Andrew

February 2008 Archives

Writing a (Tribes 1) Master Server

Posted on February 15, 2008 10:59 AM

While I wrote this in September 2007, for various reasons I did not get around to putting the finishing touches on it. Please pretend you're reading it then and not now!

After the hubbub over Sierra's announcement that they were ceasing multiplayer support for Tribes 1 and the resulting scramble to locate a replacement master server, I decided to give a shot at writing one. The required feature set appeared simple enough to only take a week or so to implement but with enough gotchas to keep it suitably interesting. While I only had a vague idea of what was required, I got a jump start on proper design by finding Half-Life and Team Fortress Networking: Closing the Loop on Scalable Network Gaming Backend Services by Yahn W. Bernier, an article detailing the design, implementation, and potential problems of the Half Life master server. Even though some of the topics did not apply to the Tribes 1 requirements, e.g. I can't alter the client's behavior to auto rate-limit the server list transmission, the article was still quite valuable and an interesting read even if you aren't implementing a master server.

Getting Started

My initial server was as basic as you can get: a hash table of servers behind a udp socket which wakes up periodically to time out servers. Tribes servers do not send a "going offline" packet, but as their heartbeat period is every 2 minutes the server will not remain in the list for long.

If the intended environment was a closed setting with a limited number of servers to track, this would be more than sufficient, but in the real world it is vulnerable to many issues and attacks, intentional or otherwise.

Issues and Attacks

Bogus Servers: The ability to add bogus servers is probably the most damaging problem due to the ability to inflate the server list which makes it possible to easily saturate the master server's upload by requesting the server list repeatedly. The two ways of adding bogus servers are either by IP spoofing or simply sending thousands of keepalives from your IP with different ports. The Half Life Networking article goes in to this problem in depth and offers the solution of a challenge-response system where the master server, on receiving a heartbeat, sends a random value to the server and only allows the heartbeat to be registered if the server sends the correct value back.

While the challenge/response system solves the problem of IP spoofing completely, it is still trivial to listen and respond to the challenges on your real IP. To defeat this attack, it is necessary to limit each IP to a certain number of registered servers. This way, even if you can intercept and respond to 20,000 challenges, only X will be accepted and the server list will be no worse for the wear. To be honest, limiting an IP to a low number of servers, say 3-5, should be in place regardless of the chance for spam. The only reason to run multiple servers on a single IP is either NAT or that the additional servers are backups/rarely used. Server CPU lag is one of the worst experiences to have online, although many mods run so poorly they tend to anticipate this fact and over-compensate with ridiculous weapons and items which leave skill completely out of the equation. That is a topic for another day though.

Of note is the fact that while Tribes network protocol does include a sequence number which can be used as a challenge, it is only 16 bits wide. To prevent brute force attacks (however unlikely), the challenge-response value should have a timeout of a few seconds at most.

Upload Starvation: If an IP is spamming list requests to the master server, it might be possible to saturate the master server's upload even with a relatively small server list. I wanted to be able to address the problem of spamming requests without requiring human intervention, but at the same time not penalize legitimate users who may click the refresh button a few times too often or have multiple players behind NAT.

The solution, ironically, came from Tribes. I implemented a penalty per IP system similar to the in-game chat spam penalty. Each request accrues a certain amount of penalty, and when the penalty reaches a certain limit the master server stops responding to that IP. The penalty is decreased by 1 for each second that passes, allowing the client to eventually access the master again. I also added a maximum penalty cap (currently at 10 seconds) so that bans will be over fairly quickly once the IP ceases to spam.

Unintentional flooding: Because Tribes has no auto-rate limiting mechanism to control the server list download speed, it's possible the master server could either upload the server list faster than the client can handle or saturate it's own upload by firing off too many packets at once when multiple requests come in. The obvious solution is for the master server to throttle it's upload, i.e. queuing up packets instead of sending them off instantly. While a simple rate limited FIFO queue would keep the server's upload from becoming saturated, it would also mean that after a certain point the packets on the tail end of a large queue would be timed out by the client before they had a chance to arrive. This means rate limiting needs to be done per-connection and not globally.

Tribes, however, throws a wrench in to how low you can rate limit a connection. When the first packet of a master server response arrives, Tribes creates X pending "ping" responses to wait for, where X is the number of packets left in the master server response. Because both the master server list request and server ping packets are the same (\x10\x03), Tribes piggybacks the master server list request in the server pending ping list and simply re-pings any packet in the list which times out. Unfortunately, the timestamps for a particular set of master server response packets are never updated when succeeding packets come in, meaning that if Tribes does not receive every master server packet within $pref::pingTimeoutTime milliseconds (default 900ms), it sends a new request for each remaining packet in the list.

So lets say the server list contains 3000 servers, and the master server batches 64 servers per packet. This would result in 47 packets at around ~450 bytes each, or 21k of data. If the master server uploads at 4k/s to the client, only 4k*0.9s = 3.6k of data will make it to the client before the remaining 38 packets are timed out. Tribes would then see 38 "pings" that timed out and re-ping each one, updating their packet keys and timestamps so that the packets still coming in from the previous request would be discarded. If Tribes hits $pref::pingRetryCount on any single master server packet, it will decide it can't contact a master server and either move on to the next master or pop up an error box. This unfortunately means the per-client rate limit has to be set high enough to make sure the server list can be transmitted within a couple seconds at most, or 900ms to be safe. Assuming a maximum of 500 servers, or ~4k of data, 5k/s per client should be a decent limit.

Due to a poor choice of data structures, Tribes itself limits how large the list can grow. The server list and pending ping list are stored in vectors which grow by 5 items at a time!, leading to tons of potential copying on vector resizes and painful O(N) lookups. When the "to ping" list reaches ~6000 servers, Tribes locks up for longer than I care to wait. This is exacerbated when you raise $pref::pingTimeoutTime to accommodate larger server lists, leading to a realistic ceiling of around 4000 servers regardless of the timeout bug.

Unresolved Attacks

Because Tribes has no challenge-response mechanism available for server list requests, there is nothing to be done about Upload Starvation by IP spoofing. Please try not to annoy anyone who has the ability to spoof and the desire to spam your master server!

Mirrors

I thought about doing something with mirrors, but there really is no good reason to bother.

1. CPU load is a non-issue. Even if there were 50,000 live servers sending heartbeats every 2 minutes, that's only 415 heartbeats a second. A heartbeat requires 1 hashtable lookup for the IP penalty, 1 hashtable lookup for the server table, and 3 hashtable lookups for the pending servers table if the server doesn't exist. A worst case of 5 hash table lookups * 415 = 2075 hashtable lookups a second which would not even register on a load monitor.

Using a separate process to spam my master with heartbeats, I can get up to ~50,000 heartbeats a second without sending a challenge packet and ~35,000 heartbeats sending the challenge (and using ~1000kb/s up in the process). Using a separate machine to spam heartbeats peaks at around ~14,000 a second with around ~30% CPU load for the master server.

A "realistic" test of 18,000 servers, 22 server list requests a second, and 450 heartbeats a second resulted in ~3-10% CPU load and 2,800 kb/s upload. Memory usage peaked at around 40mb, largely due to the 530 concurrent 120k server lists being served from memory at 5kb/s. Reducing the server count to 1,000 while keeping the requests and heartbeats the same results in no measurable CPU load and 900k peak memory usage.

2. Bandwidth, while an issue with a community supporting 50,000 servers, is not an issue with the Tribes community.

  • There are currently 121 active servers in the current master list.
  • According to archive.org's snapshot of gamespy's stats page, Tribes only had ~250 servers up in July 2004.
  • Going back even farther is one of Tim Sweeney's old news posts from October 1999, showing 589 servers during Tribes' heyday.

Lets do the math:

        Servers    Bytes / List     Req/s 40kb up   160kb up
1999        589            4323              9.47      37.90
2004        250            1950             21.01      84.02
2007        121            1047             39.12     156.49

Even a crappy cable modem can handle 39 reqs/s with the current list, and a T1 could do just as well with the server count from 1999. Obviously it makes more sense to find a host that will stay up than to make a tangled net of mirrors. There are also other problems with unorganized mirrors such as all of the mirrors a server reports to going down and the fact that the Tribes client queries mirrors sequentially instead of randomly.

If a host can't be found that is guaranteed to be up, then finding 5 such hosts and needlessly complicating the master server isn't going to make the system more robust.

Code!

The only thing I don't really like about the current code is the packet queuing. While the current server can obviously scale quite high, it should be possible to construct packets on the fly instead of batching them up in memory. The problem to be solved is how to keep multiple iterators in a hash table which can have elements added/removed at any time, and since the master has to tell Tribes how many packets it will be sending, to not send any additional servers which are added after the time of the list request. I have some ideas, but I should probably post this first instead of continuing to put it off.

The code is cross-platform, although I don't have anything other than Linux to test on so you will probably need to jump through some hoops to compile on BSD or OSX.

All of the tunable parameters are at the top of t1master.cpp and should be fairly self explanatory. I didn't want any external dependencies so the MOTD is "hardcoded", although you can pass it on the command line. I also didn't see a point in backing up the server list at any point since the list will fully regenerate itself in 2 minutes. Note that I do randomize the hash seed so even if a spoofer floods the server with bogus IPs, he won't be able to logjam the hash tables with collisions!

t1spam.cpp is the program I used for stress testing by hammering the server with heartbeats and listreqs. You should see the lines to comment/uncomment in t1master.cpp(masterserver::process) and optionally in t1master.cpp(main) to simulate random request sources to get realistic loads for the hash tables and penalties.

Tribes 1 Physics, Part One: Overview

Posted on February 20, 2008 11:04 PM

Games have been trying replicate the feel of Tribes 1 for almost as long as it's been out (Tribes 2 started development in mid-1999) and every single one of them has failed, usually miserably.

  • Tribes 2 physics are an abomination, although in hindsight it should have come as no surprise after the Base+ mod Dynamix play-tested in Tribes 1 to massive disdain. It's as if they were trying to build on the success of Tribes 1 when it was 2 weeks old, not 2 years. Unfortunately for the majority of the community not in the beta, we didn't find this out until after the game was paid for. Further mods such as Base++, Team Rabbit 2, and Classic attempted to rectify the situation yet were still only a pale ghost of Tribes 1.
  • Legends has always felt wrong despite how often they tweaked and twiddled the physics and boasted of having the original physics source code. If they did have the source, they either didn't know how to implement it or didn't have enough of the source to properly replicate all of the required physics.
  • Tribes Vengeance is so far removed from the feel of Tribes that it shouldn't even enter the discussion. Jetting is wrong, air movement shouldn't even exist, collisions are wrong, and skiing is a sick joke. I can only hope KineticPoet remembered what used to be and silently winced every time he sat down to work on the game.
  • The yet-to-be-released Fallen Empire: Legions appears to be following the "We're not a Tribes 1 clone, so let's make wacky changes to ram that home" mantra. Ideas like 6 way jetting (jetting down while in the air and laterally while on the ground), non-friction sliding a-la T:V, jetpack overdrive, a charge up sniper rifle, etc., sound like they could easily alter the game beyond good taste.

What all of these games ostensibly want is to appeal to Tribes 1 players, yet they attempt to accomplish this by using a different and/or completely arbitrary physics system, adding something that resembles a jetpack and skiing and hoping everyone likes it. While I don't know if a carbon copy of Tribes 1 on a modern engine would be a success, I do know that almost any Tribes 1 veteran will be unsatisfied with any Tribes style game that does not replicate the feel of Tribes 1 regardless of how often the developers hide behind the claim of "not Tribes 1".

These articles will solve that problem. I will provide everything needed for a 99% re-creation of the Tribes 1 physics on any 3d engine. There are a few pieces to the puzzle so I'll be breaking the topics in to separate articles for movement, collision, and explosions. This article will go over the basics of the engine and document the structures and constants I'll be using.

Basics

To start off, all units will be in meters, the default unit of Tribes. You will need to provide a separate function to convert the Tribes meters to your engine's native units, e.g. if 1 unit in your engine of choice is roughly equal to an inch, you will need to convert from Tribes meters to inches, i.e. multiply ~39.37. This will ensure that the authentic Tribes 1 constants are being used and that nothing has been messed up to due to being mis-scaled by hand. I will use MetersToUnits and UnitsToMeters to indicate when a conversion needs to be made.

Tribes runs at 31.25 ticks a second, or 0.032s per tick. Altering the ticks per second will result in slightly different physics as the gravity is accumulated based on the tick length while jumping is a singular impulse, so as the tick length grows, gravity slowly subsumes the jump impulse. There are other less obvious calculations based on the tick length which will be altered as well; some of these can be accounted for, some not.

The coordinate system is Z up, and Vectors are assumed to have overloaded operators.

All code examples will be in a pseudo C-like language and should be easy to translate in to any engine.

Physics Workflow (details omitted)

There is nothing special in Tribes 1 physics, skiing is not achieved by "Friction = 0", and the only feature an engine needs is the ability to properly detect collisions. There are a lot of nuances required to achieve a simulation that doesn't feel subtly wrong, but none are of the earth-shatteringly unique variety that many claim only the Tribes/Torque engine can replicate. They are, however, nuances that you are highly unlikely to match with pure guesswork.

Structures

These are the structures and values I will be using for detailed explanations of the physics workflow. Don't worry if you don't know what a value does just yet.

Tribes 1 Physics Series

Tribes 1 Physics, Part Two: Movement

Posted on February 24, 2008 11:48 PM

This article covers the movement physics of Tribes 1, i.e. jumping, jetting, and ground movement/friction. These account for ~90% of the "Tribes" feeling, although even 90% will still feel wrong to anyone who knows the authentic feel well. There will be some variables which are only set in the Collision code, but there shouldn't be any confusion as to what they do.

Movement Code

There are a few spots where Tribes took shortcuts with it's math and assumed that gravity would always be "0 0 -X", saving a couple multiplies on dots and such. I've replaced these spots to be gravity agnostic as I don't think the CPU savings are that big and they make the code difficult to understand. It shouldn't be hard to "re-optimize" if you feel the need. Dot products will be a .Dot method because overloaded multiplication operators confuse me.

Player.Tick

Player.Tick starts off by taking the user's movement input (left, right, forward, back) and creating the speed vector for walking and direction vector for jumping and jetting. This is pretty basic and there is nothing out of the ordinary here.

After this, the Player energy is updated, we check lastJumpableNormalTimestamp to see if a jump should be allowed, and then do the jumping, jetting, and walking. Most of the real work takes place in Jump, Jet, and Friction, so the tick function is fairly sparse.

A few things to note:

  • Tribes lets you jump up to 256ms after your last ground contact, allowing you to jump smoothly through little bumps and such where you technically leave the ground, but don't really appear to.
  • Tribes handles jetpack energy a little differently, i.e. once you get to around 5% of your jets, they cut off and recharge a little causing the jets to stutter. I haven't worked this part out yet, but may re-edit this section if I do.
  • Tribes applies gravity constantly, it is not hacked off when you are resting on a surface.
  • JETENERGY_CHARGE equals "8 + 3" with the "+ 3" being the recharge boost from an energy pack.
Player.Jump
Player.Jet

Side jets only kick in if the player is holding down a movement key and it's been longer than MAXJUMPTICKS since the last ground contact. Since jumping sets lastJumpableNormalTimestamp to the limit, jumping and jetting results in side jets being enabled immediately, while simply holding down your jets on the ground will give a little startup time of full jets regardless of whether a direction key is down.

Vector.ProjectOntoPlane

This is needed for the gravity agnostic Friction function. I don't know how common it is.

Player.Friction

Multiplying GROUNDTRACTION by currentFriction is unfortunately not the magical "Friction = 0" that supposedly causes skiing. currentFriction is decayed every tick when there hasn't been a ground contact, resulting in the ability to jet and slide against walls and ceilings without being slowed down by the contact friction.

ProjectOntoPlane is probably not needed since the player should always be oriented so the move will never include a component not in the gravity plane, but it doesn't hurt to include it.

Tribes 1 Physics Series