Andrew

Programming Archives

Writing a (Tribes 1) Master Server

Posted on February 15, 2008 10:59 AM

After the hubbub over Sierra's announcement that they were ceasing multiplayer support for Tribes 1 and the resulting scramble to locate a replacement master server, I decided to give a shot at writing one. The required feature set appeared simple enough to only take a week or so to implement but with enough gotchas to keep it suitably interesting... read more

C++ Templates and Class Inheritance

Posted on May 17, 2007 1:03 AM

The following code is not legal C++:

template < class type >
struct A {
	void f() {}
	type mX;
};

template < class type >
struct B : public A<type> {
	void g() { mY = ( mX ); f(); }
	type mY;
};

The best part is that unless you know the obscure reason why it is not legal, it appears legal and might even compile and run perfectly depending on which compiler you're using. Not surprisingly, that is exactly how I ran in to it. I was doing templated class inheritance and thought I was in the clear because everything ran fine with MSVC7.1 and ICC 9, but when I belatedly tried to compile with g++ 3.4.4, I ran in to the following errors.. read more

UTF-8 Conversion Tricks

Posted on April 14, 2007 3:04 AM

UTF-8 is a wonderfully simple encoding format with some very nice properties, but the juggling required to convert to UTF-16, and UTF-32 can be a little tricky and fairly easy to do poorly. This is further compounded by the various error conditions you must keep an eye out for, such as overlong encodings, reserved ranges, surrogate markers, incomplete sequences, and so on.

These are a couple tricks you can employ to hopefully keep the conversion fast and robust.

Tail Length Lookup


Our first trick is to use a lookup table for the initial byte. This allows you to both a) tell whether the byte is valid (80 to bf and fe to ff are invalid leading bytes, as well as f5 to fd if you don't want to handle 5 and 6 byte sequences) and b) determine the number of trailing bytes in the expected sequence. We will also need the length of the sequence to quickly ensure there are enough bytes in left in the input as well as for other upcoming tricks, so this actually results in multiple wins.. read more

Breaking SuperFastHash (and chipping lookup3)

Posted on March 29, 2007 3:31 AM

After the problems SuperFastHash had in Hash Algorithm Attacks, I decided to try and break it completely, i.e. generate collisions algorithmically instead of brute forcing them. The attempt was more successful than I had anticipated, although Paul is obviously aware of the weak mixing in the final bits as evidenced by his comment in the source code, "Force 'avalanching' of final 127 bits". My favorite collisions encountered would have to be "10/4 < 3", "10/5 = 2", and "10/6 > 1", which have the property of hashing to the same value while being mathematically correct!

As I was writing this, I came up with a way to attack Bob Jenkins' lookup3 as well. Unlike SuperFastHash, the lookup3 attack is due to the way the input bytes are being read in and does not indicate a deficiency in the mixing itself. If you are using lookup3 with hash tables, the core function will still be quite safe; it will only need to be modified if you are using it to generate unique 64bit identifiers and the input data could be altered for a malicious purpose.

With that said, let's look at the attacks.. read more

MSVC rolls over

Posted on March 22, 2007 12:44 AM

If you are using MSVC7 or earlier and you want to rotate some uints, you'll be unpleasantly surprised when.. read more

Hash Algorithm Attacks

Posted on March 13, 2007 10:12 PM

After my previous post on When Bad Hashing Means Good Caching, I got to thinking about Crosby/Wallach's Denial of Service via Algorithmic Complexity Attacks paper and wondered if any of the algorithms I was looking at were susceptible to induced collisions. While I had never done this before and didn't expect much success, the results were a little surprising.

Method

The basic method I used is the one as described by Crosby and Wallach:

The hash functions used by typical programs for their hash tables are generally not cryptographically strong functions like MD5 or SHA-1. Instead, they tend to be functions with 32 bits of internal state, designed primarily for speed. Because this state is limited, we need only find inputs such that the internal state after hashing is the same as the initial state.

Consider a hash function with the initial state of 0. Imagine we can find generators, or inputs k1,k2, ... ki such that 0 = Hash(k1) = Hash(k2) = ... = Hash(ki). Then the concatenation of any number of these generators in any combination and any order also hashes to 0. So, k1k2 also hashes to 0, as will k1k1 or k2k1k3k2. Thus, by finding three inputs k1, k2, k3 via exhaustive search and concatenating them combinatorially, we can generate a large number of collisions without requiring any additional searching. The number of possible collisions is bounded only by the maximum length to which we are willing to allow concatenated generators to grow.

I adapted this general idea into the following series of steps... read more

When Bad Hashing Means Good Caching

Posted on March 7, 2007 1:54 AM

I was testing various string hashing algorithms on chained hash tables, primarily to look at the bucket distribution and number of key comparisons with both prime and power of 2 sized tables. Each table node was set up to remember it's full hash value so bucket collisions would only drop to a key comparison on a true key collision. I initially wasn't concerned with run times, but I tacked on a timer anyway so I could get a quick metric on how collisions and distribution were affecting performance and wound up running in to a rather odd situation. For the purposes of this article, the tables were also pre-sized before inserting keys to avoid resizing overhead.

While I was testing many algorithms, only two are needed for this article: The string hash from STLPort 4.6.2 and Paul Hsieh's SuperFastHash. STLPort's hash is simply "hash = ( hash * 5 ) + string[ i ]" which should be fairly fast but produce weak distribution on large amounts of clustered keys. SuperFastHash should also be quite fast while having much stronger hashing no matter the key size or clustering.

The dataset that caused the problem was a simple 6 character sequential permutation in the form of.. read more