In his “Practical Security” column in the May/June issue of IEEE Internet Computing magazine, Stephen Farrell talks about strength of cipher-suites, and the fact that the overall strength is not simply measured by the key length of which the user is aware. It’s a useful thing to understand if you’re responsible for implementing encryption, and it’s something that we often get wrong, using strong asymmetric encryption up front, but weaker symmetric encryption behind it, or the other way around.

As often happens, though, when we try to state things simply, he says something that’s a little confusing, which I’d like to look at more closely here. In talking about the strength of the hash (digest) algorithm, Stephen says this:

Similarly, if a digest operation’s output were small, say only 32 bits long, then an adversary could simply guess possible inputs until it found a matching output, probably only after roughly 65,000 attempts.

The number, 65,000, comes from the fact that 16 bits represents about 65,000 possible values (65,536, to be exact), and that 16 bits are half as many as 32 bits. On the surface, though, this looks like an error. If we were picking random values from a 32-bit range (which represents a bit more than 4 billion possible values), we’d expect the average number of attempts to crack it to be half that value, not half the number of bits: 31 bits, not 16 bits, or about 2 billion attempts.

The trick, though, is that the hash algorithms we’re using have a theoretical collision-resistance strength equal to (at most) half the number of bits in their output. So, theoretically, the strength of a hash algorithm that’s similar to SHA but has a 32-bit output would be 16 bits. An that means that a brute-force attack would crack it in an average of about 65,000 attempts.

But why is the strength only equal to half the number of bits in the first place?

The fault in what we expect is in the intuitive — but incorrect — expectation of the rate in which random values will collide. We’re mapping all plain-text messages into the set of numbers represented by 160 bits for SHA-1, or, in Stephen’s hypothetical example, 32 bits. Let’s simplify it a little, and consider a similar situation. Let’s map all people into the set of numbers from 1 to 366.

In other words, let’s consider people and their birthdays.

The “birthday paradox” — which isn’t really a paradox — is an unexpected result that I talked about in my first “Paradox” post. It hinges on the idea that we expect to need a large number of people before it’s likely that two have the same birthday — that we get, in hash terms, a collision — but that, in fact, we need relatively few.

Comparing it to our hash expectation, we’d figure we’d need 366 / 2, or 183 people, to have a 50% chance of a collision. In fact, though, we only need 23. Thinking of that in terms of bits, it’s a difference between an expected 7.5 bits and an actual 4.5 bits. (You can get lots more information about the birthday paradox at the Wikipedia entry.)

The effect here is the same. Because the probability of two randomly chosen source strings hashing to the same value goes up rapidly, not in the linear way we expect, it turns out that the strength of a hash algorithm with an output of n bits is not 2n / 2, but, rather, 2n/2 — that is, not half the number of values, but half the number of bits.