26 March, 2024
by Mark Teisman
A common way to do the coinflip in A/B testing or online controlled experiments, is to hash the user identifier and a seed that is associated with the experiment instance.
There are many algorithms that could be used to produce this hash. The main functional requirements for the hashing functions are
Then there are some optional considerations. For instance, is it critical that, when used across different architectures, the hash function produces the same output? With different architectures - think of the different word sizes (32-bit architecture versus 64-bit architecture), but also endianess (Big-endian versus Little-endian) - the outputs of a given hash function may be different. Here, in general I think it's fair to choose an implementation that delivers consistent hashes across architectures. This to make sure you don't place any constraints on the compatibility of clients you have now or in the future.
With regards to which options to pick from, here's some options you could consider.
From this list, I would probably pick xxHash, which is highly performant and meets all previously stated requirements. xxHash passes the SMHasher test suite, which include Performance Tests, Differential Tests, Avalanche Tests and more.