Table Size

When choosing a table size (capacity), it is recommended to choose a prime number larger than the desired table size.

Why? There is an intuitive explanation on StackExchange (use this link). The idea generally is that we compute the index as

// Compress hashcode to index range [0, M-1]
index = key.hashCode() % M; 

Assume the value returned from key.hashCode() is $K$. Then for every possible $K$ that shares a common factor with $M$, the key will be hashed to a position that is a multiple of this factor. Therefore, to minimize collisions, we can reduce the number of common factors between $M$ and $K$ by choosing $M$ to be a number that has very few factors: a prime number.

In particular when you rehash, if you just double the table capacity, you get an even number. If the table capacity $M$ is an even number, and the key.hashCode() is even, then the index will also be an even number. This can bias the entries in the table to only occupy the even indices therefore only half the table positions may be used.

When it comes to implementing the hash table, a common practice is to tabulate primes $p_1, p_2, \dots, p_{k}$ (for some choice of $k$) and use it as a look-up table to pick the table size. Something like this:

int primes[] = {2, 5, 11, 23, 47, 97, 197, 397, 797, 1597, 3203, 6421, 12853, 25717, 51437,102877, 205759, 411527, 823117, 1646237,3292489, 6584983, 13169977};

If this strategy in implemented, the table capacity is either capped at $p_k$ or just switch to (simply) doubling the capacity after it reached (and surpassed) the $p_k$ size.

Resources