8 minute read

I’ve been making my own chess engine both to sharpen up a bit on my C++ skills and because it is useful for a machine learning project I am doing involving chess. In particular I need to get the legal moves as quickly as possible into python to feed in as training data. I saw a warning never to try to write your own chess engine, so obviously I took the bait.

My first version is complete, and gives a big boost, but it still is seriously slow compared to expertly constructed engines. The way they achieve this is based on creating lots of lookup tables and using bit hacks on 64-bit wide words.

Update 2022-06-29: I ended up writing several versions: https://github.com/shaunharker/chessboard expanding upon ideas from what is written below


In chess engines one typically finds 64-bit wide unsigned integer types (uint64_t in C++) being understood as 8x8 bit matrices representing subsets of the 64 squares of the chessboard. We have to pick a convention, so we’ll make the square a8 correspond to the least significant bit, and then proceed left-to-right then top-to-bottom on the chessboard assigning the consecutive powers of two to the squares. In fact, this is precisely the story of the man who asked the king for a reward based on the following procedure: place one grain of rice on the first square a8, twice as many on b8, and so forth, all the way to h1.

A power of two s is identified with a square with s grains on it; more generally, a positive integer 0 <= x < 2**64 is taken to correspond to the unique set of squares that have precisely x grains on them when taken as a whole.

We can generalize this to nxn boards, where we want the total width of bits to be n**2, and allow 0 <= x < 2**(n*n) to represent a subset of an nxn board in the exact same manner.

Visualizing Bitboards

Big-endian digit ordering is the customary way people write integers, but if we want to visualize a 64-bit binary code as a subset of chessboard squares from White’s perspective according to our convention above, we need to reverse the bit order (i.e. make it little-endian digit ordering) and break it into 8 lines. (Or, more generally, n**2-bit broken into n lines.)

Here is python code that performs these contortions for us:

def little_endian(x, n=8):
    rep = bin(x)[2:]
    return ''.join(reversed('0'*(n*n-len(rep)) + rep))

def ascii_bitboard(x, n=8):
    return ''.join([ a+b for (a,b) in
      zip(little_endian(x, n), ['']*7 + ['\n'])])

def html_bitboard(x, n=8) -> str:
    bitstring = little_endian(x, n)
    chessboardstyle = """border-collapse: collapse; margin: 0 auto;"""
    tdstyle = """width: 25px; height: 25px; border: 1px solid #000;"""
    darksquare = """background-color: #d18b47;"""
    lightsquare = """background-color: #ffce9e;"""
    circlestyle = """width: 100%; height: 100%; background-color: black; border-radius: 70%; position: relative; top: 50%; left: 50%; transform: translate(-50%, -50%);"""
    circlediv = f"""<div style="{circlestyle}"></div>"""
    html = f'<div style="text-align: center;"><table style="{chessboardstyle}">'
    for i in range(n):
        html += '<tr>'
        for j in range(n):
            squarestyle = lightsquare if (i + j) % 2 == 0 else darksquare
            contents = circlediv if bitstring[i * n + j] == '1' else ""
            html += f'<td style="{tdstyle+squarestyle}">{contents}</td>'
        html += '</tr>'
    html += '</table></div>'
    return html

Bitboards as subsets of {0,1,...,n**2-1}

It is easy to see that an nxn bitboard can be used to represent a subset of set(range(n**2)) (i.e. {0,1,2,...,n**2-1}) by associating the least significant bit to 0, the next to 1, and so on.

To be precise:

def bitboard_from_set(S):
    return sum(2**s for s in S)

S = [2, 5, 8, 33]
x = bitboard_from_set(S)


Intersection, Union, and Symmetric Difference can be performed with &, |, and ^, respectively. Works out nicely.

ntz (number of trailing zeros)

We’d also like an efficient way to iterate over the elements of a set. One possible way to accomplish this is via a bithack for the number of trailing zeros function ntz. Equivalently, this gives the position of the least significant 1 in the binary representation (where indexing starts at 0).

The main idea to this end is that we can quickly compute the lowest element, remove it, compute the lowest element, remove it, etc. This is better than testing each of the 64 bits in for loop, particularly for bitboards with small population counts.

To isolate the least significant 1 in the binary sequence is easy: the formula ((x^(x-1)) >> 1) + 1 zeros out all 1’s that are not the least significant, leaving us with a power of 2. To finish, we need a quick way to take the base 2 logarithm of a power of two.

This is the basis for the following bithack for the number of trailing zeros (ntz) in order to read off minimal square in a bitboard. Henry Warren’s Hacker’s Delight book gives an algorithm based on De Bruijn sequences for 32-bits, and 64-bit implementations can be found in the wild. Here is the algorithm in python:

def ntz(x):  # for 0 <= x < 2**64
    return [
         0, 47,  1, 56, 48, 27,  2, 60,
        57, 49, 41, 37, 28, 16,  3, 61,
        54, 58, 35, 52, 50, 42, 21, 44,
        38, 32, 29, 23, 17, 11,  4, 62,
        46, 55, 26, 59, 40, 36, 15, 53,
        34, 51, 20, 43, 31, 22, 10, 45,
        25, 39, 14, 33, 19, 30,  9, 24,
        13, 18,  8, 12,  7,  6,  5, 63

Converting from square indexing {0,1,...,63} to bitboard representation and back is accomplished with the pair of functions f = ntz and g = lambda s: 1 << s. Calling these f and g respectively note that we have f(g(f(x)) == f(x), g(f(g(x))) == g(x)

popcnt (population count)

If we want to know the number of elements in the subset of {0,1,…,n**2-1} represented by a bitboard then we compute the so-called popcnt, which is the number of 1 bits in the binary representation. Again, there are bit hacks for this. A 64-bit popcnt algorithm written in Python is as follows:

def popcnt(x):  # 0 <= x < 2**64
    k1 = 0x5555555555555555
    k2 = 0x3333333333333333
    k4 = 0x0f0f0f0f0f0f0f0f
    kf = 0x0101010101010101
    x = x - ((x >> 1) & k1)
    x = (x & k2) + ((x >> 2) & k2)
    x = (x + (x >> 4)) & k4
    x = (x * kf) >> 56
    return x

Some modern processors have (will have?) popcnt and ntz instructions built-in (among others of the same flavor) for the machine word sizes.

Special Bitboards

Let’s distinguish some special bitboards that will appear. (Really, classes of bitboards for each n, but we take n=8 in our examples.)

The diagonal consists of the bits in the (n+1)*k positions for k in {0,...,(n-1)}.

def diagonal(n=8):
    return sum(2**((n+1)*k) for k in range(n))

print(diagonal())  # 9241421688590303745

The anti-diagonal consists of the bits in the (n-1)*(k+1) positions for k in {0, ..., (n-1)}.

def antidiagonal(n=8):
    return sum(2**((n-1)*(k+1)) for k in range(n))

print(antidiagonal())  # 72624976668147840

For an 8x8 bitboard, the eighth rank consists of the bits in positions 0, 1, …, 7, and so on to the first rank consisting of the last 8 positions. To generalize terminology, refer to the nth rank of an nxn bitboard as the end rank or top, and the first rank as the back rank or bottom.

For the 8x8 bitboard, the a-file consists of the bits in the positions 0, 8, 16, …, 56, the b-file of the bits in the positions 1, 9, 17, …, 57, and so on to the h-file. In the general case let’s define the left-file and the right-file.

Define columns to be files except with a differing indexing scheme: file a is column 0, file b is column 1, and so on.

Define rows to be ranks except with a different indexing scheme: rank 8 is row 0, rank 7 is row 1, and so on.


Rank 3:

def rank(r, n=8):
    return (2**n-1) << (n*(n-r))


Rank 8:

def top(n=8):
    return rank(n,n)



def bottom(n=8):
    return rank(1,n)


The a-file:

def left(n=8):
    return sum(2**(n*k) for k in range(n))


The h-file:

def right(n=8):
    return sum(2**(n*(k+1)-1) for k in range(n))


Using lookup tables

One of the applications one might use a bitboard for in a chess engine is as a masking function: proj = lambda x: x & mask. If the mask only allows eight bits through (i.e. popcnt(mask)==8) then there are only 256 values in the range of proj. What we would like is an efficient algorithm for computing a perfect hash (i.e. a bijection) f from the range of proj to the integers 0,1,…255, and we would prefer if the properties f(x|y)=f(x)|f(y), f(x&y)=f(x)&f(y), and f(~x)=~f(x) all held, i.e. that f was an isomorphism of Boolean algebras. We’d also like if f preserved order (or perhaps reversed it).

The approach I’ve cooked up here does not seem to be directly in Henry Warren’s Hacker’s Delight book, but it is inspired by the recurring trick (appearing in both ntz and popcnt above) of multiplying by “magic” numbers and rolling off all but the highest bit.

The idea is to look for functions of the form

lambda x: (magic*x)%(2**64) >> 56

Or, more generally, look for functions that can be built by combining bit shift operations and modular multiplications in $\mathbb{Z}_{2**64}$ in any order.

def modmul(x, y, m=(2**64)):
  return (x*y)%m

Mapping the bottom to the top:

f = lambda x, n: x >> (n**2-n)

Mapping the left to the bottom:

f = lambda x, n: modmul(x, antidiagonal(n), 2**(n**2))

Mapping the left to the bottom in reverse:

f = lambda x, n: modmul(x, diagonal(n), 2**(n**2))

It becomes clear that we can map the left to the bottom in any permutation we’d like by using 8x8 permutation matrices encoded as bitboards.

Mapping to the bottom:

f = lambda x, n: modmul(x, left(n), 2**(n**2))

We show an example mapping the bits of the diagonal to the bottom.

We show an example mapping the bits of the antidiagonal to the bottom.

Exercise. Show this works not just for diagonals but for any permutation matrix.

A full bit 8x8 bit matrix transposition algorithm is given in Section 7.3 of Hacker’s Delight:

y = (x & 0x8040201008040201)        |
    (x & 0x0080402010080402) << 7   |
    (x & 0x0000804020100804) << 14  |
    (x & 0x0000008040201008) << 21  |
    (x & 0x0000000080402010) << 28  |
    (x & 0x0000000000804020) << 35  |
    (x & 0x0000000000008040) << 42  |
    (x & 0x0000000000000080) << 49  |
    (x >>  7) & 0x0080402010080402  |
    (x >> 14) & 0x0000804020100804  |
    (x >> 21) & 0x0000008040201008  |
    (x >> 28) & 0x0000000080402010  |
    (x >> 35) & 0x0000000000804020  |
    (x >> 42) & 0x0000000000008040  |
    (x >> 49) & 0x0000000000000080

We note that 0x8040201008040201UL is our friend diagonal(8) from above.