# Chess Engine Bit Hacks

I’ve been making my own chess engine both to sharpen up a bit on my C++ skills and because it is useful for a machine learning project I am doing involving chess. In particular I need to get the legal moves as quickly as possible into python to feed in as training data. I saw a warning never to try to write your own chess engine, so obviously I took the bait.

My first version is complete, and gives a big boost, but it still is seriously slow compared to expertly constructed engines. The way they achieve this is based on creating lots of lookup tables and using bit hacks on 64-bit wide words.

Update 2022-06-29: I ended up writing several versions: https://github.com/shaunharker/chessboard expanding upon ideas from what is written below

## Bitboards

In chess engines one typically finds 64-bit wide unsigned integer types (`uint64_t`

in C++) being understood as 8x8 bit matrices representing subsets of the 64 squares of the chessboard. We have to pick a convention, so we’ll make the square a8 correspond to the least significant bit, and then proceed left-to-right then top-to-bottom on the chessboard assigning the consecutive powers of two to the squares. In fact, this is precisely the story of the man who asked the king for a reward based on the following procedure: place one grain of rice on the first square a8, twice as many on b8, and so forth, all the way to h1.

A power of two s is identified with a square with s grains on it; more generally, a positive integer `0 <= x < 2**64`

is taken to correspond to the unique set of squares that have precisely x grains on them when taken as a whole.

We can generalize this to nxn boards, where we want the total width of bits to be `n**2`

, and allow `0 <= x < 2**(n*n)`

to represent a subset of an nxn board in the exact same manner.

### Visualizing Bitboards

Big-endian digit ordering is the customary way people write integers, but if we want to visualize a 64-bit binary code as a subset of chessboard squares from White’s perspective according to our convention above, we need to reverse the bit order (i.e. make it little-endian digit ordering) and break it into 8 lines. (Or, more generally, `n**2`

-bit broken into `n`

lines.)

Here is python code that performs these contortions for us:

```
def little_endian(x, n=8):
rep = bin(x)[2:]
return ''.join(reversed('0'*(n*n-len(rep)) + rep))
def ascii_bitboard(x, n=8):
return ''.join([ a+b for (a,b) in
zip(little_endian(x, n), ['']*7 + ['\n'])])
def html_bitboard(x, n=8) -> str:
bitstring = little_endian(x, n)
chessboardstyle = """border-collapse: collapse; margin: 0 auto;"""
tdstyle = """width: 25px; height: 25px; border: 1px solid #000;"""
darksquare = """background-color: #d18b47;"""
lightsquare = """background-color: #ffce9e;"""
circlestyle = """width: 100%; height: 100%; background-color: black; border-radius: 70%; position: relative; top: 50%; left: 50%; transform: translate(-50%, -50%);"""
circlediv = f"""<div style="{circlestyle}"></div>"""
html = f'<div style="text-align: center;"><table style="{chessboardstyle}">'
for i in range(n):
html += '<tr>'
for j in range(n):
squarestyle = lightsquare if (i + j) % 2 == 0 else darksquare
contents = circlediv if bitstring[i * n + j] == '1' else ""
html += f'<td style="{tdstyle+squarestyle}">{contents}</td>'
html += '</tr>'
html += '</table></div>'
return html
```

### Bitboards as subsets of `{0,1,...,n**2-1}`

It is easy to see that an nxn bitboard can be used to represent a subset of `set(range(n**2))`

(i.e. `{0,1,2,...,n**2-1}`

) by associating the least significant bit to 0, the next to 1, and so on.

To be precise:

```
def bitboard_from_set(S):
return sum(2**s for s in S)
S = [2, 5, 8, 33]
x = bitboard_from_set(S)
html_bitboard(x)
```

#### logic

Intersection, Union, and Symmetric Difference can be performed with `&`

, `|`

, and `^`

, respectively. Works out nicely.

#### ntz (number of trailing zeros)

We’d also like an efficient way to iterate over the elements of a set. One possible way to accomplish this is via a bithack for the number of trailing zeros function `ntz`

. Equivalently, this gives the position of the least significant 1 in the binary representation (where indexing starts at 0).

The main idea to this end is that we can quickly compute the lowest element, remove it, compute the lowest element, remove it, etc. This is better than testing each of the 64 bits in `for`

loop, particularly for bitboards with small population counts.

To isolate the least significant 1 in the binary sequence is easy: the formula `((x^(x-1)) >> 1) + 1`

zeros out all 1’s that are not the least significant, leaving us with a power of 2. To finish, we need a quick way to take the base 2 logarithm of a power of two.

This is the basis for the following bithack for the number of trailing zeros (`ntz`

) in order to read off minimal square in a bitboard. Henry Warren’s *Hacker’s Delight* book gives an algorithm based on De Bruijn sequences for 32-bits, and 64-bit implementations can be found in the wild. Here is the algorithm in python:

```
def ntz(x): # for 0 <= x < 2**64
return [
0, 47, 1, 56, 48, 27, 2, 60,
57, 49, 41, 37, 28, 16, 3, 61,
54, 58, 35, 52, 50, 42, 21, 44,
38, 32, 29, 23, 17, 11, 4, 62,
46, 55, 26, 59, 40, 36, 15, 53,
34, 51, 20, 43, 31, 22, 10, 45,
25, 39, 14, 33, 19, 30, 9, 24,
13, 18, 8, 12, 7, 6, 5, 63
][(0x03f79d71b4cb0a89*(x^(x-1)))>>58]
```

Converting from square indexing `{0,1,...,63}`

to bitboard representation and back is accomplished with the pair of functions `f = ntz`

and `g = lambda s: 1 << s`

. Calling these `f`

and `g`

respectively note that we have `f(g(f(x)) == f(x)`

, `g(f(g(x))) == g(x)`

#### popcnt (population count)

If we want to know the number of elements in the subset of {0,1,…,n**2-1} represented by a bitboard then we compute the so-called `popcnt`

, which is the number of 1 bits in the binary representation. Again, there are bit hacks for this. A 64-bit popcnt algorithm written in Python is as follows:

```
def popcnt(x): # 0 <= x < 2**64
k1 = 0x5555555555555555
k2 = 0x3333333333333333
k4 = 0x0f0f0f0f0f0f0f0f
kf = 0x0101010101010101
x = x - ((x >> 1) & k1)
x = (x & k2) + ((x >> 2) & k2)
x = (x + (x >> 4)) & k4
x = (x * kf) >> 56
return x
```

Some modern processors have (will have?) `popcnt`

and `ntz`

instructions built-in (among others of the same flavor) for the machine word sizes.

### Special Bitboards

Let’s distinguish some special bitboards that will appear. (Really, classes of bitboards for each `n`

, but we take `n=8`

in our examples.)

The *diagonal* consists of the bits in the `(n+1)*k`

positions for `k in {0,...,(n-1)}`

.

```
def diagonal(n=8):
return sum(2**((n+1)*k) for k in range(n))
print(diagonal()) # 9241421688590303745
html_bitboard(diagonal())
```

The *anti-diagonal* consists of the bits in the `(n-1)*(k+1)`

positions for `k in {0, ..., (n-1)}`

.

```
def antidiagonal(n=8):
return sum(2**((n-1)*(k+1)) for k in range(n))
print(antidiagonal()) # 72624976668147840
html_bitboard(antidiagonal())
```

For an 8x8 bitboard, the *eighth rank* consists of the bits in positions 0, 1, …, 7, and so on to the *first rank* consisting of the last 8 positions. To generalize terminology, refer to the nth rank of an nxn bitboard as the *end rank* or *top*, and the first rank as the *back rank* or *bottom*.

For the 8x8 bitboard, the *a-file* consists of the bits in the positions 0, 8, 16, …, 56, the *b-file* of the bits in the positions 1, 9, 17, …, 57, and so on to the *h-file*. In the general case let’s define the *left-file* and the *right-file*.

Define *columns* to be files except with a differing indexing scheme: file a is column 0, file b is column 1, and so on.

Define *rows* to be ranks except with a different indexing scheme: rank 8 is row 0, rank 7 is row 1, and so on.

#### examples

Rank 3:

```
def rank(r, n=8):
return (2**n-1) << (n*(n-r))
draw_html(rank(3))
```

Rank 8:

```
def top(n=8):
return rank(n,n)
draw_html(top())
```

Rank-1:

```
def bottom(n=8):
return rank(1,n)
draw_html(bottom())
```

The a-file:

```
def left(n=8):
return sum(2**(n*k) for k in range(n))
draw_html(left())
```

The h-file:

```
def right(n=8):
return sum(2**(n*(k+1)-1) for k in range(n))
draw_html(right())
```

## Using lookup tables

One of the applications one might use a bitboard for in a chess engine is as a masking function: `proj = lambda x: x & mask`

. If the mask only allows eight bits through (i.e. `popcnt(mask)==8`

) then there are only 256 values in the range of `proj`

. What we would like is an efficient algorithm for computing a *perfect hash* (i.e. a bijection) `f`

from the range of `proj`

to the integers 0,1,…255, and we would prefer if the properties `f(x|y)=f(x)|f(y)`

, `f(x&y)=f(x)&f(y)`

, and `f(~x)=~f(x)`

all held, i.e. that `f`

was an isomorphism of Boolean algebras. We’d also like if `f`

preserved order (or perhaps reversed it).

The approach I’ve cooked up here does not seem to be directly in Henry Warren’s *Hacker’s Delight* book, but it is inspired by the recurring trick (appearing in both `ntz`

and `popcnt`

above) of multiplying by “magic” numbers and rolling off all but the highest bit.

The idea is to look for functions of the form

```
lambda x: (magic*x)%(2**64) >> 56
```

Or, more generally, look for functions that can be built by combining bit shift operations and modular multiplications in $\mathbb{Z}_{2**64}$ in any order.

```
def modmul(x, y, m=(2**64)):
return (x*y)%m
```

#### Mapping the bottom to the top:

```
f = lambda x, n: x >> (n**2-n)
```

#### Mapping the left to the bottom:

```
f = lambda x, n: modmul(x, antidiagonal(n), 2**(n**2))
```

#### Mapping the left to the bottom in reverse:

```
f = lambda x, n: modmul(x, diagonal(n), 2**(n**2))
```

It becomes clear that we can map the left to the bottom in any permutation we’d like by using 8x8 permutation matrices encoded as bitboards.

#### Mapping to the bottom:

```
f = lambda x, n: modmul(x, left(n), 2**(n**2))
```

We show an example mapping the bits of the diagonal to the bottom.

We show an example mapping the bits of the antidiagonal to the bottom.

**Exercise.** Show this works not just for diagonals but for any permutation matrix.

## Related

A full bit 8x8 bit matrix transposition algorithm is given in Section 7.3 of *Hacker’s Delight*:

```
y = (x & 0x8040201008040201) |
(x & 0x0080402010080402) << 7 |
(x & 0x0000804020100804) << 14 |
(x & 0x0000008040201008) << 21 |
(x & 0x0000000080402010) << 28 |
(x & 0x0000000000804020) << 35 |
(x & 0x0000000000008040) << 42 |
(x & 0x0000000000000080) << 49 |
(x >> 7) & 0x0080402010080402 |
(x >> 14) & 0x0000804020100804 |
(x >> 21) & 0x0000008040201008 |
(x >> 28) & 0x0000000080402010 |
(x >> 35) & 0x0000000000804020 |
(x >> 42) & 0x0000000000008040 |
(x >> 49) & 0x0000000000000080
```

We note that `0x8040201008040201UL`

is our friend `diagonal(8)`

from above.