Compressing FEN with move and game details

Kumar Gaurav
Compressing FEN with move and game details

A position in chess is denoted using a FEN. While writing a Chess Database if you want to store moves that were played on a given position then you will have to save the FEN as well. 

 

Part 1: How to write a cost-effective database for chess from scratch

Part 2: Compressing a PGN File

Part 3: Compressing FEN with Moves and Game Details

 

A typical FEN looks like “rnbqkbnr/pppp1ppp/4p3/8/3P4/8/PPP1PPPP/RNBQKBNR w KQkq - 0 2”. This Fen denotes the below position

A Typical Position in Chess

This is part 3 of the “How to Write a cost-effective database for Chess from scratch” Series. Here I will be discussing the algorithm to compress FEN like the one I mentioned above.

Why do we need FEN compression?

We can store FEn without compression as well but that'll take huge space, let me tell you how.

In a typical game, Endgame usually starts after the first 50% of the moves. which means during the first 50% of the moves, most of the pieces are there on the board, and that led to a longer FEN string. On average, a FEN is 50 characters long.

Consider a database of 1 Million games of average 40 moves.

for 40 moves there will be 80 positions meaning 80 FENS

almost 20% of the position will repeat in the games so unique fens could be as below

80 * 10,000,000  * 0.2 = 160,000,000 

and space required to store these FENS alone would be 160,000,000 * 50 = 800,000,000 bytes which is close to 1GB.

So just saving FENS would be close to 1 GB for 1 Million games but you will have to save moves, the number of times the move was played in a given position along with statics of how many times White won or black or drawn. And also the ids of the games. From personal experience, I can tell, for a Million games it would easily take up to 2 GB of space. Now consider a Big database with 5-6 Million games, only FEN storage will take around 7-8 GB, and hence Cost of space will increase. Hence we need to compress at least the FEN if not other pieces of information like move game id etc. 

But before we go about compressing FEN we need to understand how FEN is constructed

 

Understanding FEN

A FEN has 6 parts

  • Position of pieces on the board
  • Turn
  • Castling options
  • En-passant square
  • Number of half moves from the last pawn move
  • Total number of full moves

The position of pieces on the board is divided into 8 parts separated by “/”. Each part denotes the rank, starting from the 8th rank then the 7th then the 6th, and so on.  So compressing a FEN is essentially storing the position of pieces on the board along with the other 5 pieces of information in the minimum possible space. there's a great discussion on the StackExchange for the same. Smallest chess board compression

I highly recommend checking this discussion.

The compression algorithm

 

Storing Board state

We consider the entire chess board as an 8x8 boolean matrix with false on an empty square and 1 on occupied squares. So the corresponding matrix for the above position would be as below

1 1 1 1 1 1 1 1

1 1 1 1 0 1 1 1

0 0 0 0 1 0 0 0

0 0 0 0 0 0 0 0

0 0 0 1 0 0 0 0

0 0 0 0 0 0 0 0

1 1 1 0 1 1 1 1

1 1 1 1 1 1 1 1

Now make this matrix a 1D array of 64 booleans.

 

Storing position of Pieces

As explained in the PGN compression we can store each piece using 4 bits, so we will create another array of length equal to a number of 1s in the board array multiplied by 4. We will store corresponding bits of pieces in that array. With this step, we will be able to store the first part of the FEN (Board state and Position of pieces) in very few bits.

 

Storing Turn

The turn can be only two so only 1 bit would be sufficient for this, 0 for black and 1 for white (or vice versa).

 

Storing Castling Option

Castling options are denoted below

For White:

K - only king-side castling is possible

Q - only queen-side castling is possible

KQ - both side castling is possible

"-" - No castling is possible

For Black:

k - only king-side castling is possible

q - only queen-side castling is possible

kq - both side castling is possible

"-" - No castling is possible

NOTE: Please note that castling rights for black and white are differentiated by the case of the alphabet. For black it's lowercase.

Hence both side castling possibilities are limited to 4 values and 2 bits are sufficient for storing castling right. hence an array of 8-bit would store castling rights for black and white.

 

Storing En-passant square:

If you are unsure about en-passant, please check “What is en-passant” in FAQ. En-passant can only take place on 16 squares, 8 on 3rd rank and other 8 0n 6th rank. and these 16 values can be 4 bits, but since we also need to take care of no en-passant we will have to take an extra bit hence we will need 5 bits.

 

Storing a Total number of half moves since the last pawn move:

A chess game is considered drawn if no pawn is moved for the last 50 moves, hence this value is limited to 100 values. and 100 can be written using 7 bits. therefore we will need 7 bits for storing this value.

 

Storing the Total number of moves

Mathematically I am not sure how long a chess game can go following all standard chess rules. But practically this value shouldn't be more than 256 and hence we will need 8 bits more for this value.

 

Max number of bits taken for storing a FEN: 

64 (for board) + 128 ( for pieces) + 1 (for turn) + 4 (for castling) + 5 (for en-passant square) + 7 (for half moves) + 8 (for move number) = 217bits

 

Compressing fens

After following the above steps we will end up having 7 arrays. The final step would be to merge these arrays in the same order as mentioned above. 

Next, We will have to calculate the number of bits required to make the length fully divisible by 8 so that we can convert the boolean array into an array of Bytes and eventually use that for storing FEN.

 

Storing other information

In other information, we will have to store the below items

  1. List of moves that have been played in a given position
  2. Statics of the move, number of times white wins, black wins, and draws
  3. Ids of the most recent games where this position occurred. Maximum 1000

All these informations can be encoded further but in my algorithm, I have stored them in plain text as decompressing these values will add overhead to the decompressing process, and considering space benefit vs Tiem benefit, I did not find it preferable to compress these.

Take a look at the whole code here for compressing FENS 

 

Tags :

  • chess programming

About Author


Kumar Gaurav

Kumar Gaurav

A Software developer by profession and Chess player by passion. I write chess content regularly as part of my hobby and is very much invested in it. If you have any sugession for me, please contact

Leave a Comment:

Submit

Comments (6)


Jeff Lowery (fensterchess.com)

2024-01-29

Thank you for the thoughtful article.


Jeff Lowery (fensterchess.com)

2024-01-29

Thank you for the thoughtful article.


Steffen

2024-03-19

for you assumption that 256 would be enough to count moves I'd like you to have a look at this game: https://www.chessgames.com/perl/chessgame?gid=1268705


casanovanano3

2024-10-05

nanika1234567


casanovanano3

2024-10-05

nanika1234567


casanovanano3

2024-10-05

nanika1234567