Burrows-Wheeler Transform, or BWT, is a text indexing algorithm, which is used in data compression applications. [1]

How it works:

build an n by n matrix BWM containing all the cyclic rotations of your text t, which has length | t | = n, such that each row of BWM contains one rotation of t, and each cell of BWM contains a letter of t

sort the rows lexicographically

the rightmost column of BWM is BWT

For example [2], the following string T

^BANANA|

would give the following matrix BWM(T)

^BANANA|

|^BANANA

A|^BANAN

NA|^BANA

ANA|^BAN

NANA|^BA

ANANA|^B

BANANA|^

and if we sort the rows lexicographically, we end up with

ANANA|^B

ANA|^BAN

A|^BANAN

BANANA|^

NANA|^BA

NA|^BANA

^BANANA|

|^BANANA

which gives us BWT(T) in the last column, which is

BNN^AA|A

Once of the coolest things about BWT is that you can restore the original string from only its BWT!

You may also notice that BWT has a close link to the suffix array, but is much smaller in size. Algorithms are always being developed for searching indexes like BWT, for example, the FM-Index [3]. Check out my next blog post to learn about the FM-Index!

1. Burrows, Michael, and David J. Wheeler. "A block-sorting lossless data compression algorithm." (1994).

2. https://en.wikipedia.org/wiki/Burrows–Wheeler_transform

3. Ferragina, Paolo, and Giovanni Manzini. "Opportunistic data structures with applications." Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on. IEEE, 2000.

[73]