DOTD: Burrows-Wheeler Transform

15 Mar 2017

Burrows-Wheeler Transform, or BWT, is a text indexing algorithm, which is used in data compression applications. [1]

 

How it works:

  • build an by matrix BWM containing all the cyclic rotations of your text t, which has length | | = n, such that each row of BWM contains one rotation of t, and each cell of BWM contains a letter of t

  • sort the rows lexicographically

  • the rightmost column of BWM is BWT

For example [2], the following string T

^BANANA|

 

would give the following matrix BWM(T)

^BANANA|

|^BANANA

A|^BANAN

NA|^BANA

ANA|^BAN

NANA|^BA

ANANA|^B

BANANA|^

and if we sort the rows lexicographically, we end up with

ANANA|^B

ANA|^BAN

A|^BANAN

BANANA|^

NANA|^BA

NA|^BANA

^BANANA|

|^BANANA

which gives us BWT(T) in the last column, which is

BNN^AA|A

Once of the coolest things about BWT is that you can restore the original string from only its BWT!

You may also notice that BWT has a close link to the suffix array, but is much smaller in size. Algorithms are always being developed for searching indexes like BWT, for example, the FM-Index [3]. Check out my next blog post to learn about the FM-Index!
 

1. Burrows, Michael, and David J. Wheeler. "A block-sorting lossless data compression algorithm." (1994).
2. https://en.wikipedia.org/wiki/Burrows–Wheeler_transform

3. Ferragina, Paolo, and Giovanni Manzini. "Opportunistic data structures with applications." Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on. IEEE, 2000.

 

[73]

Share on Facebook
Share on Twitter
Share on LinkedIn
Please reload

Please reload

Related Posts
PhDomics by Fatima