Following my post on the de novo genome assembly problem, it should be pointed out that it can be phrased as the Shortest Common Superstring problem, which has been shown to be NP-hard.

Given a set of strings, we must report the shortest string containing all the strings in the set as factors.

For example, take the following strings in a set:

abaab

baba

aabbb

bbab

The SCS of this set would be bbabaabbb.

Part one of this series on Genome Assembly was posted on Day 65 and part two was posted on Day 67. Part four will be posted on day 71 - stay tuned!

This post is based upon a lecture given by my supervisor Dr. Solon Pissis.

[69]

Please reload

Please reload