Genome Assembly

7 Mar 2017

Genome assembly is the process of assembling a genomic sequence after DNA sequencing (during which the DNA molecule is cut in to pieces).

Once sequenced, these pieces are called reads. In order to ensure as accurate a sequence as possible, DNA sequencing is done in such a way that there are multiple copies of the same or slightly overlapping reads. The higher this coverage the more confidence we can have in a sequence. Read length can vary from 25 [1] to 1000 (Sanger sequencing) base pairs. Different approaches need to be taken in the design of algorithms which take different lengths of reads as input.


There are two types of assembly: de novo assembly and mapping assembly.


Mapping assembly utilises a reference sequence (already assembled) to which the new sequence can be aligned. On the other hand, de novo assembly is done from scratch. The latter is also significantly more computationally expensive (time and memory).


Part two will be posted on Day 67 - stay tuned! Once you've read all the posts, I would recommend watching the following lecture.


This post is based upon a lecture given by my supervisor Dr. Solon Pissis.


1. Barski, Artem, et al. "High-resolution profiling of histone methylations in the human genome." Cell 129.4 (2007): 823-837.


Share on Facebook
Share on Twitter
Share on LinkedIn
Please reload

Please reload

Related Posts
PhDomics by Fatima