#BioinformaticsTool: Oriloc

7 Jul 2017

Oriloc finds the locus of the origin of replication in a bacterial chromosome (OriC).


The main idea


It makes use of the concept of GC-skew described in my previous blog post by initially performing a DNA walk. A DNA walk plots the cumulative GC- and AT-skew using the following axis assignments:


x-axis{A} = ( -1 , 0 )

x-axis{T} = ( 0 , 1 )

y-axis{G} = ( -1 , 0 )

y-axis{C} = ( 0 , 1 )


Note that only the third position of a codon is recorded in the walk. This increases the signal/noise ratio due to the higher variability in the third position [1]; see my post on the wobble position to understand why. It is consequently most appropriate to observe codon usage bias (and therefore GC-skew) by recording only the third codon position in the DNA walk.


Once the DNA walk is plotted, a regression line is fitted to the trajectory. The distance from the origin (of the plane, that is, (0,0)) to each point on the regression line (that corresponds to a point in the trajectory) represents cumulative skew. These distances are used to plot the gene start/end position (on the x-axis) against cumulative skew (on the y-axis).


If this plot contains two peaks, the positive and negative ones correspond to the origin and terminus of replication, respectively. If this plot contains only one peak, the sequence must have started from either the origin or terminus of replication.


What about unannotated genomes?


For newly-sequenced genomes, Glimmer 2.0 output, which predicts coding sequences, should be given as input along with the sequence. Note that, using genome annotations instead of Glimmer 2.0 output does not seem to cause a significant difference in the results.


Filtering out false-positives


If less than 50% of the putative leading strand comprises genes, the corresponding OriC prediction is discarded. This is due to the observed selective pressure for coding sequences to be on the leading strand.


How accurate is it?


It correctly found the loci of OriC in two bacterial genomes, but for E. coli, a secondary peak, which was 7kb away from the predicted OriC, was in fact the correct one. The authors, therefore, stress the importance of human inspection of the output.




This post is based on the following application note:

Frank, A.C. and Lobry, J.R., 2000. Oriloc: prediction of replication boundaries in unannotated bacterial chromosomes. Bioinformatics, 16(6), pp.560-561.


I also found this documentation to be of great use for understanding technical details.


Other references:

1. Navarre, W.W., Porwollik, S., Wang, Y., McClelland, M., Rosen, H., Libby, S.J. and Fang, F.C., 2006. Selective silencing of foreign DNA with low GC content by the H-NS protein in Salmonella. Science, 313(5784), pp.236-238.

Share on Facebook
Share on Twitter
Share on LinkedIn
Please reload

Please reload

Related Posts
PhDomics by Fatima