The Z-curveÂ of a DNA sequence is its representation as a three-dimensional graph based on three skew measurements. It is, therefore, useful in genomic analysis as it can be used as a compact representation of a genome and can be converted back in to the genomic sequence.

Definition

Each node in the graph has three co-ordinates:

xi = (Ai + Gi) - (Ci + Ti)

which represents purine-pyrimidine skew

yi = (Ai + Ci) - (Gi + Ti)

which represents amino-keto skew

zi = (Ai + Ti) - (Ci + Gi)

which represents weak-strong hydrogen bond base skew

where i = 0...n - 1,Â n is the length of the sequence,xi, yi, zi are in the range [-n, n], andÂ Ai,Â Ci,Â Gi,Â Ti are cumulative frequencies of each of the bases in the sequence up to and including position i.Â Each node is connected to its neighbouring node with a straight line and, intuitively, the first node has coordinates [0, 0, 0].

Application

AT-skew, defined by (xi + yi)/2 and GC-skew, defined by (xi - yi)/2, are used to predict the locus of the origin of replication in a prokaryotic genome [1].

