Finding complex motifs in biological sequences

23 Feb 2017

The following is a succinct and introductory guide to two efficient, accurate and easy-to-use bioinformatic tools for pattern matching in biological sequences are (the aptly named) Pattern Locator and ScanProsite.

 

These tools are different to BLAST in that their input can be degenerate or indeterminate patterns. For example, the following peptide motif of length 8 is found in caveolins, a family of membrane proteins: FED{L,V}IA{D,E}{P,A}. The letter in the last position could be a P or an A; this is known as a degenerate position.

 

Pattern Locator

DNA only.

Input: text sequence in FASTA format, motifs in IUPAC nomenclature.

Motif format example: TTYT, where Y = {C,T}

Links to associated publication, web servicedownloadable source code and documentation.

 

ScanProsite

Proteins only.

Input: text sequence in FASTA format, motifs in PROSITE's regex format.

Motif format example: F-E-D-[LV]-I-A-[DE]-[PA], where each position is separated by a dash and square brackets contain all the letters in the set at a degenerate position.

Links to associated publication, web servicedownloadable source code and documentation.

 

[53]

Share on Facebook
Share on Twitter
Share on LinkedIn
Please reload

Please reload

Related Posts
PhDomics by Fatima