#LitReview: CD-Search: protein domain annotations on the fly

17 Jun 2017

This blog post is a summary of the following paper, and follows its structure

 

Marchler-Bauer, A. and Bryant, S.H., 2004. CD-Search: protein domain annotations on the fly. Nucleic acids research, 32 (suppl 2), pp.W327-W331.

 

Introduction

 

Conserved Domain (CD-) Search searches a collection of PSSMs with the input sequence. The collection comprise several databases, including Pfam and SMART. Using the web service, one query can take up to 20 seconds depending on the query sequence length and the size of the database being searched. CD-Search uses RPS-BLAST and reports E-values analogous to those given by PSI-BLAST. Results are summarised by figures, but details are given on hits and pairwise alignments of the (consensus sequences of the) PSSMs and the input sequence. Graphical annotation includes that of functional domains; information which is obtained from the curated PSSM models, which help to identify residues critical to the function of the protein domain, and possibly help to distinguish chance occurrences of the patterns in the input sequence. Notably, CD-Search also finds and reports three-dimensional structural domains; supplemented by a visualisation tool.

 

Using CD-Search

 

CD-Search requires the following input parameters:

  • Database of PSSMs to search in (the default, Conserved Domain Database [CDD], includes all PSSM databases, such as those mentioned above)

  • Input sequence

  • E-value threshold

  • Option to switch off filter for low-complexity regions

  • Search mode (for example, switching from one two passes of the database will result in more refined results, but will be more time-consuming)

  • Format of output (for example, limiting the number of results)

Interpreting Results

 

Each hit is represented graphically on the input sequence as a whole and can be expanded to show details such as the E-value of the hit. Also, relationships between multiple domains and low complexity regions that were not used in the search are highlighted. Each hit is then also shown on a more detailed scale with a pairwise alignment between the input sequence and the domain model; reporting the percentage of the model used in the alignment. The pairwise alignment can be expanded to a multiple sequence alignment with option to include sequences with the highest similarity to the input sequence.

 

An evidence viewer provides citations or three-dimensional displays for each hit, which aid the user in determining whether a hit with a threshold near the borderline is in fact valid. Three-dimensional displays make use of NCBI's structure viewer, Cn3D, to visualise the structural information of the domain model and the multiple sequence alignment of the domain family with the input sequence.

 

Alternative Routes to CD-Search Results

 

All entries of proteins in the Entrez database, apart from the newest, contain sections with summaries on pre-calculated CD-Search results. Secondly, CD-Search is automatically run alongside any BLASTp submissions. Finally, as well the available web service, users can download RPS-BLAST and the CS-Search databases to run the program locally.

 

Future Developments

 

Curation of databases and introduction of hierarchies of domain families, in order to better organise overlapping results, and of course improve their overall quality.

 

Improvement of the user-friendliness of the organisation of the results.

 

[167]

Share on Facebook
Share on Twitter
Share on LinkedIn
Please reload

Please reload

Related Posts
PhDomics by Fatima