Identifying conserved protein domains when running SequenceServer Cloud BLAST searches

SequenceServer is great for quickly and securely BLASTing unpublished datasets. We now also identify conserved protein domains for every query sequence. For this, we use the NCBI CDD conserved domain database. Results are displayed as part of the BLAST report, and can also be downloaded for further processing.

CDD screenshots

These screenshots show what this looks like; we provide more context below.

Example of endonuclease domain identification

For each query, we end up with an additional plot, like this one, which shows that the query sequence contains a conserved endonuclease domain. Here, the standard CDD database includes several versions of that domain that differ slightly, which is why we see several aligning domain models.

Example of other output data

A standard SequenceServer BLAST report now also includes a table with:

Raw data download

Domain identification results are also available as a separate download in JSON format. This is useful if you want to do further analysis on large numbers of genes, or for general functional protein annotation.

Why protein domains?

The conserved domain database (CDD) is a collection of protein domains that are conserved across species. It is a great resource to identify the functions of a protein. Each domain model was built from a multiple alignment of many sequences.

Using a domain-identification method is complementary to a normal BLAST-type sequence similarity search: Sometimes you may get 0 BLAST hits, but a protein domain identification tool can still detect protein domains. This type of model can be more sensitive than aligning to individual sequences.

What is a protein domain?

A protein domain is a part of a protein that has a specific function. For example, a protein domain can be a binding site for a ligand, or a site that is involved in catalysis. Based on alignments of sections of proteins that have a similar function, protein domains can be identified.

That was done for example for the protein kinase domain. This domain is 250 amino acids long and found in proteins that are involved in phosphorylation.

Hundreds of other protein domains are known and included in NCBI’s CDD. Some classic examples:

So what are the benefits of having protein domain information?

Protein domains are distinct functional and structural units in a protein. Information about domains can help to understand protein function, evolution, and interactions. The CDD helps in predicting the function of unknown proteins, offering a window into their potential roles in cellular processes.

This can be particularly useful for proteins that are not well annotated, or for proteins that are not well conserved across species. For example, if you have a protein that is only found in a single species, you may not be able to find any similar sequences in a BLAST search. But if that protein contains a conserved domain, you can still identify that domain, and thus get some information about the function of the protein.

CDD annotation can thus be used to:

Multiple domains for multiple functions?

Complex proteins often have multiple domains. Each domain can have a distinct structure and function, allowing the protein to perform multiple roles or interact with different molecules within the cell.

How do I try SequenceServer’s domain identification?

Conserved domain information is immediately available to all SequenceServer Cloud users. To create a secure SequenceServer Cloud instance for yourself or your team, sign up today.

Stay up to date

Enter your email to receive the latest news and updates from our team.