conkit.applications.cdhit module

Command line object for CCMpred contact prediction application

class CdhitCommandline(cmd='cd-hit', **kwargs)[source]

Bases: Bio.Application.AbstractCommandline

Command line object for Cd-hit [1] [2]

http://cd-hit.org

CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences. CD-HIT was originally developed by Dr. Weizhong Li at Dr. Adam Godzik’s Lab at the Burnham Institute (now Sanford-Burnham Medical Research Institute).

CD-HIT is very fast and can handle extremely large databases. CD-HIT helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.

[1]Li W, Jaroszewski L, Godzik A(2001). Clustering of highly homologous sequences to reduce thesize of large protein database. Bioinformatics 17, 282-283.
[2]Li W, Jaroszewski L, Godzik A (2002). Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics 18, 77-82.

Examples

>>> from conkit.applications import CdhitCommandline
>>> cdhit_cline = CdhitCommandline()
>>> print(cdhit_cline)

You would typically run the command line with cdhit_cline() or via the subprocess module.