Command line object for CCMpred contact prediction application
CdhitCommandline
(cmd='cd-hit', **kwargs)[source]¶Bases: Bio.Application.AbstractCommandline
Command line object for Cd-hit [1] [2]
CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences. CD-HIT was originally developed by Dr. Weizhong Li at Dr. Adam Godzik’s Lab at the Burnham Institute (now Sanford-Burnham Medical Research Institute).
CD-HIT is very fast and can handle extremely large databases. CD-HIT helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.
[1] | Li W, Jaroszewski L, Godzik A(2001). Clustering of highly homologous sequences to reduce thesize of large protein database. Bioinformatics 17, 282-283. |
[2] | Li W, Jaroszewski L, Godzik A (2002). Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics 18, 77-82. |
Examples
>>> from conkit.applications import CdhitCommandline
>>> cdhit_cline = CdhitCommandline()
>>> print(cdhit_cline)
You would typically run the command line with cdhit_cline()
or via
the subprocess
module.