conkit.applications.cdhit module¶
Command line object for CCMpred contact prediction application
-
class
CdhitCommandline
(cmd='cd-hit', **kwargs)[source]¶ Bases:
Bio.Application.AbstractCommandline
Command line object for Cd-hit [1] [2]
CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences. CD-HIT was originally developed by Dr. Weizhong Li at Dr. Adam Godzik’s Lab at the Burnham Institute (now Sanford-Burnham Medical Research Institute).
CD-HIT is very fast and can handle extremely large databases. CD-HIT helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.
[1] Li W, Jaroszewski L, Godzik A(2001). Clustering of highly homologous sequences to reduce thesize of large protein database. Bioinformatics 17, 282-283. [2] Li W, Jaroszewski L, Godzik A (2002). Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics 18, 77-82. Examples
>>> from conkit.applications import CdhitCommandline >>> cdhit_cline = CdhitCommandline() >>> print(cdhit_cline)
You would typically run the command line with
cdhit_cline()
or via thesubprocess
module.