conkit.core.sequencefile module¶
SequenceFile container used throughout ConKit
-
class
SequenceFile(id)[source]¶ Bases:
conkit.core.entity.EntityA sequence file object representing a single sequence file
The
SequenceFileclass represents a data structure to holdSequenceinstances in a single sequence file. It contains functions to store and analyze sequences.-
meff¶ The number of effective sequences in the
SequenceFileType: int
-
nseq¶ The number of sequences in the
SequenceFileType: int
-
remark¶ The
SequenceFile-specific remarksType: list
Examples
>>> from conkit.core import Sequence, SequenceFile >>> sequence_file = SequenceFile("example") >>> sequence_file.add(Sequence("foo", "ABCDEF")) >>> sequence_file.add(Sequence("bar", "ZYXWVU")) >>> print(sequence_file) SequenceFile(id="example" nseq=2)
-
ascii_matrix¶ The alignment encoded in a 2-D ASCII matrix
-
diversity¶ The diversity of an alignment defined by \(\sqrt{N}/L\).
Nequals the number of sequences in the alignment andLthe sequence length
-
empty¶ Status of emptiness of sequencefile
-
encoded_matrix¶ The alignment encoded for contact prediction
-
filter(min_id=0.3, max_id=0.9, inplace=False)[source]¶ Filter sequences from an alignment according to the minimum and maximum identity between the sequences
Parameters: Returns: The reference to the
SequenceFile, regardless of inplaceReturn type: Raises: ValueError–SequenceFileis not an alignmentValueError– Minimum sequence identity needs to be between 0 and 1ValueError– Maximum sequence identity needs to be between 0 and 1
-
filter_gapped(min_prop=0.0, max_prop=0.9, inplace=True)[source]¶ Filter all sequences a gap proportion greater than the limit
Parameters: Returns: The reference to the
SequenceFile, regardless of inplaceReturn type: Raises: ValueError–SequenceFileis not an alignmentValueError– Minimum gap proportion needs to be between 0 and 1ValueError– Maximum gap proportion needs to be between 0 and 1
-
get_frequency(symbol)[source]¶ Calculate the frequency of an amino acid (symbol) in each Multiple Sequence Alignment column
Returns: A list containing the per alignment-column amino acid frequency count Return type: list Raises: RuntimeError–SequenceFileis not an alignment
-
get_meff_with_id(identity)[source]¶ Calculate the number of effective sequences with specified sequence identity
See also
-
get_weights(identity=0.8)[source]¶ Calculate the sequence weights
This function calculates the sequence weights in the the Multiple Sequence Alignment.
The mathematical function used to calculate Meff is
\[M_{eff}=\sum_{i}\frac{1}{\sum_{j}S_{i,j}}\]Parameters: identity (float, optional) – The sequence identity to use for similarity decision [default: 0.8]
Returns: A list of the sequence weights in the alignment
Return type: Raises: ValueError–SequenceFileis not an alignmentValueError– Sequence Identity needs to be between 0 and 1
-
is_alignment A boolean status for the alignment
Returns: A boolean status for the alignment Return type: bool
-
meff The number of effective sequences
-
nseq The number of sequences
-
remark The
SequenceFile-specific remarks
-
sort(kword, reverse=False, inplace=False)[source]¶ Sort the
SequenceFileParameters: Returns: The reference to the
SequenceFile, regardless of inplaceReturn type: Raises: ValueError–kwordnot inSequenceFile
-
status An indication of the residue status, i.e true positive, false positive, or unknown
-
summary()[source]¶ Generate a summary for the
SequenceFileReturns: Return type: str
-
to_string()[source]¶ Return the
SequenceFileasstr
-
top_sequence The first
Sequenceentry inSequenceFileReturns: The first Sequenceentry inSequenceFileReturn type: Sequence
-
trim(start, end, inplace=False)[source]¶ Trim the
SequenceFileParameters: Returns: The reference to the
SequenceFile, regardless of inplaceReturn type:
-