conkit.core.sequencefile module¶
SequenceFile container used throughout ConKit
-
class
SequenceFile
(id)[source]¶ Bases:
conkit.core.entity.Entity
A sequence file object representing a single sequence file
The
SequenceFile
class represents a data structure to holdSequence
instances in a single sequence file. It contains functions to store and analyze sequences.-
meff
¶ The number of effective sequences in the
SequenceFile
Type: int
-
nseq
¶ The number of sequences in the
SequenceFile
Type: int
-
remark
¶ The
SequenceFile
-specific remarksType: list
Examples
>>> from conkit.core import Sequence, SequenceFile >>> sequence_file = SequenceFile("example") >>> sequence_file.add(Sequence("foo", "ABCDEF")) >>> sequence_file.add(Sequence("bar", "ZYXWVU")) >>> print(sequence_file) SequenceFile(id="example" nseq=2)
-
ascii_matrix
¶ The alignment encoded in a 2-D ASCII matrix
-
diversity
¶ The diversity of an alignment defined by \(\sqrt{N}/L\).
N
equals the number of sequences in the alignment andL
the sequence length
-
empty
¶ Status of emptiness of sequencefile
-
encoded_matrix
¶ The alignment encoded for contact prediction
-
filter
(min_id=0.3, max_id=0.9, inplace=False)[source]¶ Filter sequences from an alignment according to the minimum and maximum identity between the sequences
Parameters: Returns: The reference to the
SequenceFile
, regardless of inplaceReturn type: Raises: ValueError
–SequenceFile
is not an alignmentValueError
– Minimum sequence identity needs to be between 0 and 1ValueError
– Maximum sequence identity needs to be between 0 and 1
-
filter_gapped
(min_prop=0.0, max_prop=0.9, inplace=True)[source]¶ Filter all sequences a gap proportion greater than the limit
Parameters: Returns: The reference to the
SequenceFile
, regardless of inplaceReturn type: Raises: ValueError
–SequenceFile
is not an alignmentValueError
– Minimum gap proportion needs to be between 0 and 1ValueError
– Maximum gap proportion needs to be between 0 and 1
-
get_frequency
(symbol)[source]¶ Calculate the frequency of an amino acid (symbol) in each Multiple Sequence Alignment column
Returns: A list containing the per alignment-column amino acid frequency count Return type: list Raises: RuntimeError
–SequenceFile
is not an alignment
-
get_meff_with_id
(identity)[source]¶ Calculate the number of effective sequences with specified sequence identity
See also
-
get_weights
(identity=0.8)[source]¶ Calculate the sequence weights
This function calculates the sequence weights in the the Multiple Sequence Alignment.
The mathematical function used to calculate Meff is
\[M_{eff}=\sum_{i}\frac{1}{\sum_{j}S_{i,j}}\]Parameters: identity (float, optional) – The sequence identity to use for similarity decision [default: 0.8]
Returns: A list of the sequence weights in the alignment
Return type: Raises: ValueError
–SequenceFile
is not an alignmentValueError
– Sequence Identity needs to be between 0 and 1
-
is_alignment
A boolean status for the alignment
Returns: A boolean status for the alignment Return type: bool
-
meff
The number of effective sequences
-
nseq
The number of sequences
-
remark
The
SequenceFile
-specific remarks
-
sort
(kword, reverse=False, inplace=False)[source]¶ Sort the
SequenceFile
Parameters: Returns: The reference to the
SequenceFile
, regardless of inplaceReturn type: Raises: ValueError
–kword
not inSequenceFile
-
status
An indication of the residue status, i.e true positive, false positive, or unknown
-
summary
()[source]¶ Generate a summary for the
SequenceFile
Returns: Return type: str
-
to_string
()[source]¶ Return the
SequenceFile
asstr
-
top_sequence
The first
Sequence
entry inSequenceFile
Returns: The first Sequence
entry inSequenceFile
Return type: Sequence
-
trim
(start, end, inplace=False)[source]¶ Trim the
SequenceFile
Parameters: Returns: The reference to the
SequenceFile
, regardless of inplaceReturn type:
-