SequenceFile container used throughout ConKit
SequenceFile
(id)[source]¶Bases: conkit.core.entity.Entity
A sequence file object representing a single sequence file
The SequenceFile
class represents a data structure to hold
Sequence
instances in a single sequence file. It contains
functions to store and analyze sequences.
id
¶str – A unique identifier
is_alignment
¶bool – A boolean status for the alignment
meff
¶int – The number of effective sequences in the SequenceFile
nseq
¶int – The number of sequences in the SequenceFile
remark
¶list – The SequenceFile
-specific remarks
status
¶int – An indication of the sequence file, i.e alignment, no alignment, or unknown
Examples
>>> from conkit.core import Sequence, SequenceFile
>>> sequence_file = SequenceFile("example")
>>> sequence_file.add(Sequence("foo", "ABCDEF"))
>>> sequence_file.add(Sequence("bar", "ZYXWVU"))
>>> print(sequence_file)
SequenceFile(id="example" nseq=2)
ascii_matrix
¶The alignment encoded in a 2-D ASCII matrix
calculate_freq
(*args, **kwargs)¶calculate_meff
(*args, **kwargs)¶calculate_neff_with_identity
(*args, **kwargs)¶calculate_weights
(*args, **kwargs)¶diversity
¶The diversity of an alignment defined by \(\sqrt{N}/L\).
N
equals the number of sequences in
the alignment and L
the sequence length
empty
¶Status of emptiness of sequencefile
encoded_matrix
¶The alignment encoded for contact prediction
filter
(min_id=0.3, max_id=0.9, inplace=False)[source]¶Filter sequences from an alignment according to the minimum and maximum identity between the sequences
Parameters: | |
---|---|
Returns: | The reference to the |
Return type: | |
Raises: |
|
filter_gapped
(min_prop=0.0, max_prop=0.9, inplace=True)[source]¶Filter all sequences a gap proportion greater than the limit
Parameters: | |
---|---|
Returns: | The reference to the |
Return type: | |
Raises: |
|
get_frequency
(symbol)[source]¶Calculate the frequency of an amino acid (symbol) in each Multiple Sequence Alignment column
Returns: | A list containing the per alignment-column amino acid frequency count |
---|---|
Return type: | list |
Raises: | RuntimeError – SequenceFile is not an alignment |
get_meff_with_id
(identity)[source]¶Calculate the number of effective sequences with specified sequence identity
See also
get_weights
(identity=0.8)[source]¶Calculate the sequence weights
This function calculates the sequence weights in the the Multiple Sequence Alignment.
The mathematical function used to calculate Meff is
Parameters: | identity (float, optional) – The sequence identity to use for similarity decision [default: 0.8] |
---|---|
Returns: | A list of the sequence weights in the alignment |
Return type: | |
Raises: |
|
is_alignment
A boolean status for the alignment
Returns: | A boolean status for the alignment |
---|---|
Return type: | bool |
meff
The number of effective sequences
neff
¶nseq
The number of sequences
remark
The SequenceFile
-specific remarks
sort
(kword, reverse=False, inplace=False)[source]¶Sort the SequenceFile
Parameters: | |
---|---|
Returns: | The reference to the |
Return type: | |
Raises: |
|
status
An indication of the residue status, i.e true positive, false positive, or unknown
summary
()[source]¶Generate a summary for the SequenceFile
Returns: | |
---|---|
Return type: | str |
to_string
()[source]¶Return the SequenceFile
as str
top_sequence
The first Sequence
entry in SequenceFile
Returns: | The first Sequence entry in SequenceFile |
---|---|
Return type: | Sequence |
trim
(start, end, inplace=False)[source]¶Trim the SequenceFile
Parameters: | |
---|---|
Returns: | The reference to the |
Return type: |