SequenceFile container used throughout ConKit
SequenceFile
(id)[source]¶Bases: conkit.core._entity._Entity
A sequence file object representing a single sequence file
The SequenceFile
class represents a data structure to hold
Sequence
instances in a single sequence file. It contains
functions to store and analyze sequences.
Examples
>>> from conkit.core import Sequence, SequenceFile
>>> sequence_file = SequenceFile("example")
>>> sequence_file.add(Sequence("foo", "ABCDEF"))
>>> sequence_file.add(Sequence("bar", "ZYXWVU"))
>>> print(sequence_file)
SequenceFile(id="example" nseq=2)
Attributes
id |
The ID of the selected entity |
is_alignment |
A boolean status for the alignment |
meff |
The number of effective sequences |
nseq |
The number of sequences |
remark |
The SequenceFile -specific remarks |
status |
An indication of the residue status, i.e true positive, false positive, or unknown |
top_sequence |
The first Sequence entry in SequenceFile |
Methods
add (entity) |
Add a child to the Entity |
calculate_freq () |
Calculate the gap frequency in each alignment column |
calculate_meff ([identity]) |
Calculate the number of effective sequences |
calculate_meff_with_identity (identity) |
Calculate the number of effective sequences with specified sequence identity |
calculate_neff_with_identity (identity) |
Calculate the number of effective sequences with specified sequence identity |
calculate_weights ([identity]) |
Calculate the sequence weights |
copy () |
Create a shallow copy of Entity |
deepcopy () |
Create a deep copy of Entity |
filter ([min_id, max_id, inplace]) |
Filter sequences from an alignment according to the minimum and maximum identity |
filter_gapped ([min_prop, max_prop, inplace]) |
Filter all sequences a gap proportion greater than the limit |
remove (id) |
Remove a child |
sort (kword[, reverse, inplace]) |
Sort the SequenceFile |
to_string () |
Return the SequenceFile as str |
trim (start, end[, inplace]) |
Trim the SequenceFile |
ascii_matrix
¶The alignment encoded in a 2-D ASCII matrix
calculate_freq
()[source]¶Calculate the gap frequency in each alignment column
This function calculates the frequency of gaps at each position in the Multiple Sequence Alignment.
Returns: | list
|
---|---|
Raises: | MemoryError
RuntimeError
|
calculate_meff_with_identity
(identity)[source]¶Calculate the number of effective sequences with specified sequence identity
See also
calculate_neff_with_identity
(identity)[source]¶Calculate the number of effective sequences with specified sequence identity
See also
calculate_weights
(identity=0.8)[source]¶Calculate the sequence weights
This function calculates the sequence weights in the the Multiple Sequence Alignment.
The mathematical function used to calculate Meff is
Parameters: | identity : float, optional
|
---|---|
Returns: | list
|
Raises: | ImportError
ValueError
ValueError
|
diversity
¶The diversity of an alignment defined by \(\sqrt{N}/L\).
N
equals the number of sequences in
the alignment and L
the sequence length
empty
¶Status of emptiness of sequencefile
encoded_matrix
¶The alignment encoded for contact prediction
filter
(min_id=0.3, max_id=0.9, inplace=False)[source]¶Filter sequences from an alignment according to the minimum and maximum identity between the sequences
Parameters: | min_id : float, optional
max_id : float, optional
inplace : bool, optional
|
---|---|
Returns: | obj
|
Raises: | MemoryError
RuntimeError
ValueError
ValueError
ValueError
|
filter_gapped
(min_prop=0.0, max_prop=0.9, inplace=True)[source]¶Filter all sequences a gap proportion greater than the limit
Parameters: | min_prop : float, optional
max_prop : float, optional
inplace : bool, optional
|
---|---|
Returns: | obj
|
is_alignment
¶A boolean status for the alignment
Returns: | bool
|
---|
meff
¶The number of effective sequences
neff
¶The number of effective sequences
nseq
¶The number of sequences
remark
¶The SequenceFile
-specific remarks
sort
(kword, reverse=False, inplace=False)[source]¶Sort the SequenceFile
Parameters: | kword : str
reverse : bool, optional
inplace : bool, optional
|
---|---|
Returns: | obj
|
Raises: | ValueError
|
status
¶An indication of the residue status, i.e true positive, false positive, or unknown
to_string
()[source]¶Return the SequenceFile
as str
top_sequence
¶The first Sequence
entry in SequenceFile
Returns: | obj
|
---|
trim
(start, end, inplace=False)[source]¶Trim the SequenceFile
Parameters: | start : int
end : int
inplace : bool, optional
|
---|---|
Returns: | obj
|