SequenceFile container used throughout ConKit
SequenceAlignmentState
[source]¶Bases: enum.Enum
Alignment states
aligned
= 2¶unaligned
= 1¶unknown
= 0¶SequenceFile
(id)[source]¶Bases: conkit.core._entity._Entity
A sequence file object representing a single sequence file
The SequenceFile
class represents a data structure to hold
Sequence
instances in a single sequence file. It contains
functions to store and analyze sequences.
Examples
>>> from conkit.core import Sequence, SequenceFile
>>> sequence_file = SequenceFile("example")
>>> sequence_file.add(Sequence("foo", "ABCDEF"))
>>> sequence_file.add(Sequence("bar", "ZYXWVU"))
>>> print(sequence_file)
SequenceFile(id="example" nseq=2)
Attributes
id 
The ID of the selected entity 
is_alignment 
A boolean status for the alignment 
neff 
The number of effective sequences 
nseq 
The number of sequences 
remark 
The SequenceFile specific remarks 
status 
An indication of the residue status, i.e true positive, false positive, or unknown 
top_sequence 
The first Sequence entry in SequenceFile 
Methods
add (entity) 
Add a child to the Entity 
calculate_freq () 
Calculate the gap frequency in each alignment column 
calculate_meff ([identity]) 
Calculate the number of effective sequences 
calculate_neff_with_identity (identity) 
Calculate the number of effective sequences with specified sequence identity 
calculate_weights ([identity]) 
Calculate the sequence weights 
copy () 
Create a shallow copy of Entity 
deepcopy () 
Create a deep copy of Entity 
filter ([min_id, max_id, inplace]) 
Filter an alignment 
remove (id) 
Remove a child 
sort (kword[, reverse, inplace]) 
Sort the SequenceFile 
trim (start, end[, inplace]) 
Trim the SequenceFile 
ascii_matrix
¶The alignment encoded in a 2D ASCII matrix
calculate_freq
()[source]¶Calculate the gap frequency in each alignment column
This function calculates the frequency of gaps at each position in the Multiple Sequence Alignment.
Returns:  list


Raises:  MemoryError
RuntimeError

calculate_neff_with_identity
(identity)[source]¶Calculate the number of effective sequences with specified sequence identity
See also
calculate_weights
(identity=0.8)[source]¶Calculate the sequence weights
This function calculates the sequence weights in the the Multiple Sequence Alignment.
The mathematical function used to calculate Meff is
Parameters:  identity : float, optional


Returns:  list

Raises:  MemoryError
RuntimeError
ValueError
ValueError

empty
¶Status of emptiness of sequencefile
filter
(min_id=0.3, max_id=0.9, inplace=False)[source]¶Filter an alignment
Parameters:  min_id : float, optional max_id : float, optional inplace : bool, optional


Returns:  obj

Raises:  MemoryError
RuntimeError
ValueError
ValueError
ValueError

is_alignment
¶A boolean status for the alignment
Returns:  bool


neff
¶The number of effective sequences
nseq
¶The number of sequences
remark
¶The SequenceFile
specific remarks
sort
(kword, reverse=False, inplace=False)[source]¶Sort the SequenceFile
Parameters:  kword : str
reverse : bool, optional
inplace : bool, optional


Returns:  obj

Raises:  ValueError

status
¶An indication of the residue status, i.e true positive, false positive, or unknown
top_sequence
¶The first Sequence
entry in SequenceFile
Returns:  obj


trim
(start, end, inplace=False)[source]¶Trim the SequenceFile
Parameters:  start : int
end : int
inplace : bool, optional


Returns:  obj
