conkit.core.SequenceFile module

Storage space for a sequence file

class SequenceFile(id)[source]

Bases: conkit.core.Entity.Entity

A sequence file object representing a single sequence file

Examples

>>> from conkit.core import Sequence, SequenceFile
>>> sequence_file = SequenceFile("example")
>>> sequence_file.add(Sequence("foo", "ABCDEF"))
>>> sequence_file.add(Sequence("bar", "ZYXWVU"))
>>> print(sequence_file)
SequenceFile(id="example" nseqs=2)

Attributes

id The ID of the selected entity
is_alignment A boolean status for the alignment
nseqs The number of conkit.core.Sequence instances in the conkit.core.SequenceFile
remark The conkit.core.SequenceFile-specific remarks
status An indication of the residue status, i.e true positive, false positive, or unknown
top_sequence The first conkit.core.Sequence entry in conkit.core.SequenceFile

Methods

add(entity) Add a child to the Entity
calculate_freq() Calculate the gap frequency in each alignment column
calculate_meff([identity]) Calculate the number of effective sequences
copy() Create a shallow copy of conkit.core.Entity
deepcopy() Create a deep copy of conkit.core.Entity
remove(id) Remove a child
sort(kword[, reverse, inplace]) Sort the conkit.core.SequenceFile
calculate_freq()[source]

Calculate the gap frequency in each alignment column

This function calculates the frequency of gaps at each position in the Multiple Sequence Alignment.

Returns:

frequency : list

A list containing the per alignment-column amino acid frequency count

Raises:

MemoryError

Too many sequences in the alignment

RuntimeError

conkit.core.SequenceFile is not an alignment

calculate_meff(identity=0.7)[source]

Calculate the number of effective sequences

This function calculates the number of effective sequences (Meff) in the Multiple Sequence Alignment.

The mathematical function used to calculate Meff is

\[M_{eff}=\sum_{i}\frac{1}{\sum_{j}S_{i,j}}\]
Parameters:

identity : float, optional

The sequence identity to use for similarity decision [default: 0.7]

Returns:

meff : int

The number of effective sequences

Raises:

MemoryError

Too many sequences in the alignment for Hamming distance calculation

RuntimeError

SciPy package not installed

ValueError

conkit.core.SequenceFile is not an alignment

ValueError

Sequence Identity needs to be between 0 and 1

is_alignment

A boolean status for the alignment

Returns:

is_alignment : bool

A boolean status for the alignment

nseqs

The number of conkit.core.Sequence instances in the conkit.core.SequenceFile

Returns:

nseqs : int

The number of sequences in the conkit.core.SequenceFile

remark

The conkit.core.SequenceFile-specific remarks

sort(kword, reverse=False, inplace=False)[source]

Sort the conkit.core.SequenceFile

Parameters:

kword : str

The dictionary key to sort contacts by

reverse : bool, optional

Sort the contact pairs in descending order [default: False]

inplace : bool, optional

Replace the saved order of contacts [default: False]

Returns:

contact_map : conkit.core.ContactMap

The reference to the conkit.core.ContactMap, regardless of inplace

Raises:

ValueError

status

An indication of the residue status, i.e true positive, false positive, or unknown

top_sequence

The first conkit.core.Sequence entry in conkit.core.SequenceFile

Returns:

top_sequence : conkit.core.Sequence, None