conkit.core.sequencefile module¶

SequenceFile container used throughout ConKit

class SequenceAlignmentState[source]¶

Bases: enum.Enum

Alignment states

aligned = 2¶

unaligned = 1¶

unknown = 0¶

class SequenceFile(id)[source]¶

Bases: conkit.core._entity._Entity

A sequence file object representing a single sequence file

The SequenceFile class represents a data structure to hold Sequence instances in a single sequence file. It contains functions to store and analyze sequences.

Examples

>>> from conkit.core import Sequence, SequenceFile
>>> sequence_file = SequenceFile("example")
>>> sequence_file.add(Sequence("foo", "ABCDEF"))
>>> sequence_file.add(Sequence("bar", "ZYXWVU"))
>>> print(sequence_file)
SequenceFile(id="example" nseq=2)

Attributes

`id`	The ID of the selected entity
`is_alignment`	A boolean status for the alignment
`neff`	The number of effective sequences
`nseq`	The number of sequences
`remark`	The `SequenceFile`-specific remarks
`status`	An indication of the residue status, i.e true positive, false positive, or unknown
`top_sequence`	The first `Sequence` entry in `SequenceFile`

Methods

`add`(entity)	Add a child to the `Entity`
`calculate_freq`()	Calculate the gap frequency in each alignment column
`calculate_meff`([identity])	Calculate the number of effective sequences
`calculate_neff_with_identity`(identity)	Calculate the number of effective sequences with specified sequence identity
`calculate_weights`([identity])	Calculate the sequence weights
`copy`()	Create a shallow copy of `Entity`
`deepcopy`()	Create a deep copy of `Entity`
`filter`([min_id, max_id, inplace])	Filter an alignment
`remove`(id)	Remove a child
`sort`(kword[, reverse, inplace])	Sort the `SequenceFile`
`trim`(start, end[, inplace])	Trim the `SequenceFile`

ascii_matrix¶: The alignment encoded in a 2-D ASCII matrix

calculate_freq()[source]¶

Calculate the gap frequency in each alignment column

This function calculates the frequency of gaps at each position in the Multiple Sequence Alignment.

Returns:

Returns:	list A list containing the per alignment-column amino acid frequency count
Raises:	MemoryError Too many sequences in the alignment RuntimeError `SequenceFile` is not an alignment

list

A list containing the per alignment-column amino acid frequency count

Raises:

MemoryError

Too many sequences in the alignment

RuntimeError

SequenceFile is not an alignment

calculate_meff(identity=0.8)[source]¶: Calculate the number of effective sequences

See also

neff

calculate_neff_with_identity(identity)[source]¶: Calculate the number of effective sequences with specified sequence identity

See also

neff, calculate_weights

calculate_weights(identity=0.8)[source]¶

Calculate the sequence weights

This function calculates the sequence weights in the the Multiple Sequence Alignment.

The mathematical function used to calculate Meff is

\[M_{eff}=\sum_{i}\frac{1}{\sum_{j}S_{i,j}}\]

Parameters:

Parameters:	identity : float, optional The sequence identity to use for similarity decision [default: 0.8]
Returns:	list A list of the sequence weights in the alignment
Raises:	MemoryError Too many sequences in the alignment for Hamming distance calculation RuntimeError SciPy package not installed ValueError `SequenceFile` is not an alignment ValueError Sequence Identity needs to be between 0 and 1

identity : float, optional

The sequence identity to use for similarity decision [default: 0.8]

Returns:

list

A list of the sequence weights in the alignment

Raises:

MemoryError

Too many sequences in the alignment for Hamming distance calculation

RuntimeError

SciPy package not installed

ValueError

SequenceFile is not an alignment

ValueError

Sequence Identity needs to be between 0 and 1

empty¶: Status of emptiness of sequencefile

filter(min_id=0.3, max_id=0.9, inplace=False)[source]¶

Filter an alignment

Parameters:

Parameters:	min_id : float, optional max_id : float, optional inplace : bool, optional Replace the saved order of sequences [default: False]
Returns:	obj The reference to the `SequenceFile`, regardless of inplace
Raises:	MemoryError Too many sequences in the alignment for Hamming distance calculation RuntimeError SciPy package not installed ValueError `SequenceFile` is not an alignment ValueError Minimum sequence Identity needs to be between 0 and 1 ValueError Maximum sequence Identity needs to be between 0 and 1

min_id : float, optional

max_id : float, optional

inplace : bool, optional

Replace the saved order of sequences [default: False]

Returns:

obj

The reference to the SequenceFile, regardless of inplace

Raises:

MemoryError

Too many sequences in the alignment for Hamming distance calculation

RuntimeError

SciPy package not installed

ValueError

SequenceFile is not an alignment

ValueError

Minimum sequence Identity needs to be between 0 and 1

ValueError

Maximum sequence Identity needs to be between 0 and 1

is_alignment¶

A boolean status for the alignment

Returns:

Returns:	bool A boolean status for the alignment

bool

A boolean status for the alignment

neff¶: The number of effective sequences

nseq¶: The number of sequences

remark¶: The SequenceFile-specific remarks

sort(kword, reverse=False, inplace=False)[source]¶

Sort the SequenceFile

Parameters:

Parameters:	kword : str The dictionary key to sort sequences by reverse : bool, optional Sort the sequences in reverse order [default: False] inplace : bool, optional Replace the saved order of sequences [default: False]
Returns:	obj The reference to the `SequenceFile`, regardless of inplace
Raises:	ValueError `kword` not in `SequenceFile`

kword : str

The dictionary key to sort sequences by

reverse : bool, optional

Sort the sequences in reverse order [default: False]

inplace : bool, optional

Replace the saved order of sequences [default: False]

Returns:

obj

The reference to the SequenceFile, regardless of inplace

Raises:

ValueError

kword not in SequenceFile

status¶: An indication of the residue status, i.e true positive, false positive, or unknown

top_sequence¶

The first Sequence entry in SequenceFile

Returns:

Returns:	obj The first `Sequence` entry in `SequenceFile`

obj

The first Sequence entry in SequenceFile

trim(start, end, inplace=False)[source]¶

Trim the SequenceFile

Parameters:

Parameters:	start : int First residue to include end : int Final residue to include inplace : bool, optional Replace the saved order of sequences [default: False]
Returns:	obj The reference to the `SequenceFile`, regardless of inplace

start : int

First residue to include

end : int

Final residue to include

inplace : bool, optional

Replace the saved order of sequences [default: False]

Returns:

obj

The reference to the SequenceFile, regardless of inplace