Core package

Core modules for hierarchy construction

class Contact(res1_seq, res2_seq, raw_score, distance_bound=(0, 8))[source]

Bases: conkit.core.Entity.Entity

A contact pair template to store all associated information

Examples

>>> from conkit.core import Contact
>>> contact = Contact(1, 25, 1.0)
>>> print(contact)
Contact(id="(1, 25)" res1="A" res1_seq=1 res2="A" res2_seq=25 raw_score=1.0)

Attributes

distance_bound
id
is_false_positive
is_true_positive
lower_bound
raw_score
res1
res2
res1_chain
res2_chain
res1_seq
res2_seq
res1_altseq
res2_altseq
scalar_score
status
upper_bound
weight

Methods

add
copy Generic (shallow and deep) copying operations.
deepcopy
define_false_positive
define_true_positive
remove
define_false_positive()[source]

Define a contact as false positive

define_true_positive()[source]

Define a contact as true positive

distance_bound

The lower and upper distance boundary values of a contact pair in Ångstrom [Default: 0-8Å].

is_false_positive

A boolean status for the contact

Parameters:

is_false_positive : bool

True / False

is_true_positive

A boolean status for the contact

Parameters:

is_true_positive : bool

True / False

lower_bound

The lower distance boundary value

raw_score

The prediction score for the contact pair

res1

The amino acid of residue 1 [default: X]

res1_altseq

The alternative residue sequence number of residue 1

res1_chain

The chain for residue 1

res1_seq

The residue sequence number of residue 1

res2

The amino acid of residue 2 [default: X]

res2_altseq

The alternative residue sequence number of residue 2

res2_chain

The chain for residue 2

res2_seq

The residue sequence number of residue 2

scalar_score

The raw_score scaled according to the average raw_score

status

An indication of the residue status, i.e true positive, false positive, or unknown

upper_bound

The upper distance boundary value

weight

A separate internal weight factor for the contact pair

class ContactMap(id)[source]

Bases: conkit.core.Entity.Entity

A contact map object representing a single prediction

The ContactMap class represents a data structure to hold a single contact map prediction in one place. It contains functions to store, manipulate and organise Contact instances.

Examples

>>> from conkit.core import Contact, ContactMap
>>> contact_map = ContactMap("example")
>>> contact_map.add(Contact(1, 10, 0.333))
>>> contact_map.add(Contact(5, 30, 0.667))
>>> print(contact_map)
ContactMap(id="example" ncontacts=2)

Attributes

coverage
id
ncontacts
precision
repr_sequence
repr_sequence_altloc
sequence
top_contact

Methods

add
assign_sequence_register
calculate_jaccard_index
calculate_scalar_score
copy Generic (shallow and deep) copying operations.
deepcopy
find
match
plot_map
remove
remove_neighbors
rescale
sort
assign_sequence_register(altloc=False)[source]

Assign the amino acids from Sequence to all Contact instances

Parameters:

altloc : bool

Use the res_altloc positions [default: False]

calculate_jaccard_index(other)[source]

Calculate the Jaccard index between two ContactMap instances

This score analyzes the difference of the predicted contacts from two maps,

\[J_{x,y}=\frac{\left|x \cap y\right|}{\left|x \cup y\right|}\]

where \(x\) and \(y\) are the sets of predicted contacts from two different predictors, \(\left|x \cap y\right|\) is the number of elements in the intersection of \(x\) and \(y\), and the \(\left|x \cup y\right|\) represents the number of elements in the union of \(x\) and \(y\).

The J-score has values in the range of \([0, 1]\), with a value of \(1\) corresponding to identical contact maps and \(0\) to dissimilar ones.

Parameters:

other : ContactMap

A ConKit ContactMap

Returns:

float

The Jaccard distance

Warning

The Jaccard distance ranges from \([0, 1]\), where \(1\) means the maps contain identical contacts pairs.

See also

match, precision

Notes

The Jaccard index is different from the Jaccard distance mentioned in [1]. The Jaccard distance corresponds to \(1-Jaccard_{index}\).

[1]Q. Wuyun, W. Zheng, Z. Peng, J. Yang (2016). A large-scale comparative assessment of methods for residue-residue contact prediction. Briefings in Bioinformatics, [doi: 10.1093/bib/bbw106].
calculate_scalar_score()[source]

Calculate a scaled score for the ContactMap

This score is a scaled score for all raw scores in a contact map. It is defined by the formula

\[{x}'=\frac{x}{\overline{d}}\]

where \(x\) corresponds to the raw score of each predicted contact and \(\overline{d}\) to the mean of all raw scores.

The score is saved in a separate Contact attribute called scalar_score

This score is described in more detail in [2].

[2]S. Ovchinnikov, L. Kinch, H. Park, Y. Liao, J. Pei, D.E. Kim, H. Kamisetty, N.V. Grishin, D. Baker (2015). Large-scale determination of previously unsolved protein structures using evolutionary information. Elife 4, e09248.
coverage

The sequence coverage score

The coverage score is calculated by analysing the number of residues covered by the predicted contact pairs.

\[Coverage=\frac{x_{cov}}{L}\]

The coverage score is calculated by dividing the number of contacts \(x_{cov}\) by the number of residues in the sequence \(L\).

Returns:

cov : float

The calculated coverage score

See also

precision

find(indexes, altloc=False)[source]

Find all contacts associated with index

Parameters:

index : list, tuple

A list of residue indexes to find

altloc : bool

Use the res_altloc positions [default: False]

Returns:

ContactMap

A modified version of the contact map containing the found contacts

match(other, remove_unmatched=False, renumber=False, inplace=False)[source]

Modify both hierarchies so residue numbers match one another.

This function is key when plotting contact maps or visualising contact maps in 3-dimensional space. In particular, when residue numbers in the structure do not start at count 0 or when peptide chain breaks are present.

Parameters:

other : ContactMap

A ConKit ContactMap

remove_unmatched : bool, optional

Remove all unmatched contacts [default: False]

renumber : bool, optional

Renumber the res_seq entries [default: False]

If True, res1_seq and res2_seq changes but id remains the same

inplace : bool, optional

Replace the saved order of contacts [default: False]

Returns:

hierarchy_mod

ContactMap instance, regardless of inplace

Raises:

ValueError

Error creating reliable keymap matching the sequence in ContactMap

ncontacts

The number of Contact instances in the ContactMap

Returns:

ncontacts : int

The number of sequences in the ContactMap

plot_map(*args, **kwargs)[source]

Produce a 2D contact map plot

Warning

This function has been deprecated. Please use conkit.plot.ContactMapFigure instead.

precision

The precision (Positive Predictive Value) score

The precision value is calculated by analysing the true and false postive contacts.

\[Precision=\frac{TruePositives}{TruePositives - FalsePositives}\]

The status of each contact, i.e true or false positive status, can be determined by running the match() function providing a reference structure.

Returns:

ppv : float

The calculated precision score

See also

coverage

remove_neighbors(min_distance=5, inplace=False)[source]

Remove contacts between neighboring residues

Parameters:

min_distance : int, optional

The minimum number of residues between contacts [default: 5]

inplace : bool, optional

Replace the saved order of contacts [default: False]

Returns:

contact_map : ContactMap

The reference to the ContactMap, regardless of inplace

repr_sequence

The representative Sequence associated with the ContactMap

The peptide sequence constructed from the available contacts using the normal res_seq positions

Returns:

sequence : conkit.coreSequence

Raises:

TypeError

Sequence undefined

repr_sequence_altloc

The representative altloc Sequence associated with the ContactMap

The peptide sequence constructed from the available contacts using the altloc res_seq positions

Returns:

sequence : Sequence

Raises:

ValueError

Sequence undefined

rescale(inplace=False)[source]

Rescale the raw scores in ContactMap

Rescaling of the data is done to normalize the raw scores to be in the range [0, 1]. The formula to rescale the data is:

\[{x}'=\frac{x-min(d)}{max(d)-min(d)}\]

\(x\) is the original value and \(d\) are all values to be rescaled.

Parameters:

inplace : bool, optional

Replace the saved order of contacts [default: False]

Returns:

contact_map : ContactMap

The reference to the ContactMap, regardless of inplace

sequence

The Sequence associated with the ContactMap

Returns:Sequence
sort(kword, reverse=False, inplace=False)[source]

Sort the ContactMap

Parameters:

kword : str

The dictionary key to sort contacts by

reverse : bool, optional

Sort the contact pairs in descending order [default: False]

inplace : bool, optional

Replace the saved order of contacts [default: False]

Returns:

contact_map : ContactMap

The reference to the ContactMap, regardless of inplace

Raises:

ValueError

kword not in ContactMap

top_contact

The first Contact entry in ContactMap

Returns:

top_contact : Contact, None

The first Contact entry in ContactFile

class ContactFile(id)[source]

Bases: conkit.core.Entity.Entity

A contact file object representing a single prediction file

The contact file class represents a data structure to hold all predictions with a single contact map file. It contains functions to store, manipulate and organise contact maps.

Examples

>>> from conkit.core import ContactMap, ContactFile
>>> contact_file = ContactFile("example")
>>> contact_file.add(ContactMap("foo"))
>>> contact_file.add(ContactMap("bar"))
>>> print(contact_file)
ContactFile(id="example" nseqs=2)

Attributes

author
method
remark
target
top_map

Methods

add
copy Generic (shallow and deep) copying operations.
deepcopy
remove
sort
author

The author of the ContactFile

method

The ContactFile-specific method

remark

The ContactFile-specific remarks

sort(kword, reverse=False, inplace=False)[source]

Sort the ContactFile

Parameters:

kword : str

The dictionary key to sort contacts by

reverse : bool, optional

Sort the contact pairs in descending order [default: False]

inplace : bool, optional

Replace the saved order of contacts [default: False]

Returns:

contact_map : ContactMap

The reference to the ContactMap, regardless of inplace

Raises:

ValueError

kword not in ContactFile

target

The target name

top_map

The first ContactMap entry in ContactFile

Returns:

top_map : ContactMap, None

The first ContactMap entry in ContactFile

class Sequence(id, seq)[source]

Bases: conkit.core.Entity.Entity

A sequence template to store all associated information

Examples

>>> from conkit.core import Sequence
>>> sequence_entry = Sequence("example", "ABCDEF")
>>> print(sequence_entry)
Sequence(id="example" seq="ABCDEF" seqlen=6)

Attributes

id
remark
seq
seq_len

Methods

add
align_global
align_local
copy Generic (shallow and deep) copying operations.
deepcopy
remove
align_global(other, id_chars=2, nonid_chars=1, gap_open_pen=-0.5, gap_ext_pen=-0.1, inplace=False)[source]

Generate a global alignment between two Sequence instances

Parameters:

other : Sequence

id_chars : int, optional

nonid_chars : int, optional

gap_open_pen : float, optional

gap_ext_pen : float, optional

inplace : bool, optional

Replace the saved order of residues [default: False]

Returns:

Sequence

The reference to the Sequence, regardless of inplace

Sequence

The reference to the Sequence, regardless of inplace

align_local(other, id_chars=2, nonid_chars=1, gap_open_pen=-0.5, gap_ext_pen=-0.1, inplace=False)[source]

Generate a local alignment between two Sequence instances

Parameters:

other : Sequence

id_chars : int, optional

nonid_chars : int, optional

gap_open_pen : float, optional

gap_ext_pen : float, optional

inplace : bool, optional

Replace the saved order of residues [default: False]

Returns:

Sequence

The reference to the Sequence, regardless of inplace

Sequence

The reference to the Sequence, regardless of inplace

remark

The Sequence-specific remarks

seq

The protein sequence as str

seq_len

The protein sequence length

class SequenceFile(id)[source]

Bases: conkit.core.Entity.Entity

A sequence file object representing a single sequence file

The SequenceFile class represents a data structure to hold Sequence instances in a single sequence file. It contains functions to store and analyze sequences.

Examples

>>> from conkit.core import Sequence, SequenceFile
>>> sequence_file = SequenceFile("example")
>>> sequence_file.add(Sequence("foo", "ABCDEF"))
>>> sequence_file.add(Sequence("bar", "ZYXWVU"))
>>> print(sequence_file)
SequenceFile(id="example" nseqs=2)

Attributes

id
is_alignment
nseqs
remark
status
top_sequence

Methods

add
calculate_freq
calculate_meff
copy Generic (shallow and deep) copying operations.
deepcopy
remove
sort
trim
calculate_freq()[source]

Calculate the gap frequency in each alignment column

This function calculates the frequency of gaps at each position in the Multiple Sequence Alignment.

Returns:

list

A list containing the per alignment-column amino acid frequency count

Raises:

MemoryError

Too many sequences in the alignment

RuntimeError

SequenceFile is not an alignment

calculate_meff(identity=0.7)[source]

Calculate the number of effective sequences

This function calculates the number of effective sequences (Meff) in the Multiple Sequence Alignment.

The mathematical function used to calculate Meff is

\[M_{eff}=\sum_{i}\frac{1}{\sum_{j}S_{i,j}}\]
Parameters:

identity : float, optional

The sequence identity to use for similarity decision [default: 0.7]

Returns:

int

The number of effective sequences

Raises:

MemoryError

Too many sequences in the alignment for Hamming distance calculation

RuntimeError

SciPy package not installed

ValueError

SequenceFile is not an alignment

ValueError

Sequence Identity needs to be between 0 and 1

is_alignment

A boolean status for the alignment

Returns:

bool

A boolean status for the alignment

nseqs

The number of Sequence instances in the SequenceFile

Returns:

int

The number of sequences in the SequenceFile

remark

The SequenceFile-specific remarks

sort(kword, reverse=False, inplace=False)[source]

Sort the SequenceFile

Parameters:

kword : str

The dictionary key to sort sequences by

reverse : bool, optional

Sort the sequences in reverse order [default: False]

inplace : bool, optional

Replace the saved order of sequences [default: False]

Returns:

SequenceFile

The reference to the SequenceFile, regardless of inplace

Raises:

ValueError

kword not in SequenceFile

status

An indication of the residue status, i.e true positive, false positive, or unknown

top_sequence

The first Sequence entry in SequenceFile

Returns:

Sequence, None

The first Sequence entry in SequenceFile

trim(start, end, inplace=False)[source]

Trim the SequenceFile

Parameters:

start : int

First residue to include

end : int

Final residue to include

inplace : bool, optional

Replace the saved order of sequences [default: False]

Returns:

SequenceFile

The reference to the SequenceFile, regardless of inplace