conkit.core.contactmap module

ContactMap container used throughout ConKit

class ContactMap(id)[source]

Bases: conkit.core._entity._Entity

A contact map object representing a single prediction

The ContactMap class represents a data structure to hold a single contact map prediction in one place. It contains functions to store, manipulate and organise Contact instances.

Examples

>>> from conkit.core import Contact, ContactMap
>>> contact_map = ContactMap("example")
>>> contact_map.add(Contact(1, 10, 0.333))
>>> contact_map.add(Contact(5, 30, 0.667))
>>> print(contact_map)
ContactMap(id="example" ncontacts=2)

Attributes

coverage The sequence coverage score
id The ID of the selected entity
ncontacts The number of Contact instances in the ContactMap
precision The precision (Positive Predictive Value) score
repr_sequence The representative Sequence associated with the ContactMap
repr_sequence_altloc The representative altloc Sequence associated with the ContactMap
sequence The Sequence associated with the ContactMap
top_contact The first Contact entry in ContactMap

Methods

add(entity) Add a child to the Entity
assign_sequence_register([altloc]) Assign the amino acids from Sequence to all Contact instances
calculate_contact_density([bw_method]) Calculate the contact density in the contact map using Gaussian kernels
calculate_jaccard_index(other) Calculate the Jaccard index between two ContactMap instances
calculate_kernel_density(*args, **kwargs) Calculate the contact density in the contact map using Gaussian kernels
calculate_scalar_score() Calculate a scaled score for the ContactMap
copy() Create a shallow copy of Entity
deepcopy() Create a deep copy of Entity
find(register[, altloc, strict]) Find all contacts with one or both residues in register
match(other[, match_other, …]) Modify both hierarchies so residue numbers match one another.
remove(id) Remove a child
remove_neighbors([min_distance, …]) Remove contacts between neighboring residues
rescale([inplace]) Rescale the raw scores in ContactMap
sort(kword[, reverse, inplace]) Sort the ContactMap
assign_sequence_register(altloc=False)[source]

Assign the amino acids from Sequence to all Contact instances

Parameters:

altloc : bool

Use the res_altloc positions [default: False]

calculate_contact_density(bw_method=’amise’)[source]

Calculate the contact density in the contact map using Gaussian kernels

Various algorithms can be used to estimate the bandwidth. To calculate the bandwidth for an 1D data array X with n data points and d dimensions, the listed algorithms have been implemented. Please note, in rules 2 and 3, the value of \(\sigma\) is the smaller of the standard deviation of X or the normalized interquartile range.

Parameters:

bw_method : str, optional

The bandwidth estimator to use [default: amise]

Returns:

list

The list of per-residue density estimates

Raises:

RuntimeError

Cannot find SciKit package

ValueError

Undefined bandwidth method

calculate_jaccard_index(other)[source]

Calculate the Jaccard index between two ContactMap instances

This score analyzes the difference of the predicted contacts from two maps,

\[J_{x,y}=\frac{\left|x \cap y\right|}{\left|x \cup y\right|}\]

where \(x\) and \(y\) are the sets of predicted contacts from two different predictors, \(\left|x \cap y\right|\) is the number of elements in the intersection of \(x\) and \(y\), and the \(\left|x \cup y\right|\) represents the number of elements in the union of \(x\) and \(y\).

The J-score has values in the range of \([0, 1]\), with a value of \(1\) corresponding to identical contact maps and \(0\) to dissimilar ones.

Parameters:

other : ContactMap

A ConKit ContactMap

Returns:

float

The Jaccard index

Warning

The Jaccard distance ranges from \([0, 1]\), where \(1\) means the maps contain identical contacts pairs.

See also

match, precision

Notes

The Jaccard index is different from the Jaccard distance mentioned in [1]. The Jaccard distance corresponds to \(1-Jaccard_{index}\).

[1]Q. Wuyun, W. Zheng, Z. Peng, J. Yang (2016). A large-scale comparative assessment of methods for residue-residue contact prediction. Briefings in Bioinformatics, [doi: 10.1093/bib/bbw106].
calculate_kernel_density(*args, **kwargs)[source]

Calculate the contact density in the contact map using Gaussian kernels

calculate_scalar_score()[source]

Calculate a scaled score for the ContactMap

This score is a scaled score for all raw scores in a contact map. It is defined by the formula

\[{x}'=\frac{x}{\overline{d}}\]

where \(x\) corresponds to the raw score of each predicted contact and \(\overline{d}\) to the mean of all raw scores.

The score is saved in a separate Contact attribute called scalar_score

This score is described in more detail in [2].

[2]S. Ovchinnikov, L. Kinch, H. Park, Y. Liao, J. Pei, D.E. Kim, H. Kamisetty, N.V. Grishin, D. Baker (2015). Large-scale determination of previously unsolved protein structures using evolutionary information. Elife 4, e09248.
coverage

The sequence coverage score

The coverage score is calculated by analysing the number of residues covered by the predicted contact pairs.

\[Coverage=\frac{x_{cov}}{L}\]

The coverage score is calculated by dividing the number of contacts \(x_{cov}\) by the number of residues in the sequence \(L\).

Returns:

float

The calculated coverage score

See also

precision

empty

Empty contact map

find(register, altloc=False, strict=False)[source]

Find all contacts with one or both residues in register

Parameters:

register : int, list, tuple

A list of residue register to find

altloc : bool

Use the res_altloc positions [default: False]

strict : bool

Both residues of Contact in register [default: False]

Returns:

obj

A modified version of the ContactMap containing the found contacts

long_range_contacts

The long range contacts found ContactMap

long range contacts are defined as 24 <= x residues apart

Returns:

obj

A copy of the ContactMap with long-range contacts only

match(other, match_other=False, remove_unmatched=False, renumber=False, inplace=False)[source]

Modify both hierarchies so residue numbers match one another.

This function is key when plotting contact maps or visualising contact maps in 3-dimensional space. In particular, when residue numbers in the structure do not start at count 0 or when peptide chain breaks are present.

Parameters:

other : ContactMap

A ConKit ContactMap

match_other: bool, optional

Match other to self [default: False]

remove_unmatched : bool, optional

Remove all unmatched contacts [default: False]

renumber : bool, optional

Renumber the res_seq entries [default: False]

If True, res1_seq and res2_seq changes but id remains the same

inplace : bool, optional

Replace the saved order of contacts [default: False]

Returns:

obj

ContactMap instance, regardless of inplace

Raises:

ValueError

Error creating reliable keymap matching the sequence in ContactMap

medium_range_contacts

The medium range contacts found ContactMap

Medium range contacts are defined as 12 <= x <= 23 residues apart

Returns:

obj

A copy of the ContactMap with medium-range contacts only

ncontacts

The number of Contact instances in the ContactMap

Returns:

int

The number of contacts in the ContactMap

precision

The precision (Positive Predictive Value) score

The precision value is calculated by analysing the true and false postive contacts.

\[Precision=\frac{TruePositives}{TruePositives - FalsePositives}\]

The status of each contact, i.e true or false positive status, can be determined by running the match() function providing a reference structure.

Returns:

float

The calculated precision score

See also

coverage

remove_neighbors(min_distance=5, max_distance=9223372036854775807, inplace=False)[source]

Remove contacts between neighboring residues

The algorithm works by keeping contact pairs that satisfy

min_distance <= x <= max_distance
Parameters:

min_distance : int, optional

The minimum number of residues between contacts [default: 5]

max_distance : int, optional

The maximum number of residues between contacts [defailt: maximum nr permitted by OS]

inplace : bool, optional

Replace the saved order of contacts [default: False]

Returns:

obj

The reference to the ContactMap, regardless of inplace

repr_sequence

The representative Sequence associated with the ContactMap

The peptide sequence constructed from the available contacts using the normal res_seq positions

Returns:

obj

A conkit.coreSequence object

Raises:

TypeError

Sequence undefined

repr_sequence_altloc

The representative altloc Sequence associated with the ContactMap

The peptide sequence constructed from the available contacts using the altloc res_seq positions

Returns:

obj

A Sequence object

Raises:

ValueError

Sequence undefined

rescale(inplace=False)[source]

Rescale the raw scores in ContactMap

Rescaling of the data is done to normalize the raw scores to be in the range [0, 1]. The formula to rescale the data is:

\[{x}'=\frac{x-min(d)}{max(d)-min(d)}\]

\(x\) is the original value and \(d\) are all values to be rescaled.

Parameters:

inplace : bool, optional

Replace the saved order of contacts [default: False]

Returns:

obj

The reference to the ContactMap, regardless of inplace

sequence

The Sequence associated with the ContactMap

Returns:

obj

A Sequence object

short_range_contacts

The short range contacts found ContactMap

Short range contacts are defined as 6 <= x <= 11 residues apart

Returns:

obj

A copy of the ContactMap with short-range contacts only

sort(kword, reverse=False, inplace=False)[source]

Sort the ContactMap

Parameters:

kword : str

The dictionary key to sort contacts by

reverse : bool, optional

Sort the contact pairs in descending order [default: False]

inplace : bool, optional

Replace the saved order of contacts [default: False]

Returns:

obj

The reference to the ContactMap, regardless of inplace

Raises:

ValueError

kword not in ContactMap

top_contact

The first Contact entry in ContactMap

Returns:

obj, None

The first Contact entry in ContactFile