conkit.core.contactmap module

ContactMap container used throughout ConKit

class ContactMap(id)[source]

Bases: conkit.core.entity.Entity

A contact map object representing a single prediction

The ContactMap class represents a data structure to hold a single contact map prediction in one place. It contains functions to store, manipulate and organise Contact instances.

Examples

>>> from conkit.core import Contact, ContactMap
>>> contact_map = ContactMap("example")
>>> contact_map.add(Contact(1, 10, 0.333))
>>> contact_map.add(Contact(5, 30, 0.667))
>>> print(contact_map)
ContactMap(id="example" ncontacts=2)
coverage

The sequence coverage score

Type:float
id

A unique identifier

Type:str
ncontacts

The number of Contact instances in the ContactMap

Type:int
precision

The precision (Positive Predictive Value) score

Type:float
repr_sequence

The representative Sequence associated with the ContactMap

Type:Sequence
repr_sequence_altloc

The representative altloc Sequence associated with the ContactMap

Type:Sequence
sequence

The Sequence associated with the ContactMap

Type:Sequence
top_contact

The first Contact entry

Type:Contact
as_dict(altloc=False)[source]

The ContactMap as a dictionary where each key corresponds with the residue number and the values are sets of tuples with the id

Parameters:altloc (bool) – Use the res_altloc positions [default: False]
Returns:A dictionary represnetation of the ContactMap instance
Return type:dict
as_list(altloc=False)[source]

The ContactMap as a 2D-list containing contact-pair residue indexes

Parameters:altloc (bool) – Use the res_altloc positions [default: False]
as_set(altloc=False)[source]

The ContactMap as a 2D-set containing contact-pair residue indexes

Parameters:altloc (bool) – Use the res_altloc positions [default: False]
coverage

The sequence coverage score

The coverage score is calculated by dividing the number of residues covered by the predicted contact pairs \(x_{cov}\) by the number of residues in the sequence \(L\).

\[Coverage=\frac{x_{cov}}{L}\]
Returns:The calculated coverage score
Return type:float

See also

precision

empty

Empty contact map

filter(threshold, filter_by='raw_score', inplace=False)[source]

Filter out contacts below selected threshold

Parameters:
  • threshold (int) – Threshold to be applied in the filter
  • filter_by (str) – Contact attribute to be used in the filter.
  • inplace (bool, optional) – Replace the saved order of contacts [default: False]
Returns:

The reference to the ContactMap, regardless of inplace

Return type:

ContactMap

Raises:

TypeError – threshold must be int or float

find(register, altloc=False, strict=False, inverse=False)[source]

Find all contacts with one or both residues in register

Parameters:
  • register (int, list, tuple) – A list of residue register to find
  • altloc (bool) – Use the res_altloc positions [default: False]
  • strict (bool) – Both residues of Contact in register [default: False]
  • inverse (bool) – Select non-matching residues [default: False]
Returns:

A modified version of the ContactMap containing the found contacts

Return type:

ContactMap

get_contact_density(bw_method='amise')[source]

Calculate the contact density in the contact map using Gaussian kernels

Various algorithms can be used to estimate the bandwidth. To calculate the bandwidth for an 1D data array X with n data points and d dimensions, the listed algorithms have been implemented. Please note, in rules 2 and 3, the value of \(\sigma\) is the smaller of the standard deviation of X or the normalized interquartile range.

Parameters:

bw_method (str, optional) – The bandwidth estimator to use [default: amise]

Returns:

The list of per-residue density estimates

Return type:

list

Raises:
get_jaccard_index(other)[source]

Calculate the Jaccard index between two ContactMap instances

This score analyzes the difference of the predicted contacts from two maps,

\[J_{x,y}=\frac{\left|x \cap y\right|}{\left|x \cup y\right|}\]

where \(x\) and \(y\) are the sets of predicted contacts from two different predictors, \(\left|x \cap y\right|\) is the number of elements in the intersection of \(x\) and \(y\), and the \(\left|x \cup y\right|\) represents the number of elements in the union of \(x\) and \(y\).

The J-score has values in the range of \([0, 1]\), with a value of \(1\) corresponding to identical contact maps and \(0\) to dissimilar ones.

Parameters:other (ContactMap) – A ConKit ContactMap
Returns:The Jaccard index
Return type:float

See also

match(), precision()

Warning

The Jaccard distance ranges from \([0, 1]\), where \(1\) means the maps contain identical contacts pairs.

Note

The Jaccard index is different from the Jaccard distance mentioned in [1]. The Jaccard distance corresponds to \(1-Jaccard_{index}\).

[1]Q. Wuyun, W. Zheng, Z. Peng, J. Yang (2016). A large-scale comparative assessment of methods for residue-residue contact prediction. Briefings in Bioinformatics, [doi: 10.1093/bib/bbw106].
highest_residue_number

The highest residue sequence number among contacts in the ContactMap

Returns:Highest residue sequence number in the contact map
Return type:int
long_range

The long range contacts found ContactMap

Long range contacts are defined as 24 <= x residues apart

Returns:A copy of the ContactMap with long-range contacts only
Return type:ContactMap
match(other, add_false_negatives=False, match_other=False, remove_unmatched=False, renumber=False, inplace=False)[source]

Modify both hierarchies so residue numbers match one another.

This function is key when plotting contact maps or visualising contact maps in 3-dimensional space. In particular, when residue numbers in the structure do not start at count 0 or when peptide chain breaks are present.

Parameters:
  • add_false_negatives (bool) –

    Add false negatives to the self, which are contacts in other but not in self

    Required for recall() and can be undone with remove_false_negatives()

  • other (ContactMap) – A ConKit ContactMap
  • match_other (bool, optional) – Match other to self [default: False]
  • remove_unmatched (bool, optional) – Remove all unmatched contacts [default: False]
  • renumber (bool, optional) –

    Renumber the res_seq entries [default: False]

    If True, res1_seq and res2_seq changes but id remains the same

  • inplace (bool, optional) – Replace the saved order of contacts [default: False]
Returns:

ContactMap instance, regardless of inplace

Return type:

ContactMap

Raises:
match_naive(other, add_false_negatives=False, match_other=False, inplace=False)[source]

Modify both hierarchies so residue numbers match one another

This function performs a naive match. It assumes the numbering in both contact maps is equivalent and it will not attempt to perform a sequence alignment.

Parameters:
  • other (ContactMap) – A ConKit ContactMap
  • add_false_negatives (bool) –

    Add false negatives to the self, which are contacts in other but not in self

    Required for recall() and can be undone with remove_false_negatives()

  • match_other (bool, optional) – Match other to self [default: False]
  • inplace (bool, optional) – Replace the original contacts [default: False]
Returns:

ContactMap instance, regardless of inplace

Return type:

ContactMap

medium_range

The medium range contacts found ContactMap

Medium range contacts are defined as 12 <= x <= 23 residues apart

Returns:A copy of the ContactMap with medium-range contacts only
Return type:ContactMap
ncontacts

The number of Contact instances

Returns:The number of contacts in the ContactMap
Return type:int
precision

The precision (Positive Predictive Value) score

The precision value is calculated by analysing the true and false postive contacts.

\[Precision=\frac{TruePositives}{TruePositives + FalsePositives}\]

The status of each contact, i.e true or false positive status, can be determined by running the match() function providing a reference structure.

Returns:The calculated precision score
Return type:float

See also

coverage, recall

recall

The Recall (Sensitivity) score

The recall value is calculated by analysing the true positive and false negative contacts.

\[Recall=\frac{TruePositives}{TruePositives + FalseNegatives}\]

The status of each contact, i.e true positive and false negative status, can be determined by running the match() function providing a reference structure.

Note

To determine and save the false negatives, please use the add_false_negatives keyword when running the match() function.

You may wish to run remove_false_negatives() afterwards.

Returns:The calculated recall score
Return type:float

See also

coverage, precision

reindex(index, altloc=False, inplace=False)[source]

Re-index the ContactMap

Parameters:
  • index (int) – The new starting index [assigned to the lowest existing index in the contact map]
  • altloc (bool) – Use the res_altloc positions [default: False]
  • inplace (bool) – Replace the saved order of contacts [default: False]
Returns:

The reference to the ContactMap, regardless of inplace

Return type:

ContactMap

Raises:

ValueError – Index must be positive

remove_false_negatives(inplace=False)[source]

Remove false negatives from the contact map

Parameters:
  • min_distance (int, optional) – The minimum number of residues between contacts [default: 5]
  • max_distance (int, optional) – The maximum number of residues between contacts [default: sys.maxsize]
  • inplace (bool, optional) – Replace the saved order of contacts [default: False]
Returns:

The reference to the ContactMap, regardless of inplace

Return type:

ContactMap

remove_neighbors(min_distance=5, max_distance=9223372036854775807, inplace=False)[source]

Remove contacts between neighboring residues

The algorithm works by keeping contact pairs that satisfy

min_distance <= x <= max_distance
Parameters:
  • min_distance (int, optional) – The minimum number of residues between contacts [default: 5]
  • max_distance (int, optional) – The maximum number of residues between contacts [default: sys.maxsize]
  • inplace (bool, optional) – Replace the saved order of contacts [default: False]
Returns:

The reference to the ContactMap, regardless of inplace

Return type:

ContactMap

repr_sequence

The representative Sequence associated with the ContactMap

The peptide sequence constructed from the available contacts using the normal res_seq positions

Returns:
Return type:Sequence
Raises:TypeError – Sequence undefined
repr_sequence_altloc

The representative altloc Sequence associated with the ContactMap

The peptide sequence constructed from the available contacts using the res_altseq positions

Returns:
Return type:Sequence
Raises:TypeError – Sequence undefined
rescale(inplace=False)[source]

Rescale the raw scores in ContactMap

Parameters:inplace (bool, optional) – Replace the saved order of contacts [default: False]
Returns:The reference to the ContactMap, regardless of inplace
Return type:ContactMap
sequence

The Sequence associated with the ContactMap

Returns:A Sequence object
Return type:Sequence
set_scalar_score()[source]

Calculate and set the scalar_score for the ContactMap

This score is a scaled score for all raw scores in a contact map. It is defined by the formula

\[{x}'=\frac{x}{\overline{d}}\]

where \(x\) corresponds to the raw score of each predicted contact and \(\overline{d}\) to the mean of all raw scores.

This score is described in more detail in [2].

[2]S. Ovchinnikov, L. Kinch, H. Park, Y. Liao, J. Pei, D.E. Kim, H. Kamisetty, N.V. Grishin, D. Baker (2015). Large-scale determination of previously unsolved protein structures using evolutionary information. Elife 4, e09248.
set_sequence_register(altloc=False)[source]

Assign the amino acids from Sequence to all Contact instances

Parameters:altloc (bool) – Use the res_altloc positions [default: False]
Raises:ValueError – Undefined sequence
short_range

The short range contacts found ContactMap

Short range contacts are defined as 6 <= x <= 11 residues apart

Returns:A copy of the ContactMap with short-range contacts only
Return type:ContactMap
singletons

Singleton contact pairs in the current ContactMap

Contacts are identified by a distance-based grouping analysis. A Contact is classified as singleton if not other contacts are found within 2 residues.

Returns:
Return type:ContactMap
slice_map(l_factor, seq_len=None, inplace=False)[source]

Slice the contact map using a L * factor

Parameters:
  • l_factor (float) – L/N factor to be applied
  • seq_len (int) – Sequence length.
  • inplace (bool, optional) – Replace the saved order of contacts [default: False]
Returns:

The reference to the ContactMap, regardless of inplace

Return type:

ContactMap

Raises:

ValueError – Either seq_len must be provided or sequence defined

sort(kword, reverse=False, inplace=False)[source]

Sort the ContactMap

Parameters:
  • kword (str) – The dictionary key to sort contacts by
  • reverse (bool, optional) – Sort the contact pairs in descending order [default: False]
  • inplace (bool, optional) – Replace the saved order of contacts [default: False]
Returns:

The reference to the ContactMap, regardless of inplace

Return type:

ContactMap

Raises:

ValueErrorkword not in ContactMap

to_string()[source]

Return the ContactMap as str

top_contact

The first Contact entry

Returns:The first Contact entry in ContactFile
Return type:Contact