conkit.core.contactmap module¶
ContactMap container used throughout ConKit
-
class
ContactMap
(id)[source]¶ Bases:
conkit.core.entity.Entity
A contact map object representing a single prediction
The
ContactMap
class represents a data structure to hold a single contact map prediction in one place. It contains functions to store, manipulate and organiseContact
instances.Examples
>>> from conkit.core import Contact, ContactMap >>> contact_map = ContactMap("example") >>> contact_map.add(Contact(1, 10, 0.333)) >>> contact_map.add(Contact(5, 30, 0.667)) >>> print(contact_map) ContactMap(id="example" ncontacts=2)
-
ncontacts
¶ The number of
Contact
instances in theContactMap
Type: int
-
repr_sequence
¶ The representative
Sequence
associated with theContactMap
Type: Sequence
-
repr_sequence_altloc
¶ The representative altloc
Sequence
associated with theContactMap
Type: Sequence
-
sequence
¶ The
Sequence
associated with theContactMap
Type: Sequence
-
as_dict
(altloc=False)[source]¶ The
ContactMap
as a dictionary where each key corresponds with the residue number and the values are sets of tuples with theid
Parameters: altloc (bool) – Use the res_altloc
positions [default: False]Returns: A dictionary represnetation of the ContactMap
instanceReturn type: dict
-
as_list
(altloc=False)[source]¶ The
ContactMap
as a 2D-list containing contact-pair residue indexesParameters: altloc (bool) – Use the res_altloc
positions [default: False]
-
as_set
(altloc=False)[source]¶ The
ContactMap
as a 2D-set containing contact-pair residue indexesParameters: altloc (bool) – Use the res_altloc
positions [default: False]
-
coverage
The sequence coverage score
The coverage score is calculated by dividing the number of residues covered by the predicted contact pairs \(x_{cov}\) by the number of residues in the sequence \(L\).
\[Coverage=\frac{x_{cov}}{L}\]Returns: The calculated coverage score Return type: float See also
-
empty
¶ Empty contact map
-
filter
(threshold, filter_by='raw_score', inplace=False)[source]¶ Filter out contacts below selected threshold
Parameters: Returns: The reference to the
ContactMap
, regardless of inplaceReturn type: Raises: TypeError
– threshold must be int or float
-
find
(register, altloc=False, strict=False, inverse=False)[source]¶ Find all contacts with one or both residues in
register
Parameters: Returns: A modified version of the
ContactMap
containing the found contactsReturn type:
-
get_contact_density
(bw_method='amise')[source]¶ Calculate the contact density in the contact map using Gaussian kernels
Various algorithms can be used to estimate the bandwidth. To calculate the bandwidth for an 1D data array
X
withn
data points andd
dimensions, the listed algorithms have been implemented. Please note, in rules 2 and 3, the value of \(\sigma\) is the smaller of the standard deviation ofX
or the normalized interquartile range.Parameters: bw_method (str, optional) – The bandwidth estimator to use [default: amise]
Returns: The list of per-residue density estimates
Return type: Raises: ImportError
– Cannot find scikit-learn packageValueError
– Undefined bandwidth methodValueError
–ContactMap
is empty
-
get_jaccard_index
(other)[source]¶ Calculate the Jaccard index between two
ContactMap
instancesThis score analyzes the difference of the predicted contacts from two maps,
\[J_{x,y}=\frac{\left|x \cap y\right|}{\left|x \cup y\right|}\]where \(x\) and \(y\) are the sets of predicted contacts from two different predictors, \(\left|x \cap y\right|\) is the number of elements in the intersection of \(x\) and \(y\), and the \(\left|x \cup y\right|\) represents the number of elements in the union of \(x\) and \(y\).
The J-score has values in the range of \([0, 1]\), with a value of \(1\) corresponding to identical contact maps and \(0\) to dissimilar ones.
Parameters: other ( ContactMap
) – A ConKitContactMap
Returns: The Jaccard index Return type: float See also
Warning
The Jaccard distance ranges from \([0, 1]\), where \(1\) means the maps contain identical contacts pairs.
Note
The Jaccard index is different from the Jaccard distance mentioned in [1]. The Jaccard distance corresponds to \(1-Jaccard_{index}\).
[1] Q. Wuyun, W. Zheng, Z. Peng, J. Yang (2016). A large-scale comparative assessment of methods for residue-residue contact prediction. Briefings in Bioinformatics, [doi: 10.1093/bib/bbw106].
-
highest_residue_number
¶ The highest residue sequence number among contacts in the
ContactMap
Returns: Highest residue sequence number in the contact map Return type: int
-
long_range
¶ The long range contacts found
ContactMap
Long range contacts are defined as 24 <= x residues apart
Returns: A copy of the ContactMap
with long-range contacts onlyReturn type: ContactMap
See also
-
match
(other, add_false_negatives=False, match_other=False, remove_unmatched=False, renumber=False, inplace=False)[source]¶ Modify both hierarchies so residue numbers match one another.
This function is key when plotting contact maps or visualising contact maps in 3-dimensional space. In particular, when residue numbers in the structure do not start at count 0 or when peptide chain breaks are present.
Parameters: - add_false_negatives (bool) –
Add false negatives to the self, which are contacts in other but not in self
Required for
recall()
and can be undone withremove_false_negatives()
- other (
ContactMap
) – A ConKitContactMap
- match_other (bool, optional) – Match other to self [default: False]
- remove_unmatched (bool, optional) – Remove all unmatched contacts [default: False]
- renumber (bool, optional) –
Renumber the
res_seq
entries [default: False]If
True
,res1_seq
andres2_seq
changes butid
remains the same - inplace (bool, optional) – Replace the saved order of contacts [default: False]
Returns: ContactMap
instance, regardless of inplaceReturn type: Raises: ValueError
– At least one of the input mapsContactMap
is emptyValueError
– Error creating reliable keymap matching the sequence inContactMap
RuntimeError
– Error matching the contacts inContactMap
- add_false_negatives (bool) –
-
match_naive
(other, add_false_negatives=False, match_other=False, inplace=False)[source]¶ Modify both hierarchies so residue numbers match one another
This function performs a naive match. It assumes the numbering in both contact maps is equivalent and it will not attempt to perform a sequence alignment.
Parameters: - other (
ContactMap
) – A ConKitContactMap
- add_false_negatives (bool) –
Add false negatives to the self, which are contacts in other but not in self
Required for
recall()
and can be undone withremove_false_negatives()
- match_other (bool, optional) – Match other to self [default: False]
- inplace (bool, optional) – Replace the original contacts [default: False]
Returns: ContactMap
instance, regardless of inplaceReturn type: - other (
-
medium_range
¶ The medium range contacts found
ContactMap
Medium range contacts are defined as 12 <= x <= 23 residues apart
Returns: A copy of the ContactMap
with medium-range contacts onlyReturn type: ContactMap
See also
-
ncontacts
The number of
Contact
instancesReturns: The number of contacts in the ContactMap
Return type: int
-
precision
The precision (Positive Predictive Value) score
The precision value is calculated by analysing the true and false postive contacts.
\[Precision=\frac{TruePositives}{TruePositives + FalsePositives}\]The status of each contact, i.e true or false positive status, can be determined by running the
match()
function providing a reference structure.Returns: The calculated precision score Return type: float
-
recall
¶ The Recall (Sensitivity) score
The recall value is calculated by analysing the true positive and false negative contacts.
\[Recall=\frac{TruePositives}{TruePositives + FalseNegatives}\]The status of each contact, i.e true positive and false negative status, can be determined by running the
match()
function providing a reference structure.Note
To determine and save the false negatives, please use the add_false_negatives keyword when running the
match()
function.You may wish to run
remove_false_negatives()
afterwards.Returns: The calculated recall score Return type: float
-
reindex
(index, altloc=False, inplace=False)[source]¶ Re-index the
ContactMap
Parameters: Returns: The reference to the
ContactMap
, regardless of inplaceReturn type: Raises: ValueError
– Index must be positive
-
remove_false_negatives
(inplace=False)[source]¶ Remove false negatives from the contact map
Parameters: Returns: The reference to the
ContactMap
, regardless of inplaceReturn type:
-
remove_neighbors
(min_distance=5, max_distance=9223372036854775807, inplace=False)[source]¶ Remove contacts between neighboring residues
The algorithm works by keeping contact pairs that satisfy
min_distance
<=x
<=max_distance
Parameters: Returns: The reference to the
ContactMap
, regardless of inplaceReturn type:
-
repr_sequence
The representative
Sequence
associated with theContactMap
The peptide sequence constructed from the available contacts using the normal res_seq positions
Returns: Return type: Sequence
Raises: TypeError
– Sequence undefinedSee also
-
repr_sequence_altloc
The representative altloc
Sequence
associated with theContactMap
The peptide sequence constructed from the available contacts using the
res_altseq
positionsReturns: Return type: Sequence
Raises: TypeError
– Sequence undefinedSee also
-
rescale
(inplace=False)[source]¶ Rescale the raw scores in
ContactMap
Parameters: inplace (bool, optional) – Replace the saved order of contacts [default: False] Returns: The reference to the ContactMap
, regardless of inplaceReturn type: ContactMap
-
sequence
The
Sequence
associated with theContactMap
Returns: A Sequence
objectReturn type: Sequence
See also
-
set_scalar_score
()[source]¶ Calculate and set the
scalar_score
for theContactMap
This score is a scaled score for all raw scores in a contact map. It is defined by the formula
\[{x}'=\frac{x}{\overline{d}}\]where \(x\) corresponds to the raw score of each predicted contact and \(\overline{d}\) to the mean of all raw scores.
This score is described in more detail in [2].
[2] S. Ovchinnikov, L. Kinch, H. Park, Y. Liao, J. Pei, D.E. Kim, H. Kamisetty, N.V. Grishin, D. Baker (2015). Large-scale determination of previously unsolved protein structures using evolutionary information. Elife 4, e09248.
-
set_sequence_register
(altloc=False)[source]¶ Assign the amino acids from
Sequence
to allContact
instancesParameters: altloc (bool) – Use the res_altloc
positions [default: False]Raises: ValueError
– Undefined sequence
-
short_range
¶ The short range contacts found
ContactMap
Short range contacts are defined as 6 <= x <= 11 residues apart
Returns: A copy of the ContactMap
with short-range contacts onlyReturn type: ContactMap
See also
-
singletons
¶ Singleton contact pairs in the current
ContactMap
Contacts are identified by a distance-based grouping analysis. A
Contact
is classified as singleton if not other contacts are found within 2 residues.Returns: Return type: ContactMap
-
slice_map
(l_factor, seq_len=None, inplace=False)[source]¶ Slice the contact map using a L * factor
Parameters: Returns: The reference to the
ContactMap
, regardless of inplaceReturn type: Raises: ValueError
– Either seq_len must be provided orsequence
defined
-
sort
(kword, reverse=False, inplace=False)[source]¶ Sort the
ContactMap
Parameters: Returns: The reference to the
ContactMap
, regardless of inplaceReturn type: Raises: ValueError
–kword
not inContactMap
-
to_string
()[source]¶ Return the
ContactMap
asstr
-
top_contact
The first
Contact
entryReturns: The first Contact
entry inContactFile
Return type: Contact
-