ContactMap container used throughout ConKit
ContactMap
(id)[source]¶Bases: conkit.core.entity.Entity
A contact map object representing a single prediction
The ContactMap
class represents a data structure to hold a single
contact map prediction in one place. It contains functions to store, manipulate and organise
Contact
instances.
Examples
>>> from conkit.core import Contact, ContactMap
>>> contact_map = ContactMap("example")
>>> contact_map.add(Contact(1, 10, 0.333))
>>> contact_map.add(Contact(5, 30, 0.667))
>>> print(contact_map)
ContactMap(id="example" ncontacts=2)
coverage
¶float – The sequence coverage score
id
¶str – A unique identifier
ncontacts
¶int – The number of Contact
instances in the ContactMap
precision
¶float – The precision (Positive Predictive Value) score
repr_sequence
¶Sequence
– The representative Sequence
associated with the ContactMap
repr_sequence_altloc
¶Sequence
– The representative altloc Sequence
associated with the ContactMap
sequence
¶Sequence
– The Sequence
associated with the ContactMap
as_list
(altloc=False)[source]¶The ContactMap
as a 2D-list containing contact-pair residue indexes
Parameters: | altloc (bool) – Use the res_altloc positions [default: False] |
---|
assign_sequence_register
(*args, **kwargs)¶calculate_jaccard_index
(*args, **kwargs)¶calculate_kernel_density
(*args, **kwargs)¶calculate_scalar_score
(*args, **kwargs)¶coverage
The sequence coverage score
The coverage score is calculated by dividing the number of residues covered by the predicted contact pairs \(x_{cov}\) by the number of residues in the sequence \(L\).
Returns: | The calculated coverage score |
---|---|
Return type: | float |
See also
empty
¶Empty contact map
find
(register, altloc=False, strict=False)[source]¶Find all contacts with one or both residues in register
Parameters: | |
---|---|
Returns: | A modified version of the |
Return type: |
get_contact_density
(bw_method=’amise’)[source]¶Calculate the contact density in the contact map using Gaussian kernels
Various algorithms can be used to estimate the bandwidth. To calculate the
bandwidth for an 1D data array X
with n
data points and d
dimensions,
the listed algorithms have been implemented. Please note, in rules 2 and 3, the
value of \(\sigma\) is the smaller of the standard deviation of X
or
the normalized interquartile range.
Parameters: | bw_method (str, optional) – The bandwidth estimator to use [default: amise] |
---|---|
Returns: | The list of per-residue density estimates |
Return type: | |
Raises: |
|
get_jaccard_index
(other)[source]¶Calculate the Jaccard index between two ContactMap
instances
This score analyzes the difference of the predicted contacts from two maps,
where \(x\) and \(y\) are the sets of predicted contacts from two different predictors, \(\left|x \cap y\right|\) is the number of elements in the intersection of \(x\) and \(y\), and the \(\left|x \cup y\right|\) represents the number of elements in the union of \(x\) and \(y\).
The J-score has values in the range of \([0, 1]\), with a value of \(1\) corresponding to identical contact maps and \(0\) to dissimilar ones.
Parameters: | other (ContactMap ) – A ConKit ContactMap |
---|---|
Returns: | The Jaccard index |
Return type: | float |
See also
Warning
The Jaccard distance ranges from \([0, 1]\), where \(1\) means the maps contain identical contacts pairs.
Note
The Jaccard index is different from the Jaccard distance mentioned in [1]. The Jaccard distance corresponds to \(1-Jaccard_{index}\).
[1] | Q. Wuyun, W. Zheng, Z. Peng, J. Yang (2016). A large-scale comparative assessment of methods for residue-residue contact prediction. Briefings in Bioinformatics, [doi: 10.1093/bib/bbw106]. |
long_range
¶The long range contacts found ContactMap
Long range contacts are defined as 24 <= x residues apart
Returns: | A copy of the ContactMap with long-range contacts only |
---|---|
Return type: | ContactMap |
See also
long_range_contacts
¶match
(other, add_false_negatives=False, match_other=False, remove_unmatched=False, renumber=False, inplace=False)[source]¶Modify both hierarchies so residue numbers match one another.
This function is key when plotting contact maps or visualising contact maps in 3-dimensional space. In particular, when residue numbers in the structure do not start at count 0 or when peptide chain breaks are present.
Parameters: |
|
---|---|
Returns: |
|
Return type: | |
Raises: |
|
medium_range
¶The medium range contacts found ContactMap
Medium range contacts are defined as 12 <= x <= 23 residues apart
Returns: | A copy of the ContactMap with medium-range contacts only |
---|---|
Return type: | ContactMap |
See also
medium_range_contacts
¶ncontacts
The number of Contact
instances
Returns: | The number of contacts in the ContactMap |
---|---|
Return type: | int |
precision
The precision (Positive Predictive Value) score
The precision value is calculated by analysing the true and false postive contacts.
The status of each contact, i.e true or false positive status, can be
determined by running the match()
function providing a reference
structure.
Returns: | The calculated precision score |
---|---|
Return type: | float |
recall
¶The Recall (Sensitivity) score
The recall value is calculated by analysing the true positive and false negative contacts.
The status of each contact, i.e true positive and false negative status, can be
determined by running the match()
function
providing a reference structure.
Note
To determine and save the false negatives, please use the add_false_negatives keyword when
running the match()
function.
You may wish to run remove_false_negatives()
afterwards.
Returns: | The calculated recall score |
---|---|
Return type: | float |
reindex
(index, altloc=False, inplace=False)[source]¶Re-index the ContactMap
Parameters: | |
---|---|
Returns: | The reference to the |
Return type: | |
Raises: |
|
remove_false_negatives
(inplace=False)[source]¶Remove false negatives from the contact map
Parameters: | |
---|---|
Returns: | The reference to the |
Return type: |
remove_neighbors
(min_distance=5, max_distance=9223372036854775807, inplace=False)[source]¶Remove contacts between neighboring residues
The algorithm works by keeping contact pairs that satisfy
min_distance
<=x
<=max_distance
Parameters: | |
---|---|
Returns: | The reference to the |
Return type: |
repr_sequence
The representative Sequence
associated
with the ContactMap
The peptide sequence constructed from the available contacts using the normal res_seq positions
Returns: | |
---|---|
Return type: | Sequence |
Raises: | TypeError – Sequence undefined |
See also
repr_sequence_altloc
The representative altloc Sequence
associated
with the ContactMap
The peptide sequence constructed from the available
contacts using the res_altseq
positions
Returns: | |
---|---|
Return type: | Sequence |
Raises: | TypeError – Sequence undefined |
See also
rescale
(inplace=False)[source]¶Rescale the raw scores in ContactMap
Parameters: | inplace (bool, optional) – Replace the saved order of contacts [default: False] |
---|---|
Returns: | The reference to the ContactMap , regardless of inplace |
Return type: | ContactMap |
sequence
The Sequence
associated with the ContactMap
Returns: | A Sequence object |
---|---|
Return type: | Sequence |
See also
set_scalar_score
()[source]¶Calculate and set the scalar_score
for the
ContactMap
This score is a scaled score for all raw scores in a contact map. It is defined by the formula
where \(x\) corresponds to the raw score of each predicted contact and \(\overline{d}\) to the mean of all raw scores.
This score is described in more detail in [2].
[2] | S. Ovchinnikov, L. Kinch, H. Park, Y. Liao, J. Pei, D.E. Kim, H. Kamisetty, N.V. Grishin, D. Baker (2015). Large-scale determination of previously unsolved protein structures using evolutionary information. Elife 4, e09248. |
set_sequence_register
(altloc=False)[source]¶Assign the amino acids from Sequence
to all Contact
instances
Parameters: | altloc (bool) – Use the res_altloc positions [default: False] |
---|
short_range
¶The short range contacts found ContactMap
Short range contacts are defined as 6 <= x <= 11 residues apart
Returns: | A copy of the ContactMap with short-range contacts only |
---|---|
Return type: | ContactMap |
See also
short_range_contacts
¶singletons
¶Singleton contact pairs in the current ContactMap
Contacts are identified by a distance-based grouping analysis. A Contact
is
classified as singleton if not other contacts are found within 2 residues.
Returns: | |
---|---|
Return type: | ContactMap |
sort
(kword, reverse=False, inplace=False)[source]¶Sort the ContactMap
Parameters: | |
---|---|
Returns: | The reference to the |
Return type: | |
Raises: |
|
to_string
()[source]¶Return the ContactMap
as str
top_contact
The first Contact
entry
Returns: | The first Contact entry in ContactFile |
---|---|
Return type: | Contact |