conkit.core.contactmap module¶
ContactMap container used throughout ConKit
-
class
ContactMap(id)[source]¶ Bases:
conkit.core.entity.EntityA contact map object representing a single prediction
The
ContactMapclass represents a data structure to hold a single contact map prediction in one place. It contains functions to store, manipulate and organiseContactinstances.Examples
>>> from conkit.core import Contact, ContactMap >>> contact_map = ContactMap("example") >>> contact_map.add(Contact(1, 10, 0.333)) >>> contact_map.add(Contact(5, 30, 0.667)) >>> print(contact_map) ContactMap(id="example" ncontacts=2)
-
ncontacts¶ The number of
Contactinstances in theContactMapType: int
-
repr_sequence¶ The representative
Sequenceassociated with theContactMapType: Sequence
-
repr_sequence_altloc¶ The representative altloc
Sequenceassociated with theContactMapType: Sequence
-
sequence¶ The
Sequenceassociated with theContactMapType: Sequence
-
as_dict(altloc=False)[source]¶ The
ContactMapas a dictionary where each key corresponds with the residue number and the values are sets of tuples with theidParameters: altloc (bool) – Use the res_altlocpositions [default: False]Returns: A dictionary represnetation of the ContactMapinstanceReturn type: dict
-
as_list(altloc=False)[source]¶ The
ContactMapas a 2D-list containing contact-pair residue indexesParameters: altloc (bool) – Use the res_altlocpositions [default: False]
-
as_set(altloc=False)[source]¶ The
ContactMapas a 2D-set containing contact-pair residue indexesParameters: altloc (bool) – Use the res_altlocpositions [default: False]
-
coverage The sequence coverage score
The coverage score is calculated by dividing the number of residues covered by the predicted contact pairs \(x_{cov}\) by the number of residues in the sequence \(L\).
\[Coverage=\frac{x_{cov}}{L}\]Returns: The calculated coverage score Return type: float See also
-
empty¶ Empty contact map
-
filter(threshold, filter_by='raw_score', inplace=False)[source]¶ Filter out contacts below selected threshold
Parameters: Returns: The reference to the
ContactMap, regardless of inplaceReturn type: Raises: TypeError– threshold must be int or float
-
find(register, altloc=False, strict=False, inverse=False)[source]¶ Find all contacts with one or both residues in
registerParameters: Returns: A modified version of the
ContactMapcontaining the found contactsReturn type:
-
get_contact_density(bw_method='amise')[source]¶ Calculate the contact density in the contact map using Gaussian kernels
Various algorithms can be used to estimate the bandwidth. To calculate the bandwidth for an 1D data array
Xwithndata points andddimensions, the listed algorithms have been implemented. Please note, in rules 2 and 3, the value of \(\sigma\) is the smaller of the standard deviation ofXor the normalized interquartile range.Parameters: bw_method (str, optional) – The bandwidth estimator to use [default: amise]
Returns: The list of per-residue density estimates
Return type: Raises: ImportError– Cannot find scikit-learn packageValueError– Undefined bandwidth methodValueError–ContactMapis empty
-
get_jaccard_index(other)[source]¶ Calculate the Jaccard index between two
ContactMapinstancesThis score analyzes the difference of the predicted contacts from two maps,
\[J_{x,y}=\frac{\left|x \cap y\right|}{\left|x \cup y\right|}\]where \(x\) and \(y\) are the sets of predicted contacts from two different predictors, \(\left|x \cap y\right|\) is the number of elements in the intersection of \(x\) and \(y\), and the \(\left|x \cup y\right|\) represents the number of elements in the union of \(x\) and \(y\).
The J-score has values in the range of \([0, 1]\), with a value of \(1\) corresponding to identical contact maps and \(0\) to dissimilar ones.
Parameters: other ( ContactMap) – A ConKitContactMapReturns: The Jaccard index Return type: float See also
Warning
The Jaccard distance ranges from \([0, 1]\), where \(1\) means the maps contain identical contacts pairs.
Note
The Jaccard index is different from the Jaccard distance mentioned in [1]. The Jaccard distance corresponds to \(1-Jaccard_{index}\).
[1] Q. Wuyun, W. Zheng, Z. Peng, J. Yang (2016). A large-scale comparative assessment of methods for residue-residue contact prediction. Briefings in Bioinformatics, [doi: 10.1093/bib/bbw106].
-
highest_residue_number¶ The highest residue sequence number among contacts in the
ContactMapReturns: Highest residue sequence number in the contact map Return type: int
-
long_range¶ The long range contacts found
ContactMapLong range contacts are defined as 24 <= x residues apart
Returns: A copy of the ContactMapwith long-range contacts onlyReturn type: ContactMapSee also
-
match(other, add_false_negatives=False, match_other=False, remove_unmatched=False, renumber=False, inplace=False)[source]¶ Modify both hierarchies so residue numbers match one another.
This function is key when plotting contact maps or visualising contact maps in 3-dimensional space. In particular, when residue numbers in the structure do not start at count 0 or when peptide chain breaks are present.
Parameters: - add_false_negatives (bool) –
Add false negatives to the self, which are contacts in other but not in self
Required for
recall()and can be undone withremove_false_negatives() - other (
ContactMap) – A ConKitContactMap - match_other (bool, optional) – Match other to self [default: False]
- remove_unmatched (bool, optional) – Remove all unmatched contacts [default: False]
- renumber (bool, optional) –
Renumber the
res_seqentries [default: False]If
True,res1_seqandres2_seqchanges butidremains the same - inplace (bool, optional) – Replace the saved order of contacts [default: False]
Returns: ContactMapinstance, regardless of inplaceReturn type: Raises: ValueError– At least one of the input mapsContactMapis emptyValueError– Error creating reliable keymap matching the sequence inContactMapRuntimeError– Error matching the contacts inContactMap
- add_false_negatives (bool) –
-
match_naive(other, add_false_negatives=False, match_other=False, inplace=False)[source]¶ Modify both hierarchies so residue numbers match one another
This function performs a naive match. It assumes the numbering in both contact maps is equivalent and it will not attempt to perform a sequence alignment.
Parameters: - other (
ContactMap) – A ConKitContactMap - add_false_negatives (bool) –
Add false negatives to the self, which are contacts in other but not in self
Required for
recall()and can be undone withremove_false_negatives() - match_other (bool, optional) – Match other to self [default: False]
- inplace (bool, optional) – Replace the original contacts [default: False]
Returns: ContactMapinstance, regardless of inplaceReturn type: - other (
-
medium_range¶ The medium range contacts found
ContactMapMedium range contacts are defined as 12 <= x <= 23 residues apart
Returns: A copy of the ContactMapwith medium-range contacts onlyReturn type: ContactMapSee also
-
ncontacts The number of
ContactinstancesReturns: The number of contacts in the ContactMapReturn type: int
-
precision The precision (Positive Predictive Value) score
The precision value is calculated by analysing the true and false postive contacts.
\[Precision=\frac{TruePositives}{TruePositives + FalsePositives}\]The status of each contact, i.e true or false positive status, can be determined by running the
match()function providing a reference structure.Returns: The calculated precision score Return type: float
-
recall¶ The Recall (Sensitivity) score
The recall value is calculated by analysing the true positive and false negative contacts.
\[Recall=\frac{TruePositives}{TruePositives + FalseNegatives}\]The status of each contact, i.e true positive and false negative status, can be determined by running the
match()function providing a reference structure.Note
To determine and save the false negatives, please use the add_false_negatives keyword when running the
match()function.You may wish to run
remove_false_negatives()afterwards.Returns: The calculated recall score Return type: float
-
reindex(index, altloc=False, inplace=False)[source]¶ Re-index the
ContactMapParameters: Returns: The reference to the
ContactMap, regardless of inplaceReturn type: Raises: ValueError– Index must be positive
-
remove_false_negatives(inplace=False)[source]¶ Remove false negatives from the contact map
Parameters: Returns: The reference to the
ContactMap, regardless of inplaceReturn type:
-
remove_neighbors(min_distance=5, max_distance=9223372036854775807, inplace=False)[source]¶ Remove contacts between neighboring residues
The algorithm works by keeping contact pairs that satisfy
min_distance<=x<=max_distanceParameters: Returns: The reference to the
ContactMap, regardless of inplaceReturn type:
-
repr_sequence The representative
Sequenceassociated with theContactMapThe peptide sequence constructed from the available contacts using the normal res_seq positions
Returns: Return type: SequenceRaises: TypeError– Sequence undefinedSee also
-
repr_sequence_altloc The representative altloc
Sequenceassociated with theContactMapThe peptide sequence constructed from the available contacts using the
res_altseqpositionsReturns: Return type: SequenceRaises: TypeError– Sequence undefinedSee also
-
rescale(inplace=False)[source]¶ Rescale the raw scores in
ContactMapParameters: inplace (bool, optional) – Replace the saved order of contacts [default: False] Returns: The reference to the ContactMap, regardless of inplaceReturn type: ContactMap
-
sequence The
Sequenceassociated with theContactMapReturns: A SequenceobjectReturn type: SequenceSee also
-
set_scalar_score()[source]¶ Calculate and set the
scalar_scorefor theContactMapThis score is a scaled score for all raw scores in a contact map. It is defined by the formula
\[{x}'=\frac{x}{\overline{d}}\]where \(x\) corresponds to the raw score of each predicted contact and \(\overline{d}\) to the mean of all raw scores.
This score is described in more detail in [2].
[2] S. Ovchinnikov, L. Kinch, H. Park, Y. Liao, J. Pei, D.E. Kim, H. Kamisetty, N.V. Grishin, D. Baker (2015). Large-scale determination of previously unsolved protein structures using evolutionary information. Elife 4, e09248.
-
set_sequence_register(altloc=False)[source]¶ Assign the amino acids from
Sequenceto allContactinstancesParameters: altloc (bool) – Use the res_altlocpositions [default: False]Raises: ValueError– Undefined sequence
-
short_range¶ The short range contacts found
ContactMapShort range contacts are defined as 6 <= x <= 11 residues apart
Returns: A copy of the ContactMapwith short-range contacts onlyReturn type: ContactMapSee also
-
singletons¶ Singleton contact pairs in the current
ContactMapContacts are identified by a distance-based grouping analysis. A
Contactis classified as singleton if not other contacts are found within 2 residues.Returns: Return type: ContactMap
-
slice_map(l_factor, seq_len=None, inplace=False)[source]¶ Slice the contact map using a L * factor
Parameters: Returns: The reference to the
ContactMap, regardless of inplaceReturn type: Raises: ValueError– Either seq_len must be provided orsequencedefined
-
sort(kword, reverse=False, inplace=False)[source]¶ Sort the
ContactMapParameters: Returns: The reference to the
ContactMap, regardless of inplaceReturn type: Raises: ValueError–kwordnot inContactMap
-
to_string()[source]¶ Return the
ContactMapasstr
-
top_contact The first
ContactentryReturns: The first Contactentry inContactFileReturn type: Contact
-