>>> import conkit.io
>>> conpred = conkit.io.read("toxd/toxd.mat", "ccmpred").top_map
>>> conpred.sequence = conkit.io.read("toxd/toxd.fasta", "fasta").top_sequence
>>> conpred.set_sequence_register()
Note, we’ve chosen the top ContactMap
immediately because this file only contained a single one. We also added the sequence straight away and assigned the sequence register using the set_sequence_register()
function. The latter makes sure that each residue number has the right amino acid assigned.
Similar to the example on plotting a ContactMap
, this is something you will almost certainly do every single time you obtain a contact prediction. Head over to the Contact Map Plotting example in case you haven’t looked at it.
Let’s assume we have created the following conkit.core.contactmap.ContactMap
plot
Looking at this conkit.core.contactmap.ContactMap
tells us that there is most likely a β-strand between residues 13-23 and 33-34. The spread of the contacts also gives us an indication that we are dealing with a monomeric protein structure. However, our contact prediction contains a lot more information that is not visible from this plot.
Assume you have prior knowledge about the protein target, in this case PDB entry 1DTX. Let’s assume that this prior knowledge includes information about a residue, e.g. residue 34. As a result, we might want to look at all contacts related to residue 34 in our conkit.core.contactmap.ContactMap
to see specifically if any relevant contacts were predicted.
In ConKit, we can use the find()
function to extract all contacts containing one or more residues.
>>> residue_34_map = conpred.find([34])
>>> print(residue_34_map)
ContactMap(id="map_1", ncontacts=59)
This print statement in the last line indicates that there a total of 59 contacts are associated with residue 34. Let’s rank them to see the most confidently predicted contact involving residue 34.
We use sort()
for ranking our map. We want to sort it by the raw_score
in reverse order, i.e. largest values at the top of the list. We don’t want to create a new ContactMap
but instead apply the changes to the existing one, thus we use the inplace=True
keyword argument.
>>> residue_34_map.sort("raw_score", reverse=True, inplace=True)
>>> print(residue_34_map[0])
Contact(id="(21, 34)" res1="P" res1_chain="" res1_seq=21 res2="R" res2_chain="" res2_seq=34 raw_score=0.459334760904)
This shows that the most confidently predicted contact is between residues 21 and 34 with a prediction score of 0.46. Considering our conkit.core.contactmap.ContactMap
from the step 2, this information would make sense as we assumed β-sheet in that region of residues.
Note
You can use the conkit-precision
script to achieve the same goal.
Finally, in some cases you either have already existing structural information, such as a crystal structure, or a predicted model and you want to assess how accurate or satisfied your contact prediction is. For this kind of analysis, the precision score is the ideal measure. It compares one set of contacts against a second and calculates the ratio between the matches and the total number of contacts compared.
In ConKit, this functionality is provided via the precision
attribute of each ContactMap
. If we want to use the latter, we need to match()
the two ContactMap
’s first. The following Python code shows the entire process.
>>> pdb = conkit.io.read("toxd/toxd.pdb", "pdb").top_map
>>> conpred.remove_neighbors(inplace=True)
>>> conpred.sort("raw_score", reverse=True, inplace=True)
>>> conpred.match(pdb, inplace=True)
>>> print(conpred[:59].precision)
0.59322033898305082
The output from this final print
statement tells us that the precision score for the 59 most confidently predicted contacts is 0.59.
We could also ask for the first 30 contacts only or the last 20. Each conkit.core.contactmap.ContactMap
understands Python slices, and thus any fraction can be successfully extracted.
>>> # Calculate the precision scores for various L threshold contacts
>>> for factor in (0.5, 1.0, 1.5):
... ncontacts = int(conpred.sequence.seq_len * factor)
... precision = conpred[:ncontacts].precision
... print("Ncontacts %d at factor %.1f ==> precision score %.3f" % (ncontacts, factor, precision))
Ncontacts 29 at factor 0.5 ==> precision score 0.828
Ncontacts 59 at factor 1.0 ==> precision score 0.593
Ncontacts 88 at factor 1.5 ==> precision score 0.432
As you can see, it’s simple to calculate late it for the three different factors 0.5
, 1.0
and 1.5
, i.e. L/2
, L
and 3L/2
.
If you would like to automate this process, or rather visualise the data, you can use the PrecisionEvaluationFigure
class. It accepts a matched ContactMap
and calculates the precision values at different thresholds to produce a plot.
To produce a precision evaluation plot with factors between 0.0 and 5.0 with a stepwise increase of 0.1, use the following command.
>>> conkit.plot.PrecisionEvaluationFigure(conpred, min_cutoff=0.0, max_cutoff=5.0, cutoff_step=0.1)
Note
You can also generate the final plot using the conkit-plot
script.