Core modules for hierarchy construction
Contact
(res1_seq, res2_seq, raw_score, distance_bound=(0, 8))[source]¶Bases: conkit.core.Entity.Entity
A contact pair template to store all associated information
Examples
>>> from conkit.core import Contact
>>> contact = Contact(1, 25, 1.0)
>>> print(contact)
Contact(id="(1, 25)" res1="A" res1_seq=1 res2="A" res2_seq=25 raw_score=1.0)
Attributes
distance_bound |
|
id |
|
is_false_positive |
|
is_true_positive |
|
lower_bound |
|
raw_score |
|
res1 |
|
res2 |
|
res1_chain |
|
res2_chain |
|
res1_seq |
|
res2_seq |
|
res1_altseq |
|
res2_altseq |
|
scalar_score |
|
status |
|
upper_bound |
|
weight |
Methods
add |
|
copy |
Generic (shallow and deep) copying operations. |
deepcopy |
|
define_false_positive |
|
define_true_positive |
|
remove |
distance_bound
¶The lower and upper distance boundary values of a contact pair in Ångstrom [Default: 0-8Å].
is_false_positive
¶A boolean status for the contact
Parameters: | is_false_positive : bool
|
---|
is_true_positive
¶A boolean status for the contact
Parameters: | is_true_positive : bool
|
---|
lower_bound
¶The lower distance boundary value
raw_score
¶The prediction score for the contact pair
res1
¶The amino acid of residue 1 [default: X]
res1_altseq
¶The alternative residue sequence number of residue 1
res1_chain
¶The chain for residue 1
res1_seq
¶The residue sequence number of residue 1
res2
¶The amino acid of residue 2 [default: X]
res2_altseq
¶The alternative residue sequence number of residue 2
res2_chain
¶The chain for residue 2
res2_seq
¶The residue sequence number of residue 2
status
¶An indication of the residue status, i.e true positive, false positive, or unknown
upper_bound
¶The upper distance boundary value
weight
¶A separate internal weight factor for the contact pair
ContactMap
(id)[source]¶Bases: conkit.core.Entity.Entity
A contact map object representing a single prediction
The ContactMap
class represents a data structure to hold a single
contact map prediction in one place. It contains functions to store,
manipulate and organise Contact
instances.
Examples
>>> from conkit.core import Contact, ContactMap
>>> contact_map = ContactMap("example")
>>> contact_map.add(Contact(1, 10, 0.333))
>>> contact_map.add(Contact(5, 30, 0.667))
>>> print(contact_map)
ContactMap(id="example" ncontacts=2)
Attributes
coverage |
|
id |
|
ncontacts |
|
precision |
|
repr_sequence |
|
repr_sequence_altloc |
|
sequence |
|
top_contact |
Methods
add |
|
assign_sequence_register |
|
calculate_jaccard_index |
|
calculate_scalar_score |
|
copy |
Generic (shallow and deep) copying operations. |
deepcopy |
|
find |
|
match |
|
plot_map |
|
remove |
|
remove_neighbors |
|
rescale |
|
sort |
assign_sequence_register
(altloc=False)[source]¶Assign the amino acids from Sequence
to all Contact
instances
Parameters: | altloc : bool
|
---|
calculate_jaccard_index
(other)[source]¶Calculate the Jaccard index between two ContactMap
instances
This score analyzes the difference of the predicted contacts from two maps,
where \(x\) and \(y\) are the sets of predicted contacts from two different predictors, \(\left|x \cap y\right|\) is the number of elements in the intersection of \(x\) and \(y\), and the \(\left|x \cup y\right|\) represents the number of elements in the union of \(x\) and \(y\).
The J-score has values in the range of \([0, 1]\), with a value of \(1\) corresponding to identical contact maps and \(0\) to dissimilar ones.
Parameters: | other :
|
---|---|
Returns: | float
|
Warning
The Jaccard distance ranges from \([0, 1]\), where \(1\) means the maps contain identical contacts pairs.
Notes
The Jaccard index is different from the Jaccard distance mentioned in [1]. The Jaccard distance corresponds to \(1-Jaccard_{index}\).
[1] | Q. Wuyun, W. Zheng, Z. Peng, J. Yang (2016). A large-scale comparative assessment of methods for residue-residue contact prediction. Briefings in Bioinformatics, [doi: 10.1093/bib/bbw106]. |
calculate_scalar_score
()[source]¶Calculate a scaled score for the ContactMap
This score is a scaled score for all raw scores in a contact map. It is defined by the formula
where \(x\) corresponds to the raw score of each predicted contact and \(\overline{d}\) to the mean of all raw scores.
The score is saved in a separate Contact
attribute called
scalar_score
This score is described in more detail in [2].
[2] | S. Ovchinnikov, L. Kinch, H. Park, Y. Liao, J. Pei, D.E. Kim, H. Kamisetty, N.V. Grishin, D. Baker (2015). Large-scale determination of previously unsolved protein structures using evolutionary information. Elife 4, e09248. |
coverage
¶The sequence coverage score
The coverage score is calculated by analysing the number of residues covered by the predicted contact pairs.
The coverage score is calculated by dividing the number of contacts \(x_{cov}\) by the number of residues in the sequence \(L\).
Returns: | cov : float
|
---|
See also
find
(indexes, altloc=False)[source]¶Find all contacts associated with index
Parameters: | index : list, tuple
altloc : bool
|
---|---|
Returns: |
|
match
(other, remove_unmatched=False, renumber=False, inplace=False)[source]¶Modify both hierarchies so residue numbers match one another.
This function is key when plotting contact maps or visualising contact maps in 3-dimensional space. In particular, when residue numbers in the structure do not start at count 0 or when peptide chain breaks are present.
Parameters: | other :
remove_unmatched : bool, optional
renumber : bool, optional
inplace : bool, optional
|
---|---|
Returns: | hierarchy_mod
|
Raises: | ValueError
|
ncontacts
¶The number of Contact
instances in the ContactMap
Returns: | ncontacts : int
|
---|
plot_map
(*args, **kwargs)[source]¶Produce a 2D contact map plot
Warning
This function has been deprecated. Please use conkit.plot.ContactMapFigure
instead.
precision
¶The precision (Positive Predictive Value) score
The precision value is calculated by analysing the true and false postive contacts.
The status of each contact, i.e true or false positive status, can be
determined by running the match()
function providing a reference
structure.
Returns: | ppv : float
|
---|
See also
remove_neighbors
(min_distance=5, inplace=False)[source]¶Remove contacts between neighboring residues
Parameters: | min_distance : int, optional
inplace : bool, optional
|
---|---|
Returns: | contact_map :
|
repr_sequence
¶The representative Sequence
associated with the ContactMap
The peptide sequence constructed from the available contacts using the normal res_seq positions
Returns: | sequence : |
---|---|
Raises: | TypeError
|
See also
repr_sequence_altloc
¶The representative altloc Sequence
associated with the ContactMap
The peptide sequence constructed from the available contacts using the altloc res_seq positions
Returns: | sequence : |
---|---|
Raises: | ValueError
|
See also
rescale
(inplace=False)[source]¶Rescale the raw scores in ContactMap
Rescaling of the data is done to normalize the raw scores to be in the range [0, 1]. The formula to rescale the data is:
\(x\) is the original value and \(d\) are all values to be rescaled.
Parameters: | inplace : bool, optional
|
---|---|
Returns: | contact_map :
|
sequence
¶The Sequence
associated with the ContactMap
Returns: | Sequence |
---|
See also
sort
(kword, reverse=False, inplace=False)[source]¶Sort the ContactMap
Parameters: | kword : str
reverse : bool, optional
inplace : bool, optional
|
---|---|
Returns: | contact_map :
|
Raises: | ValueError
|
top_contact
¶The first Contact
entry in ContactMap
Returns: | top_contact :
|
---|
ContactFile
(id)[source]¶Bases: conkit.core.Entity.Entity
A contact file object representing a single prediction file
The contact file class represents a data structure to hold all predictions with a single contact map file. It contains functions to store, manipulate and organise contact maps.
Examples
>>> from conkit.core import ContactMap, ContactFile
>>> contact_file = ContactFile("example")
>>> contact_file.add(ContactMap("foo"))
>>> contact_file.add(ContactMap("bar"))
>>> print(contact_file)
ContactFile(id="example" nseqs=2)
Attributes
author |
|
method |
|
remark |
|
target |
|
top_map |
Methods
add |
|
copy |
Generic (shallow and deep) copying operations. |
deepcopy |
|
remove |
|
sort |
The author of the ContactFile
method
¶The ContactFile
-specific method
remark
¶The ContactFile
-specific remarks
sort
(kword, reverse=False, inplace=False)[source]¶Sort the ContactFile
Parameters: | kword : str
reverse : bool, optional
inplace : bool, optional
|
---|---|
Returns: | contact_map :
|
Raises: | ValueError
|
target
¶The target name
top_map
¶The first ContactMap
entry in ContactFile
Returns: | top_map :
|
---|
Sequence
(id, seq)[source]¶Bases: conkit.core.Entity.Entity
A sequence template to store all associated information
Examples
>>> from conkit.core import Sequence
>>> sequence_entry = Sequence("example", "ABCDEF")
>>> print(sequence_entry)
Sequence(id="example" seq="ABCDEF" seqlen=6)
Attributes
id |
|
remark |
|
seq |
|
seq_len |
Methods
add |
|
align_global |
|
align_local |
|
copy |
Generic (shallow and deep) copying operations. |
deepcopy |
|
remove |
align_global
(other, id_chars=2, nonid_chars=1, gap_open_pen=-0.5, gap_ext_pen=-0.1, inplace=False)[source]¶Generate a global alignment between two Sequence
instances
Parameters: | other : id_chars : int, optional nonid_chars : int, optional gap_open_pen : float, optional gap_ext_pen : float, optional inplace : bool, optional
|
---|---|
Returns: |
|
align_local
(other, id_chars=2, nonid_chars=1, gap_open_pen=-0.5, gap_ext_pen=-0.1, inplace=False)[source]¶Generate a local alignment between two Sequence
instances
Parameters: | other : id_chars : int, optional nonid_chars : int, optional gap_open_pen : float, optional gap_ext_pen : float, optional inplace : bool, optional
|
---|---|
Returns: |
|
seq
¶The protein sequence as str
seq_len
¶The protein sequence length
SequenceFile
(id)[source]¶Bases: conkit.core.Entity.Entity
A sequence file object representing a single sequence file
The SequenceFile
class represents a data structure to hold
Sequence
instances in a single sequence file. It contains
functions to store and analyze sequences.
Examples
>>> from conkit.core import Sequence, SequenceFile
>>> sequence_file = SequenceFile("example")
>>> sequence_file.add(Sequence("foo", "ABCDEF"))
>>> sequence_file.add(Sequence("bar", "ZYXWVU"))
>>> print(sequence_file)
SequenceFile(id="example" nseqs=2)
Attributes
id |
|
is_alignment |
|
nseqs |
|
remark |
|
status |
|
top_sequence |
Methods
add |
|
calculate_freq |
|
calculate_meff |
|
copy |
Generic (shallow and deep) copying operations. |
deepcopy |
|
remove |
|
sort |
|
trim |
calculate_freq
()[source]¶Calculate the gap frequency in each alignment column
This function calculates the frequency of gaps at each position in the Multiple Sequence Alignment.
Returns: | list
|
---|---|
Raises: | MemoryError
RuntimeError
|
calculate_meff
(identity=0.7)[source]¶Calculate the number of effective sequences
This function calculates the number of effective sequences (Meff) in the Multiple Sequence Alignment.
The mathematical function used to calculate Meff is
Parameters: | identity : float, optional
|
---|---|
Returns: | int
|
Raises: | MemoryError
RuntimeError
ValueError
ValueError
|
is_alignment
¶A boolean status for the alignment
Returns: | bool
|
---|
nseqs
¶The number of Sequence
instances
in the SequenceFile
Returns: | int
|
---|
remark
¶The SequenceFile
-specific remarks
sort
(kword, reverse=False, inplace=False)[source]¶Sort the SequenceFile
Parameters: | kword : str
reverse : bool, optional
inplace : bool, optional
|
---|---|
Returns: |
|
Raises: | ValueError
|
status
¶An indication of the residue status, i.e true positive, false positive, or unknown
top_sequence
¶The first Sequence
entry in SequenceFile
Returns: |
|
---|
trim
(start, end, inplace=False)[source]¶Trim the SequenceFile
Parameters: | start : int
end : int
inplace : bool, optional
|
---|---|
Returns: |
|