hpotk.algorithm.similarity package
- hpotk.algorithm.similarity.calculate_ic_for_annotated_items(items: AnnotatedItemContainer, ontology: MinimalOntology, base: float | None = None, module_root: TermId | None = None, use_pseudocount: bool = False) AnnotationIcContainer [source]
Calculate information content (IC) for each
TermId
based on a collection of annotated items.The calculation can be done for an ontology module - only the descendants of the provided module_root will be included in the analysis. If assume_annotated is True, then the count of all ontology/module terms is set to at least 1, even for those terms that do not annotate the items.
- Parameters:
items – a collection of world items (e.g. diseases).
ontology – ontology with concepts used to annotate the items (e.g. Human Phenotype Ontology for diseases).
base – information content base or None for e (produces IC in nats)
module_root – the root of the ontology module to calculate the IC for.
use_pseudocount – assume that each ontology term annotates at least one item.
- Returns:
a container with mappings from
TermId
to information content in nats, bits, or else, depending on the base value
- class hpotk.algorithm.similarity.AnnotationIcContainer[source]
Bases:
Mapping
[TermId
,float
],MetadataAware
A container for storing information content of item annotations.
- class hpotk.algorithm.similarity.SimilarityContainer(metadata: Mapping[str, str] | None = None)[source]
Bases:
MetadataAware
,Sized
A container for pre-calculated semantic similarity results.
- get_similarity(a: str, b: str) float [source]
Get similarity of two entries a and b.
- Parameters:
a – an item, e.g. HP:1234567
b – another item, e.g. HP:9876543
- Returns:
a non-negative semantic similarity
- items()[source]
Get a generator of semantic similarities.
Each item is a tuple with three items: * left item (str) * right item (str) * similarity (float)
- property metadata: MutableMapping[str, str]
Get a mapping with entity metadata.
- hpotk.algorithm.similarity.precalculate_ic_mica_for_hpo_concept_pairs(ic: AnnotationIcContainer, hpo: MinimalOntology) SimilarityContainer [source]
Precalculate Resnik semantic similarity for HPO
TermId
pairs.- Parameters:
ic – a mapping for obtaining an information content of a
TermId
.hpo – HPO ontology.
- Returns:
a mapping with Resnik similarity for
TermId
pairs where the similarity \(s>0\).