hpotk.algorithm.similarity package

hpotk.algorithm.similarity.calculate_ic_for_annotated_items(items: AnnotatedItemContainer, ontology: MinimalOntology, base: float | None = None, module_root: TermId | None = None, use_pseudocount: bool = False) AnnotationIcContainer[source]

Calculate information content (IC) for each TermId based on a collection of annotated items.

The calculation can be done for an ontology module - only the descendants of the provided module_root will be included in the analysis. If assume_annotated is True, then the count of all ontology/module terms is set to at least 1, even for those terms that do not annotate the items.

Parameters:
  • items – a collection of hpotk.annotations.AnnotatedItems

  • ontology – ontology with concepts used to annotate the items

  • base – information content base or None for e (produces IC in nats)

  • module_root – the root of the ontology module to calculate the IC for.

  • use_pseudocount – assume that each ontology term annotates at least one of the items.

Returns:

a container with mappings from TermId to information content in nats, bits, or else, depending on the base value

class hpotk.algorithm.similarity.AnnotationIcContainer[source]

Bases: Mapping[TermId, float], MetadataAware

A container for storing information content of item annotations.

to_csv(fh: str | IO)[source]

Store the term ID to IC mapping with metadata into a CSV file. :param fh: where to write the :return:

class hpotk.algorithm.similarity.SimpleAnnotationIcContainer(data: Mapping[TermId, float], metadata: Mapping[str, str] | None = None)[source]

Bases: AnnotationIcContainer

An implementation of a AnnotationIcContainer that is backed by a dict.

property metadata: MutableMapping[str, str]

Get a mapping with entity metadata.

class hpotk.algorithm.similarity.SimilarityContainer(metadata: Mapping[str, str] | None = None)[source]

Bases: MetadataAware, Sized

A container for pre-calculated semantic similarity results.

get_similarity(a: str, b: str) float[source]

Get similarity of two entries a and b.

Parameters:
  • a – an item, e.g. HP:1234567

  • b – another item, e.g. HP:9876543

Returns:

a non-negative semantic similarity

set_similarity(a: str, b: str, sim: float)[source]

Set semantic similarity for items a and b. :param a: an item, e.g. HP:1234567 :param b: another item, e.g. HP:9876543 :param sim: a non-negative semantic similarity

items()[source]

Get a generator of semantic similarities.

Each item is a tuple with three items: * left item (str) * right item (str) * similarity (float)

property metadata: Mapping[str, str]

Get a mapping with entity metadata.

to_csv(fh: str | IO)[source]
static from_csv(fh: str | IO)[source]
hpotk.algorithm.similarity.precalculate_ic_mica_for_hpo_concept_pairs(ic: AnnotationIcContainer, hpo: MinimalOntology) SimilarityContainer[source]

Precalculate Resnik semantic similarity for HPO TermId pairs.

Parameters:
  • ic – a mapping for obtaining an information content of a TermId.

  • hpo – HPO ontology.

Returns:

a mapping with Resnik similarity for TermId pairs where the similarity \(s>0\).