HPO annotations

The HPO projects offers phenotype annotations to describe the connection between diseases and HPO terms. The annotations are available for download from HPO annotation release site in tabular format.

Load HPO annotations

HPO toolkit provides means to parse and work with the disease models and here we show how to load the HPO annotation file.

The loader needs HPO to Q/C the annotations, hence we must load HPO first. We need the HPO version corresponding to the annotations version, although a more recent HPO version should generally work as well.

>>> import hpotk

>>> base_url = 'https://github.com/obophenotype/human-phenotype-ontology/releases/download/v2023-10-09/'
>>> hpo = hpotk.load_minimal_ontology(base_url + 'hp.json')

Now, we can load a the annotation file.

>>> from hpotk.annotations.load.hpoa import SimpleHpoaDiseaseLoader

>>> loader = SimpleHpoaDiseaseLoader(hpo)
>>> diseases = loader.load(base_url + 'phenotype.hpoa')
>>> diseases.version
>>> len(diseases)

We loaded diseases, an instance of hpotk.annotations.HpoDiseases with 12468 disease models.

Now, we can iterate over all diseases:

>>> sum(1 for disease in diseases)

or we can get a disease for a given identifier:

>>> disease = diseases['OMIM:256000']
>>> disease.name
'Leigh syndrome'

The identifier can be a CURIE str (above) or a hpotk.model.TermId:

>>> disease = diseases[hpotk.TermId.from_curie('OMIM:256000')]
>>> disease.name
'Leigh syndrome'

Disease model

HPO toolkit provides hpotk.annotations.HpoDisease to model the disease data. HpoDisease is a simple data class with a limited functionality on top of just providing the data. Let’s check out the available attributes.

We can access the identifier and name of the disease:

>>> str(disease.identifier)
>>> disease.name
'Leigh syndrome'

We can access the phenotype annotations of the disease. In case of Leigh disease there are 30 annotations:

>>> len(disease.annotations)

Let’s examine the first annotation in greater detail:

>>> a = next(iter(disease.annotations))
>>> str(a.identifier)
>>> hpo.get_term_name(a)

See also

See hpotk.annotations.HpoDiseaseAnnotation for more details on the phenotype annotations.

We can also access the modes of inheritance:

>>> for moi in sorted(disease.modes_of_inheritance):
...   print(moi, hpo.get_term_name(moi))
HP:0000007 Autosomal recessive inheritance
HP:0001427 Mitochondrial inheritance