I belong to the Comparative genomics group headed by Chris Ponting.

My current research interests include

  • Comparative genomics within and between clades.
  • Evolutionary rate estimation.
  • Visualization of genomic data.

Scientific career

1992/1996Student of Biotechnology at the Technical University of Braunschweig, Germany
1996/1997Trainee in Haruki Nakamura's group at the Biomolecular Engineering Research Institute (BERI), Osaka
1998/2002PhD at the EBI in Hinxton, Cambridge
2002/2004PostDoc in Liisa Holm's Bioinformatics group at the University of Helsinki, Finland
2004/presentInvestigator scientist in the Comparative genomics group at the MRC Functional Genetics Unit, Oxford, Uk


I mostly program in python, C++ and perl. I build pipelines using shell scripts, unix tools and makefiles. I have built a few web-servers using zope. The software I have written is open-source and I am happy to assist with requests. Software availabe for download as is are:
RADAR C A program to find repeats in protein sequences.
ADDA C++ An algorithm to find domains in protein sequences.
alignlib C++ A C++ library with python bindings for classic sequence alignment.


I have been involved in the following projects:
OPTIC OPTIC: Orthologous and paralogous transcripts in clades.
Platypus genome analysis Assisting Chris Ponting with gene prediction and orthology analysis.
Monodelphis genome analysis Assisting Chris Ponting and Leo Goodstadt with gene prediction and analysis.
Comparative genomics of fruit flies Gene prediction, orthology and rate analysis in 12 fruit flies. See also the AAA Wiki
ADDA An algorithm to find domains in protein sequences (and a database).
PairsDB A database of all-vs-all pre-computed BLAST and PSIBLAST alignments.
Global Trace Graph A method for fold prediction.
RADAR A program to find repeats in protein sequences.
RSDB A database of non-redundant protein sequences at different levels of similarity.


Mikkelsen, TS., Wakefield, MJ., Aken, B. et al.. (2007). Genome of the marsupial monodelphis domestica reveals innovation in non-coding sequences. Nature 447, 167-177.

Heger, A., Mallick, S., Wilton, C. et al.. (2007). The global trace graph, a novel paradigm for searching protein sequence databases.. Bioinformatics 23, 2361-2367.

Goodstadt, L., Heger, A., Webber, C. et al.. (2007). An analysis of the gene complement of a marsupial, monodelphis domestica: evolution of lineage-specific genes and giant chromosomes.. Genome Res 17, 969-981.

Heger, A. and Ponting, CP.. (2007). Variable strength of translational selection among 12 drosophila species.. Genetics 177, 1337-1348.

Heger, A. and Ponting, CP.. (2007). Evolutionary rate analyses of orthologs and paralogs from 12 drosophila genomes.. Genome Res 17, 1837-1849.

Heger, A. and Ponting, CP.. (2007). Optic: orthologous and paralogous transcripts in clades.. Nucleic Acids Res , .

Clark, AG., Eisen, MB., Smith, DR. et al.. (2007). Evolution of genes and genomes on the drosophila phylogeny.. Nature 450, 203-218.

Heger, A., Korpelainen, E., Hupponen, T. et al.. (2007). Pairsdb atlas of protein sequence space.. Nucleic Acids Res , .

Heger, A., Wilton, CA., Sivakumar, A. et al.. (2005). Adda: a domain database with global coverage of the protein universe. Nucleic Acids Res 33, D188-91.

Lindblad-Toh, K., Wade, CM., Mikkelsen, TS. et al.. (2005). Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438, 803-819.

Heger, A., Lappe, M. and Holm, L.. (2004). Accurate detection of very sparse sequence motifs. J Comput Biol 11, 843-857.

Heger, A. and Holm, L.. (2003). Exhaustive enumeration of protein domain families. J Mol Biol 328, 749-767.

Heger, A. and Holm, L.. (2003). More for less in structural genomics. J Struct Funct Genomics 4, 57-66.

Heger, A. and Holm, L.. (2003). Sensitive pattern discovery with 'fuzzy' alignments of distantly related proteins. Bioinformatics 19 Suppl 1, i130-7.

Heger, A. and Holm, L.. (2001). Picasso: generating a covering set of protein family profiles. Bioinformatics 17, 272-279.

Dietmann, S., Park, J., Notredame, C. et al.. (2001). A fully automatic evolutionary classification of protein folds: dali domain dictionary version 3. Nucleic Acids Res 29, 55-57.

Park, J., Holm, L., Heger, A. et al.. (2000). Rsdb: representative protein sequence databases have high information content. Bioinformatics 16, 458-464.

Heger, A. and Holm, L.. (2000). Rapid automatic detection and alignment of repeats in protein sequences. Proteins 41, 224-237.

Heger, A. and Holm, L.. (2000). Towards a covering set of protein family profiles. Prog Biophys Mol Biol 73, 321-337.

Park, J., Dietmann, S., Heger, A. et al.. (2000). Estimating the significance of sequence order in protein secondary structure and prediction. Bioinformatics 16, 978-987.

Heger, A., Higo, J. and Nakamura, H.. (1997). Model building study of complex structures using nmrchemical shift change information.. Proc. Japan Acad. 73B, 109-113.

Private Matter

I play Ultimate Frisbee in Reading and Oxford for a team called Discuits.
