THE HOMEOBOX PAGE


maintained by Thomas R. Bürglin 

For the latest classification update, see Bürglin, T.R.  (2011). Homeodomain subtypes and functional diversity. Subcell. Biochem., 52, 95-122.

And for plants, see
Mukherjee, K., Brocchieri, L., and Bürglin, T.R. (2009). A comprehensive classification and evolutionary analysis of plant homeobox genes. Mol. Biol. Evol., 12, 2775-2794

On this page I try to maintain information relevant to homeobox genes (in particular about classification/evolution). Should you find any of this information out of date or wrong, or if you simply want to add some information, please send me an email ().

This page and all figures are Copyright by Thomas R. Bürglin
 

What is a homeobox     Structure of the homeodomain    Classification of homeobox genes     Compilations of homeobox genes


What is a homeobox?

"Since their discovery in 1983, homeobox genes, and the proteins they encode, the homeodomain proteins, have turned out to play important roles in the developmental processes of many multicellular organisms. While certainly not the only developmental control genes, they have been shown to play crucial roles from the earliest steps in embryogenesis - such as setting up an anterior-posterior gradient in the egg of the fruit fly Drosophila melanogaster - to the very latest steps in cell differentiation - such as the differentiation of neurons in the nematode Caenorhabditis elegans (C. elegans). They have a wide phylogenetic distribution, having been found in baker's yeast, plants, and all animal phyla that have been examined so far. Since their original discovery, hundreds of homeobox genes have been described" 

"The homeobox was originally described as a conserved DNA motif of about 180 base pairs. The protein domain encoded by the homeobox, the homeodomain, is thus about 60 amino acids long. The first genes found to encode homeodomain proteins were Drosophila developmental control genes, in particular homeotic genes, from which the name "homeo"box was derived. However, many homeobox genes are not homeotic genes; the homeobox is a sequence motif, while "homeotic" is a functional description for genes that cause homeotic transformations."

These excerpts were taken from Bürglin, T.R. (1996) Homeodomain Proteins. In Meyers, R.A. (ed.), Encyclopedia of Molecular Biology and Molecular Medicine, Vol 3., VCH Verlagsgesellschaft mbH, Weinheim, pp. 55-76.

Thus, in summary one can say that the homeodomain is a DNA-binding domain that occurs in proteins that are usually transcription factors. These transcription factors regulate the transcription of other genes and hence very frequently play important roles in development of multicellular organisms.



Some definitions

Hox genes: Hox genes are a subgroup of homeobox genes. In vertebrates these genes are found in gene clusters on the chromosomes. In mammals four such clusters exist, called Hox clusters. The gene name "Hox" has been restricted to name Hox cluster genes in vertebrates. Only genes in the HOX cluster should be named Hox genes. So note: homeobox genes are NOT Hox genes, Hox genes are a subset of homeobox genes.

HOX cluster: The term Hox cluster refers to a group of clustered homeobox genes, named Hox genes in vertebrates, that play important roles in pattern formation along the anterior-posterior body axis. In fact, the first homeobox genes discovered where those of the Drosophila homeotic gene clusters, i.e. the "Antennapedia complex" and the "Bithorax complex", which summarily are referred to as HOM-C (homeotic complex). This HOM-C complex in Drosophila is the evolutionary homolog of the vertebrate Hox clusters and the evolutionarily related homeobox gene clusters in other animals (i.e. chordates, insects, nematodes, etc.) are now also called HOX clusters.

homeodomain: a DNA-binding domain, usually about 60 amino acids in length, encoded by the homeobox.

homeobox: a fragment of DNA of about 180 basepairs (not counting introns), found in homeobox genes.


 

Structure of the homeodomain

Two different views of the Antennapedia homeodomain bound to DNA: View 1, View 2.
 
Here a rotating view of the homeodomain bound to DNA: Rotating View of Antennapedia bound to DNA
 

Do you have a homeobox gene?

Use this consensus to see how well your gene matches. The top is a consensus, amino acids listed under each position occur with smaller and smaller frequency. For more divergent homeobox genes, you don't have to have a match with one of the listed amino acids at each position, but you should have matches to the more conserved positions indicated by the 3 different kinds of symbols. Some infrequently occuring amino acids are not shown. Note: this consensus has been derived from >350 typical (i.e. 60 amino acid long) homeodomain sequences. It does not include highly divergent homeodomains, and atypical homeodomains, in particular the large TALE superclass (see below for more info on TALE).



Classification of homeobox genes

The most recent overview of homeobox genes and their classification can be found in:

Bürglin, T.R.  (2011). Homeodomain subtypes and functional diversity. Subcell. Biochem., 52, 95-122.

Homeodomain Proteins. Bürglin, T.R. ( 2005). In Meyers, R.A. (ed.), Encyclopedia of Molecular Cell Biology and Molecular Medicine, Wiley-VCH Verlag GmbH & Co., Weinheim, 179-222.

This is a fairly comprehensive overview. A few families and classes are still missing, and some fine classification still needs to be done (mainly within the Antennapedia superclass), but overall, the classification in this chapter will probably not be subject to major changes.
The following list are some of the sources on which this chapter is based:

A Comprehensive Classification of Homeobox Genes, Bürglin, T.R. (1994) in Guidebook to the Homeobox Genes. Duboule, D. (ed.) Oxford University Press, Oxford pp 25-71).

The Evolution of Homeobox genes. Bürglin, T.R. (1995) in 'Biodiversity and Evolution', Arai, R., Kato, M., Doi, Y. (eds.), The National Science Museum Foundation, Tokyo, pp. 291-336, ISBN 4-9900433-0-8, The National Science Museum Foundation, 7-20 Ueno Park, Taito-ku, Tokyo 110, Japan.

Analysis of TALE superclass homeobox genes (MEIS, PBC, KNOX, Iroquois, TGIF) reveals a novel domain conserved between plants and animals. Bürglin, T.R. (1997) Nucleic Acids Research, 25, 4173-4180.

The PBC domain contains a MEINOX domain: Coevolution of Hox and TALE homeobox genes? Bürglin, T. R. (1998). Dev. Genes. Evol., 208, 113-116.

Key points from that paper:
The PBC domain also shares similarity with the MEINOX domain.  Online version of the paper. 
Loss and gain of domains during evolution of cut superclass homeobox genes.  Bürglin, T.R., and Cassata, G. (2002). Int. J. Dev. Biol., 46, 115-123.
Key points from that paper:
The cut superclass of homeobox genes is comprised of four classes: Cux (with 3 cut domains and 1 homeodomain), ONECUT (with 1 cut domain and 1 homeodomain), CMP (Compass, with 1 Compass domain and 2 homeodomains), and SATB (with 1 COMPASS domain, 2 cut domains, and 1 homeodomain).

The Caenorhabditis elegans Six/sine oculis class homeobox gene ceh-32 is required for head morphogenesis. Dozier, C., Kagoshima, H. Niklaus, G., Cassata, G. and Bürglin, T.R. (2001).Dev. Biol., 236, 289-303.
       Contains a relatively recent phylogenetic tree of Six/so homeobox genes.

Regulation of ectodermal and excretory function by the C. elegans POU homeobox gene ceh-6. Bürglin, T.R., and Ruvkun, G. (2001). Development, 128, 779-790
       Contains a relatively recent phylogenetic tree of the POU homeobox genes.

A cluster of Drosophila homeobox genes involved in mesoderm differentiation programs.
Jagla, K., Bellard, M., and Frasch, M. (2001). Bioessays 23, 125-133.
Review of the Drosophila NK homeobox gene cluster

Dispersal of NK homeobox gene clusters in amphioxus and humans. Luke, G. N., Castro, L. F., McLay, K., Bird, C., Coulson, A., and Holland, P. W. (2003). Proc Natl Acad Sci U S A 100, 5292-5295.
The latest on the NK homeobox gene cluster in chordates.
NOTE: the notion of an EGH/EGHbox homeobox gene cluster that has been proposed earlier has been abandonened!

Evolution of homeobox genes: Q50 Paired-like genes founded the Paired class. Galliot, B., de Vargas, C., and Miller, D. (1999). Dev Genes Evol 209, 186-197.
Phylogenetic analysis of paired and paired-like homeobox genes.

Origin of the paired domain. Breitling, R., and Gerber, J. K. (2000). Dev Genes Evol 210, 644-650.
The paired domain is derived from a transposase.

Functions of LIM-homeobox genes.
Hobert, O., and Westphal, H. (2000). Trends Genet 16, 75-83.

Phylogenetic tree of LIM homeobox genes.


Many additional references are found in the 2005 "Homeodomain proteins" book chapter.

Additional information, some a bit dated:

  • Some families in the prd-like class have conserved motifs upstream of the homeodomain (e.g., ceh-10). Svendsen, P. C., and McGhee, J. D. (1995). The C. elegans neuronally expressed homeobox gene ceh-10 is closely related to genes expressed in the vertebrate eye. Develoment 121, 1253-1262.
  • Brand new class: a sea urchin homologue for C. elegans ceh-19 has been found. Popodi, E., Andrews, M. E., and Raff, R. A. (1995). A sea urchin homologue of ceh-19, an unusual homeobox-containing gene from a nematode. Gene 164, 367-368. E. Popodi suggest ceh19 as class name.
  • Structure of the LIM domain protein CRP determined: Perez-Alvarado et al., 1994. Nature Struct. Biol., 1, 388ff.
  • New class: bsh class (bsh = brain-specific homeobox, Drosophila), due to a C. elegans homologue.
  • Nomenclature change: XlHbox8, mouse ipf-1, rat idx-1 and stf-1 (I guess they must be the same) have undergone a name change, which is approved by the International Mouse Nomenclature Committee. These genes are now all called pdx-1 (C.V.E Wright, pers. comm.). (I hope XlHbox8 is indeed pdx-1, and not a future pdx-2...). The name of this family is still the Xlox family.
  • A novel small conserved domain is found between mouse Sax1 and Drosophila NK-1/S59 upstream of the homeodomain. Smith, S. T., and Jaynes, J. B. (1996). A conserved region of engrailed, shared among all en-, gsc-, Nk1-, Nk2- and msh-class homeoproteins, mediates active transcriptional repression in vivo. Development 122, 3141-3150.
  • It seems more and more likely that the plant (so far mainly Arabidopsis) genes Athb-x/HATx cannot be assigned to any particular animal homeobox gene class. Thus, this group of genes is grouped into its own class, the HD-ZIP class.
  • New class of homeobox genes, resulting from genes sequenced by the C. elegans genome project: Prh class (77% identity vertebrate - C. elegans).
  • A new class, the Bar class, is formed by C. elegans genes and the Drosophila BarH genes (aka OM) (77% identity between C. elegans and Drosophila). The vertebrate gene Barx1, as well as the Xom = Xvent = Xbra = Vox genes have affinities to the Bar genes, althought they do not seem to be direct vertebrate orthologues.
  • Xnot class: Xnot, zebrafish floating head, and a Drosophila gene (90Bre) form a new class. This class seems to have affinities to the ems class.
  • The Ptx1 = P-OTX gene (Lamonerie et al., 1996, Genes Dev. 10, 1284; Szeto et al., 1996 PNAS 93, 7706) is an orthologue of the C. elegans gene unc-30 (thus P-Otx is a misnomer), a very divergent member of the prd-like class. They form their own family, Ptx.

  • Compilations of homeobox gene lists in different species

    Sacharamyces cerevisiae

    Encephalitozoon cuniculi (Microsporidia)

    Under construction:

    Arabidopsis thaliana (plant) homeobox gene list table and supplementary material

    C. elegans homeobox genes


    Notes and other info:

    Nomenclature of vertebrate HOX cluster genes, listed with old names and accession numbers as in SwissProt. Hox gene list.

    Some older overviews and reviews:

    From the 1995 Homeobox Workshop in Monte Verita in Ascona, Switzerland.
    The group photofrom the workshop, signed by Ed Lewis.
    Two more pictures of Ed Lewis on the evening of the Nobel Prize announcement (from left to right):
    Meera Berry and Ed Lewis, and Walter Gehring, Ed Lewis and the Manager of Monte Verita.


    Links

  • Search of PUBMED with the keyword "homeobox".
  • For structural views of homeodomain proteins, go to the RCSB PDB Protein Data Bank and search with the term "homeo*", or other terms such as e.g., "POU". You can look at structures on that Web site, or download the structures to view them in programs such as RasMol.
  • Gehring lab home page, with more pictures on homeobox genes.
  • Homeobox proteins, automatic alignments.
  • HOX-pro-Network of Vertebrate Homeobox Genes, including Promoter Regions Analysis Pages.
  • HOX Pro db, Homeobox genes database (much faster link than above)
  • CRP antibody products, have anti-homeobox antibodies.
  • Search for homeobox genes in the worm breeders gazette (WBG).
  • The Hox database


  •  

    To Bürglin lab homepage