Moose Lab Webpage
Department of Crop Sciences, University of Illinois

 

Comparative Genomics of Cereal Gene Promoters

 

The comparison of noncoding DNA sequences is a powerful approach to identify regulatory sequences among genes that perform similar functions in different, but related species.  The underlying principle of this approach is that sequences that are important for determining conserved patterns of gene expression evolve more slowly than other noncoding sequences, due to functional constraints on the DNA sequence.  Surveys for conserved noncoding sequences (CNS) have been performed in a variety of organisms, including mammals (e.g. human-mouse comparisons), fruit flies, nematodes, yeast, bacteria, and plants in the Brassica genus that contains the model plant species Arabidopsis. CNS have been shown to be associated with sequences important for gene regulation, and may function as binding sites for transcription factors as well as determinants of chromatin organization, mRNA stability, or intron splicing.

 

We have surveyed for CNS among genes from the cultivated cereals maize, rice, sorghum, wheat, and barley.  Collectively, the grains from these cereals account for the majority of world caloric intake and are critical to global agriculture.  Each of these species belong to the grass family and the genomes of the cereals possess many of the same genes with slightly different patterns of organization.  The evolutionary relationships among the different cereal grass genomes are known, thus surveys for CNS among the grasses have utility not only in defining sequences important for gene regulation, but also reveal insights into patterns of gene evolution among the cereals.

 

 

 

 

 

 

 

 

 

   

 

 

 

 

 

   

 

 

 

 

 

 

 

 


Shown above is the output from an alignment of orthologous alcohol dehydrogenase1 (Adh1) genes from maize, sorghum, barley, and rice, using the VISTA alignment tool (http://www-gsd.lbl.gov/vista/). The pink areas indicate conserved blocks of noncoding sequence, the blue blocks are the conserved coding sequences.

 

To the right of the global alignment are the sequence alignments and relative positions within the genes of the five CNS conserved across all four species. Below the sequence alignment are the phylogenetic relationships of the four species predicted from the Adh1 CNS, which is consistent with the known evolutionary relationships among these four grasses.

 

One test for the regulatory significance of short CNS identified in maize-rice gene comparisons is to determine if known functional promoter regulatory elements appear at a higher frequency in CNS compared to the entire maize and rice promoter sequence dataset, which has been observed for comparisons of orthologous human and mouse genes (Levy et al., 2001). We performed such a test by first calculating the frequency of occurrence for all combinations of heptad sequences in two sets of sequences: the 277.2-kbp of annotated promoter sequences from the 78 maize-rice orthologous gene pairs and the entire collection of CNS (8.5-kb) identified in the VISTA comparisons of these same promoter sequences.

As part of a project funded by the United States Department of Agriculture, Hena Guo, a graduate student in my laboratory, has conducted a survey for CNS among all publicly available genomic sequences from maize, wheat, barley and sorghum genes. Included in all of these comparisons are orthologous sequences from the nearly completed rice genome. Listed below are links to both html files and a downloadable Microsoft Access relational database that lists all of the annotated genomic sequences used in our comparisons as well as summary information (number of CNS blocks per kbp of noncoding sequence and the proportion of noncoding sequence defined as CNS) from surveys for CNS among 81 sets of orthologous cereal genes. All of the CNS were defined using the VISTA tool and criteria of >70% identity in a block of at least 10-bp. If you are interested in the output obtained from any one specific gene comparison, you may repeat the comparisons at the VISTA website (http://www-gsd.lbl.gov/vista/), or we can provide you the output files upon request.

The results from our initial survey have recently been published in The Plant Cell.

Guo, H. and Moose, S.P. (2003) Conserved noncoding sequences among cultivated cereal genomes identify candidate regulatory sequence elements and patterns of promoter evolution. The Plant Cell 15: 1143-1158.

Click here to view a pdf file of the article.

Link to html file listing annotated cereal genomic sequences: http://www.cropsci.uiuc.edu/faculty/moose/Database.htm

Link to html file summarizing results from VISTA comparisons: http://www.cropsci.uiuc.edu/faculty/moose/VistaResults.htm

Link to download relational database of genomic sequences and summaries of VISTA output from here.

Link to download xls file of heptad sequence occurrences from here.

*Please use Explorer or Netscape (greater than 6.0) to open or download the above links.