NEW
YORK, Jan. 30 (GenomeWeb News) - When a researcher at Penn State
University offered Dan Peterson the chance to use a Genome Sequencer
20 for his pine genomics research, he jumped at the chance, and is
glad he did.
Peterson, director of the Mississippi Genome Exploration
Laboratory at Mississippi State University, said the instrument's
high-throughput capabilities have opened doors to research that he
hadn't planned to follow, including the evolution of non-coding,
repetitive sequences.
This
type of information will help scientists further define and refine
the tree of life, and could be a boon for research into evolution.
"There is a lot to be learned from this in terms of understanding
the evolution of an organism," Peterson told GenomeWeb
News.
All
the plant genomes that have been sequenced so far are flowering
plants, angiosperms. The gymnosperms, which include conifers,
diverged from angiosperms 130 million years older. "One of the
things we are trying to get at it is understanding how the genomes
of different gymnosperms have changed in comparison to angiosperms,"
Peterson said. "This is one group that has not been very well
explored and one of the major groups on earth."
Sequencing the pine genome is a difficult task: With 21
billion base pairs, it is seven times the size of the human genome.
Its mammoth length is due mostly to the large number of repetitive
DNA sequences, which do not code for genes but are essential to the
evolution of organisms. "Most eukaryotic genomes are mainly
repetitive DNA," explained Peterson. "Pine is one of the extreme
examples."
Peterson has coupled the 454 instrument with Cot analysis, a
technique pioneered in the 1960s that provides a way to separate
repetitive DNA sequences from the single- and low-copy "gene rich"
regions of the genome. If DNA is denatured by heating and then
cooled, sequences will begin to reassociate with complementary
strands.
"When that happens, the most common sequences, the ones that
are repeated, will find each other and form double stranded DNA
faster than the ones that are single copy or low copy," explained
Peterson. "The double-stranded repetitive DNA can then be separated
from the single-stranded low-copy DNA by hydroxyapatite
chromatography."
In
2002, Peterson, a postdoc at University of Georgia, and his postdoc
advisor Andrew Paterson resurrected this old lab technique and
showed how Cot analyses could be coupled with DNA cloning and high
throughput sequencing to efficiently elucidate unique sequence
information in a genome. Since then, Peterson has been using his
Cot-based cloning and sequencing technique to identify and study
gene-rich regions from large genome species.
The 454 sequencer has allowed Peterson and fellow pine
researcher John Carlson at Penn to sequence isolated repeat and
single/low copy components to relatively high coverage without
having to do prior DNA cloning. To date, Peterson and Carlson have
used the 454 instrument to sequence 28 million base pairs of random
genomic DNA, 22 Mb of highly repetitive DNA, 20Mb of moderately
repetitive DNA, and 31 Mb of single/low copy DNA.
"With the 454 sequencer, we decided to fractionate the pine
genome into highly repetitive, moderately repetitive, and a
single/low copy components, and use the 454 to sequence these
different components," he said.
According to Peterson, one of the most interesting things to
come out of the work is based on the extremely high sequencing
coverage obtained for repetitive regions. "You can use contigs
assembled from the 454 reads to study the evolution of repetitive
elements," he said. Many plant repetitive segments, for example, are
retroelements, sequences that evolved from or perhaps gave rise to
retroviruses. They have replicated over and over again," Peterson
said. "And over time, they evolve and kind of drift, so the copies
are not as similar as they were originally. You can actually
determine the age of these elements based on how much divergence you
see in them."
Sequencing repetitive sequences will also help researchers
identify genes. "If you know the repeat sequences, you can make sure
that you avoid using them in marker developments and physical
mappings as they will produce confusing uninformative results,"
Peterson explained.
Peterson's repeat evolution work is one area of research that
would have been too costly to pursue without a next-generation
sequencer. Others share the sentiment that next-generation
instruments will enable scientists to explore novel areas of
research.
Garth Ehrlich, executive director of the Center for Genomic
Sciences at Allegheny-Singer Research Institute, recently said the
Genome Sequencer 20 has enabled him to study bacterial
transformability. "It completely changes the kind of questions you
can ask," Ehrlich said.
Peterson points out that to look at the repetitive sequences,
his lab couldn't use the assembly software provided by 454. This
software is aimed at assembling sequences from bacteria, viruses,
and other prokaryotic organisms; it's not geared for assembling
sequences from eukaryote genomes that have a lot of repetitive DNA.
"What 454 has said is, 'We are not going to spend a lot of time
developing this new software to go with the instrument.' I can
understand that because there is other software out there," said
Peterson. "They have said that anyone who wants our basic software
can have it, we'll give you the code, and you can do with it what
you want."
According to Bill Spencer, director
of worldwide system sales at 454, the company is currently
developing a large genome assembler. "The software's current
capacity is limited to approximately 7 to 7.5 million 454 reads,
which limits the full genome assembly to bacterial and fungal
genomes as well as sections of larger eukaryotic genomes, such as
the assembly of BAC sequences."
Peterson said he is at the beginning of his sequencing
project, and there are many who are eager for its results. Loblolly
pine is the number one crop in the southeastern
United
States. While loblolly and related southern
pines cover six percent of US forest land, they account for 58
percent of the wood produced in the country. Scientists have a pure
scientific interest in the project, and the Department of Energy has
interest in pine as a biomass/biofuel crop
Kate O'Rourke covers the next-generation
genome-sequencing market for GenomeWeb News. E-mail her at korourke@genomeweb.com.