The
Human Geneome Project
So
then, we have a “working draft” of the human genome.
Dr Ewan Birney, one of the
lead researchers at the publicly-funded European Bio-informatics
Institute (EBI) in Cambridge, UK, told the BBC that:
"The public project decided last year to accelerate
its rate of discovery to match the private project
and on 15 June we will say that we're effectively
90% done - 90% of the interesting bits."
But what constitutes ‘an interesting bit’? As Dr Birney
doesn’t mention that, we’re left guessing. Presumably
this includes protein-coding sequences, regulatory
regions and transcription factors.
Still, this
is only about 3% of the genome. So what exactly is
the rest of it? Well, a significant proportion is
just repeated DNA. Repeats come in two flavours, mini
& micro. The repeats are made up of ‘units’ that
are repeated. For example, "ATATATATATATAT"
would have a 2bp unit of "AT". Micro-repeats
are of 5-6bp. Mini-repeats are 10 or more units. There
are virtually no 7,8 or 9bp units observed. What is
the difference you might ask? Well, micro-repeats
are thought to be the result of DNA polymerase ‘slippage’.
Basically the copied DNA has a few extra repeats than
the original. After a few million years, you get a
lot of repeats. Micro-repeats are thought to be caused
by inaccurate crossing over in meiosis.
Although
repeats are considered ‘junk’, they do have their uses.
VNTRs (Variable Number Tandem Repeats) allow DNA samples
to be discerned from one another, they are the basis
of the DNA fingerprint. Generally, genes are no good
for fingerprinting. Although most genes have several ‘versions’
or alleles, most are not very common. Any changes in
genes will probably result a change in the function
of the gene product, so is almost always selected against.
To ensure that there is enough variation for fingerprinting
to be useful, there needs to be 6 or more common alleles;
this is very rare in genes, frequent in VNTRs. Analysis
of 10 or more VNTRs is usually enough for a fingerprint.
Repeats also have another interesting consequence for
sequencing. The chemical reactions used to read off
the bases are stalled by highly repetitive regions.
This means that about 10% of the genome is virtually
impossible to sequence! Chromosome 22, the first to
be sequenced does in fact have a section of about 15Mb
(3%) that is unreadable.
The situation is worse in the case of the privately
funded Celera Genomics sequencing program. Celera uses
a shotgun sequencing technique. This breaks the DNA
into short fragments & sequences them. Overlaps
between the ends of these sequences are looked for so
that they can be fitted together like a jigsaw. The
nature of this technique is that there will be gaps
where a section hasn’t been successfully sequenced.
Then add to this the problem of un-sequencable repeats.
Finally there are ‘short repeated sequences’ which afflicts
both programs. While they can be sequenced, they all
look exactly the same as each other, so we cannot pinpoint
where in the mass of repeats a segment fits.
The importance of these regions is debatable. It is
generally agreed that these regions are the least likely
to contain any genes. However, Evan Eichler at the University
of Cleveland has discovered arrays of genes at the edges
of repetitive regions. He believes that by ignoring
these regions, researches are missing important genes.
This may prove to be very important, or of little relevance;
we just don’t know yet.
So then, while there is uproar over Celera sequencing
the human genome in a matter of months and a less
than a tenth the cost of the public project, neither
will be complete. But they are also looking for different
things. Celera want to uncover as many genes of commercial
value as quickly as possible. The HGP aims to sequence
as much of the genome as possible and make that information
available to all. It is much more thorough than Celera’s
approach. This said, Celera will discover more genes
in less time; there is a place for both, although
for my money, Celera comes across as a rushed job.
|