The Origin of Biological Information and the Higher Taxonomic Categories
The Origin of Biological Information and the Higher Taxonomic Categories
By: Stephen C. Meyer
Proceedings of the Biological Society of Washington
September 29, 2004
On August 4th, 2004 an extensive review essay by Dr. Stephen C. Meyer, Director
of Discovery Institute’s Center for Science & Culture appeared in the Proce
edings of the Biological Society of Washington (volume 117, no. 2, pp. 213-239). The
Proceedings is a peer-reviewed biology journal published at the National Museum
of Natural History at the Smithsonian Institution in Washington D.C.
In the article, entitled ?The Origin of Biological Information and the Higher
Taxonomic Categories?, Dr. Meyer argues that no current materialistic theory
of evolution can account for the origin of the information necessary to build
novel animal forms. He proposes intelligent design as an alternative explanation
for the origin of biological information and the higher taxa.
Due to an unusual number of inquiries about the article and because the article
is presently not available on line elsewhere, Dr. Meyer, the copyright holder,
has decided to make the article available now in HTML format on this website.
(Off prints are also available from Discovery Institute by writing to Keith Pennock
at Kpennock@discovery.org).
PROCEEDINGS OF THE BIOLOGICAL SOCIETY OF WASHINGTON
117(2):213-239. 2004
The origin of biological information and the higher taxonomic categories
Stephen C. Meyer
Introduction
In a recent volume of the Vienna Series in a Theoretical Biology (2003), Gerd
B. Muller and Stuart Newman argue that what they call the ?origination of organismal
form? remains an unsolved problem. In making this claim, Muller and Newman (2003:3-10)
distinguish two distinct issues, namely, (1) the causes of form generation in the
individual organism during embryological development and (2) the causes responsible
for the production of novel organismal forms in the first place during the history
of life. To distinguish the latter case (phylogeny) from the former (ontogeny),
Muller and Newman use the term ?origination? to designate the causal processes
by which biological form first arose during the evolution of life. They insist
that ?the molecular mechanisms that bring about biological form in modern day embryos
should not be confused? with the causes responsible for the origin (or ?origination?)
of novel biological forms during the history of life (p.3). They further argue
that we know more about the causes of ontogenesis, due to advances in molecular
biology, molecular genetics and developmental biology, than we do about the
causes of phylogenesis–the ultimate origination of new biological forms during
the remote past.
In making this claim, Muller and Newman are careful to affirm that evolutionary
biology has succeeded in explaining how preexisting forms diversify under the
twin influences of natural selection and variation of genetic traits. Sophisticated
mathematically-based models of population genetics have proven adequate for mapping and understanding
quantitative variability and populational changes in organisms. Yet Muller and
Newman insist that population genetics, and thus evolutionary biology, has not
identified a specifically causal explanation for the origin of true morphological novelty
during the history of life. Central to their concern is what they see as the
inadequacy of the variation of genetic traits as a source of new form and structure.
They note, following Darwin himself, that the sources of new form and structure
must precede the action of natural selection (2003:3)–that selection must act
on what already exists. Yet, in their view, the ?genocentricity? and ?incrementalism?
of the neo-Darwinian mechanism has meant that an adequate source of new form and structure
has yet to be identified by theoretical biologists. Instead, Muller and Newman
see the need to identify epigenetic sources of morphological innovation during
the evolution of life. In the meantime, however, they insist neo-Darwinism lacks
any ?theory of the generative? (p. 7).
As it happens, Muller and Newman are not alone in this judgment. In the last
decade or so a host of scientific essays and books have questioned the efficacy
of selection and mutation as a mechanism for generating morphological novelty,
as even a brief literature survey will establish. Thomson (1992:107) expressed
doubt that large-scale morphological changes could accumulate via minor phenotypic
changes at the population genetic level. Miklos (1993:29) argued that neo-Darwinism
fails to provide a mechanism that can produce large-scale innovations in form and complexity.
Gilbert et al. (1996) attempted to develop a new theory of evolutionary mechanisms
to supplement classical neo-Darwinism, which, they argued, could not adequately
explain macroevolution. As they put it in a memorable summary of the situation:
?starting in the 1970s, many biologists began questioning its (neo-Darwinism’s)
adequacy in explaining evolution. Genetics might be adequate for explaining
microevolution, but microevolutionary changes in gene frequency were not seen as able
to turn a reptile into a mammal or to convert a fish into an amphibian. Microevolution
looks at adaptations that concern the survival of the fittest, not the arrival
of the fittest. As Goodwin (1995) points out, ‘the origin of species–Darwin’s problem–
remains unsolved’? (p. 361). Though Gilbert et al. (1996) attempted to solve
the problem of the origin of form by proposing a greater role for developmental
genetics within an otherwise neo-Darwinian framework,1 numerous recent authors have continued
to raise questions about the adequacy of that framework itself or about the
problem of the origination of form generally (Webster & Goodwin 1996; Shubin
& Marshall 2000; Erwin 2000; Conway Morris 2000, 2003b; Carroll 2000; Wagner 2001;
Becker & Lonnig 2001; Stadler et al. 2001; Lonnig & Saedler 2002; Wagner & Stadler
2003; Valentine 2004:189-194).
What lies behind this skepticism? Is it warranted? Is a new and specifically
causal theory needed to explain the origination of biological form?
This review will address these questions. It will do so by analyzing the problem
of the origination of organismal form (and the corresponding emergence of higher
taxa) from a particular theoretical standpoint. Specifically, it will treat
the problem of the origination of the higher taxonomic groups as a manifestation
of a deeper problem, namely, the problem of the origin of the information (whether
genetic or epigenetic) that, as it will be argued, is necessary to generate
morphological novelty.
In order to perform this analysis, and to make it relevant and tractable to
systematists and paleontologists, this paper will examine a paradigmatic example
of the origin of biological form and information during the history of life:
the Cambrian explosion. During the Cambrian, many novel animal forms and body plans
(representing new phyla, subphyla and classes) arose in a geologically brief
period of time. The following information-based analysis of the Cambrian explosion
will support the claim of recent authors such as Muller and Newman that the mechanism
of selection and genetic mutation does not constitute an adequate causal explanation
of the origination of biological form in the higher taxonomic groups. It will
also suggest the need to explore o
ther possible causal factors for the origin of
form and information during the evolution of life and will examine some other
possibilities that have been proposed.
The Cambrian Explosion
The ?Cambrian explosion? refers to the geologically sudden appearance of many
new animal body plans about 530 million years ago. At this time, at least nineteen,
and perhaps as many as thirty-five phyla of forty total (Meyer et al. 2003),
made their first appearance on earth within a narrow five- to ten-million-year
window of geologic time (Bowring et al. 1993, 1998a:1, 1998b:40; Kerr 1993;
Monastersky 1993; Aris-Brosou & Yang 2003). Many new subphyla, between 32 and
48 of 56 total (Meyer et al. 2003), and classes of animals also arose at this
time with representatives of these new higher taxa manifesting significant morphological
innovations. The Cambrian explosion thus marked a major episode of morphogenesis
in which many new and disparate organismal forms arose in a geologically brief
period of time.
To say that the fauna of the Cambrian period appeared in a geologically sudden
manner also implies the absence of clear transitional intermediate forms connecting
Cambrian animals with simpler pre-Cambrian forms. And, indeed, in almost all
cases, the Cambrian animals have no clear morphological antecedents in earlier Vendian
or Precambrian fauna (Miklos 1993, Erwin et al. 1997:132, Steiner & Reitner
2001, Conway Morris 2003b:510, Valentine et al. 2003:519-520). Further, several
recent discoveries and analyses suggest that these morphological gaps may not be
merely an artifact of incomplete sampling of the fossil record (Foote 1997,
Foote et al. 1999, Benton & Ayala 2003, Meyer et al. 2003), suggesting that
the fossil record is at least approximately reliable (Conway Morris 2003b:505).
As a result, debate now exists about the extent to which this pattern of evidence
comports with a strictly monophyletic view of evolution (Conway Morris 1998a,
2003a, 2003b:510; Willmer 1990, 2003). Further, among those who accept a monophyletic
view of the history of life, debate exists about whether to privilege fossil
or molecular data and analyses. Those who think the fossil data provide a more
reliable picture of the origin of the Metazoan tend to think these animals arose
relatively quickly–that the Cambrian explosion had a ?short fuse.? (ConwayMorris 2003b:505-
506, Valentine & Jablonski 2003). Some (Wray et al. 1996), but not all (Ayala
et al. 1998), who think that molecular phylogenies establish reliable divergence
times from pre-Cambrian ancestors think that the Cambrian animals evolved over a
?very long period of time–that the Cambrian explosion had a ?long fuse.? This
review will not address these questions of historical pattern. Instead, it will
analyze whether the neo-Darwinian process of mutation and selection, or other
processes of evolutionary change, can generate the form and information necessary
to produce the animals that arise in the Cambrian. This analysis will, for the
most part, 2 therefore, not depend upon assumptions of either a long or short
fuse for the Cambrian explosion, or upon a monophyletic or polyphyletic view
of the early history of life.
Defining Biological Form and Information
Form, like life itself, is easy to recognize but often hard to define precisely.
Yet, a reasonable working definition of form will suffice for our present purposes.
Form can be defined as the four-dimensional topological relations of anatomical
parts. This means that one can understand form as a unified arrangement of body
parts or material components in a distinct shape or pattern (topology)–one
that exists in three spatial dimensions and which arises in time during ontogeny.
Insofar as any particular biological form constitutes something like a distinct
arrangement of constituent body parts, form can be seen as arising from constraints
that limit the possible arrangements of matter. Specifically, organismal form
arises (both in phylogeny and ontogeny) as possible arrangements of material
parts are constrained to establish a specific or particular arrangement with
an identifiable three dimensional topography–one that we would recognize as
a particular protein, cell type, organ, body plan or organism. A particular ?form,?
therefore, represents a highly specific and constrained arrangement of material
components (among a much larger set of possible arrangements).
Understanding form in this way suggests a connection to the notion of information
in its most theoretically general sense. When Shannon(1948) first developed
a mathematical theory of information he equated the amount of information transmitted
with the amount of uncertainty reduced or eliminated in a series of symbols
or characters. Information, in Shannon’s theory, is thus imparted as some options
are excluded and others are actualized. The greater the number of options excluded,
the greater the amount of information conveyed. Further, constraining a set
of possible material arrangements by whatever process or means involves excluding
some options and actualizing others. Thus, to constrain a set of possible material
states is to generate information in Shannon’s sense. It follows that the constraints
that produce biological form also imparted information. Or conversely, one might
say that producing organismal form by definition requires the generation of
information.
In classical Shannon information theory, the amount of information in a system
is also inversely related to the probability of the arrangement of constituents
in a system or the characters along a communication channel (Shannon1948). The
more improbable (or complex) the arrangement, the more Shannoninformation, or
information-carrying capacity, a string or system possesses.
Since the 1960s, mathematical biologists have realized that Shannon’s theory
could be applied to the analysis of DNA and proteins to measure the information-carrying
capacity of these macromolecules. Since DNA contains the assembly instructions
for building proteins, the information-processing system in the cell represents a kind
of communication channel (Yockey 1992:110). Further, DNA conveys information
via specifically arranged sequences of nucleotide bases. Since each of the four
bases has a roughly equal chance of occurring at each site along the spine of
the DNA molecule, biologists can calculate the probability, and thus the information-carrying
capacity, of any particular sequence n bases long.
The ease with which information theory applies to molecular biology has created
confusion about the type of information that DNA and proteins possess. Sequences
of nucleotide bases in DNA, or amino acids in a protein, are highly improbable
and thus have large information-carrying capacities. But, like meaningful sentences
or lines of computer code, genes and proteins are also specified with respect
to function. Just as the meaning of a sentence depends upon the specific arrangement
of the letters in a sentence, so too does the function of a gene sequence depend
upon the specific arrangement of the nucleotide bases in a gene. Thus, molecular
?biologists beginning with Crick equated information not only with complexity
but also with ?specificity,? where ?specificity? or ?specified? has meant ?necessary
to function? (Crick 1958:144, 153; Sarkar, 1996:191).3 Molecular biologists such as Monod
and Crick understood biological information–the information stored in DN
A and
proteins–as something more than mere complexity (or improbability). Their notion
of information associated both biochemical contingency and combinatorial complexity
with DNA sequences (allowing DNA’s carrying capacity to be calculated), but
it also affirmed that sequences of nucleotides and amino acids in functioning
macromolecules possessed a high degree of specificity relative to the maintenance
of cellular function.
The ease with which information theory applies to molecular biology has also
created confusion about the location of information in organisms. Perhaps because
the information carrying capacity of the gene could be so easily measured, it
has been easy to treat DNA, RNA and proteins as the sole repositories of biological
information. Neo-Darwinists in particular have assumed that the origination
of biological form could be explained by recourse to processes of genetic variation
and mutation alone (Levinton 1988:485). Yet if one understands organismal form
as resulting from constraints on the possible arrangements of matter at many
levels in the biological hierarchy–from genes and proteins to cell types and
tissues to organs and body plans–then clearly biological organisms exhibit many
levels of information-rich structure.
Thus, we can pose a question, not only about the origin of genetic information,
but also about the origin of the information necessary to generate form and
structure at levels higher than that present in individual proteins. We must
also ask about the origin of the ?specified complexity,? as opposed to mere complexity,
that characterizes the new genes, proteins, cell types and body plans that arose
in the Cambrian explosion. Dembski (2002) has used the term ?complex specified
information? (CSI) as a synonym for ?specified complexity? to help distinguish
functional biological information from mere Shannoninformation–that is, specified
complexity from mere complexity. This review will use this term as well.
The Cambrian Information Explosion
The Cambrian explosion represents a remarkable jump in the specified complexity
or ?complex specified information? (CSI) of the biological world. For over three
billions years, the biological realm included little more than bacteria and
algae (Brocks et al. 1999). Then, beginning about 570-565 million years ago (mya),
the first complex multicellular organisms appeared in the rock strata, including
sponges, cnidarians, and the peculiar Ediacaran biota (Grotzinger et al. 1995).
Forty million years later, the Cambrian explosion occurred (Bowring et al. 1993).
The emergence of the Ediacaran biota (570 mya), and then to a much greater extent
the Cambrian explosion (530 mya), represented steep climbs up the biological
complexity gradient.
One way to estimate the amount of new CSI that appeared with the Cambrian animals
is to count the number of new cell types that emerged with them (Valentine 1995:91-93).
Studies of modern animals suggest that the sponges that appeared in the late
Precambrian, for example, would have required five cell types, whereas the more complex
animals that appeared in the Cambrian (e.g., arthropods) would have required
fifty or more cell types. Functionally more complex animals require more cell
types to perform their more diverse functions. New cell types require many new
and specialized proteins. New proteins, in turn, require new genetic information.
Thus an increase in the number of cell types implies (at a minimum) a considerable
increase in the amount of specified genetic information. Molecular biologists have
recently estimated that a minimally complex single-celled organism would require
between 318 and 562 kilobase pairs of DNA to produce the proteins necessary
to maintain life (Koonin 2000). More complex single cells might require upward
of a million base pairs. Yet to build the proteins necessary to sustain a complex
arthropod such as a trilobite would require orders of magnitude more coding
instructions. The genome size of a modern arthropod, the fruitfly Drosophila melanogaster, is appro
ximately 180 million base pairs (Gerhart & Kirschner 1997:121, Adams et al.
2000). Transitions from a single cell to colonies of cells to complex animals
represent significant (and, in principle, measurable) increases in CSI.
Building a new animal from a single-celled organism requires a vast amount of
new genetic information. It also requires a way of arranging gene products–proteins–into
higher levels of organization. New proteins are required to service new cell
types. But new proteins must be organized into new systems within the cell; new
cell types must be organized into new tissues, organs, and body parts. These,
in turn, must be organized to form body plans. New animals, therefore, embody
hierarchically organized systems of lower-level parts within a functional whole.
Such hierarchical organization itself represents a type of information, since
body plans comprise both highly improbable and functionally specified arrangements
of lower-level parts. The specified complexity of new body plans requires explanation
in any account of the Cambrian explosion.
Can neo-Darwinism explain the discontinuous increase in CSI that appears in
the Cambrian explosion–either in the form of new genetic information or in
the form of hierarchically organized systems of parts? We will now examine the
two parts of this question.
Novel Genes and Proteins
Many scientists and mathematicians have questioned the ability of mutation and
selection to generate information in the form of novel genes and proteins. Such
skepticism often derives from consideration of the extreme improbability (and
specificity) of functional genes and proteins.
A typical gene contains over one thousand precisely arranged bases. For any
specific arrangement of four nucleotide bases of length n, there is a corresponding
number of possible arrangements of bases, 4n. For any protein, there are 20n p
ossible arrangements of protein-forming amino acids. A gene 999 bases in length
represents one of 4999 possible nucleotide sequences; a protein of 333 amino
acids is one of 20333 possibilities.
Since the 1960s, some biologists have thought functional proteins to be rare
among the set of possible amino acid sequences. Some have used an analogy with
human language to illustrate why this should be the case. Denton(1986, 309-311),
for example, has shown that meaningful words and sentences are extremely rare
among the set of possible combinations of English letters, especially as sequence
length grows. (The ratio of meaningful 12-letter words to 12-letter sequences is 1/1014, th
e ratio of 100-letter sentences to possible 100-letter strings is 1/10100.)
Further, Dentonshows that most meaningful sentences are highly isolated from one another
in the space of possible combinations, so that random substitutions of letters
will, after a very few changes, inevitably degrade meaning. Apart from a few
closely clustered sentences accessible by random substitution, the overwhelming
majority of meaningful sentences lie, probabilistically speaking, beyond the reach
of random search.
Denton (1986:301-324) and others have argued that similar constraints apply
to genes and proteins. They have questioned whether an undirected search via
mutation and selection would have a reasonable chance of locating new islands
of function–representing fundamentally new genes or proteins–within the time available
(Eden
1967, Shutzenberger 1967, Lovtrup 1979). Some have also argued that alterations
in sequencing would likely result in loss of protein function before fundamentally
new function could arise (Eden 1967, Denton 1986). Nevertheless, neither the extent
to which genes and proteins are sensitive to functional loss as a result of
sequence change, nor the extent to which functional proteins are isolated within
sequence space, has been fully known.
Recently, experiments in molecular biology have shed light on these questions.
A variety of mutagenesis techniques have shown that proteins (and thus the genes
that produce them) are indeed highly specified relative to biological function
(Bowie & Sauer 1989, Reidhaar-Olson & Sauer 1990, Taylor et al. 2001). Mutagenesis
research tests the sensitivity of proteins (and, by implication, DNA) to functional
loss as a result of alterations in sequencing. Studies of proteins have long
shown that amino acid residues at many active positions cannot vary without functional
loss (Perutz & Lehmann 1968). More recent protein studies (often using mutagenesis
experiments) have shown that functional requirements place significant constraints
on sequencing even at non-active site positions (Bowie & Sauer 1989, Reidhaar-Olson
& Sauer 1990, Chothia et al. 1998, Axe 2000, Taylor et al. 2001). In particular,
Axe (2000) has shown that multiple as opposed to single position amino acid
substitutions inevitably result in loss of protein function, even when these
changes occur at sites that allow variation when altered in isolation. Cumulatively,
these constraints imply that proteins are highly sensitive to functional loss
as a result of alterations in sequencing, and that functional proteins represent highly
isolated and improbable arrangements of amino acids -arrangements that are far
more improbable, in fact, than would be likely to arise by chance alone in the
time available (Reidhaar-Olson & Sauer 1990; Behe 1992; Kauffman 1995:44; Dembski
1998:175-223; Axe 2000, 2004). (See below the discussion of the neutral theory
of evolution for a precise quantitative assessment.)
Of course, neo-Darwinists do not envision a completely random search through
the set of all possible nucleotide sequences–so-called ?sequence space.? They
envision natural selection acting to preserve small advantageous variations
in genetic sequences and their corresponding protein products. Dawkins (1996),
for example, likens an organism to a high mountain peak. He compares climbing
the sheer precipice up the front side of the mountain to building a new organism
by chance. He acknowledges that his approach up ?MountImprobable? will not succeed.
Nevertheless, he suggests that there is a gradual slope up the backside of the
mountain that could be climbed in small incremental steps. In his analogy, the
backside climb up ?MountImprobable? corresponds to the process of natural selection
acting on random changes in the genetic text. What chance alone cannot accomplish
blindly or in one leap, selection (acting on mutations) can accomplish through
the cumulative effect of many slight successive steps.
Yet the extreme specificity and complexity of proteins presents a difficulty,
not only for the chance origin of specified biological information (i.e., for
random mutations acting alone), but also for selection and mutation acting in
concert. Indeed, mutagenesis experiments cast doubt on each of the two scenarios
by which neo-Darwinists envisioned new information arising from the mutation/selection
mechanism (for review, see Lonnig 2001). For neo-Darwinism, new functional genes
either arise from non-coding sections in the genome or from preexisting genes. Both scenarios
are problematic.
In the first scenario, neo-Darwinists envision new genetic information arising
from those sections of the genetic text that can presumably vary freely without
consequence to the organism. According to this scenario, non-coding sections
of the genome, or duplicated sections of coding regions, can experience a protracted
period of ?neutral evolution? (Kimura 1983) during which alterations in nucleotide
sequences have no discernible effect on the function of the organism. Eventually,
however, a new gene sequence will arise that can code for a novel protein. At
that point, natural selection can favor the new gene and its functional protein
product, thus securing the preservation and heritability of both.
This scenario has the advantage of allowing the genome to vary through many
generations, as mutations ?search? the space of possible base sequences. The
scenario has an overriding problem, however: the size of the combinatorial space
(i.e., the number of possible amino acid sequences) and the extreme rarity and
isolation of the functional sequences within that space of possibilities. Since
natural selection can do nothing to help generate new functional sequences,
but rather can only preserve such sequences once they have arisen, chance alone–random
variation–must do the work of information generation–that is, of finding the
exceedingly rare functional sequences within the set of combinatorial possibilities.
Yet the probability of randomly assembling (or ?finding,? in the previous sense)
a functional sequence is extremely small.
Cassette mutagenesis experiments performed during the early 1990s suggest that
the probability of attaining (at random) the correct sequencing for a short
protein 100 amino acids long is about 1 in 1065