Login  Register

Re: Callling all cladisticists

Posted by Marcus G. Daniels on Jan 04, 2009; 4:25am
URL: http://friam.383.s1.nabble.com/Callling-all-cladisticists-tp2106906p2108081.html

Nicholas Thompson wrote:
>
> But what then about cladistics.  Cladistics is a dark art of
> classification that uses a variety of obscure incantations to lable
> relations amongst species without, so far as I understand, any
> reference to evolution.  Yet, as I understand it, cladistics is not
> arbitrary.
In both cases it boils down to selecting a set of features and assigning
them to a set of character states.  With DNA, the job is done because
the character states are A G C or T in long strings.   But can also
consider an encoding like C=has claws, !C does not have claws, L=has
lungs, !L has no lungs, V=has vertebrae, !V not vertebrae, F=fur, !F no
fur, and so on.    To make a taxonomy, similarity techniques like
neighbor-joining or distance methods are often used.   To go to the next
step and consider an evolutionary model, then things get complex fast
because, for example, it is necessary to be able to say how a critter
goes from having no hair to having it, or develops lungs and the
relative impotance of those things.    On the other hand, it is not
nearly so hard if the transition you want to describe is one of an
adenine changing to guanine, which is chemistry.

I think a high-level description of conceptual model features (like
those Joshua suggested) as character states would work for making
similarity trees without an evolutionary model behind them.   The main
work there is deciding on the features.  

And on the other extreme, one could probably come up with some very
crude evolutionary model for local change of machine code based on
context and knowledge of common programming idioms and/or the source
language and compiler.  Even if you had that, though, one thing that is
assumed by most phylogenetics programs is a multiple alignment.  That
is, for any code fragment found anywhere in a  given program, the same
fragment can be found in any another aligned down to the opcode.   Then
there's the small matter that horizontal gene transfer happens all the
time in software as 3rd party libraries get pulled in and dropped and
software factoring is going on.   In principle, I bet with sufficient
effort one could probably recover the revision history of some large
project like GCC from various binaries of different ages.   But better
just to go the revision system and look at the history directly.  With
GCC it goes back 20 years or something.

Marcus

============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org