Holiday reading anyone? Info theory boosts clustering

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Holiday reading anyone? Info theory boosts clustering

Tom Johnson
 ======================
 Info theory boosts clustering
 ======================
 The emerging field of clustering aims to help scientists analyse
 mountains of data like genome sequencing, astronomical observations and
 market behaviour by automatically grouping like pieces of data.

 Princeton University researchers have taken a fresh approach to the
 clustering problem using information theory to generalise the process,
 which removes the need to define ahead of time what makes pieces of data
 similar to each other.

 The method determines how much information each piece of data has in
 common regardless of the nature of the information, and it boils down to
 finding the best trade-off between maximising the apparent relatedness
 of pieces of data while minimising the number of bits needed to describe
 the data. The method can be used with any kind of data and performs
 better than previous clustering algorithms, the researchers say.

See http://www.pnas.org/cgi/content/abstract/0507432102v1
and
Technology Research Magazine - December 20, 2005
 http://unu-merit.nl/i&tweekly/ref.php?nid=2456

Unfortunately, only subscribers can see the full text of the article.  So
far. But here's a link to the lead author's homepage, which has much of the
same material.
http://www.princeton.edu/~nslonim/ <http://www.princeton.edu/%7Enslonim/>
  And this is helpfuo, too:
http://www.genomics.princeton.edu/biophysics-theory/Clustering/web-content/index.html

A paper of related interest is:

Estimating mutual information and multi?information in large networks
Noam Slonim, Gurinder S. Atwal, Gavsper Tkavcik, and William Bialek
Joseph Henry Laboratories of Physics, and Lewis?Sigler
Institute for Integrative Genomics
Princeton University, Princeton, New Jersey 08544
{nslonim,gatwal,gtkacik,wbialek}@princeton.edu

Abstract
We address the practical problems of estimating the information
relations that characterize large networks. Building on methods
developed for analysis of the neural code, we show that reliable
estimates of mutual information can be obtained with manage-
able computational effort. The same methods allow estimation of
higher order, multi?information terms. These ideas are illustrated
by analyses of gene expression, financial markets, and consumer
preferences. In each case, information theoretic measures correlate
with independent, intuitive measures of the underlying structures
in the system.

This is found at
http://www.genomics.princeton.edu/biophysics-theory/DirectMI/0502017.pdf

-tj
--
==============================================
J. T. Johnson
Institute for Analytic Journalism
www.analyticjournalism.com
505.577.6482(c)                                 505.473.9646(h)
http://www.jtjohnson.com               tom at jtjohnson.com

"You never change things by fighting the existing reality.
To change something, build a new model that makes the
existing model obsolete."
                                                   -- Buckminster Fuller
==============================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://redfish.com/pipermail/friam_redfish.com/attachments/20051223/96f17b2e/attachment-0001.htm

Reply | Threaded
Open this post in threaded view
|

Holiday reading anyone? Info theory boosts clustering

Roger Critchlow-2
On 12/23/05, J T Johnson <tom at jtjohnson.com> wrote:

>
> Unfortunately, only subscribers can see the full text of the article.  So
> far. But here's a link to the lead author's homepage, which has much of the
> same material.
> http://www.princeton.edu/~nslonim/
>   And this is helpfuo, too:
> http://www.genomics.princeton.edu/biophysics-theory/Clustering/web-content/index.html

The [Download the paper here] link on this page appears to be the
arXiv preprint of the PNAS paper.   There's a gpl'ed matlab
implementation of the algorithm in the left sidebar if you'd rather
try first and RTFM later.

It reads very nicely, but most clustering algorithms seem reasonable.
You really need to take it for a spin and see if the results make
sense.

-- rec --