A CMU thesis proposal. I am sure machine readable documentation is, or will
be, available in case anyone is interested.
Frank
---
Frank C. Wimberly????? 140 Calle Ojo Feliz????? Santa Fe, NM 87505
Phone:???505 995-8715 or 505 670-9918 (cell)
[hidden email] or
[hidden email]
?
-----Original Message-----
From: Diane Stidle [mailto:
[hidden email]]
Sent: Friday, October 29, 2004 1:19 PM
To:
[hidden email]
Cc:
[hidden email];
[hidden email];
[hidden email]
Subject: Thesis Proposal - Tools for Large Graph Mining
Thesis Proposal - Deepayan Chakrabarti
Date: 11/12/04
Time: 1:30pm
Place: 4623 Wean Hall
Thesis Committee: Guy Blelloch, Christos Faloutsos, Chris Olston, Jon
Kleinberg
Title: Tools for Large Graph Mining
Abstract:
Graphs show up in a surprisingly diverse set of disciplines, ranging from
computer networks to sociology, biology, ecology and many more.? How do
such "normal" graphs look like? How can we spot abnormal subgraphs
within them?? Which nodes/edges are "suspicious?" How does a virus
spread over a graph?? Answering these questions is vital for outlier
detection (such as terrorist cells, money laundering rings), forecasting,
simulations (how well will a new protocol work on a realistic computer
network?), immunization campaigns and many other applications.
We attempt to answer these questions in two parts.? First, we investigate
recurring patterns in real-world graphs, to gain a deeper understanding of
their structure. This leads to the development of the R-MAT model of graph
generation for creating synthetic but "realistic" graphs, which match
many of the patterns found in real-world graphs, including power-law and
lognormal degree distributions, small diameter and "community" effects.
The second part of our work is targeted at applications: what
patterns/properties of a graph are important for solving specific
problems? Here, we investigate the propagation behavior of a computer
virus over a network, and find a simple formula for the epidemic threshold
(beyond which any viral outbreak might become an epidemic). We also
develop a scalable, parameter-free method for finding groups of
"similar" nodes in a graph, corresponding to homogeneous regions (or
CrossAssociations) in the binary adjacency matrix of the graph. This can
help navigate the structure of the graph, and find un-obvious patterns.
Our proposed work involves extending these ideas, and applying these
principles to other areas.