Event Details

Massive Graph Mining

Presenter: Dr. James Abello - Senior Member of Technical Staff, AT&T Labs and Rutgers University
Supervisor: Dr. Nigel Horspool, Chair, Department of Computer Science

Date: Thu, July 4, 2002
Time: 13:30:00 - 14:30:00
Place: Engineering Office Wing Building (EOW), Room # 430

ABSTRACT

ABSTRACT:

A variety of massive data sets exhibit an underlying structure that can be modeled as dynamic weighted multi-digraphs. Their sizes range from tens of gigabytes to petabytes. These include the World Wide Web, Internet Traffic and Telephone Call Detail. The sheer volume of these data sets brings with it a series of computational and visualization challenges due mainly to the I/O and screen bottlenecks.

We present external memory algorithms for connectivity, minimum spanning trees and maximal matchings together with heuristics for quasi-clique finding and diameter computations. From the visualization store we describe an external memory hierarchy that allow us to use computer graphics techniques, like dynamic visibility, to provide the user with navigation control. This hierarchical decomposition lends itself to a unified treatment of the I/O and screen bottlenecks.

We will discuss our results with graphs having on the order of 200 million vertices and several billion edges and we will point out some mathematical problems that have surfaced along the way.

The overall goal is to extract useful information that can be brought into a user's palm top, and to export the techniques to other mining domains.

NOTE: Dr. James Abello is a candidate for a Canada Research Chair