2006 Digital Symposium Collection

eMailSift: Email Classification Based on Structure and Content

Manu Aery and Sharma Chakravarthy
View Paper (PDF)

Return to Session 11: Data Representation

Abstract

In this paper we propose a novel approach that uses structure as well as the content of emails in a folder for email classification. Our approach is based on the premise that representative — common and recurring — structures/patterns can be extracted from a pre-classified email folder and the same can be used effectively for classifying incoming emails. A number of factors that influence representative structure extraction and the classification are analyzed conceptually and validated experimentally. In our approach, the notion of inexact graph match is leveraged for deriving structures that provide coverage for characterizing folder contents. Extensive experimentation validate the selection of parameters and the effectiveness of our approach for email classification.