Welcome to D
SIGMOD 2005
PODS 2005
SIGMOD-RECOR
CIDR 2005
CIKM 2005
COMAD 2005
CVDB 2005
DaMoN 2005
Data Enginee
DEBS05
DMSN 2005
DOLAP 2005
GIR 2005
GIS 2005
Hypertext 20
ICDE 2005
ICDM 2005
IHIS 2005
IQIS 2005
JCDL 2005
KRAS 2005
MDM 2005
MIR 2005
MobiDE 2005
P2PIR 2005
RIDE 2005
SBBD 2005
SIGIR 2005
<<< = SIGIR'05 Pap>>>
SIGIR-FORUM
SIGKDD 2005
SIGKDD-EXP
SSDBM 2005
TIME 2005
TKDE 2005
TODS 2005
VLDB 2005
VLDBJ 2005
WebDB 2005
WIDM 2005

An application of text categorization methods to gene ontology annotation


Kazuhiro Seki and Javed Mostafa

  View Paper (PDF)  

Return to Categorization and classification


Abstract

This paper describes an application of IR and text categorization methods to a highly practical problem in biomedicine, specifically, Gene Ontology (GO) annotation. GO annotation is a major activity in most model organism database projects and annotates gene functions using a controlled vocabulary. As a first step toward automatic GO annotation, we aim to assign GO domain codes given a specific gene and an article in which the gene appears, which is one of the task challenges at the TREC 2004 Genomics Track. We approached the task with careful consideration of the specialized terminology and paid special attention to dealing with various forms of gene synonyms, so as to exhaustively locate the occurrences of the target gene. We extracted the words around the gene occurrences and used them to represent the gene for GO domain code annotation. As a classifier, we adopted a variant of k-Nearest Neighbor (kNN) with supervised term weighting schemes to improve the performance, making our method among the top-performing systems in the TREC official evaluation. Moreover, it is demonstrated that our proposed framework is successfully applied to another task of the Genomics Track, showing comparable results to the best performing system.

BIBTEX


@inproceedings{1076060,
  author = {Kazuhiro Seki and Javed Mostafa},
  title = {An application of text categorization methods to gene ontology annotation},
  booktitle = {SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval},
  year = {2005},
  isbn = {1-59593-034-5},
  pages = {138--145},
  location = {Salvador, Brazil},
  doi = {http://doi.acm.org/10.1145/1076034.1076060},
  publisher = {ACM Press},
  address = {New York, NY, USA},
  
}



©2006 Association for Computing Machinery