Information Extraction from Biomedical Literature: Methodology, Evaluation and an Application

Subramaniam L Venkata; Mukherjea Sougata; Kankar Pankaj; Srivastava Biplav; Batra Vishal S; Kamesam Pasumarti V; Kothari Ravi

DSpace Home
→
Ingenierías y Ciencias de la Computación
→
*Ingenierías y Ciencias de la Computación (Proyecto VLIR)
→
Documentos
→
View Item

dc.contributor.author	Subramaniam L Venkata
dc.contributor.author	Mukherjea Sougata
dc.contributor.author	Kankar Pankaj
dc.contributor.author	Srivastava Biplav
dc.contributor.author	Batra Vishal S
dc.contributor.author	Kamesam Pasumarti V
dc.contributor.author	Kothari Ravi
dc.date.accessioned	2018-01-22T17:24:35Z
dc.date.available	2018-01-22T17:24:35Z
dc.date.issued	2003
dc.identifier.uri	http://hdl.handle.net/123456789/6929
dc.description.abstract	Journals and conference proceedings represent the dominant mechanisms of reporting new biomedical results. The un-structured nature of such publications makes it difficult to utilize data mining or automated knowledge discovery techniques. Annotation (or markup) of these unstructured documents represents the first step in making these documents machine analyzable. In this paper we first present a system called BioAnnotator for identifying and annotating biological terms in documents. BioAnnotator uses domain based dictionary look-up for recognizing known terms and a rule engine for discovering new terms. The combination and dictionary look-up and rules result in good performance (87% precision and 94% recall on the GENIA 1.1 corpus for extracting general biological terms based on an approximate matching criterion). To demonstrate the subsequent mining and knowledge discovery activities that are made feasible by BioAnnotator, we also present a system called MedSumma-rizer that uses the extracted terms to identify the common concepts in a given group of genes.
dc.format	application/pdf
dc.title	Information Extraction from Biomedical Literature: Methodology, Evaluation and an Application
dc.type	generic