" Is This Document Relevant? . . . Probably " : A Survey of Probabilistic Models in Information Retrieval

Crestani Fabio; Lalmas Mounia; Rijsbergen Cornelis J Van; Campbell Iain

DSpace Home
→
Ingenierías y Ciencias de la Computación
→
*Ingenierías y Ciencias de la Computación (Proyecto VLIR)
→
Documentos
→
View Item

dc.contributor.author	Crestani Fabio
dc.contributor.author	Lalmas Mounia
dc.contributor.author	Rijsbergen Cornelis J Van
dc.contributor.author	Campbell Iain
dc.date.accessioned	2018-01-16T18:50:45Z
dc.date.available	2018-01-16T18:50:45Z
dc.date.issued	1999
dc.identifier.uri	http://hdl.handle.net/123456789/6266
dc.description.abstract	This article surveys probabilistic approaches to modeling information retrieval. The basic concepts of probabilistic approaches to information retrieval are outlined and the principles and assumptions upon which the approaches are based are presented. The various models proposed in the development of IR are described, classified, and compared using a common formalism. New approaches that constitute the basis of future research are described. 1. HISTORY OF PROBABILISTIC MODELING IN IR In information retrieval (IR), probabilis-tic modeling is the use of a model that ranks documents in decreasing order of their evaluated probability of relevance to a user's information needs. Past and present research has made much use of formal theories of probability and of statistics in order to evaluate, or at least estimate, those probabilities of relevance. These attempts are to be distinguished from looser ones such as the " vector space model " [Salton 1968] in which documents are ranked according to a measure of similarity to the query. A measure of similarity cannot be directly interpretable as a probability. In addition, similarity-based models generally lack the theoretical soundness of probabilistic models. The first attempts to develop a proba-bilistic theory of retrieval were made over 30 years ago [Maron and Kuhns 1960; Miller 1971], and since then there has been a steady development of the approach. There are already several operational IR systems based upon proba-bilistic or semiprobabilistic models. One major obstacle in probabilistic or semiprobabilistic IR models is finding methods for estimating the probabilities used to evaluate the probability of relevance that are both theoretically sound and computationally efficient. The problem of estimating these probabilities is difficult to tackle unless some simplifying assumptions are made. In the early Authors' address:
dc.format	application/pdf
dc.title	" Is This Document Relevant? . . . Probably " : A Survey of Probabilistic Models in Information Retrieval
dc.type	journal-article
dc.source.volume	30
dc.source.issue	4
dc.source.journal	ACM Computing Surveys