| dc.description.abstract |
This article surveys probabilistic approaches to modeling information retrieval. The basic concepts of probabilistic approaches to information retrieval are outlined and the principles and assumptions upon which the approaches are based are presented. The various models proposed in the development of IR are described, classified, and compared using a common formalism. New approaches that constitute the basis of future research are described. 1. HISTORY OF PROBABILISTIC MODELING IN IR In information retrieval (IR), probabilis-tic modeling is the use of a model that ranks documents in decreasing order of their evaluated probability of relevance to a user's information needs. Past and present research has made much use of formal theories of probability and of statistics in order to evaluate, or at least estimate, those probabilities of relevance. These attempts are to be distinguished from looser ones such as the " vector space model " [Salton 1968] in which documents are ranked according to a measure of similarity to the query. A measure of similarity cannot be directly interpretable as a probability. In addition, similarity-based models generally lack the theoretical soundness of probabilistic models. The first attempts to develop a proba-bilistic theory of retrieval were made over 30 years ago [Maron and Kuhns 1960; Miller 1971], and since then there has been a steady development of the approach. There are already several operational IR systems based upon proba-bilistic or semiprobabilistic models. One major obstacle in probabilistic or semiprobabilistic IR models is finding methods for estimating the probabilities used to evaluate the probability of relevance that are both theoretically sound and computationally efficient. The problem of estimating these probabilities is difficult to tackle unless some simplifying assumptions are made. In the early Authors' address: |
|