文档介绍:parison of Event Models for Naive Bayes Text Classification
Andrew McCallum‡† Kamal Nigam†
mccallum@ ******@
‡
Just Research †
4616 Henry Street School puter Science
Pittsburgh, PA 15213 Carnegie Mellon University
Pittsburgh, PA 15213
Abstract learning, especially when the number of attributes is
large.
Recent approaches to text classification have used two Document classification is just such a domain with
different first-order probabilistic models for classifica- a large number of attributes. The attributes of the
tion, both of which make the naive Bayes assumption.
Some use a multi-variate Bernoulli model, that is, a examples to be classified are words, and the number
work with no dependencies between words of different words can be quite large indeed. While
and binary word features (. Larkey and Croft 1996; some simple document classification tasks can be ac-
Koller and Sahami 1997). Others use a multinomial curately performed with vocabulary sizes less than one
model, that is, a uni-gram language model with integer hundred, plex tasks on real-world data from
word counts (. Lewis and Gale 1994; Mitchell 1997). the Web, and newswire articles do best with vo-
This paper aims to clarify the confusion by describing cabulary sizes in the thousands. Naive Bayes has been
the differences and details of these two models, and by essfully