4.1.3 Knowledge Combination
The HMM structure makes strong independence assumptions:
(1) that features depend only on the current
state (and in practice, as we saw, only on the
event label) and (2) that each word+event label depends
only on the last N
1 tokens. In return, we
get a computationally efficient structure that allows
information from the entire sequence W; F to inform
the posterior probabilities needed for classifi-
cation, via the forward-backward algorithm.
More problematic in practice is the integration
of multiple word-level features, such as POS tags
and chunker output. Theoretically, all tags could
simply be included in the hidden state representation
to allow joint modeling of words, tags, and
events. However, this would drastically increase the
size of the state space, making robust model estimation
with standard N-gram techniques difficult. A
method that works well in practice is linear interpolation,
whereby the conditional probability estimates
of various models are simply averaged, thus
reducing variance
ai.stanford.edu/~ang/.../nips01-discriminativegenerative....
On Discriminative vs. Generative classifiers: A comparison of logistic regression and naive Bayes. Andrew Y. Ng. Computer Science Division. University of ...
stackoverflow.com/.../what-is-the-difference-between-a-generative-and-d...
May 18, 2009 - This paper is a very popular reference on the subject of discriminative vs. generative classifiers, but it's pretty heavy going. The overall gist is ...
bayesian inference
No comments:
Post a Comment