On Jan 9, 2010, at 12:18 PM, Grant Ingersoll wrote:
For text, you can actually compute perplexity which measures how well
cluster membership predicts what words are used. This is nice because you
don't have to worry about the entropy of real valued numbers.
Do you have a good ref. on perplexity and/or some R code (or other)?
In looking a little more at this (via http://en.wikipedia.org/wiki/Perplexity), it seems we may already have most of this, given o.a.m.math.stats.LogLikelihood has the entropy calculation and this
is just b^entropy() right? Or am I misreading?
-Grant