3.4 Smoothing
Last updated
Was this helpful?
Last updated
Was this helpful?
DEF Smoothing / discounting means shaving off some probability mass from frequent events and giving it to unseen events.
Laplace smoothing / add one smoothing increase every count of words by 1. Therefore .
adjusted count is a virtual count measuring the effect of a smoothing algorithm. Normalizing adjusted count by N will give us the smoothed probability. In the case of Laplace smoothing, .
discount is defined as the ratio of the adjusted counts to the original counts: .
Add-k smoothing increase every count of words by k. Therefore .
It turns out that add-k still doesn’t work well for language modeling, generating counts with poor variances and often inappropriate discounts. ( ref: [1] )
[1] Gale, W. A. and Church, K. W. (1994). What is wrong with adding one?. In Oostdijk, N. and de Haan, P. (Eds.), Corpus-Based Research into Language, 189–198. Rodopi.