3.4 Smoothing
DEF Smoothing / discounting means shaving off some probability mass from frequent events and giving it to unseen events.
Laplace smoothing / add one smoothing increase every count of words by 1. Therefore PL(wi)=N+∣V∣count(wi)+1 .
adjusted count is a virtual count measuring the effect of a smoothing algorithm. Normalizing adjusted count by N will give us the smoothed probability. In the case of Laplace smoothing, ci∗=NPL(wi)=(ci+1)N+∣V∣N .
discount is defined as the ratio of the adjusted counts to the original counts: dc=c∗/c .
Add-k smoothing increase every count of words by k. Therefore Pk(wi)=N+k∣V∣ci+k .
It turns out that add-k still doesn’t work well for language modeling, generating counts with poor variances and often inappropriate discounts. ( ref: [1] )
Reference
[1] Gale, W. A. and Church, K. W. (1994). What is wrong with adding one?. In Oostdijk, N. and de Haan, P. (Eds.), Corpus-Based Research into Language, 189–198. Rodopi.
Last updated