3.4 Smoothing

DEF Smoothing / discounting means shaving off some probability mass from frequent events and giving it to unseen events.

Laplace smoothing / add one smoothing increase every count of words by 1. Therefore $P_L(w_i)=\frac{\mathrm{count}(w_i)+1}{N+|V|}$ .
- adjusted count is a virtual count measuring the effect of a smoothing algorithm. Normalizing adjusted count by N will give us the smoothed probability. In the case of Laplace smoothing, $c_i^* = NP_L(w_i) = (c_i+1)\frac{N}{N+|V|}$ .
- discount is defined as the ratio of the adjusted counts to the original counts: $d_c=c^*/c$ .
Add-k smoothing increase every count of words by k. Therefore $P_k(w_i) = \frac{c_i+k}{N+k|V|}$ .

It turns out that add-k still doesn’t work well for language modeling, generating counts with poor variances and often inappropriate discounts. ( ref: [1] )

Reference

[1] Gale, W. A. and Church, K. W. (1994). What is wrong with adding one?. In Oostdijk, N. and de Haan, P. (Eds.), Corpus-Based Research into Language, 189–198. Rodopi.

Previous3.3 Generalization and Zeros Next3.5 Kneser-Ney Smoothing

Last updated 4 years ago

Was this helpful?