oneminusp.com Computational Finance, Markets, Programming & co

13Feb/100

Entropy estimators and predictability

In previous posts I discussed the local uncertainty h_n^{(1)} and the block entropy H_n. We also saw the rapid decrease in H_n uncertainty -- this is due to sampling errors. With larger n our empirical probability estimate n_i / n gets worse because it would require more samples to "fill up the histogram", i.e. there's missing ngrams and the seen ngrams have a bad probability estimate.

There's a vast number of papers and techniques on reducing the bias and variance on entropy estimates and I decided to write a few posts about this, with the aim to find the best entropy estimators for our (local) uncertainty measure. With a suitable entropy estimator we will be able to analyse local predictabilities conditioned on larger number of previous symbols with higher significance.

The estimator we used so far is called "plug-in" or maximum likelihood estimator and is defined as

\hat{H}(X) = - \sum_X \hat{P}(x) log \hat{P}(x)

where \hat{P}(x) = n_x / n, so the number of occurrences of the word x in the whole space. It is well known that the MLE estimator is negatively biased. What does that mean?