Entropy estimators and predictability
In previous posts I discussed the local uncertainty and the block entropy
. We also saw the rapid decrease in
uncertainty -- this is due to sampling errors. With larger n our empirical probability estimate
gets worse because it would require more samples to "fill up the histogram", i.e. there's missing ngrams and the seen ngrams have a bad probability estimate.
There's a vast number of papers and techniques on reducing the bias and variance on entropy estimates and I decided to write a few posts about this, with the aim to find the best entropy estimators for our (local) uncertainty measure. With a suitable entropy estimator we will be able to analyse local predictabilities conditioned on larger number of previous symbols with higher significance.
The estimator we used so far is called "plug-in" or maximum likelihood estimator and is defined as
where , so the number of occurrences of the word x in the whole space. It is well known that the MLE estimator is negatively biased. What does that mean?
Local order and predictability – Implementation
Part 1 discussed a paper on local order and predictability of time series. I will now describe the implementation of the described functions in R.
First we assume that already have our real returns data partitioned into symbols so
is 3. Thus our time series is just a vector of values 0 1 2.
Next, all our functions will consider trajectories of that original vector. I will implement this as a sliding window of length n. So if our sequence is 012020120 the function slide will create the array 012, 120, 202, 020, 201, 012, 120 out of it.
slide <- function(seq,windowsize) {
steps <- length(seq)-windowsize
start <- 1
stop <- windowsize
accu <- array(0,dim=c(steps,windowsize))
for(i in 1:(steps)) {
#print(seq[start:stop])
accu[i,] <- seq[start:stop]
start <- start+1
stop <- start+windowsize-1
}
return(accu)
}
Local order and predictability of financial time series
In this series of posts I will discuss an implementation and tests of the paper Local order, entropy and predictability of financial time series by L. Molgedey and W. Ebeling. (pdf)
The paper presents an excellent application of information theory to time series analysis. The idea is simple: is it possible to find sub-trajectories in financial time series (here the daily returns of some indices or stock) where a "local order" exists with higher than average predictability.
I won't explain the paper in full, so please have a look at the pdf above for notation and details. However I will describe the most important concepts below. We consider one-dimensional, discretely partitioned time series. The authors use Shannon entropy H as basic tool to measure uncertainty or predictability of the probability distribution described by the time series. For a certain trajectory of length n the uncertainty of predicting the next state is the difference in Shannon entropies for trajectories of length n+1 and n:
Information Theory and Financial Markets
I would like discuss and implement ideas from papers applying information theoretical (IT) notions to trading in financial markets. I will provide links to all papers I'll read on this topic and describe certain concepts in more detail.
The current list is:
Untertainty analysis in financial markets: can entropy be a solution?, Andreia Dionísio, Rui Menezes and Diana A. Mendes (pdf)
Forecasting Foreign Exchange Market Movements via Entropy Coding, Arman Glodjo, Campbell R. Harvey (pdf)
Local order, entropy and predictability of financial time series, L. Molgedey and W. Ebeling (pdf)
These three papers all use Shannon Entropy in place of more traditional statistical measures. What is interesting however is that all of them apply entropy in different ways.
The first paper by Dionisio compares entropy as measure of uncertainty with variance/standard deviation in portfolio management.
The second paper by Glodjo applies techniques from coding theory (the original and most successful application of IT) to forecasting high frequency time series. Also it provides good arguments for using IT in finance.
The last paper by Molgedey is using conditional entropy directly on returns time series to quantify "local order" in highly stochastic time series. A local order would be a point in time where the next step is more predictable than average.
I will have a look at some of those techniques in more detail and might implement some of it to see if I can replicate the authors results.