Local order and predictability – Implementation
Part 1 discussed a paper on local order and predictability of time series. I will now describe the implementation of the described functions in R.
First we assume that already have our real returns data partitioned into symbols so
is 3. Thus our time series is just a vector of values 0 1 2.
Next, all our functions will consider trajectories of that original vector. I will implement this as a sliding window of length n. So if our sequence is 012020120 the function slide will create the array 012, 120, 202, 020, 201, 012, 120 out of it.
slide <- function(seq,windowsize) {
steps <- length(seq)-windowsize
start <- 1
stop <- windowsize
accu <- array(0,dim=c(steps,windowsize))
for(i in 1:(steps)) {
#print(seq[start:stop])
accu[i,] <- seq[start:stop]
start <- start+1
stop <- start+windowsize-1
}
return(accu)
}
The probability of a trajectory of length n is . This is simply modeled by the function prob_n by the empirical distribution of the number of times the ngram is found, divided by the total number of windows
prob_n <- function(seq, ngram) {
l <- dim(seq)
count <- 0
for(j in 1:l[1]) {
if(identical(seq[j,], ngram)) { count<-count+1 }
}
return(count/l[1])
}
The actual entropy is the average uncertainty in an ngram given the complete sequence. So if a specific ngram has probability
then its weighted self-information is
. This quantity is added up for all possible ngram sequences. This is implemented in the following function
H_n <- function(seq,n,lambda) {
nseq <- slide(seq,n)
# get the unique ngrams
uniques <- unique(nseq)
dim_u <- dim(uniques)
accu <- 0
# for every unique ngram, get its prob_n and add to its entropy
for(i in 1:dim_u[1]) {
p <- prob_n(nseq, uniques[i,])
#accu <- accu + p
accu <- accu + (-p *log(p)/log(lambda))
}
return(accu)
}
The average uncertainty of predicting the next symbol , given that you've already seen n previous ones is just the difference of as follows
h_n <- function(seq,n,lambda) {
return(H_n(seq,n+1,lambda) - H_n(seq,n,lambda))
}
The more interesting bit is how to calculate the conditional probabilities in ,
. Say the ngram we are looking at is the sequence 0120,
is the uncertainty in predicting the next symbol after observing 0120. This is implemented in a way where we only consider the sequences of length 5 with an unknown fifth symbol 0120x. The code h_n_cond finds the empirical probability distribution for x in the restricted sample space of 0120x. Once the probability distribution is accumulated in the variable probdist, we simply calculate the
based entropy of it to arrive at
.
h_n_cond <- function(seq, ngram, lambda) {
nextcounts <- array(data=0, dim=c(lambda,1))
l <- dim(seq)
skip <- length(ngram)
#count <- 0
for(j in 1:l[1]) {
if(identical(seq[j,], ngram)) {
#count <- count + 1
# keep count for each possible next step element
if(j < l[1]-(skip-1)) {
nextcounts[seq[j+skip,1]+1] <- nextcounts[seq[j+skip,1]+1] + 1
}
}
}
probdist <- (nextcounts/sum(nextcounts))
entropy(probdist, lambda)
}
Testing
Next up, we need some real data to test our functions on! I downloaded all of DJI historical data which was available from Yahoo. The data is stored in the variable dji and I will only consider dji$Close prices for now. I generate the daily logarithmic returns which will be partitioned into the discrete symbols 0, 1 and 2. We will use the same partitioning as in the paper. In R this is quite straightforward as:
> djipart <- ifelse(djiret < -0.0025, 0, ifelse(djiret > 0.0034, 2, 1))
The authors justified the asymmetric partitioning due to the slightly positive mean logarithmic prices. I rather see a negative mean of -0.0001835142 in my data but I want to reproduce the numbers of the paper as well as possible so I use theirs.
Before running any functions on the real data I want to test it on uniformly distributed random data
> sequnif <-round(runif(10000, min=0,max=100)) %% 3
Obviously, I expect any uncertainties calculated as equally likely .. there shouldn't be any higher predictability by considering longer trajectories
> h_n(sequnif, 3, 3)
[1] 0.997702
> h_n(sequnif, 4, 3)
[1] 0.9928539
> sequnif4 <- slide(sequnif,4)
> h_n_cond(sequnif4, c(1,0,1,0), 3)
[1] 0.9980187
> h_n_cond(sequnif4, c(1,0,1,1), 3)
[1] 0.9982527
> h_n_cond(sequnif4, c(0,1,1,1), 3)
[1] 0.9974415
Our functions seem to produce the expected almost total uncertainty answers. The conditional entropies for n from 1-6 is plotted in the picture below. Here are some numerical values:
> h_n(djipart, 3, 3)
[1] 0.9841248
> h_n(djipart, 4, 3)
[1] 0.9788154
> h_n(djipart, 5, 3)
[1] 0.9679176
> h_n(djipart, 6, 3)
[1] 0.9429579
We observe a nice gradual decrease of uncertainty for longer trajectories.

This graph pretty much reflects the shape of Fig. 2 in the original paper. Again, the graph reflects the average uncertainty of predicting the next symbol for ngrams of different lengths (with values from the x axis).
The local uncertainty for specific ngrams can be queried with h_n_cond as follows:
> h_n_cond(dji4, c(0,2,1,2), 3)
[1] 0.9924063
> h_n_cond(dji4, c(1,1,1,1), 3)
[1] 0.9245277
The next state after observing pattern 1111 is more predictable than the pattern 0212 by 6.7%!
The next graph I will generate is the local uncertainty of length 4 of the last 200 trading day closes using a rolling window as follows
len <- dim(dji4)[1]
v4 <- vector()
for(i in 1:200) {
h <- h_n_cond(dji4, dji4[i,], 3)
v4 <- c(v4,h)
}
> plot(rev(v4), type='l', axes=F, xlab="DJI close", ylab="local uncertainty")
> axis(1,at=pretty(c(1,200)),labels=rev(dji$Close[pretty(c(0,201))]))
> axis(2)

Which resembles the graph from Fig. 1 from the original paper. Given the original definitions we should have that the mean of that local uncertainty sequence v4 should be close to the average uncertainty of length 4:
> mean(v4)
[1] 0.9764447
> h_n(djipart, 4, 3)
[1] 0.9788154
The minimal value in this graph is at 0.9245.
What we haven't considered so far is how significant the predictions are. The MLE estimator of entropy we are using here is biased and the sample error grows rapidly with increasing ngram length. The paper is using something they call surrogate sequences to calculate its significance. I don't really know about that approach but I will look into it and write about it in a future article in this series of local uncertainties. One approach I could imagine is using different entropy estimators with bias correction.
Hope you enjoyed this article so far, please leave a comment if you feel like it.