oneminusp.com Computational Finance, Markets, Programming & co

29Jan/100

Local order and predictabilitiy: Significance testing

The two previous posts described an implementation of a paper about finding local order (return patterns with higher than average predictability of the next symbol) in financial time series.

One important unanswered question so far is about the significance of the local uncertainties h_n(A_1 \dots A_n). Does a deviation from almost no order ( > 0.99) really mean something or is it due to imprecisions/undersampling of the empirical probabilities? As the original paper notices, the larger values we choose for n, i.e. the more previous trading days we consider to predict the next one, the more ngrams are possible and therefore the more samples we need to approximate the probabilities p^{(n)} more or less accurately.

There's two ways to go:

  • As in the original paper, use empirical probabilities and the basic plugin entropy estimator and restrict n to maximally 5, as their significance level K dictates (more to that below)
  • Experiment with larger n including more sophisticated probability and enstropy estimators

We will do both. But for now I'll concentrate on the significance level K as introduced in the paper. A so called surrogate sequence of length n is generated out of the partitioned time series. These surrogates have the same mean and standard deviation as the original sequence, you could see it as a random shuffling of the sequence with some further rules. The local uncertainties from the surrogates are called h_n^S(A_1 \dots A_n). The significance level K is then calculates as:

K_n(A_1 \dots A_n) = \vert \frac{h_n(A_1 \dots A_n) - \langle h_n^S(A_1 \dots A_n) \rangle}{\sigma_{h_n^S}}\vert

26Jan/100

Local order and predictability – Implementation

Part 1 discussed a paper on local order and predictability of time series. I will now describe the implementation of the described functions in R.

First we assume that already have our real returns data partitioned into symbols A_t = \{0,1,2\} so \lambda is 3. Thus our time series is just a vector of values 0 1 2.

Next, all our functions will consider trajectories A_1 \dots A_n of that original vector. I will implement this as a sliding window of length n. So if our sequence is 012020120 the function slide will create the array 012, 120, 202, 020, 201, 012, 120 out of it.

slide <- function(seq,windowsize) {
	steps <- length(seq)-windowsize
	start <- 1
	stop <- windowsize
	accu <- array(0,dim=c(steps,windowsize))
	for(i in 1:(steps)) {
		#print(seq[start:stop])
		accu[i,] <- seq[start:stop]
		start <- start+1
		stop <- start+windowsize-1
	}
	return(accu)
}