Category: Investing Insight

Investing insight to make you a better investor.

The Shifting Sands of Auto-Correlation

Autocorrelation represents the degree of similarity between a given time series and a lagged version of itself over successive time intervals. Our previous post on using run length encoding discussed how you can inspect the streakiness of returns. ACF allows you to inspect the relationship between day-0 and day-n returns.

Let’s have a look at the ACF of NIFTY and BANKNIFTY returns. Here is a plot of the ACFs of both real returns and up/down returns in 2015.

Zoom into the up/down ACF of NIFTY, see how day-0 direction correlates strongly to day-1, day-2 and day-3? Combine this with what we got from running an rle and you might just have a trend-following strategy.

Before we run off to make our millions, lets see how other years faired. Enter 2021.

While NIFTY’s day-0 direction correlates strongly to day-1, it is the inverse of day-2’s and day-3’s. Did it become mean-reversion-y? One implication of this give-up is that a trend model tuned to work well with 2015 data is unlikely to repeat its performance in 2021.

However, ACFs from 2015 through 2021 have day-1 correlations as always positive. What if you just bet that an up day will be followed by another up day and vice versa?

Euclid vs. Hamming Distance

Our previous post explored the differences between CAPM Beta and Hamming distance. Think of Beta as a linear regression between two time-series and Hamming distance as the number of days when the direction of returns differed. The usefulness of the Euclidian distance for non-reverting timeseries is somewhere between the two.

Extending the previous example using HDFC and keeping everything else the same, here’s what the Euclidian distance measure looks like.

Higher the distance, the farther apart their curves and worse the index hedge. Here’s the equity curve that can help map returns to distance.

Reproducing the Beta and Hamming Distance charts:

From a linear portfolio point-of-view, which of these series is more “predictable?” Is it possible to specify bands beyond which things “break?” And does using shorter look-backs help?

Beta vs. Hamming

The beta of a portfolio is often used to hedge it against the market. We did a brief intro in our post: A Gentle Introduction to Hedging. And previously, we discussed how the Hamming distance can unearth relationships by simplifying the data that we have. Here, we bring the two concepts together.

Linear-payoffs

CAPM Beta is a glorified linear regression between two return streams. It is useful in the context of linear-payoff portfolios.

For example, a typical long-only fund can use its portfolio beta to measure sensitivity to the market and to hedge against it. A single-stock portfolio with only HDFC in it will exhibit varying beta wrt different markets.

If you are after a linear-payoff (long stocks or futures outright,) beta can be a useful metric to track.

Convex-payoffs

Betas are useless if you are trying to hedge or analyze a portfolio with convex payoffs. Like, say, an options portfolio. Here, you care more about up/down days over an index. This is where Hamming distances are useful.

A Hamming distance of 70 over a 250-day return stream means that by flipping the direction of just a third of the sample, the up/down series will equalize.

In our HDFC-only single-stock portfolio example above, we see that its beta over NIFTY/BANK-NIFTY is vastly different whereas its Hamming distances closely track each other. This behavior can be used to construct trades that go beyond being long-only equity.

Hamming Distance

Previously, we discussed how removing information from data can be useful. And our discussion on using Euclidean Distance for Pattern Matching showed how you can use a rolling window to identify matching segments within a time-series. What if we mix the two ideas together?

If you transform a time-series of returns to 0-1, then we can use Hamming distance, a measure the minimum number of substitutions required to change one string into the other (Wikipedia,) as a measure of similarity.

For example, take the most recent 20-day VIX time-series and “match” it with a rolling window of historical 20-day VIX segments and sort it by its Hamming Distance.

Here, on the second row, we see that by just flipping two bits, the 20-day sequence ending on 2020-05-18 matches with the 20-day sequence ending 2021-11-16.
If you are looking for a rough up/down days match, then this is a blistering fast way to compute it.

Direction vs. Magnitude

Sometimes, it is useful to remove information from the data that you have.

Lets say, you have a time-series of returns: +0.001, +0.001, +0.001, +0.1, -0.01, -0.01. What if, you removed magnitude information and kept only the direction? You end up with: UP, UP, UP, UP, DN, DN. Now, you can analyze this transformed dataset using a whole bunch of algorithms designed to work on binary sequences.

Run Length Encoding (rle) is one such algo. We used it while looking for streaks (Part I, Part II.) We dismissed the backtest as a datamining artefact. Which it might very well be. However, if you believe that a timeseries can exhibit both trend and mean-reversion, then looking at it through this lens can be useful.

Knowing the “average” length of streaks can also help in position sizing in a trend-following system and regime classification.

Stay tuned.