The World Complex: deconstructing algos

Showing posts with label deconstructing algos. Show all posts

Wednesday, April 2, 2014

From the small to the big: earthquakes, avalanches, and high-frequency trading

I've been talking about scale invariance a lot lately. I became interested in the topic quite a few years ago in the context of geological phenomena like earthquakes and avalanches. The Gutenberg-Richter law describing the size-frequency relationship for earthquakes was one of the first natural laws based on scale invariance, but interest in the topic really picked up with the Bak et al. paper in 1987 (pdf - may only be a temporary link).

The cause for this relationship is still foggy, as is the physical mechanism between the small and large earthquakes. The best proposed explanation is that the scale-invariant distibution of events allows for the most efficient flow of energy (and information) through the system (but it isn't clear why that should be so).

So back in the early '90s I was estimating recurrence intervals estimates for certain hazardous events and I started trying to work out a methodology for detecting scale invariance in the geologic record. Using the Gutenberg-Richter Law, you can estimate the likelihood of a large earthquake in an area based on the number of small earthquakes. There were interesting implications for areas where the recurrence interval of large earthquakes is longer than the local recorded history (as in much of Canada). At the time, there were seismic hazard maps produced by the USGS which showed significant earthquake risk in zones which mysteriously ended right at the Canadian border.

One of my classmates in my undergrad days (we're back in the 80s, now) studied the correlation between microquakes and fluid injection at oil extraction operations in southwestern Ontario. The oil companies were surprisingly cooperative until they understood the point of the research, after which they started to withhold data.

And here is the mystery. The principle of scale invariance in earthquakes would suggest that increasing the number of small earthquakes should increase the number of large earthquakes at least in the short term. Yet our understanding of the dynamics of earthquakes tells us that lubricating the fault should allow stresses to be relieved through the small earthquakes, which in the long-run should reduce the chance of a large quake in the longer term. (This idea has been proposed at various times over the past fifty years, but for various obvious reasons, it has never been deliberately pursued).

By the early 2000s, other geophysicists (notably Didier Sornette, but there were others) had moved a portion of their data processing expertise into studying econometric time series. I made this move later as I gradually came to appreciate the key problem with developing quantitative techniques when the data were suspect. First of all, the measurements themselves are inaccurate. More importantly, our estimate of the timing of each observation was just that--an estimate. Most quantitative methods assume that the observations are evenly spaced in time. Failing that, they assume you know the timing of your observations. The consequences of errors in the timing are terrible, and frequently underestimated. The point is that it is difficult to develop excellent quantitative methods when the data are terrible.

The big advantage of working with economic time series--pricing data, in particular, is the elimination of the observational errors. When a transaction occurs, there is no doubt about either the price of the time--right down to the millisecond scale.

I started looking at market macrostructure--because (several years ago) nothing interesting ever happened on a scale of less than about an hour. Until just the past few years. Suddenly, strange, rich, unusual behaviours began to occur in individual stock prices, and even indices, on the millisecond scale. I didn't know what was causing it--but it sure was interesting.

Three seconds on the tilt-a-whirl.

This was the signature of onset of HFT. I was initially interested in it for entirely different reasons than most of you. After Crutchfield's (1994) paper (pdf) on emergence, I had been pondering the idea of how to recognize a fundamental change in a complex system. Again, my interest was in the earth system as a whole, and how to recognize whether or not new observations were pointing to a fundamental change in its mode of operation.

Given our understanding that the number of large avalanches is positively correlated to the number of small avalanches, it seems pretty clear that (as Nanex and Zerohedge has been saying) the damaged market microstructure is mirrored in the increasing number of flash crashes since Reg NMS. Unfortunately, our murky understanding of how the microstructure causes the macrostructural changes can be used by the regulatory authorities to avoid investigation. They can't see a smoking gun.

We would normally expect the micro-crashes to eventually relieve imbalances in the system, improving its long-run stability. (Perhaps this is how the SEC justifies the practice). But unlike earthquakes and avalanches, these uncountably many small crashes are not reducing the imbalances. One reason is that the cause of the imbalances is separate from HFT--the dollars keep being shoveled to the top of the mountain as fast as, if not faster, than HFT brings them cascading down. Another reason is that the trades (mostly) get unwound--so the exchanges push most of the snow back to the mountaintop after the avalanche.

HFT certainly benefits unfairly from the system, but isn't responsible for it. If anything, it is a symptom of corruption--but the cause of the corruption is elsewhere.

Accordingly, my modest proposal for dealing with HFT is this--nothing. Don't bust trades--let them stand. I'd be curious to see the response of the various Ivy-League endowment funds and pension funds when they suffer brutal, near-instantaneous, multi-billion-dollar losses. At a guess, I would probably hear the screaming up here. How would real companies, producing real products, react to a sudden monkey-hammering of their stock price, especially if it triggered debt covenants? Maybe they would all exit the market en masse. It might even force a real change.

Tuesday, November 5, 2013

Happy anniversary chaos!

Fifty years ago, Edward Lorenz published the first paper (pdf) generally recognized to discuss chaos.

Lorenz didn't call what he had discovered 'chaos'. It's not clear that he really understood the importance of what he had discovered. He knew it was interesting, and when scientists find something interesting they publish it, and worry about the ramifications later.

What Lorenz had discovered is that a deterministic system could have unpredictability. It is difficult to convey how unexpected this discovery was at the time, because the idea of what is now called chaos has disseminated (although imperfectly) through our common culture. A deterministic system is one in which the rules are well described (via equations) which operate on data to produce results. Since the time of Laplace's Dr. Manhattan quote, it had always been assumed that nothing unexpected could arise from such a system when an initial position and the rules of motion were defined to arbitrary precision. Unexpected behaviour should only result from randomness.

So when Lorenz put together a simple model for atmospheric convection in the presence of heating, he used three simple differential equations, simple boundary conditions, and an arbitrary starting point, there would have been no reason to suspect that anything unexpected might occur. After all, all the parameters in the equations were known.

In essence, what he discovered was that minute variances in starting conditions led to extremely large variations in outcome. This again was unexpected, because our knowledge was largely built on assumptions of linear behaviour, in which small variations only grow larger slowly. Lorenz's interpretation of what he had discovered was to correctly point out that long-term weather forecasting was impossible, because it was impossible to measure the present state of the system with perfect accuracy--and the range of possible differing outcomes from the measurement accuracy was essentially the range of all possible weather.

The discovery and formalization of chaos theory led to entirely new fields of study encompassing different aspects of nonlinear dynamics and complex systems. Among them is one field of endeavour which has been a point of interest on this blog--complexity.

What do we mean by complexity? Actually, I'll write about this in a future posting. For now, let's just note the relative unpredictability of complex systems and get into the whys of it all later.

- - - - - - -

This post is a bit belated, because Lorenz's publication was actually in March. But something as momentous as chaos should celebrate over the course of an entire year.

Sometimes we go all out and celebrate something over a couple of years. International Geophysical Year (1957-58) and International Heliophysical Year (2007-08) come to mind.

There have already been numerous celebratory events so far this year. But first, a word about the enablers of this year's celebrations on the markets.

High-frequency trading spams the exchanges with empty quotes destined to be cancelled--so much that it appears that many legitimate offers do not get filled at optimum pricing, as the system becomes overwhelmed with meaningless numbers.

According to the exchanges, HFT is a good thing. It increases liquidity, or at least that is the axiom that guides their acceptance. Unfortunately, observation tells us that the opposite may be the case--that HFT causes liquidity to vanish precisely at the time it is most needed.

In the last 50 years, we entered the nonlinear world. But our thinking--especially institutional thinking--is still trapped in the linear world.

In the linear world, if something is a benefit, then more of it is a greater a benefit. But in the nonlinear world, where one may be a benefit, and two may be better; three could turn out to be horrifying.

So in celebrating 50 years of chaos, the exchanges (with their sponsors, the algos) have brought you the following celebratory events.

Flash crash on the German market. Twitter feed flash crash. (Appropriately enough, both of these were in April). Thee Anadarko flash crash. Information travels faster than the speed of light! Closing of the Nasdaq options bourse. Not to mention hundreds of strange trade executions across all the exchanges.

How to lose lots of money in 45 ms by Nanex.

Most of these problems are the (un)predictable result of the interaction of numerous algorithms. Some may have been errors, or the so-called 'fat finger' trades; others may have been other form of human or algo error.

Algo error. Was that supposed to happen?

The markets are not what they used to be. The overall superstate has changed over the last ten years from one dominated by humans to one dominated by machines. The result has a been a series of entirely new phenomena, which we have earlier termed 'innovation'.

The year isn't over yet. I look forward to the next special event. I don't think I have long to wait.

- - - - - - - - - - - - - -

And then there's this. I was going to put in something by King Crimson here, but this seemed more appropriate.

Friday, October 4, 2013

One more time--the distinction between human- and algo-trading

The markets do not act like they once did. The trading in certain stocks is operating on time-scales so small that they cannot be in response to human thought. Not only are certain individuals able to access key information before others and so respond to news releases faster than the speed of light, but certain entities have free range to post and cancel orders on a microsecond basis, and queue-jump by shaving off (or adding on) tiny fractions of a penny from their orders.

Stocks traded by humans tend to make significant moves on a timescale of minutes to days. Even when there is a news event that radically changes the apparent value of a company, if there are only humans in the market, the move takes time to occur. Below we a couple of charts for Detour Gold (I currently have no position in this stock)

Normally, when looked at on a ms timescale, the graph is not really distinguishable from a straight line.

The little squares occur because all the price-changes I saw in the course of the day were a penny. On this scale it scarcely matters which axis is the current price and which is the lagged-price.

Once the algos get involved, the millisecond phase space plots get a lot more interesting. Some of them are works of art! Below, some plots for Century Casinos (I have no position in this one, either). Data here.

Algos playing tug-o-war.

Nice to look at, but maybe not so nice to trade against.

Remember the adage about playing poker: If you don't know who the sucker is . . .

Sunday, September 22, 2013

Why do Ma and Pa play in a rigged market?

“They have been able to pay off politicians with political campaign funds and have been granted informal and unspoken yet complete immunity from prosecution, setting the scene for even bigger confiscations of investor capital. With the risk of legal repercussions so small and the temptation to steal so large, why would any of them not take advantage? What do they have to do to stop people from entrusting them with their savings? Put up neon signs that say, “We steal your money”?”

– Dimitri Orlov – The Five Stages of Collapse

Sometimes even that doesn't help.

"Darn it, Pa! I told ya not ta buy AAPL on margin!"

"But Ma, I thought you wuz talkin' 'bout apples."

The problem is that Ma and Pa believe they are above average--in intellect, in wisdom, in luck, in investing acumen. When they see that sign that says their money will be stolen, they all have faith that it will happen to somebody else.

When I was a kid, I wanted to be a pool shark. Never mind that I wasn't much good at it. I researched the concept at the local magazine stand and discovered that the secret was not to beat the other guy with great shooting, but to beat him in a way that he would think he had beat himself, through mistakes. The reason it works is because when your victim makes a good shot, he believes that is his normal capability, but a mistake is not--and tends to be discounted. That discounting is essential, because if your victim believes he only lost because of his own mistakes, he will be confident he will win the next time. On the other hand, if you simply run the table, chances are he won't play again.

This only works because people are usually not as clever as they think they are.

Consequently, Ma and Pa can read about what is being done to them--and they will think it is only a problem for other people. Their orders will get filled in a timely fashion. The stocks they buy have true liquidity--not the phantom liquidity of offers that are never to be filled. And when they file for bankruptcy, it will merely have been bad luck.

Saturday, January 5, 2013

Invasive behaviour and extinction in the retail market

The term "invasive species" has been used to describe new types of plants or animals that have been introduced to a new area, whereupon they change the local biosystem.

The sudden appearance of new lifeforms in an environment can cause rapid losses in some of the species present prior to this appearance. Biosystems are dynamic systems with considerable stability, and often the arrival of new species simply cause a slight change in the dynamics of the system, which continues on with only small cosmetic changes.

On occasion, however, the new players cause overwhelm the stabilizing factors in the system, which undergoes dramatic changes, eventually stabilizing in a new configuration that is highly detrimental to many of the original players in the system.

Which brings me to today's invasive species.

Although high-frequency trading has been around for nearly a decade, it didn't hit public consciousness until the "flash crash" of May 2010. In the past two years, the incidence of HFT flash crashes has expanded (see archive here) to the point where they are causing significant pain to the retail investors.

Much has been written about the impact of HFT, and a broad survey of the literature is so contradictory that I have to feel that some authors are not writing honestly. For every article about manipulation and increased volatility and reduced liquidity, there are academic papers like this one claiming that HFT "improves liquidity and enhances the informativeness of quotes".

Many of the characteristics of successful invasive species are shared by HFT algorithms: 1) fast growth; 2) rapid reproduction; 3) the ability to alter form (mutate) to suit current conditions; 4) tolerance of a wide range of conditions (except perhaps transparency); and 5) ability to live off a wide variety of food types. As a bonus, living in contact with humans also helps invasive species.

What in the market is going extinct? The retail investor.

How so? It comes about through the erosion in their margins brought about by HFT. In the presence of HFT, the unsophisticated investor pays a higher price on the buy and receives a lower price on the sell than would be the case otherwise. The professional traders manage to maintain their margins--the losses of the unsophisticated are the profits of the algos.

As our markets have come to resemble casinos, investment is increasingly like gambling. For a typical gambler in a casino, where winning is determined by chance, is eventually ruined. Gambler's ruin is inevitable in a fair game--but comes faster now that the bias is negative because algos are skimming a little off each one of our gambler's bets.

Friday, August 12, 2011

Machines without memory

The Masters of the Market stay
In darkened rooms where 'lectrons play
And talking heads cannot convey
The new idea's birth

For they hunger in their secret dreams
For the trading highs of cruel machines
Projected on a million screens
Without a sense of worth.

At last the HFT algo show!
The crash nobody could foresee!
Your neck inside the rope!
Indices wihout hope!
The looters that ignore the SEC!

Perfumed fingers through the till
The asks all grow but never fill
The red ink has begun to spill
The market starts to tank!

The regulators and the traders too
Are uncertain if the plunge is through
And consult their charts to find a clue
But the frozen screens go blank!

At last the HFT algo show!
High VIX grown of a fractal seed!
A bubbilicious time!
A dark and dirty crime!
The bulging eyes of traders strangled by their greed!

(apologies to Alan Moore)

A paper by W. S. Rea and co-authors reminds me of why the methods of analysis I used in this article failed to provide any useful insight despite appearing to work on longer term charts here, here, here, and here.

The time series outputs of some dynamic systems possess long memory--meaning that the present beharviour is influenced by the entire past history of the system. It may be that recent events have a larger statistical impact on the present, but the characteristic of a long memory requires that even events in the distant past are reflected in the present behaviour.

How memory is "stored" in the system varies. For instance, in the days when HFT was a distant dream, the response of a stock's price to a good quarter would depend at least in part to the company's past behaviour. One which disappointed quarter after quarter would not benefit as much from a good quarter as one which had a history of meeting or exceeding expectations--the market might exhibit some skepticism. Where is this memory stored?

In climate systems, the memory may be "stored" in slow-response variables, which may yet influence the reactions of fast-response variables to various forcings. The geological system is extremely complicated, because local climatic factors, which are driven by such things as ocean currents and the distribution of continental land masses are strongly influenced over the long-term by tectonic activity; and over shorter timescales by the distribution of fresh water bodies, themselves being altered in response to isostatic uplift. Slow variations occasionally lead to catastrophic events, meaning sudden irreversible changes can occur in what had been a slowly evolving system.

Many economic systems appear to have long memory. The activities of today are influenced by events of the past. Nixon striking down the last vestige of the gold standard, Volcker raising interest rates, the "strong dollar" policy of Summers, the wars of Bush the Elder and Bush the Younger, Obama's lates raise of the debt ceiling--all of these have had impacts that have rippled through the USD gold price from then until today.

Dynamic systems analysis--at least the type used on this blog--work best on systems that have this kind of memory. The lessons of the past must echo, at least in some form, through the system for analysis to give us some interpretable results, as they do for unemployment and the gold-silver ratio.

HFT algos are different. They have no memory and only dream of a concept of value. Arguably, they estimate a value on the basis of variability in observed parameters; but their means of acquiring or disposing of a stock subverts the normal method of price discovery. Each trade during a flash crash has no identifiable connection with previous trades, but represents that maximizing of an unpredictable opportunity. When the flash is done, normal trading between human resumes as if nothing had happened.

Tuesday, August 9, 2011

Flash crash: business model or indicator?

After the series on deconstructing algos, a few things become clear:

1) HFT, by and large, does not increase liquidity. On the contrary, it works by reducing liquidity at key intervals (during periods of determined buying or selling), resulting in larger price moves than would otherwise be the case.

2) We can distinguish between the brief episodes when an algo clears out those pesky human bids from those when two algos are going toe-to-toe in a stat arb war, as well as those intervals when an algo is taking some hapless mutual or pension fund to the cleaners.

Examples below are accessible through here, except for the first one which was posted here.

Eliminating human bids before the fun begins--2 s.

Scalping the fund by removing liquidity in the face of determined buying. Note the sudden

rapid rise in stock price while the fund buys, and the price returns to normal afterward.

Two algos slugging it out. Notice a lot of activity in the bid/ask but very few trades actually occur.

Two algos duel. Then, at 10:25, a committed buyer shows up for a scalping.

A committed seller experiences HFT (note the rapid decline in price).

The flash crashes occur because somebody needs to sell a quantity of shares. The algos "perceive" the orders coming to market and choke off liquidity, and the seller gets a poor price.

The flash "rises" occur because somebody needs to buy. In response to demand, the algos again remove liquidity.

In neither case is liquidity being offered when it is needed. In fact, the exact opposite is occurring. By systematically removing liquidity when it is needed most, HFT algos destabilize the system. This destabilization is merely a side effect--the algos increase the profits of the companies that operate them. But this is very much like the Enron method of doing business--shut down plants at a time of soaring electricity demand to line your own pockets while possibly bringing down the electrical network.

Days with a lot of flash crashes, as on Friday (August 5), are days where there is a lot of institutional selling. It is possible that the focussed withdrawal of liquidity by these service providers contributed to the rather steep decline of the indices on that day.

Friday, August 5, 2011

Deconstructing algos part 5: Are there any humans in the market?

In the past few days, some unusual behaviour has been occurring in after-hours trading of Earthlink shares.

Are there any humans in this market? Hello?

After hours pricing on ELNK, August 2, 2011. Lots of action. Image from Nanex.

Details of the above image. Same source.

And here's the trading action. Not much considering all the bidding activity.

I think what we are seeing is the elimination of humans from the market. Two algos, using their own stat-arb approaches have a differing opinion about ELNK. One thinks it is a buy at any price below, say, $8--the other thinks it a sell at any price better than, say, $7.95. It is normal for such differences of opinion to exist--indeed, they have to exist for the market to exist. When two humans meet in the market, with just such a difference in opinion, they would soon come to an agreement, the price being dependent on which participant gives away his opinion first.

The algos each try to maximize its own gain. And they do this by showing only a small offering at the best price that doesn't attract any attention. As soon as some interest is shown in their bid, it is cancelled and moved to a much more favourable price. It would be as if one of the human traders had opined "I might be interested in selling some ELNK at $7.95", and then when anyone expresses an interest, suddenly changes his mind, and say, "actually, I meant $8.15." Then the other trader says, "well, if you came down to $8.10, I might be interested," and just as the first trader goes to agree, the other suddenly says, "actually I mean $7.75." After this goes back and forth for awhile until the inevitable fistfight breaks out.

No trading would occur. This approach provides no liquidity.

It is a contest, like the game where you try to step on your opponents foot. One favoured tactic is to dangle your foot in front, luring your opponent into an attack, pulling it out of the way as he does so, and then quickly counterattacking your opponent's extended foot. Every so often one of the opponents manages to touch the other and a trade goes through. Otherwise, the bids and offers just go up and down furiously.

* * * * * * * * * * *

In the last "Deconstructing algos" article we looked at two-dimensional reconstructed phase space portraits of busted trade data for CNTY; original data acquired from the Nanex site here.

As described earlier, one approach to creating a geometric representation of a phase space from a time series is to generate a time-delay plot, in which the values of our time series are plotted against lagged values of the same series. We use a constant lag for reasons described here.

Now, in the CNTY data (and in the data series in today's articles) the time control isn't as fine as we would like. In particular, even though the trades are presented in order, the time stamps only extend to the second. We may have 250 transactions in order in that second, but we don't actually know the length of time between any of them. How do we come up with a constant lag?

We can't. What I did in the last episode was assume that all trades were evenly spaced. In reality, this was unlikely. The result is that my phase space portraits were distorted somewhat from reality. How much distortion depends on how far from evenly spaced the samples are. In practice, with lots of points, the distortion isn't really going to be bad unless you have more than 80% of the trades compressed into an interval comprising less than 20% of the time investigated. This seems unlikely, but it would be nice to be able to check. Intuitively, it seems likely that the many trades at similar values occur close together in time.

A geological time series may be a representation of midsummer temperature, captured at thousand-year intervals. We don't know what the temperature does in between each of our observations, but it would be reasonable to assume that it varies, probably in quasiperiodic fashion. Worse, our control over the timing of our samples is nowhere near as nice as we like to pretend. Ask a geologist if his samples really are separated by thousand-year intervals and he will smile and have a distant look in his eye. In reality, the samples are at uncertain intervals, and the time series is fitted to some sort of time scale, and the geological parameters of interest have been interpolated (usually in a linear fashion).

Pricing series are different. Each of our observations is one sale. There is no doubt what the price is between sales. By convention the price between sales is that price of the last sale. So there is no need to interpolate data.

Let's look at a simple example. Brown-Forman Corp. (BF.A) had some interesting gyrations on July 12, 2011, as detailed on the Nanex strange days page.

We observe 46 trades time-stamped 09:30:01. Notice the stock trades from $68 down to $23 during this second. The trades are not quite evenly spaced, but I have created the time-delay pseudo phase space plot by assuming they are, and plotting the price of one of the trades with this time stamp against the fifth trade prior (with the same time stamp). Hence we have 42 paired trades to put on a scatter plot. By convention we draw a trajectory through them in sequence. Here is what the resulting pseudo phase space plot looks like.

A masterpiece of flash impressionism! Look at the elegant lines. It looks ready to take flight, free at last from human meddling with the stock price! The initial trades are near the upper right, the final trades took place at the left lower tip.

Now we can add some trading density to the graph. We know the location of each of the paired trades. We choose select the volume--either that of the original trade or that of the lagged trade--it doesn't matter which, but be consistent! I have chosen the lagged volume and contoured using various bin sizes. In these graphs, the bins are 2x2 squares, centred in the midpoint of the four squares.

The above plot used fairly large bins. Each bin has a $20 trading range. I had to use such large bins because there weren't very many trades. The contours are at 10% intervals, meaning that all bins (2x2 boxes) centred within the first shaded contour contain at least 10% of all trades during the one second interval represented in the plot. Most trades occur in the $60-$70 range. The trading density thins out at the lower price intervals.

Here is the same plot with smaller bins.

Smaller binning gives a better image of what's going on. Here we see the greatest trading density was actually in the $50 range. There are five disjoint basins (six disjoint areas, maybe). Other than that I don't know how to interpret this. I'm not sure whether there is any point in trying to tease out any more information from it.

Let us look at trades for ASIA on July 14, 2011.

The stock began trading near $16 and within 1 s had retreated to $14.

Trading density plot.

Here I've used an absolute trading density (i.e. number of shares traded). The most shares traded in one bin was in excess of 50,000 (labelled on diagram). Instead of contouring, I shaded the bins in accordance with the legend. The labelled dot is the first state at 9:30:01.

This exercise is really about displaying the data in a different form in the hopes that we can make some kind of interpretation of it. It is always possible that no interpretation is possible. This has made me dizzy. I am posting these (and will post a few more shortly) in the hopes that someone sees something of note.

Or perhaps this is the correct interpretation.

Saturday, July 30, 2011

Deconstructing algos, part 4: Phase space reconstructions of CNTY busted trades suggests high speed gang-bangs in the market

This summary is not available. Please click here to view the post.

Thursday, July 28, 2011

Similarities between paleoclimate transfer function determination and statistical arbitrage

Paleoclimate transfer functions

Back in the day I was given the task of converting a set of programs from FORTRAN 66 to something that could be run on a PC. These programs were designed to use a wide variety of paleoclimatic indicator sets--in this case the relative abundances of 30 species of foraminifera, and known distributions of summer and winter temperature and salinities at the ocean surface over thousands of surface samples, and convert them into a transfer function which related the desired environmental parameters to the foraminiferal abundance data.

This really brings back memories.

The basic idea is this: Summertime surface temperature (Ts) will be a function of the foraminiferal species abundance. If the abundances measured at a particular point were represented by the probability series p1, p2, p3, . . . , p30 (where Σp = 1), then an expression might be found as follows:

Ts = a1p1 + a2p2 + a3p3 + . . . + a30p30 + a31p1p1 + a32 p1p2 + . . . for a whole lot of terms. Assuming we used all first and second-order terms, we would have to develop 496 parameters in the above equation. That is rather a lot, particularly for the computers we were using when FORTRAN 66 was in vogue (well, okay, it was obsolete then--we were really using FORTRAN 77).

So instead of using all of the foraminiferal species abundance data, we would use factor analysis to simplify the data. Factor analysis is a bit of statistical wizardry which groups species which behave in a similar fashion together into a single factor. We would use the minimum number of factors to represent an acceptable amount of variance--in the case of transfer functions for the North Atlantic we reduced our thirty species to six factors. Our expression for temperature then becomes:

Ts = a1f1 + a2f2 + a3f3 + . . . + a6f6 + a7f1f1 + a8f1f2 + . . . +a27f6f6 + a28. Values for a1 to a28 were found by multiple regression. The PCs of the late 80's were capable of running such programs in a reasonable length of time.

Once all of the surface samples were run for the present day, you could look at the foraminifera found at different levels of a dated core. Let's say you have a sample from a level in a core known to be 100,000 years old. You count the numbers of the different species of foramifera, convert your observations into the factors determined above, and apply the factor loading scores to the transfer function, and you calculate the sea surface temperature at the site of your core 100,000 years ago.

As you might expect, there are a lot of things that can go wrong. The environmental preferences of one or more species might change on geological timescales. Different species might bloom during different times of the year, and this may also change due to evolutionary pressures or some confounding effect like iron seeding in coastal seas oceans due to variability in surface runoff. Nevertheless, the technique has been a mainstay of paleoclimatologists for about forty years.

Approaches to statistical arbitrage

The idea of statistical arbitrage is that there is a particular stock (or commodity or bond or what-have-you) which is mispriced relative to a model based on observations. This model would have the form of a transfer function as described above, but instead of using species abundance data, we use observed values of related financial data, including such things as the price of one or more indices, perhaps the unemployment rate, the inflation rate, the price of gold (or other commodities), and so on. Like in the transfer functions described above, having accurate financial data is critical (as opposed to manipulated "official" data sets).

There are at least two principal approaches to statistical arbitrage--1) concerning individual stocks (or commodities or bonds or whatever and 2) involves matched long-short trades between any number of stocks.

Instead of creating a transfer function between your observations and a particular stock price, you might find the ratio between the prices of two stocks. If the modeled ratio differs from the observed ratio, there may be an arbitrageable opportunity by going long the underpriced stock and short the overpriced one. Your assumption is that the dynamics of the ratio between the two prices has not changed so you plan to take advantage of the reversion to the mean. You don't know whether mean reversion will occur by the overpriced stock falling or the underpriced one rising, but the paired trade should work provided the relationship between the stocks does return to the mean.

Speed comes into play because it is an advantage to be the first market participant to recognize the arbitrage opportunity. As the world is now filled with algorithms seeking these opportunities they tend not to be available for long. One recent exception was in Canadian banks a few years ago, when two of the major banks were thought to be in trouble (and their dividend yields rose quite sharply as share prices fell)--there was a great trade in going long the high-yielding stocks and short the low-yielding stocks (on the assumption that in Canada there will not be a bank failure).

Many institutions calculate highly involved stat-arb positions involving matched long and short positions over large numbers of stocks. It can be difficult for a human to see how all the matches work. For human traders, stat arb probably works best in paired stocks, or a stock vs. an index.

For the record, I don't have any problem with algos pursuing stat arb. It seems to me to be an inherently fair process, and potentially it does improve liquidity if the under/overvaluation is driven by some sort of investor frenzy. I would have a problem if part of the strategy of the algo is to interfere with the access to information of market competitors by creating latency.

Friday, July 8, 2011

Deconstructing algos 3: Quote stuffing as a means of restoring arbitrageable latency

In a recent article Nanex has shown that quote stuffing can slow down the updating of series of stock prices, bids and asks. The article was less clear about why one might do that. There could be arbitraging opportunities.

One of the first games these clowns got into was latency arbitrage. HFTer offers a number of shares for sale at one price, and at the first sign of interest, pulls all of the offers and resubmits them at a higher price. The latency comes into play because as another player send his orders in to fill HFTer, and these orders all find their ways to the market via differing routes, each of which has a different latency (lag time)--so instead of all arriving at once, they arrive singly, giving HFTer time to pull the rest of his bids.

Early this year, Royal Bank of Canada (RBC) launched a new trading program called Thor, which was designed to avoid latency arbitrage. The gist of the program was that the system would monitor the various latencies to all the different exchanges to which orders would be routed, and artificially delayed the submission along the fastest route so that all the orders would arrive simultaneously on all exchanges. While perfection did not occur, the early claims were that the various latency would me measured in microseconds, which at the time seemed reasonable.

Presumably RBC is not the only player that has developed such software.

Now we hear that orders are being stuffed down different channels at such speeds as to change the latencies. In the Nanex article we see:

Today (June 28, 2011) between 10:35 and 11:17, algorithms running on multiple option exchanges (6 or more), drove excessively high quote rates for SPY options (and 2 or three other symbols that I haven't identified yet). Fortunately this was a quiet trading period. A total of about 400,000,000 excessive quotes were generated -- that is, compared and scaled to the previous day. In one 100ms period, 2,000 SPY option contracts had about 16,000 fluttering quotes (some combination of nominal changes in bid, bid size, ask, ask size) resulting in saturating/delaying all SPY options on that line. These events occurred several times per minute during the interval. If these algorithms include more symbols, or if they run again during an active market, we will see severe problems. It is shocking to see this so widely distributed across so many exchanges and contracts simultaneously.

I'm pretty sure this is not intended to be damaging to the market. The fact that it runs for short bursts on limited channels suggests that there is a particular target. A target like Thor.

Saturating the quotes on individual lines will change the time lags (latency factor) during the intervals the quotes are generated. For Thor to work properly, it has to estimate by observation the precise lag between sending an order and having it arrive on each market. Randomly changing the lags for the different lines would confound RBC's (and others) attempts at ensuring all its orders arrive on all markets at the same time.

The quote stuffing in this case is intended to be noise, and its intent is to give the latency arbitrageurs the upper hand, as it is easier to generate an immense amount of random noise than it is to formulate an anticipatory response to it in real (microsecond) time.

The only approach I can see (if this is possible) is some kind of all-or-nothing fill on your orders. So your orders arrive on the different markets at slightly different times, but they aren't triggered until they are all ready and then they trigger at the same time. Of course the arbitrageur is probably also the "market maker" and can see you trying to match up your orders prior to execution, leaving you up shit creek in any case.

I really can't see an argument for these actions providing liquidity.

Tuesday, July 5, 2011

The evolutionary arms race in the realm of HFT algos

The history of life is littered with abundant evidence for evolutionary arms races, by which one (or one group) of organism(s) develops some advantage over competitors, predators, or prey, which is soon after countered by the disadvantaged group. The dance has continued from the earliest times of life until the present, and is presumed to be ongoing. Indeed, it is one of the central selective pressures effecting evolution--by eliminating the losers of the arms race.

As I am not an evolutionary biologist, I was thinking in particular of asymmetric races, in which competing organisms adopt different methods, rather than symmetric races.

My interest in such things stems from having a son (and many, many relatives) with G6PD deficiency, a relatively common enzyme deficiency--selected for in humans most likely because it confers some resistance to malaria. The chief drawback of this genetic condition is that eating certain foods (and medications) can cause massive destruction of red blood cells.

How does such a condition appear? Like most genetic conditions it most likely is an example of a random mutation which hangs around because it is selected for in malarial environments.

Plants have developed toxins over evolutionary history, and one such class of toxins causes destruction of red blood cells. Mammals (among other animals) have developed enzymes that break down these toxins, and the breakdown products are now beneficial. In fact we call these toxins "antioxidants".

Ironically, the enzyme G6PD apparently plays a role in the life-cycle of the malaria parasite, as those who have this condition and who are infected by malaria typically carry lower parasite loads.

In the digital realm, the concept of evolutionary arms races has been around since about 1980, and are most commonly observed in the ongoing battle between computer viruses and antivirus products.

They also appear in the realm of high-frequency trading (HFT), although as no one is eager to produce manuals on how their specific systems work it is a little harder to see how.

Years ago we used to see stops getting busted on in-demand shares--which we soon recognized as a sign that the particular stocks targeted would soon see gains. It always happened during a quiet trading time, usually after about 11 in the morning, and suddenly all the bids would be hit until a massive stop-loss was triggered and picked up. I remember in 2002 seeing CDE-N knocked down 30% in a matter of minutes, followed by a massive pick-up of some sucker's stop-loss, followed by furious action as the price bounced back. That passed for HFT in those days.

One of the modern approaches to HFT is latency arbitrage, whereby some entities are able to see more up-to-date buy and sell orders than the general public sees, and use this info to either scoop up the market with an arbitraged advantage or withdraw orders only to replace them moments later at a higher price. For instance, you may be trying to buy shares in company ABC, but as there are differing time-delays for each of the markets on which you are seeking shares--as soon as your first order appears on an exchange, all other available share orders at your buy price are cancelled and resubmitted at a higher price.

Recently, RBC announced a new program called "Thor", meant to combat latency arbitrage. The idea was that RBC would monitor the latency for all markets and use that info to ensure their orders arrive on all markets simultaneously.

Well, what's an HF arbitrageur to do? Why not try quote stuffing? A large number of quotes on a single stock can slow down the reporting on an entire channel, so why not use it to vary the latencies by random factors, making it more difficult for a program like Thor to work. If the latency factors for each market starts varying randomly, choosing the appropriate lags for Thor becomes impossible.

Saturday, July 2, 2011

Deconstructing algos--reconstructing the system

Our market system is predicated on the assumption that all market participants have equal access to information. In the real world, this is not the case. As companies pursue their interests, discoveries are made, unusually large transactions occur, and participants in the companies in question acquire material information before many other market participants. The market has rules which are intended to prevent those with access to privileged information from being able to profit unfairly. Thus, the rules against trading on insider information.

Now, as has been demonstrated abundantly, it is clear that there are other entities with access to privileged information. This information has not arisen from the normal business activities of market participants--it has been deliberately created and vomited into the market through quote stuffing in order to overwhelm the system's ability to update market prices and to create many, many, many small arbitrage opportunities.

Very few market participants can carry out quote stuffing, or can create and cancel orders hundreds of times per second. The effect is to tip the playing field in favour of these large entities, and the law of large numbers ensures that profits flow towards them. To my mind this is something very different from trading on insider information.

There is a libertarian argument insider trading is victimless and should not be considered a crime. I admit to some sympathy towards this view as it seems to fit into a kind of justice--usually someone is doing something socially useful and this creates an opportunity to make additional profit.

For those who think that quote stuffing represents a form of natural justice (IFS Bank has invested in the technology and therefore deserves its ill-gotten profits at the expense of everyone else), I would like to point out the moral difference. The money made from quote stuffing is not part of a socially useful activity. The proper role for financing has always been to make money by attempting to create something that generates cash flow, whether it be a mine, a factory, or an apartment block. Carrying out thousands of transactions per second in order to scalp fractions of pennies each time does not create wealth--it transfers it towards the HF traders.

How do we regulate the market to return it to a semblance of normality? The current market rules did not envision the kinds of advantages that can arise through quote stuffing--consequently there is no mechanism for bringing its practitioners under control. What should be done?

Quote stuffing has to be ended. One method is to place a tax on each stuffed quote, and to remove the exemptions that certain market participants hold for Exchange cancellation fees. The fact that such exemptions exist guarantees that certain market participants will hold an advantage over other participants--something that is not consistent with a fair market.

As has also been pointed out--how can such changes occur when the regulators have been captured by the perpetrators?

Thursday, June 30, 2011

Deconstructing algos, part 2: Leveraging chaos into high-frequency arbitrage opportunities

The recent elegant explanation for the activities of the HFT algos by Nanex seems likely to be a better one than the analysis that follows, as it answers the all-important question--why? In the following analysis we will look a little bit at how, but most or our interpretation of the results is coloured by the Nanex explanation. It explains why so many trades happen outside the bid-ask spread, particularly as the bid and ask prices are moving rapidly. They are scalping fractions of pennies from some poor fool who has data more than a few ms old.

As this is the reason, the method of choosing bid and ask prices pales in significance next to the methodology of stuffing the orders. This methodology I know nothing about and will not address. This article will address how to use this stuffing to create endless opportunities for arbitrage.

The principal advantage discussed in the Nanex report is stuffing the market with so many orders that competitors have trouble seeing the present state of the market. Whenever such inefficiencies are created, an arbitraging opportunity may also be created.

One method of creating arbitrage opportunities is through manipulation of time. We are accustomed to thinking of information flow as instantaneous, but it is limited by the speed of light. How might HFT take advantage of this?

Imagine for a moment that transatlantic communications were somehow extremely limited, so that a trader in New York could not see the present price of a stock in Paris, but would only see it after a two-hour delay. Any market participant who could somehow overcome this limitation would find a myriad of arbitrage opportunities.

Now look at the present. Let's suppose International Face Sucker (IFS) starts stuffing 100,000 bids per second into the pipe in New York. Let us say that those bids are x1, x2, x3, . . .

A market participant in Californa, Hedge Fool LLP (HF) is in the market and starts looking at the stream of bids coming down the pipe.

At 100,000 bids per second, the electronic signal only travels about 3 km between each bid. So at the time when HF sees the first bid (x1), and prepares its response (say, y1), IFS is actually sending quote x1500 into the dataverse. Where is the market? What is the current price?

Now suppose IFS has a branch in California. They have the same algo as IFS New York, and are running it so locally they observe x1 and HF's response y1--but they already know what x1500 is (or is that "is going to be"?), not to mention all of x2 through x1499. Might there be an arbitrage opportunity? Might there be 100,000 such opportunities every second?

A fraction of a penny 100,000 times a second--it isn't long before you're into real money.

Now IFS has branches in London, Paris, Sydney, Tokyo, Shanghai, Moscow--they are all running these arbitrage trades and who knows--maybe they are stuffing orders into their local bourses, using an algo known by all other branches and are arbitraging them as well.

The role of chaos

Not that it matters much, but what sort of algos are they using? I think they are mostly pretty simple.

The algo on the bizarre spreads seen here is straight forward, but hard to see how it profits.

As I've written elsewhere, the nat gas trading algo looked very similar to a simple chaotic function--the first such function ever identified.

Natural gas over a brief interval on June 8, 2011. Graph from Nanex.

Nat gas price from above graph plotted against linear time.

Plot of first 5500 values for x using the Lorenz equation with parameters σ = 10, ρ = 24.7, β = 8/3.

You might think that using such a simple, well known function would mean that anyone could tag along for the ride. But you would be mistaken. Chaotic functions have a property called sensitivity to initial conditions, which makes them very useful in this particular application. Even in the unlikely event that some disgruntled former employee steals the software, its use will be extremely limited.

Note in the equations above we have three flow parameters, σ, ρ, and β, for which there are an unlimited number of choices resulting in chaotic behaviour of the overall function. In addition, we may choose any starting location, and we can also vary the time steps (basically x2 = x1 + time-step * dx/dt). Any arbitrarily small difference in any of the above parameters/variables leads to a dramatically different future evolution of the function. For instance, the two plots below (blue and red) are identical in all respects except for blue, σ = 10.01, and for red, σ = 10.00 (where only the red appears, the two curves are essentially identical).

The plot above represents about 16000 intervals, which could probably be squeezed into fewer than 5000 quotes, which IFS could blast out in maybe a twentieth of a second. If HF had stolen the program, and entered every parameter correctly, except for a typo ("10.01" instead of "10.00") then their estimate of the IFS bids will only be accurate for only about 25 ms. After that, HF might as well guess.

We could imagine IFS deciding on the next day's choice of parameters late in the evening, sending the numbers in an encrypted message to all their offices worldwide, and the next day they are all happily arbitraging away 100,000 times a second. They could change the parameters on an hourly basis--or every minute--it requires only a small amount of information to unspool an unlimited number of bids.

The only practical use for this software, if stolen, would be to use the same quote-stuffing method so your international subs could arbitrage the hell of the market. But that would be manipulation, if it falls into the wrong (read "your") hands. In the hands of IFS, of course, it is proper and judicious market management.

Dust flux, Vostok ice core