The World Complex: dynamic systems

Showing posts with label dynamic systems. Show all posts

Friday, December 23, 2011

Innovation in complex systems

Innovation has been on my mind a lot lately. Unfortunately, not the kind that results in iPhones and the like.

We normally think of innovation as a good thing. But not all innovations are good ones. As counterexamples, let's consider recent political innovations in the US that allow indefinite detention without trial of anyone accused of terror-related activities; or the use of Predator drones to target American citizens.

My interest has been innovation in the Earth system--particularly in the behaviour of the climate system over the past two million years. The problem with recognizing innovation is that we tend to interpret any activities in light of what we already know--consequently it is difficult to discover anything new. Our first tendency would be to explain our new observations as a special case of what we already know. We resist the idea that something new is occurring.

The Earth system is driven by a few global parameters which interact with myriads of local agents; yet contrary to expectations instead of dissolving into noise, highly ordered global-scale structure arises. We may call such structures emergent properties, and the means by which they arise is termed emergence.

The problem of how these global structures arise from multitudes of interacting local agents is, shall we say, a non-trivial problem. They are in no way predictable from our knowledge of the local interactions; nevertheless we agree that emergence is in accordance with physical laws.

In earth systems, such emergent properties include plate tectonics, glaciations, superplume events, and some mass extinction events.

The emergent properties of a system may change. These changes may or may not be related to specific change(s) on the local level. For the purpose of this essay, I am referring to such changes as innovation.

Possible examples of innovation in Earth systems include the (somewhat controversial) proposed change in mode of tectonics in Archaean time; (very controversial) Neoproterozoic glaciation (i.e., "snowball Earth"); and magnetic pole reversals.

I have been considering change in operation of the climate system during the Mid-Pleistocene (from about 1 million years ago to about 500 thousand years ago).

I present the following probability density plots of the 2-d phase space reconstructions of the ice volume proxy, produced using the time delay method with a delay of 6 thousand years. Each of the figures below is calculated from 150 thousand years of data.

Starting from the Early Pleistocene . . .

Limit cycles (green dashed ellipses) are common in the Early Pleistocene, less so later.

Areas of Lyapunov stability, labelled A1 and A2, represent relatively ice-free conditions. Current global ice volume is comparable to A2, and A1 represents even less ice than at present. Limit cycles in the Early Pleistocene (representing slow, steady growth and decay of ice sheets) start from either the A1 or A2 condition.

The Late Pleistocene is characterized by discrete areas of high probability, suggesting rapid transitions between longer periods of stability. A2 represents an interglacial condition, and A3 to A6 represent separate metastable ice configurations of greater volume respectively. A6 represents a glacial maximum condition, as we experienced about 18,000 years ago.

Climate dynamics as inferred from global ice volume seems to have changed during the Pleistocene epoch. Was it innovation?

Opinions about what happened during the Mid-Pleistocene include changes in atmospheric CO2 leading to greater glaciations, cumulative cooling in the deep ocean changing the nature of the glacial-interglacial transition, erosive uncovering of crystalline bedrock leading to greater thickness of ice sheets, and spontaneous (chaotic) change. There is general agreement that there is no obvious external forcing or any fundamental change in the low-level dynamics leading to the change in climate behaviour, so it is at least possible to argue that the climate system began to act in an "innovative" fashion (provided we state that we do not view this innovation as having been directed in any way).

Let's look at another system instead--one represented by the share price of Century Casinos.

The chart of the daily closing price looks a little like my portfolio--up to a high in April, and all downhill from there.

The two-dimensional reconstructed phase space doesn't look much different from those of other stocks I've looked at in the past.

Actually, this has been smoothed a little, using a 3-point moving average.

There appears to be nothing interesting in the share price activity over the past year--unless we look at daily high prices instead of closing prices.

And here we see something unexpected--a singular spike in share price on June 21, where the share price bounced between about $3 and $8 several times over the day, on first a one-minute timescale, and around mid-day at a one-second timescale.

To investigate dynamics on this timescale, we have to construct our time-delay phase space with a small lag.

In two seconds of trading we have numerous fluctuations between $3 and $7. Lots of money to be made here! (or there would have been had the exchanges not cancelled all the trades).

A few minutes later we get this over one second.

This is orders of magnitude different from what we see in the annual behaviour of the stock, and even considerably different from the bowl of spaghetti above. This figure actually represents a phase space portrait of a random walk. Yes, you can trade randomly if you are quick enough.

So what is the difference between the trading in CNTY on June 21 and every other day this year? Another innovation--high-frequency trading, but in a form which creates the illusion of liquidity by placing lots of orders and then cancelling them as they begin to be filled. The resulting moves in a stock can be dramatic.

Suppose an institutional investor needs to buy a million shares of CNTY (perhaps part of some proprietary arbitrage position). The buyer looks at the depth chart and sees that there are a million shares being offered at $3, so the buyer attempts to fill the order--only to discover that he gets perhaps a thousand shares, the rest of the offer is cancelled, and there are now a million shares offered at $3.05. The tug-of-war may continue, but if the buyer is motivated, the share price may rise considerably in a remarkably short period of time.

Remember that the original intent of having a bid and ask price is that the various offerings were intended to be sold. The idea that these offerings would be used only as bait and not represent real liquidity is indeed innovative, but unhelpful.

Unlike the change in climate dynamics in the mid-Pleistocene, the change in dynamics in share price of CNTY is symptomatic of a fundamental change in the operation of the market, and this change is detrimental to the majority of its participants.

Saturday, October 22, 2011

Inference of dynamics for complex systems part 2

Phase space portraits

As we left our last installment we had the problem of a series of observations from some interesting system, and we were seeking a means of understanding it. First of all, however, we had some doubts as to whether the measurements we have made will tell us anything about the system, or whether there will be other information needed in order to make any useful inferences.

Approaches to studying dynamic systems include both qualitative studies of the general trends of a system and quantitative studies in which invariant properties of the system are evaluated [Abarbanel, 1996]. System dynamics are evaluated by reconstructing the system’s phase space, which is a geometrical representation of the system projected in a “space” created of different variables [Packard et al., 1980; Abarbanel, 1996]. The climate system can be described by a phase space with coordinates x1, x2, x3, . . . xn, and the functions x1(t), x2(t), x3(t), . . ., xn(t) (the outputs of the system). As time (t) varies, the sequential plot of points of coordinate {x1(t), x2(t), x3(t), . . ., xn(t)} describes the time evolution of the system in phase space.

The number of output functions (n) is called the embedding dimension [Sauer et al., 1991]. The evolution of the system is marked by the trajectory traced out by sequential plots of individual states with coordinates defined by the values of the n functions at each observed time. Describing the trajectory of the system as it flows through phase space is a qualitative means of characterizing the dynamics of the system. The system may also be characterized quantitatively in terms of its invariant properties, such as the Lyapunov exponents and the correlation dimension of the system, which can be calculated from the phase space portrait [Abarbanel, 1996].

Phase space from multiple time series

How do we select the coordinates? One method is to create a phase space by plotting scatterplots of several different records which have been sampled at the same time intervals. For instance, Saltzman and Verbitsky [1994] created a phase space using, as variables, ice mass, ocean temperature, and atmospheric CO2. The state of the system is defined by its location in phase space at a particular time. The plot of successive states through time traces out the trajectory of the system. Traditionally the trajectory is constructed by drawing a curved line, rather than straight line segments through the states in sequence.

The drawback with the Saltzman and Verbitsky approach in paleoclimate is that is difficult to find many records that have been sampled at the same intervals. You are restricted to the portion of the geologic record covered by the shortest record. Additionally, there are errors in both magnitude and time.

Let's not worry about interpretation yet. Today is only about basic methodologies.

Economic systems can quite profitably be studied using this approach, mainly because there are so many of them, the errors tend to be small (except see here), and the timing is usually well constrained as well. So we can compare US unemployment rate to interest rates, for instance.

Data from BLS site.

Commonly we might look at observations like the one above, and not draw the trajectory (the curve that runs sequentially through the data). Instead, a traditional approach might have been to draw a line of best fit in the hopes of defining a correlation. In looking at the above figure, we see two clusters of observations. Past experience tells us it is risky to define a line of best fit using the traditional methods in this way, as the result is heavily weighted by the line between the centres of each cluster.

Similarly we can look at the average duration of unemployment vs unemployment rate.

Data from BLS site.

Or unemployment rate (vertical axis) vs monetary measures.

Data from BLS and St. Louis Fed site.

Or house prices vs real interest rates.

Data from Shiller [2005].

Defining a phase space from multiple variables requires multiple records. The state space can only be characterized over the duration of the shortest record. Dating errors will lead to various forms of distortion in the projected phase space. The economic time series tend to lend themselves well to this form of projection, because many of them exist to any arbitrary level of precision. If you choose month-end or year-end prices, there are normally no dating errors.

Phase space from a single time series

It is pretty uncommon to have more than one geological time series of sufficient length with good dating control. So geologists will normally have to work with a single time series. The method below can similarly be used in other types of time series as well.

When you have one time series, you may wonder how much dynamic information it contains. Fortunately, ergodic theory suggests that dynamic information about the entire system is contained in each time series output from the system [Abarbanel, 1996]. Therefore, a phase space portrait reflecting the dynamics of the entire system may be reconstructed from a single time series.

Time-derivative method

Packard et al. [1980] propose a method in which the function is plotted along one axis, and its various time derivatives are plotted on the other axes. If we use the simplest two-dimensional case, the graph would consist of a scatterplot of the function against its first time derivative. (i.e. y vs. dy/dt). An example of such a plot appears on the masthead of the blog.

In the above figure, we see the ice volume proxy plotted on the horizontal axis (ice volume increases towards the right) plotted against its first derivative over an interval of time lasting about 120,000 years. The numbers on the graph represent the time in thousands of years before present (ka BP). The rate of change of ice volume is plotted with +ve on top, so that as global ice volume grows (near A, for example), the system will move towards the right through phase space.

Any equilibria in this type of figure must necessarily occur along the zero rate of change axis.

Note the error bars presented on some of the states. Similar error bars would be found at all other states in the figure as well. The error in estimating the rate of change is a consequence of the error in measurement being similar in size to the difference between successive measurements. The size of the error bars is large compared to the variability of some parts of the trajectory--consequently our confidence in this trajectory is not as great as it otherwise might be.

Time-delay method

We reduce these errors by reconstructing the phase space by the time delay method [Packard et al., 1980], in which the elements of a time series are plotted against n-1 lagged observations from the same series (figure 2B). Identifying the lags and the embedding dimension (n) are key decisions in the reconstruction. To simplify things in the following discussion we shall only use two dimensions. Thus we reconstruct our phase space portrait by a scatterplot of the data against a lagged copy of itself. The optimum lag is defined by the first minimum of the average mutual information function [Fraser and Swinney, 1986]; however for quasiperiodic data we find that this tends to be the first minimum of the autocorrelation function (about ¼ of the period of the dominant waveform).

Thus for ice volume:

Here we are looking at a two-dimensional phase space reconstructed from ice volume proxy data covering about 200,000 years. In this projection, lower glacial ice volume is at the lower left corner of the plot, with greater ice volume towards the upper right corner. We'll interpret these later. Moving on

Case-Shiller index

Official unemployment rate

Detour Gold Corp.

CNTY busted trades (1 s of trading activity each figure)

Gold-silver ratio in phase space

Dynamic systems, like climate, have historically been analyzed using power spectral methods, such as the Fourier transform and wavelet analysis [Hays et al., 1976; Imbrie et al., 1992]. This has been a reflection of the predominantly linear assumptions underlying early analytical methods.

The power spectrum is not an invariant property of a nonlinear time series [Abarbanel, 1996], meaning that significant changes may appear in the power spectrum despite the lack of changes in the dynamics of the system. Therefore, changes in power spectrum are insufficient evidence to infer changes in dynamics.

In our next installment we'll talk a bit about equilibrium and what any of the above plots have to say about it.

References

Abarbanel, H. D. I. (1996), Analysis of Observed Chaotic Data, Springer-Verlag, New York.

Fraser, A. M., and H. L. Swinney (1986), Independent coordinates for strange attractors from mutual information, Phys. Rev. A, 33, 1134-1140.

Hays, J. D., J. Imbrie, and N. J. Shackleton (1976), Variations in the Earth’s orbit: Pacemaker of the ice ages. Science, 194, 1121-1132.

Imbrie, J., et al. (1992), On the structure and origin of major glaciation cycles, 1, Linear responses to Milankovitch forcing, Paleoceanography, 7, 701-738, 1992.

Packard, N. H., J. P. Crutchfield, J. D. Farmer and R. S. Shaw (1980), Geometry from a time series, Phys. Rev. Lett., 45, 712-716.

Saltzman, B., and M. Verbitsky (1994), Late Pleistocene climatic trajectory in the phase space of global ice, ocean state, and CO2: observations and theory, Paleoceanography, 9, 767-779.

Sauer, T., J. A. Yorke and M. Casdagli (1991), Embedology. Journal of Statistical Physics, 65, 579-616.

Shiller, R. J. (2005), Irrational Exuberance, 2nd ed., Princeton University Press.

Wednesday, October 19, 2011

Inference of dynamics for complex systems, part 1

Today I will start over with the analysis of dynamic systems, describing a methodology and some of the rationale behind the interpretations from previous postings, as it occurs to me that all of this stuff, though discussed before, is buried in the archives and is not easy to pull together.

This will also be good for me as I have to put together some kind of paper on the topic for one or more conferences in the first half of next year. GAC, in St. John's next year, will be a given as it is my old alma mater, but I am giving thought to presenting at the upcoming 3rd Multiconference on Complexity, Informatics and Cybernetics.

You are studying an interesting system, with many components. You know that many of the components interact, but you don't know the details of their interaction. If the interactions vary with changing conditions within the system (feedback) it may be described as a complex adaptive system. Examples of such systems include, but are not limited to, ecosystems and other biological systems, the stock market and other economic systems, the climate system, and some would argue, the entire earth system.

The behaviour of such systems is typically nonlinear, and typically characterized by self-organization and emergent phenomena. The presence of negative feedback gives the system a form of resilience, allowing it to resist perturbations; and the presence of positive feedback causes the system to experience episodes of rapid change, usually resulting in a shift from one equilibrium condition to another. Multistability (the presence of more than one equilibrium condition) is a common feature of such systems.

The system has input signals, which may be time-dependent, however it may be that you are only able to observe some of these signals; furthermore there may be input signals of which you are unaware. There are output signals, which you observe, and compile into one or more time series; however there is no way to know if your output signal is important in terms of developing a global understanding of the system of interest.

There are conditions within the system which influence the manner in which the input signals feed through to the output signals. You may have an inkling of some of these rules (commonly expressed as differential equations) but normally your understanding of these rules is incomplete. You hope to understand your system by deducing these equations on the basis of your observations.

Here are some examples of systems we may wish to study.

Daily closing prices for Detour Gold Corp. (DGC-T), from late November 2009 to October 2011.

Gold-silver ratio.

Case-Shiller index. Data from Robert Shiller data page.

Unemployment rate (from US BLS site).

Trading activity in Century Casinos, June 21, 2011. From Nanex.

Paleoclimate proxy records over the past two million years. Magnetic susceptibility of loess (proxy for Himalayan monsoon strength) at top. Deep water 18-O record (proxy for global glacial ice volume) at bottom.

At first glance, the problem seems insurmountable. How do you study a system when you can't even be sure that your observations are meaningful? What if you have failed to observe the most important observable parameters?

It is especially bad for the geological time series, for in addition to the above problem, there are both errors in measurement and errors in the date (or time) of each observation.

In future installments, we will work through the data sets shown above; but we will start with some thoughts on equilibrium.

Friday, September 30, 2011

Recognizing change in complex systems part 3: Unemployment and real interest rates

This post is for those who still think that lower interest rates will lead to lower unemployment.

Information comes from Bureau of Labour Statistics and the Fred. Strangely, the historical data from the BLS does not match the data downloaded from the same site some months ago for previous posts on this topic--the differences are about 0.7% (i.e., the recent correction reduced the unemployment rate by 0.7% for December 2010).

The scatterplot of real interest rates (which is calculated by subtracting the official inflation rate calculated from CPI data for all urban consumers including all items--annualized and smoothed through a 3-pt MA--from the 3-month treasury yield) against unemployment rate shows two distinct areas of Lyapunov stability in phase space. These are separated by a brief (four month) excursion into relatively high real interest rates. The lower-unemployment region of phase space is occupied from January 2001 until August 2008.

Notice that there is no discernable correlation between unemployment rate and interest rate. I recognize that this observation based on possibly manipulated data sets is at odds with the axioms of Keynesian economics and therefore should not be discussed.

The system experienced a bifurcation in late 2008. When real interest rates fell in late 2008, unemployment unexpectedly rose and the system settled into a new area of stability, where it has remained since.

The policy of frantically lowering interest rates has failed to bring down unemployment because of a fundamental change within the economic system. Continuing to hold interest rates low will not undo the irreversible change that occurred in 2008. It might be a good thing to spend some effort on understanding the dynamics of the economic system rather than continuing with actions based on axioms that are clearly at odds with the actual universe.

I recognize that the idea of the economic system undergoing fundamental changes to its dynamics is at odds with the axioms of Keynesian economics and therefore should not be discussed.

Feedback is a common feature of dynamic systems. In certain dynamic systems, there are areas of phase space where the system is dominated by negative feedback. Perturbations to the system are resisted. If the perturbation is large enough, however, the system may enter a state wherein positive feedbacks are dominant, in which case the system evolves rapidly through phase space until it arrives in (usually) a new area of phase space, where once again negative feedbacks dominate and the system regains some form of stability. These areas of stability are sometimes described as attractors, but for reasons discussed previously, we prefer to describe them as areas of Lyapunov stability.

A similar change is observed in the plot of unemployment duration vs real interest rates, once again covering the period from 2001 to present. Notice that the average duration of unemployment actually shows no correlation with real interest rates.

Observing the change is easy (if we disregard Keynesian axioms). Deducing the nature of the change is more difficult.

One observation that leaps out at me is this. Real interest rates fell to an extreme low in August 2005, followed by an extreme high in October 2006. They fill to an extreme low in June 2008, and rose to an extreme high in November 2008. In the first case, there were no dire effects on unemployment. But the second time around, we got a bifurcation.

Is the answer here?

House prices were still rising in late 2005. They were falling in late 2008. Perhaps a fluctuation in interest rates when people believe they are becoming more wealthy is not harmful, but one that occurs during a time when our perception of wealth is falling led to a massive loss in confidence. Or at least a sudden realization that we couldn't afford all this debt.

If the change in economic dynamics is caused by a sudden negative perception of debt, then manipulating the interest rates downward will not and cannot bring us back to a paradise of low unemployment. Particularly if it is accompanied by declines in the Case-Schiller index and the stock market.

Tuesday, September 27, 2011

Recognizing change in complex systems: excursions vs. bifurcations

Continued from last time.

Once again, we have a fifteen-year plot of gold/copper ratio vs. silver/rough rice ratio. We are continuing our discussion of whether the event labelled C, which is still unfolding, is likely to be an excursion (which will then return to the region populated by most of the graph) or a bifurcation (which will lead us to a new area of Lyapunov stability somewhere new in phase space).

It appears to be at least a once-in-a-generation event. But how significant is it?

For this spike to represent a bifurcation as opposed to an excursion, what we would have to see the function settle in around a new area of phase space. Looking at the shape of the spike, the likely area for such a new area of stability will be centred near the "C", and could extend from the far right of the graph to the kink in the curve near (350, 1.5). For this to form with any degree of satisfaction is likely to take about two more years. If we don't see any sign of orbiting about the "C", then the most likely outcome is a return to the 15-year area of stability.

I found it interesting that on this graph, the silver spike of 1998 does not appear. It is lost in the middle of area of stability.

Is it possible that the late spike in silver is due to its comparison to a soft commodity, which perhaps haven't performed as well as metals? We can check this by changing our ratios slightly.

Same four commodities--but compared differently. This time we are looking at the silver($/oz) to copper ($/lb) ratio against gold to rough rice. There are three significant excursions, labelled A, B, and C.

Excursion A represents the spike in silver price in 1998 due to the Warren Buffet purchase. Excursion B is the rise in price in copper in 2006, which exceeded (in percentage terms) the gain in silver which occurred at the same time.

Excursion C has two phases. For the first six months or so, the excursion is dominated by increased gold prices (compared to rice), and for the last 13 months or so, the excursion records the outperformance of silver.

Once again, we won't be sure that this is a bifurcation as opposed to an excursion unless the phase space settles somewhere near or above the "C" for at least another two years.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

It is obvious gold has not behaved well lately. Some of the stocks have been hit as well. We haven't looked at Detour Gold Corp. for awhile.

The recent price action has mirrored that of gold, with a sharp rise from the end of July and a sharper decline in the past few trading days.

The two-dimensional reconstructed phase space of the last eight months of the price of DGC-T.

The phase space portrait occupies a Lyapunov-stable area at around $30 for about six months, and from the beginning of August, the system evolves towards the $37 area. The system completes an orbit in the $37 range before leaving the area. This behaviour suggests that there may be another LSA in the $37 range, but I would feel more comfortable with it after a few more orbits.

What happens next? The two most likely scenarios would appear to be a return to the $30 LSA, or a return to the incipient LSA in the $37 range. Being long Detour, I would be encouraged by a reinforcement of the $37 LSA. Unfortunately, this outcome is nearly impossible. The reason is in the nature of the construction of the graph. Recall that the phase space reconstruction is generated by the price at closing of a particular day (on the vertical axis) against the closing price four trading days earlier (horizontal axis).

The coordinates of the last point are ($37.60, $31.00). In four days, the value of the x-coordinate is going to be $31 (today's closing price). In three days, the value of the x-coordinate is going to be $31.17 (yesterday's closing price). I don't know what the closing price of Detour will be on those days--but if it is at about the current price, then the state will lie within LSA30. The only possible hope for the Detour price state to reach the orbit at $37 would be for the price to recover to about $37 within the next three trading days--by Friday. Even then the state would not lie within LSA37, but at least the trajectory would indicated that that would be the likely outcome.

In fact, if we don't see some reassuring action tomorrow--when x falls to $33.60--we will be right on the outer edge of LSA30.

There is another, less desirable outcome. The price may continue falling to the previous LSA near the $23 level. Doh!

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Time to liquidate more cocoa. Sadly, the good Ghanaian stuff is all gone. I am forced to take the South American stuff. But I can at least sweeten it a little with this.

Saturday, September 24, 2011

Recognizing change in complex systems

"So I came downstairs and was surprised to find the dog reading the paper."

Such is the beginning of a typical shaggy dog story. If the story is true to form, the punchline would be something like the dog has been getting his news from the internet for years, or some such. If this were a true story, it would be that the dog is usually sleeping when I come downstairs. In such a case, I would presumably notice right away that the dog's behaviour was unusual.

One of the outstanding problems of complex dynamic systems is recognizing (hopefully in real time) when a change in behaviour is occurring.

Today we look at the long (ish)-term behaviour of some commodities with respect to one another. I haven't thought much about their economic significance. Today will just be arm-waving at charts, with a view to see if we can recognize the presence of change in the market fundamentals.

First up--some soft commodities. Below we see the ratio of cocoa prices to rough rice (contracts as defined in 1996).

The principal impact on the graph is from the first Ivoiran civil war. I find it interesting that the second war didn't really impact the price as much. Possibly this is because of rapidly increasing Ghanaian cocoa production. According to Voice of America, Ivoiran production is doomed due to years of under-investment.

The reconstructed phase space appears below. I have used the time delay method. To construct this figure I have smoothed the above data set through a 3-pt moving average filter, as the unfiltered data looked really noisy.

The fifteen years of data are confined to a comparatively small region of state space, with the only interesting feature the large excursion during the first civil war. This event lasted about two years. Note the magnitude, and more importantly, note the outcome--the system reverted back to the same area of phase space from which it began.

The second civil war by contrast is scarcely noticeable.

There are other ways to contruct phase space portraits. Instead of reconstructing them from a single time series we can build them from scatterplots of two (or more) time series. In studying geological systems I try to avoid this technique because it is difficult to build two time series of equivalent length and similar sample rates. For commodity time series this is less of a problem.

So let's go all out and plot the gold/copper ratio against the silver/rough rice ratio (all month-end closing prices).

IIRC, I have multiplied the silver price by 100 in the above chart. Silver and gold used $/oz, copper measured in $/lb.

Notice how for most of the fifteen year record, the states all plot within a relatively large oval (let's call this an LSA) within phase space. There have been three significant deviations from the LSA over the past fifteen years. The excursion marked A represents the strength in gold during the 2008-9 global meltdown. The excursion marked B is a rise in tandem of both silver and copper during the year 2006.

The last excursion (C) represents the recent rise in silver. This ongoing excursion appears to have lasted 17 months so far. The multi-billion dollar question is whether this is an excursion expected to revert to the LSA, or has a bifurcation occurred, with the system evolving towards a new LSA somewhere else in phase space.

Refer once again to the phase space plot of cocoa/rough rice.

Bifurcations and excursions are both rare events. The excursions occur on at best a decadal scale. Bifurcations are more rare. Clearly they have happened in the past. For instance, the long-term gold/silver ratio was approximately 16 for centuries, but has not been near this ratio for several decades. A bifurcation occurred in the last century sometime. Perhaps we are in one now. Either way, I think it might be prudent to stock up on rice.

Disclosure: long gold, long silver, long copper, long cocoa, long rice. Sadly, I was forced to liquidate some of my cocoa holdings in the recent market turbulence.

But it was comforting.

Update (April, 2012) - I overlooked this story in the explanation for the rising price of cocoa in 2010. It seems to fit in with the first peak in the cocoa/rice ratio in 2010.

Friday, June 24, 2011

Deconstructing algos, part 1

The third part of the series on information theoretic methods of analysis for dynamic systems is taking longer than anticipated. Crunching the numbers is killing me. So I'll take a break from it and look a little farther forward--how we can use the methods I have been describing so far to forensically examine the algorithms used in various high-frequency trading events of the recent past.

As seen on Nanex and Zero Hedge, there has recently been a lot of strange, algorithmically driven behaviour in the pricing of natural gas and individual stock prices on very short time frames. In an earlier article I pointed out that the apparent simple chaos we observe in the natural gas price appeared to be an emergent property of at least two duelling algorithms.

In this series of articles we will begin analysis of the algorithms involved. Today's discussion will mostly focus on framing the issues that must be addressed in order to study unknown algorithms on the basis of their time-varying outputs. Future articles will present results from the various analyses.

We begin by looking at the activity in the natural gas price on June 8, 2011:

Let us also consider the pricing action in CNTY on June 21, 2011:

In both of these examples (many more such examples exist) there are three time series of interest to us--the bid price, the ask price, and the prices of trades. Additional information which may also be of use are such things as volume, size of bids, size of asks, and so on. In principal both the bid and ask prices form continuous series which are prone to instantaneous changes. The actual trades form a discontinuous time series with obsrevations at irregular intervals.

We don't have access to the code involved in these algorithms--nevertheless, we can learn something about the computational processes involved, within certain limitations. Unfortunately, just as is the case in studying time series recorded in rocks, we have to make some assumptions, and the validity of our assumptions goes a long way towards predicting the success of our endeavours.

Our first assumption is that the bid price and the ask price are being set by competing interests. This assumption is extremely important. It is possible that the bid and the ask are both being set by a single entity, or by two closely related entities who are using them to manipulate the natural gas price. We will go though in some detail the reasoning behind our assumption that there are competing interests involved below.

Secondly, we are approaching this problem assuming that prices are set and changed discontinuously in time rather than continuously in time. Subtleties of this assumption are discussed in the introduction of Bosi and Ragot (2010).

The methodologies we will explore are as follows:

Cross-correlation of the bid and ask series over selected windows. We choose limited time intervals rather than the entire record because we expect that each series will sometimes lead and sometimes follow. Peaks here will show whether one of the series leads or trails the other consistently or whether each one leads intermittently, which would support the idea that these are distinct dueling algorithms. It seems likely that the bid price will lead as both are declining, and the ask will lead as both are climbing. We should test this hypothesis.

One goal of this analysis will be to see if we can detect trigger points, where one stops following and begins leading. We will locate the times and see if the trigger can be identified, which is only likely if the trigger is some change in either price series, the price of a trade, the volume of a trade. Unfortunately, many other triggers are possible, and it may not be possible to identify them if they are, for instance, a random number generator seeded by, say, the thousandths-of-a-second digit at the instant of some distant event like the first pitch of a Yankee's game or when the secretary in the front office misspells 'the'.

Phase space reconstruction--the relevant time series (bid prices, ask prices, trade prices) each represent one-dimensional data sets. If the algorithms used can be visualized in higher-dimensional phase space, we may be able to reconstruct the overall architecture.

The advantage of this approach is that in principle the dynamics of the system will be contained no matter which output of the model we use. We only have measurements of the bid price, but have no idea what other outputs are generated by the same algorithm, even if these unknown outputs are critical to the decision-making module of the algo. The reconstructed phase space

The difficulties here are that 1) the function may change from leader to follower so quickly that the resulting trajectory through phase space is too short to interpret; 2) there may be multiple players on both the bid and ask, meaning the reconstructed trajectory through phase space is an amalgamation of two or more different functions, the instant of joining of which may be impossible to determine; and 3) it may prove impossible to properly define windows for the data, again creating an amalgamation in phases space of two or more different functions.

Epsilon machine reconstruction--We will need to try to identify the actual "work" done by these programs. How do they decide on a price? How do they "decide" to drop or raise their offer? Do they change? How are we to recognize when an algorithm changes its behaviour when all we have to deal with is the output? Can we recognize when the structure of the computation involved in the decision-making part of the algorithm changes, given our extremely limited knowledge of that structure?

These questions may be addressed using the ε-machine reconstruction approach suggested by Crutchfield (1994). The objective of this approach is to use an open-ended modeling scheme to describe the computational structure objectively, so that different practitioners working on the same data will come up with similar (hopefully identical) constructs. By encouraging an heirarchical architecture of undefined complexity, the method allows investigators to identify changes in behaviour of the the system.

This particular approach is built around discrete computation, so is amenable to data which are discrete rather than continuous in time. We assume that the discrete outputs (the time series, or stream of values) is the result of a computational process which is knowable. The data have to be organized, and (this is the key) repeated states are identified. It is possible that these states will be identified from the reconstructed phase space portraits above; alternatively they may be be defined by particular observations. These states may be identified as key strings of data, or may be recognized in complex functions by reconstructing the state space in a higher dimension. The ordering of the states is significant, as the state that appears first before another particular state is referred to as the predictive state, and the following state is the successor state.

The ε-machine is constructed by identifying all the predictive and successor states and calculating the probabilities of all of their observed relationships. If more than one ε-machine is inferred, the sequence of these first-order ε-machines can be used to build a higher-order ε-machine. Given sufficient data, you may construct ε-machines of arbitrary order.

Information theory--as seen in recent articles, information theory may be used to characterize the complexity of the ε-machine reconstruction and the probability density. The yet-to-be completed third part of that series concerns methods of using information theory to find the optimum window length for creating a probability density plot of the reconstructed phase space. The subsequent parts of this series will concern itself with the analyses described above on the nat gas and CNTY algos, as well as others as they are found.

Given the limitations of time and computing resources, I can't guarantee a timeline. I regret that my speed of analysis is six or seven orders of magnitude slower than the incidents in real time.

Friday, June 10, 2011

Flash crash the nat gas!

As shown on posts on Zero Hedge, (correction, these are originally from Nanex) there have been some bizarre patterns in the trading of natural gas in the past couple of days. The charts below are from the evening of June 8, 2011 and come from the first of the two articles linked above:

At first glance this looks like nearly perfect chaos.

The last figure represents a one-dimensional projection of a Lorenz butterfly curve, shown in its glory below.

In reality the trading data isn't as nice as it first appears. There has been a bit of playing around with the time axis on the first two plots. I have subsequently digitized the data with trades at half-second intervals (but I'll outline the caveats below).

The axis on the bottom is time in seconds, starting from 19:40:37 on 08-Jun-11. The digitization is at half-second intervals, because the analysis below requires evenly spaced data. There were some difficulties, however. There was not always a trade right on the half- or full-second mark. Frequently the two nearest trades on either side of the desired time interval were at the same price so that it would be reasonable to use that price. Sometimes, however, the two nearest prices were quite far apart--for these I used a midpoint between bid and ask at the moment--however arguably this is not a price, particularly when we observe that many trades took place either above the ask or below the bid. Additionally, there are intervals where the midpoint between the bid and ask is actually undefinable, as during the interval from 19:42:04 to 19:42:06 where the bid price fluctuates in a complex fashion while the ask remains constant. So there are risks in this analysis.

The two dimensional lagged state space (using a 2-s lag to minimize mutual information) looks as below:

Not quite as beautiful as the Lorenz butterfly curve. I smoothed the data through a 3-point moving average filter as the original really looked like hell. The sinusoidal waves that slowly increase in size are reflected in the two-d state space as a spiral, drawn from the centre outwards, until the curve flies off into a new area of phase space.

The conventional dynamic explanation of the nat gas trading curve would be of a system in equilibrium--but the equilibrium is unstable, and the wobbles get progressively larger until the system shifts to a new equilibrium. Such a dynamic interpretation is incorrect. The Lorenz equilibrium is all equilibrium. What we perceive as a sudden shift in equilibrium is actually part of the equilibrium state.

Disequilibrium--it's new equilibrium.

From a trading perspective, the nat gas trading is more straightforward. Someone is able to profit from variability. The proper positioning of puts and calls may allow you to benefit by a large move (either up or down) in the price of a stock or commodity. After that, you can try to hit any buy or sell stops if you are able to drive the price up or down, (or in this case, both ways, until the sell stops were hit triggering a cascade in price). If there had been more buy stops, then the price would have melted up. It was simply a matter of luck that the price melted down.

As in all markets there are two (or more) participants. Leaving the actual trades for a minute and scrutinizing the fluctuations in the bid and ask, we see a complicated psychological game being played. I see many of the tricks that I have seen in thinly traded gold stocks priced by a market maker. You see the bid (or trade price) creep up towards the ask price, then run away, perhaps hoping to draw the ask price down.

But the time frame is completely different. I used to see this play out over the course of a trading day (my favourite was when I would get a partial fill of, say 500 shares of some penny stock before the price would swing away, and could imagine the market maker saying "do you want to pay full transaction fees for a $100 sale, or are you going to meet my price and fill your order?"). In this case, these games are being played on a split-second scale.

Game theory has been digitized and is running on a level of complexity that leaves TIT FOR TAT in the dust. Unfortunately my textbook on the subject is being used by my seven-year-old, so I can't get into strategy games between different players.

Many of the price rises happen while trades are occurring above the ask price. How does that happen. Is the asker crossing transactions with fictional entities to lure the buyer? Similarly, many trades occur below the bid price as the price is falling. Are these fictional crosses? Are these real trades or just a gimmick to lure in another party (hurry up and buy--look how fast price is falling!).

The chaotic appearance of this function is simply an emergent property of the gaming algorithms.

Lastly, I will point out that my analysis is roughly seven orders of magnitude slower than the frequency of some of the trades.

Dust flux, Vostok ice core