The World Complex: Epsilon machines

Showing posts with label Epsilon machines. Show all posts

Wednesday, April 2, 2014

From the small to the big: earthquakes, avalanches, and high-frequency trading

I've been talking about scale invariance a lot lately. I became interested in the topic quite a few years ago in the context of geological phenomena like earthquakes and avalanches. The Gutenberg-Richter law describing the size-frequency relationship for earthquakes was one of the first natural laws based on scale invariance, but interest in the topic really picked up with the Bak et al. paper in 1987 (pdf - may only be a temporary link).

The cause for this relationship is still foggy, as is the physical mechanism between the small and large earthquakes. The best proposed explanation is that the scale-invariant distibution of events allows for the most efficient flow of energy (and information) through the system (but it isn't clear why that should be so).

So back in the early '90s I was estimating recurrence intervals estimates for certain hazardous events and I started trying to work out a methodology for detecting scale invariance in the geologic record. Using the Gutenberg-Richter Law, you can estimate the likelihood of a large earthquake in an area based on the number of small earthquakes. There were interesting implications for areas where the recurrence interval of large earthquakes is longer than the local recorded history (as in much of Canada). At the time, there were seismic hazard maps produced by the USGS which showed significant earthquake risk in zones which mysteriously ended right at the Canadian border.

One of my classmates in my undergrad days (we're back in the 80s, now) studied the correlation between microquakes and fluid injection at oil extraction operations in southwestern Ontario. The oil companies were surprisingly cooperative until they understood the point of the research, after which they started to withhold data.

And here is the mystery. The principle of scale invariance in earthquakes would suggest that increasing the number of small earthquakes should increase the number of large earthquakes at least in the short term. Yet our understanding of the dynamics of earthquakes tells us that lubricating the fault should allow stresses to be relieved through the small earthquakes, which in the long-run should reduce the chance of a large quake in the longer term. (This idea has been proposed at various times over the past fifty years, but for various obvious reasons, it has never been deliberately pursued).

By the early 2000s, other geophysicists (notably Didier Sornette, but there were others) had moved a portion of their data processing expertise into studying econometric time series. I made this move later as I gradually came to appreciate the key problem with developing quantitative techniques when the data were suspect. First of all, the measurements themselves are inaccurate. More importantly, our estimate of the timing of each observation was just that--an estimate. Most quantitative methods assume that the observations are evenly spaced in time. Failing that, they assume you know the timing of your observations. The consequences of errors in the timing are terrible, and frequently underestimated. The point is that it is difficult to develop excellent quantitative methods when the data are terrible.

The big advantage of working with economic time series--pricing data, in particular, is the elimination of the observational errors. When a transaction occurs, there is no doubt about either the price of the time--right down to the millisecond scale.

I started looking at market macrostructure--because (several years ago) nothing interesting ever happened on a scale of less than about an hour. Until just the past few years. Suddenly, strange, rich, unusual behaviours began to occur in individual stock prices, and even indices, on the millisecond scale. I didn't know what was causing it--but it sure was interesting.

Three seconds on the tilt-a-whirl.

This was the signature of onset of HFT. I was initially interested in it for entirely different reasons than most of you. After Crutchfield's (1994) paper (pdf) on emergence, I had been pondering the idea of how to recognize a fundamental change in a complex system. Again, my interest was in the earth system as a whole, and how to recognize whether or not new observations were pointing to a fundamental change in its mode of operation.

Given our understanding that the number of large avalanches is positively correlated to the number of small avalanches, it seems pretty clear that (as Nanex and Zerohedge has been saying) the damaged market microstructure is mirrored in the increasing number of flash crashes since Reg NMS. Unfortunately, our murky understanding of how the microstructure causes the macrostructural changes can be used by the regulatory authorities to avoid investigation. They can't see a smoking gun.

We would normally expect the micro-crashes to eventually relieve imbalances in the system, improving its long-run stability. (Perhaps this is how the SEC justifies the practice). But unlike earthquakes and avalanches, these uncountably many small crashes are not reducing the imbalances. One reason is that the cause of the imbalances is separate from HFT--the dollars keep being shoveled to the top of the mountain as fast as, if not faster, than HFT brings them cascading down. Another reason is that the trades (mostly) get unwound--so the exchanges push most of the snow back to the mountaintop after the avalanche.

HFT certainly benefits unfairly from the system, but isn't responsible for it. If anything, it is a symptom of corruption--but the cause of the corruption is elsewhere.

Accordingly, my modest proposal for dealing with HFT is this--nothing. Don't bust trades--let them stand. I'd be curious to see the response of the various Ivy-League endowment funds and pension funds when they suffer brutal, near-instantaneous, multi-billion-dollar losses. At a guess, I would probably hear the screaming up here. How would real companies, producing real products, react to a sudden monkey-hammering of their stock price, especially if it triggered debt covenants? Maybe they would all exit the market en masse. It might even force a real change.

Friday, June 8, 2012

Innovation in earth systems poster presentation

Presented at GAC in St. Johns early last week.

It looks like it can't be enlarged in this format. Later I'll try breaking it up into different panels and posting separately. The 3-d projection at lower right will be the hardest to present.

Update:

This is just the long 3-d image set at original size. Not sure about your best viewing options.

The main issue I note is that after trying to trace particular attractors through all of the different windows all at once, my interpretation of the number of areas of Lyapunov stability is a little different than when I looked at them sequentially. I now think that the ice-minimum attractor in the Early Quaternary is the same as that in the late Quaternary, even though the range of O-18 values represented in it has drifted over the last two million years. In earlier postings I had interpreted these as different attractors on the basis of the different values.

Secular drift in the positions of attractors in phase space speaks to slow changes in governing parameters of the climate system. On the scale of two million years, likely candidates are tectonic uplift (particularly around the North Atlantic), strengthening of the modern oceanographic circulation system after the closing of the Panamanian isthmus, drawdown of atmospheric CO2 due to enhanced erosion of uplifted highlands--at least these occur to me off the top of my head.

Monday, March 12, 2012

Constructing an epsilon machine--a worked example, part 1

Started feeling a little under the weather just as the madness of PDAC settled down. Really busy over the past few days.Fell asleep feverish and had this world-transforming idea of an entirely new kind of testable hypothesis. I was convinced that I had something about which papers--no, entire scientific volumes--would be published. Unfortunately when I awoke it was gone, and the parts that I remember (it seemed to involve measuring the girth of everyone who left a particular poster session at which I would present--that and a great deal of cotton stuffing) do not seem promising. But I'll work on it and let you know how it goes.

In the meantime I want to work in more detail on concepts that I have just glossed over--epsilon-machine (which may unfortunately be confused with machine-epsilon--I do wish these researchers would do a search before naming something) reconstruction from paleoclimatic data. I have earlier posted here and here on this topic, and have presented here at GAC a couple of years ago.

We will be working with the ocean condition probability density plots shown here, based on data presented in Raymo et al. (2004) and methodologies described in Gipp (2001) (also here).

Probability density plot. The probability density portrait was computed over the interval 1299 ka to 1029 ka. The two ellipses are preliminary interpretations based on the contours of probability density--I inferred limit cycles in addition to the single LSA labelled O4. The lower limit cycle I have called LC12, and the upper one is LC23.

Now let's add some of the reconstructed phase space portrait.

This is an enlargement of the above plot. The green curve is the reconstructed phase space over the interval 1119 ka to 1059 ka (so 60 000 years). At 1119 ka, the system is in the lower limit cycle--it then moves into the upper limit cycle where it stays until at least 1059 ka. The little yellow blobs represent direct observations (at 3 ky intervals). The green line is a curve fitted through the points, as suggested by Abarbanel (1997). We record this as a transition from LC12 to LC23. Over the entirety of the record--or over a distinct subset--we will see many transitions. These will be counted and enumerated to produce our epsilon machine.

References

Abarbanel, H. D. I., 1996. Analysis of Observed Chaotic Data, Springer-Verlag, New York.

Crutchfield, J. P., 1994. The calculi of emergence: Computation, dynamics, and induction. Physica D, v. 75, 11-54.

Gipp, M. R., 2001. Interpretation of climate dynamics from phase space portraits: Is the climate system strange or just different? Paleoceanography, v. 16, 335-351.

Raymo, M. E., Oppo, D. W., Flower, B. P., et al., 2004. Stability of North Atlantic water masses in face of pronounced climate variability during the Pleistocene. Paleoceanography, v. 19, 13p.
doi: 10.1029/2003PA000921.

Thursday, January 19, 2012

Why geology (also economics) cannot be a branch of physics

I have been working my way through a new paper by Ellison et al. (2012), from the same group that brought us epsilon machines.

In this paper the authors investigate systems which lie between those governed by classical mechanics (which are reversible, and both their past and future are easily divined from present observations assuming we know the equations of motion) and thermodynamic systems (the past of which, once in equilibrium, are fundamentally unknowable); which could be to say, systems which lie between physics (undergraduate, at any rate) and chemistry.

Systems for which our observations are incomplete, and which are characterized by extreme sensitivity to initial conditions are the focus of the paper. These systems are stochastic.

At this my ears (or maybe my eyes) perk up--we have been looking at just these sort of systems in the course of this blog.

The key insight in the paper is the potential for irreversibility in such systems. Irreversibility, in this case, means that the computational effort of prediction (forecasting the future) and retrodiction (modelling the past) are not equivalent. For these systems, it is insufficient to characterize the forward time-evolution of the system--one must also characterize the backward time-evolution as well; otherwise its description is incomplete.

These characterizations are built through Markov chain analysis, leading to an epsilon-machine construction. A reversible process is one in which both the forward-evolving and backward-evolving epsilon machines are identical.

In an earlier post, I constructed epsilon machines for early Quaternary paleoclimates using as predictive states the high-probability ice volumes A1, A2, and A3, from probability density plots like the one below:

Probability density for reconstructed phase space portrait of global ice

volume proxy from 1700 to 1550 ka.

In the above diagram, A1 represents an interglacial condition (much less ice than at present), A2 was an intermediate glacial condition (comparable to what we call an interglacial state today), and the maximal glacial state A3, which here in the late Quaternary would only be considered a mild glacial state.

These states were defined from the probability density plots of reconstructed phase spaces way back when I was still using a window of length 270 ky. More recently I have reconstructed phase space for global ice volume with a window length of only 150 ky, but haven't transcribed all the state changes yet.

In retrospect, I could have placed the border between α1 and α2 in a different place than I did in the figure above. As a demonstration, I shall leave things as they are.

Of the three epsilon machines depicted, α2 and α3 are reversible, whereas α1 is not. The sequence of states in α1 was as follows: A2-A3-A1-A3-A2-A3-A2-A3-A2-A1. The probability computed for the arrow from A2 to A3 is the probability that A3 occurs given A2; which we find by observing all occurrences of A2 (4) and counting the number of them that are immediately followed by A3 (3), for a probability of 0.75. The other probabilities are calculated in a similar fashion.

To time-reversed epsilon machine is constructed the same way, but with the states in reverse order: A1-A2-A3-A2-A3-A2-A3-A1-A3-A2. The structure will appear to be the same, but the probabilities of each transition will differ.

The tilde (~) over the α1 tells us that this is the time-reversed epsilon machine. We note that it is not the same as α1.

The Mid-Quaternary epsilon machine appears to be irreversible.

The general structure of the forward and time-reversed version of α4 are similar. The Markov chain was short, beginning with A1 and ending with A4. The reversed version, therefore, has one more A4 and one less A1 than the forward version, and it is this difference that explains the changes in probabilities for the different transitions. Hence, α4 may actually be reversible.

The late Quaternary epsilon machine appears to be irreversible.

We recognize these as being irreversible because the backward evolving epsilon machine is different.

The goal is to then combine the forward and reverse models into a single bidirectional model (Crutchfield et al., 2009; Ellison et al., 2009). I haven't figured out how to do that yet.

Consider for a moment the reconstructed phase space portrait for the Case-Shiller index of house prices.

Defining causal states by a similar process as for the global ice volume proxy, we would see something like "BAB?"--with the question mark referring to the ongoing excursion generated by our friends at the Federal Reserve. It isn't really a long enough string to do any interesting Markov chain analysis or to construct an epsilon machine. We just haven't seen enough Fed intervention* to devise a predictive model.

Well, my homework now is to figure out the business of constructing the bidirectional model, and how to calculate the difference in entropy between the forward and backward process.

*On the other hand, I think we have already seen quite enough.

References

Crutchfield, J. P., C. J. Ellison, and J. R. Mahoney. Time's barbed arrow: Irreversibility, crypticity, and
stored information. Phys. Rev. Lett., 103(9):094-101, 2009.

Ellison, C. J., J. R. Mahoney, and J. P. Crutchfield. Prediction, retrodiction, and the amount of information
stored in the present. J. Stat. Phys., 136(6):1005-1034, 2009.

Ellison, C. J., J. R. Mahoney, R. G. James, et al., 2012. Information symmetries in irreversible processes. on arxiv (waiting for official publication)

Friday, June 24, 2011

Deconstructing algos, part 1

The third part of the series on information theoretic methods of analysis for dynamic systems is taking longer than anticipated. Crunching the numbers is killing me. So I'll take a break from it and look a little farther forward--how we can use the methods I have been describing so far to forensically examine the algorithms used in various high-frequency trading events of the recent past.

As seen on Nanex and Zero Hedge, there has recently been a lot of strange, algorithmically driven behaviour in the pricing of natural gas and individual stock prices on very short time frames. In an earlier article I pointed out that the apparent simple chaos we observe in the natural gas price appeared to be an emergent property of at least two duelling algorithms.

In this series of articles we will begin analysis of the algorithms involved. Today's discussion will mostly focus on framing the issues that must be addressed in order to study unknown algorithms on the basis of their time-varying outputs. Future articles will present results from the various analyses.

We begin by looking at the activity in the natural gas price on June 8, 2011:

Let us also consider the pricing action in CNTY on June 21, 2011:

In both of these examples (many more such examples exist) there are three time series of interest to us--the bid price, the ask price, and the prices of trades. Additional information which may also be of use are such things as volume, size of bids, size of asks, and so on. In principal both the bid and ask prices form continuous series which are prone to instantaneous changes. The actual trades form a discontinuous time series with obsrevations at irregular intervals.

We don't have access to the code involved in these algorithms--nevertheless, we can learn something about the computational processes involved, within certain limitations. Unfortunately, just as is the case in studying time series recorded in rocks, we have to make some assumptions, and the validity of our assumptions goes a long way towards predicting the success of our endeavours.

Our first assumption is that the bid price and the ask price are being set by competing interests. This assumption is extremely important. It is possible that the bid and the ask are both being set by a single entity, or by two closely related entities who are using them to manipulate the natural gas price. We will go though in some detail the reasoning behind our assumption that there are competing interests involved below.

Secondly, we are approaching this problem assuming that prices are set and changed discontinuously in time rather than continuously in time. Subtleties of this assumption are discussed in the introduction of Bosi and Ragot (2010).

The methodologies we will explore are as follows:

Cross-correlation of the bid and ask series over selected windows. We choose limited time intervals rather than the entire record because we expect that each series will sometimes lead and sometimes follow. Peaks here will show whether one of the series leads or trails the other consistently or whether each one leads intermittently, which would support the idea that these are distinct dueling algorithms. It seems likely that the bid price will lead as both are declining, and the ask will lead as both are climbing. We should test this hypothesis.

One goal of this analysis will be to see if we can detect trigger points, where one stops following and begins leading. We will locate the times and see if the trigger can be identified, which is only likely if the trigger is some change in either price series, the price of a trade, the volume of a trade. Unfortunately, many other triggers are possible, and it may not be possible to identify them if they are, for instance, a random number generator seeded by, say, the thousandths-of-a-second digit at the instant of some distant event like the first pitch of a Yankee's game or when the secretary in the front office misspells 'the'.

Phase space reconstruction--the relevant time series (bid prices, ask prices, trade prices) each represent one-dimensional data sets. If the algorithms used can be visualized in higher-dimensional phase space, we may be able to reconstruct the overall architecture.

The advantage of this approach is that in principle the dynamics of the system will be contained no matter which output of the model we use. We only have measurements of the bid price, but have no idea what other outputs are generated by the same algorithm, even if these unknown outputs are critical to the decision-making module of the algo. The reconstructed phase space

The difficulties here are that 1) the function may change from leader to follower so quickly that the resulting trajectory through phase space is too short to interpret; 2) there may be multiple players on both the bid and ask, meaning the reconstructed trajectory through phase space is an amalgamation of two or more different functions, the instant of joining of which may be impossible to determine; and 3) it may prove impossible to properly define windows for the data, again creating an amalgamation in phases space of two or more different functions.

Epsilon machine reconstruction--We will need to try to identify the actual "work" done by these programs. How do they decide on a price? How do they "decide" to drop or raise their offer? Do they change? How are we to recognize when an algorithm changes its behaviour when all we have to deal with is the output? Can we recognize when the structure of the computation involved in the decision-making part of the algorithm changes, given our extremely limited knowledge of that structure?

These questions may be addressed using the ε-machine reconstruction approach suggested by Crutchfield (1994). The objective of this approach is to use an open-ended modeling scheme to describe the computational structure objectively, so that different practitioners working on the same data will come up with similar (hopefully identical) constructs. By encouraging an heirarchical architecture of undefined complexity, the method allows investigators to identify changes in behaviour of the the system.

This particular approach is built around discrete computation, so is amenable to data which are discrete rather than continuous in time. We assume that the discrete outputs (the time series, or stream of values) is the result of a computational process which is knowable. The data have to be organized, and (this is the key) repeated states are identified. It is possible that these states will be identified from the reconstructed phase space portraits above; alternatively they may be be defined by particular observations. These states may be identified as key strings of data, or may be recognized in complex functions by reconstructing the state space in a higher dimension. The ordering of the states is significant, as the state that appears first before another particular state is referred to as the predictive state, and the following state is the successor state.

The ε-machine is constructed by identifying all the predictive and successor states and calculating the probabilities of all of their observed relationships. If more than one ε-machine is inferred, the sequence of these first-order ε-machines can be used to build a higher-order ε-machine. Given sufficient data, you may construct ε-machines of arbitrary order.

Information theory--as seen in recent articles, information theory may be used to characterize the complexity of the ε-machine reconstruction and the probability density. The yet-to-be completed third part of that series concerns methods of using information theory to find the optimum window length for creating a probability density plot of the reconstructed phase space. The subsequent parts of this series will concern itself with the analyses described above on the nat gas and CNTY algos, as well as others as they are found.

Given the limitations of time and computing resources, I can't guarantee a timeline. I regret that my speed of analysis is six or seven orders of magnitude slower than the incidents in real time.

Tuesday, June 14, 2011

Information theoretic approaches to characterizing complex systems, part 1: complexity of reconstructed epsilon machines

Introduction

In earlier posts, I opined that the behaviour of various climate subsystems showed greater complexity in the Late Pleistocene than in the early Pleistocene. This opinion is shaped by observations of the behaviour of these varying subsystems (Himalayan monsoon strength, global ice volume, and oceanographic conditions) inferred from proxy records up to about 2 million years in length.

Any such argument would be strengthened by a number. So the challenge I will address over the next few posts in this series will be--how to characterize the complexity of the output of a dynamic series by a single number. After all, in order to compare complexity between two periods, it would be helpful to have a single parameter to compare.

The principal information theoretic concept we shall use is Shannon's (1949) measurement of entropy. The trick is deciding how to apply this parameter.

Entropy of the epsilon machine for the ice volume proxy

Let's look at epsilon machine reconstructions of the ice volume proxy.

Three separate first-order epsilon machines describe portions of the Early Pleistocene variations in the ice volume proxy. A1 represents a minimum ice state, A2 is roughly what goes for an interglacial at present, and A3 is the maximum ice state of the early Pleistocene, which would pass for a minor glacial event in the late Pleistocene.

The Mid-Pleistocene epsilon machine looks more complex.

The Late Pleistocene epsilon machine reconstruction for the ice volume looks to be more complex than any of the Early Pleistocene reconstructions. But is it? How can we tell?

The approach we will try here is to characterize the complexity by the entropy of all of the state transitions. Entropy is expressed as -Σp(i)log p(i) for all values of p(i).* Entropy is considered to be a measure of the "information" in a stream of data. This expression is normally applied in systems where Σp(i) = 1, a condition not met in the figures above. The probabilities of each pathway leading from each of the predictive states adds up to 1; so that the total of all "probabilities" adds up to the number of predictive states in the epsilon machine (two or three in the Early Quaternary, four in the mid Quaternary, six in the late Quaternary.

Do we add up the probabilities as they appear? Should we divide all probabilities by the total number of predictive states so we end up with Σ p(i) = 1? Should we weight the various probabilities to reflect the relative importance of an individual predictive state?

Let's see what happens.

First off, consider a system based not on the proxy data, but on a model. Say, the Late Quaternary global ice volume model of Paillard (1998).

Quite provocative given the current state of the economy! Actually, the I stands for interglacial regime, the M for mild glacial regime, and the F for full glacial regime. The system bumps along from state to state, but there are no probabilities listed as there is only one possible successor state from each predictive state.

Is the above system more complex, less complex, or the same as one with a single state--say, "I".

From a dynamic system, which would use topological arguments--both systems would be equally complex (or simple, in this case), as there is no choice of successor state from each predictive state. From an information sense, it is not at all clear the two systems are the same.

I M F I M F I M F I M F I M F I M F . . .

I I I I I I I I I I I I I I I I I I I I I I . . .

It depends on whether you allow yourself to 'group' the Is, Ms, and Fs into words, which repeat. From a geological perspective, there is a difference in the complexity between the two systems--having three separate predictive states is different than having a single repeated predictive state.

However, if we calculate entropy [-Σp(i)log p(i)] for all the states in both systems, we come up with a value of zero. This is because the probability of each transition (I → M, M→ F, F → I or I → I in the second example) is 1. If, on the other hand, we establish the probability of each of the transitions (I → M, M → F, and F → I) as 1/3, then the entropy is 1.585, as compared to zero for the system with a single predictive state.

The implied complexity is 3x greater for the I M F system as compared to I. Seems reasonable. Now let's consider the epsilon machine construction for the Early Quaternary ice volume proxy.

There are two possibilities to recalculating the probabilities of each transition for α1, for instance: we could divide each of the probabilities by by the number of predictive states (remembering that the probability on the unlabelled A1 → A3 arrow is 1), or we could multiply the probability of each transition by the probability of the originating predictive state. In the interval from 1870-1700 ka, we find p(A1) = 0.2, p(A2) = 0.4, p(A3) = 0.4.

By method 1, the entropy for α1 is 2.13. By method 2, the entropy for α1 is 2.17. Not too different.

By both methods, the entropy for both α2 and α3 is 1.

For the Mid Pleistocene, the entropy for α4 (by method 1) is 3.23. We observe p(A1) = 0.19, p(A2) = 0.125, p(A3) = 0.31, p(A4) = 0.375, so by method 2, entropy for α4 is 3.21. Again, not much different from method 1.

For the Late Pleistocene, the entropy of α5 (by method 1) is 3.42. We observe p(A1) = 0.027, p(A2) = 0.243, p(A3) = 0.243, p(A4) = 0.297, p(A5) = 0.162, p(A6) = 0.027. The entropy of α5 is 2.85, which is considerably lower than by method 1. I think this is because observations of A1 and A6 are rare, as these predictive states are only observed once each during the Late Pleistocene.

Entropy of epsilon machines for paleomonsoon strength proxy

Now consider the reconstructed epsilon machines for the paleomonsoon strength proxy. We shall only use method 2 in calculating entropy.

Three predictive states in the Early Quaternary, dominated by M1 and M2. Given p(M1) = 0.50, p(M2) = 0.41, p(M3) = 0.09, then the entropy of μ1 is 1.83.

In the Late Quaternary, there are six predictive states, with observed probabilities as follows: p(M1) = 0.4, p(M2) = 0.2, p(M3) = 0.17, p(M4) = 0.1, p(M5) = 0.1, p(M6) = 0.03. The entropy of μ2 is 3.67.

Conclusions

Method 1 is the easier calculation, however method 2 is a better calculation. However, method 1 can be used as long as the distribution of predictive states is not too far from even.

In summary

Time Entropy (ice volume) Entropy (paleomonsoon)

Late Pleistocene 2.85 3.67

Mid-Pleistocene 3.21 1.83

Early Pleistocene 1-2.1 1.83

By this test, the behaviour of the climate system has been more complex in the Late Pleistocene than it was in the Early Pleistocene.

In our next installment, we look at how we can use the characterize the complexity for the probability density calculation for each window shown here, to give us a nice smooth graph of complexity of the climate system through time.

References

Crutchfield, J. P., 1994. The calculi of emergence: Computation, dynamics, and induction. Physica D 75: 11-54.
Gipp, M. R., 2001. Interpretation of climate dynamics from phase space portraits: Is the climate system strange or just different? Paleoceanography, 16, 335-351.
Kukla, G., Z. S. An, J. L. Melice, J. Gavin, and J. L. Xiao, 1990. Magnetic susceptibility record of Chinese loess. Trans. R. Soc. Edinburgh Earth Sci., 81: 263-288.
Paillard, D., 2001. Glacial cycles: Toward a new paradigm. Reviews of Geophysics, 3: 325-346.
Shackleton, N. J., A. Berger, and W. R. Peltier, 1990. An alternative astronomical calibration of the Lower Pleistocene timescale based on ODP site 677, Trans. R. Soc. Edinburgh, Earth Sci., 81: 251-261.
Shannon, Claude (1949). "Communication Theory of Secrecy Systems". Bell System Technical Journal 28 (4): 656–715.

* We calculate all logarithms in a base of 2, in accordance with the nerds who came up with this concept.

Wednesday, April 27, 2011

A framework for applying computation theory to the study of earth history

Introduction

Today we look at a different means of characterizing the complexity of geological data sets using Crutchfield's (1994) "epsilon-machine" concept.

Crutchfield (1994) presented a framework for attacking the following questions. How do we discover anything new when everything we can describe must be expressed in the language of our current understanding? How does innovative behaviour arise from deterministic dynamic systems?

The kernel of Crutchfield's (1994) paper is that the "work" of complex natural systems can be likened to computation, and is capable of being studied in the same manner. Complex systems may thus be understood in terms of how they process information. Computational theory, then, lies at the very centre of the study of complex natural systems.

Computation has an heirarchical architecture, and it is this structure that makes complex behaviour possible. The necessity of heirarchies in the generating innovative or complex behaviour has been previously noted by Hofstadter (1979) and Casti (1992).

The epsilon machine approach is one in which we organize our observations of natural systems in a heirarchical architecture in order to enhance our understanding of them. Today we will look at some examples of this approach using geological time series.

Most of the images below are from the presentation I made last year at GAC (pdf).

First of all, let us consider how an earth system might be considered to be carrying out computation. Consider climate.

The global climate system can be viewed as a machine that operates on inputs, which include the amount of radiation received from the Sun; its latitudinal variations (which change with key astronomical parameters); and possibly other inputs beyond the experience of this author. We don't necessarily know all possible inputs.

The machine uses an unknown series of subroutines, each of which are influenced by factors such as the strength of various ocean currents, the pattern of overall atmospheric and oceanic circulation, the configuration of the continents, levels of atmospheric dust, atmospheric composition, etc. To make matters more interesting, some of these parameters are themselves outputs of the system.

The outputs of the system include the obvious ones like global ice volume, and global temperature distributions. Realistically, however, they also include a myriad of local temperatures, ice conditions, water temperatures, wave conditions and so forth. Like the inputs, there are more outputs than can be conveniently itemized. From the standpoint of understanding the workings of the computer, our opinions on important global parameters (say, ice volume) may not actually be significant.

To make things more challenging, we only perceive the present outputs of the system directly. We have to infer previous outputs from the geological record, which does not necessarily record the outputs cleanly. Normally we infer the outputs through the use of paleoclimate proxy data (frequently isotopic ratios, but sometimes also ratios of certain organic molecules, or the chirality of certain organic isomers. Implicit in this approach is the assumption that the relationship between the output and its proxies are constant, which is by no means certain as most of these proxies are related to the energy relationship between various organisms and their immediate surroundings: a relationship which may be expected to vary through time as as innovative behaviours are selected for by competitive or environmental pressures.

Even after the geological information is recorded (usually in sediments), it is subject to alteration by tectonic activity, or changes in temperature, or chemical alteration, and it can be challenging to recognize and correct for such changes.

In effect the geologist is trying to work out the computational details of an unknown program running on an unknown operating system with an unknown computational architecture. Some of the inputs are known, but an unknown number of additional inputs also exist, some of which may be more important than the known inputs. Some outputs are known, but these may have been altered to an extent which can be difficult to ascertain.

Is there any way we can decide which of the many possible outputs of the system to study?

The answer is surprising. According to Packard et al. (1980), it doesn't matter. All outputs of the system contain all of its information, albeit not necessarily in a form that is easy to recognize. As specialists, it is to our benefit to concentrate on our favourite record (or records). So I will show you how to construct epsilon machines using two of my favourite time series, and you can follow along with your own.

Epsilon machine construction

The process for constructing an epsilon machine is as follows:

1) define predictive and successor states - these are repetitive strings within the data, which are followed in a more or less predictive fashion by other repeated strings.

In the figure above, I have identified two repeated sequences, labelled A (1,2,3,2), and B (2,3,0,1). In the series above, assuming that the sequence of data runs left to right, it appears that A is the predictive state and B, which follows it, is a successor state. Since we are only looking at a fragment of the total data stream, it is just as likely that B is the predictive state and A is the successor state.

2) Construct the epsilon machine by calculating and recording the probabilities of all observed relationships between predictive and successor states.

3) If there are more than one epsilon machine of a given order, then each may itself be a predictive (or successor) state of the next higher order. These higher-order predictive/successor states can be used to define a higher-order epsilon machine in the same manner as in step 2.

In the diagram above, we have defined three epsilon machines from our hypothetical data set. Their existence allows us to construct a second-order epsilon machine from the three first-order epsilon machines.

This third step is where the heirarchical nature of epsilon machines can be realized. Note also that as in mathematical induction we are not specifying the order, so this step is repeated, and epsilon machines of successively higher order are defined until the data gives out. Given sufficient data, you may construct epsilon machines of arbitrary order.

Predictive states in paleoclimate proxies

Let's look at our data.

Here are two well-known data sets: the magnetic susceptibility record from a long loess record, which is a proxy for the strength of the Himalayan monsoon (Kukla et al., 1990); and the deep-sea d18O record, which is a proxy for global ice volume (Shackleton et al., 1990). The d18O record has been available from NGDC servers--the magnetic susceptibility record was obtained from the original publication, and digitized by this author.

The first obstacle in applying this method to geological time series is in defining predictive/successor states. The uncertainties in typical geologic records are so great that strings of identical observations are extremely unlikely. We need to find another approach to defining predictive and successor states.

Herein I propose that we determine predictive and successor states from probability density plots of two dimensional reconstructed phase space portraits. The particular approach we will use involves the use of the time delay space (Packard et al., 1980; Mañé, 1981).

Sequential viewing of the probability density plots for gives you an animation effect. The key here is to note that the regions of high probability are relatively constant throughout the Quaternary and are interpreted to represent areas of Lyapunov stability. These regions of stability shall be our predictive/successor states.

Epsilon machines from the paleomonsoon proxy

In the early Quaternary, we observe three predictive states in the probability density plot of the two-dimensional reconstructed phase space portraits of the paleomonsoon proxy. These have been labelled M1, M2, and M3.

Depending on your browser you may need to click on the above figure to see the magic.

Nice simple epsilon machine (μ1) for the paleomonsoon proxy in the early Quaternary. Som of the simplicity has arisen from a long interval in which M2 appears to be the only stable state, although close inspection of the record actually suggests a limit cycle around M2.

Three more predictive states are apparent in the Late Quaternary, leading to a more complex epsilon machine we will call μ2. The sequence of observed first-order epsilon machines is μ1, μ2, which is not a particularly inspiring second-order epsilon machine, but perhaps the Quaternary isn't finished yet.

Epsilon machines from the ice volume proxy

In the early Quaternary, there are only three predictive states--labelled A1, A2, and A3 (Gipp, 2001). For the purposes of this article we are ignoring the limit cycles seen here. A fourth predictive state (A4) appears in the mid Quaternary, and the fifth (and sixth) appear only in the Late Quaternary. Also in the Late Quaternary, predictive state A1 all but disappears.

We observe three distinct interrelationships (epsilon machines) between our three predictive/successor states active during the Early Quaternary. The epsilon machines have been labelled α1, α2, α3 and each are active during different portions of the record (it may be that α2 and α3 simply represent limit cycles).

What does this tell us? From a computational standpoint, for α1--if the first predictive state was A1, then the next successor state is always A3. From A3, there is a 0.25 probability that A1 is the next successor, otherwise it is A2. From A2, there is a 0.75 probability that A3 is the next successor, otherwise it is A1. Given this information, one can use a stochastic process to generate a string of successor states that satisfy the criteria of the epsilon machine, but which would not necessarily be the exact sequence of observed states.

A climate model of this stage of the Quaternary could be readily tested against the observed model by testing to see whether there is a recognizable string of predictive/successor states in the output, and whether they satisfy (approximately) the observed probabilities (we should recognize that our observed probabilities may be relatively poor estimates of the actual probabilities). There are statistical tools for demonstrating the likelihood that the probabilities generated by the computer model are in agreement with the observed probabilities.

During the mid Quaternary, ice volume state A4 appears for the first time, and the epsilon machine (α4) shows increased complexity.

In the late Quaternary, the appearance of predictive state A5 ushers in a new, more complex epsilon machine I have called α5. The asymmetric sawtooth nature of glaciations in late Quaternary are reflected by the high probability of a rapid drop from A5 to A2 (p = 0.57), A3 (p = 0.14), or A1 via A6 (p = 0.14).

The five first-order epsilon machines we define through the Quaternary may be used as predictive/successor states and used to construct a second-order epsilon machine. There are not enough transitions to establish probabilities for transition from one second-order state to another, so our knowledge of this machine is lacking. At present it appears to be: α1, α2, α3, α2, α4, α5. It is a little more interesting than the second-order epsilon machine for the paleomonsoon proxy, but not completely satisfying. Nevertheless, it is still useful as a descriptor for the climate system, and can certainly be used to test computer models of climate.

Examples of testing models against the geologic record

Computer models of ice volume may be tested by determining the predictive or successor states from the model output, and comparing the resultant epsilon machines to the observed system. For a computer model to be considered successful, it should generate more than one first-order epsilon machine.

Let's try it.

Model 1: Parennin and Paillard (2003) model for Late Quaternary glacial cycles recognizes three metastable states: 1) an interglacial regime; 2) a mild glacial regime; and 3) a full glacial regime. The steps from one regime to another are driven by variable responses of the ice volume to northern hemisphere insolation modified by the extant ice volume. When in the mild glacial regime, ice volume does not drop suddenly in response to increasing insolation; nor does the system evolve back to the interglacial regime. Each transition from one state to another represents an irreversible change. Deglaciation only occurs from the full glacial regime in response to an increase in insolation.

The epsilon machine for this model output is summarized below.

. . . where I represents the interglacial regime, M represents mild glacial regime, and F represents the full glacial regime. In terms of the predictive/successor states used in the epsilon machine method, I corresponds to A2, M to A3, and F to A5.

The epsilon machine from the Paillard (2001) is simple, but no innovative behaviour is likely due to its tight definition. As there is only one first-order epsilon machine, there will clearly be no higher-order epsilon machines. Nor is it clear how to extend the model backward to the mid or early Quaternary.

A model constructed by Saltzman and Verbisky (1994) uses multiple feedbacks ('+' means positive feedback; '-' means negative) in order to generate a results which could be tested against a three-dimensional phase space defined by global ice volume, ocean temperature, and amospheric carbon dioxide. Interestingly, the resultant model successfully captured the multistability that characterizes the data; albeit one in which only two predictive/successor states are observed. The model was only run over a limited portion of the Late Quaternary, during an interval in which only states A2, A4, and A5 are observed.

If Kauffman's (1993) assertion that the number of metastable states (or Lyapunov-stable areas) in a complex system is a function of the square root of the number of parameters holds true, then the Saltzman-Verbitsky model has between about 1/2 and 1/9 of the number of parameters necessary to generate a model as complex in behaviour as the observed Late Quaternary ice volume proxy.

Nevertheless, the generation of multistability in this model represents a form of innovation (such multistability is not explicitly defined in the model), which places this model in my view above the Parennin and Paillard (2003) model--and indeed above virtually all other models of Late Quaternary climate encountered by this author.

Improperly nested epsilon machines in the geologic system

The observed first-order epsilon machines above (alpha-1 through alpha-5) can themselves be considered predictive or sucessor states (we shall call them second-order states to distinguish them from A1 through A5). The sequence of second-order states defines an epsilon machine of the second order. The second order epsilon machine, once completed, represents a third-order predictive or successor state, and given the number of such states, it may prove possible to define a third order epsilon machine.

Successively higher-ordered epsilon machines can be defined as long as there is data to support them. In theory there may be no limit to the order of machine that may be identified in geologic records; but in practice data are limited and we do not expect to easily rise above the second order epsilon machine.

For climate change, higher-order epsilon machines are expected to be related to higher-order cycles within earth history. Higher-order epsilon machines may relate to major tectonic events, and it may eventually prove possible to define epsilon machines of order W, related to Wilson cycles. Beyond that, would be a larger epsilon machine related to the formation (and eventual destruction) of Earth.

The heirarchy of computational processes necessarily reflects the heirarchy of geological processes. But these heirarchies are not perfectly nested. As related in the introduction, some of the results of Wilson-cycle dynamics feeds into the atmospheric circulation. As a consequence, the heirarchies are not nested, they are tangled. Such tangling of heirarchies is a requirement of innovative, complex behaviour (Hofstadter, 1979), and is no doubt the reason that such behaviour is a feature of geologic systems.

References

Casti, J., 1992. Reality Rules: Understanding the world through mathematics.
Crutchfield, J. P., 1994. The calculi of emergence: Computation, dynamics, and induction. Physica D 75: 11-54.
Gipp, M. R., 2001. Interpretation of climate dynamics from phase space portraits: Is the climate system strange or just different? Paleoceanography, 16, 335-351.
Gipp, M. R., in press. Imaging system change in reconstructed phase space: Evidence for multiple bifurcations in Quaternary paleoclimate. Paleoceanography.
Hofstadter, D., 1979. Godel, Escher, Bach: An Eternal Golden Braid.
Kauffman, S. (1993), The Origins of Order: Self-Organization and Selection in Evolution, Oxford Univ. Press, New York, 734 p.
Kukla, G., Z. S. An, J. L. Melice, J. Gavin, and J. L. Xiao, 1990. Magnetic susceptibility record of Chinese loess. Trans. R. Soc. Edinburgh Earth Sci., 81: 263-288.
Mañé, R.,1981. On the dimension of the compact invariant sets of certain non-linear maps, in Dynamical Systems and Turbulence, Warwick 1980, edited by D. Rand and L. S. Young, pp. 230-241, Springer-Verlag, New York.
Packard, N. H., J. P. Crutchfield, J. D. Farmer, and R. S. Shaw, 1980. Geometry from a time series, Phys. Rev. Lett., 45, 712-716.
Paillard, D., 2001. Glacial cycles: Toward a new paradigm. Reviews of Geophysics, 3: 325-346.
Parrenin, F. and D. Paillard (2003), Amplitude and phase of glacial cycles from a conceptual model, Earth Planet. Sci. Lett., 214, 243-250.
Saltzman, B., and Verbitsky, M., 1994. Late Pleistocene climatic trajectory in the phase space of global ice, ocean state, and CO2: Observations and theory. Paleoceanography, 9: 767–779.
Shackleton, N. J., A. Berger, and W. R. Peltier, 1990. An alternative astronomical calibration of the Lower Pleistocene timescale based on ODP site 677, Trans. R. Soc. Edinburgh, Earth Sci., 81: 251-261.

Dust flux, Vostok ice core