The World Complex: information theory

Showing posts with label information theory. Show all posts

Thursday, September 22, 2011

Information theoretic approaches to characterizing complex systems, part 4: Formalizing a method for optimizing window lengths for probability density diagrams from phase space

When the market looks like it does today, it is better to think about other things.

Some time ago we began looking at the problem of determining the optimum window over which the probability density of the phase space should be calculated. The problem is essentially one of optimization, where if the window is too short the probability density plot will not be accurately represented, but if it is too long, then interesting features may be smoothed out.

Once again we look at a > 2 million year proxy for the strength of the Himalayan paleomonsoon (top figure).

Reconstructing a phase space portrait over small (say 30 thousand years) intervals gives us highly variable results because of the variability in the dynamic behaviour of the system over comparatively short timeframes. As we have seen elsewhere, many of these complex climatic systems are characterized by intervals during which the state space is confined within a comparatively small area of Lyapunov stability, interspersed with intervals during which the system evolves rapidly towards another area of stability.

While confined within an LSA, the probability density will consist of a few large values spread over a small area. The entropy (in an information theory sense) will be small.

When the system is evolving towards a new LSA, the probability density will consist of many small values spread over a large area. The entropy will be large.

Successive values of entropy calculated over small time windows will show a lot of variability. Some of that variability will be due to secular changes in the complexity of the system, and some will be due to the granularity of our observations. If we start choosing longer time windows, we get tend to get both episodes of stability and bifurcation within each window, so the effects of granularity ideally vanish and only the secular variations remain.

Variability declines as the window length increases from 30 ky (thousand years) to 150 ky. Now, each of the above graphs consists of a string of data, so we can do better than eyeballing a comparison.

The methodology proposed then is to normalize each of the strings of data above, and then compare the zero-lag cross-correlation of the entropy of two successive window-lengths to the zero-lag autocorrelation of the entropy observed in shorter of the two windows.

For instance, in the figure above, the entropy observed over 30-ky windows varies from about 1.5 to 3.5. We normalize the data by dividing the difference between each observation and the mean of the series by its standard deviation i.e. x(norm)= (x - mean)/(standard deviation)

We similarly normalize the entropy values for the 60-ky window. The zero-lag cross-correlation will have a lower value than the zero-lag autocorrelation--but how much lower is a measure of how different the two curves are. We find that the ratio of the two values improves as we look at longer windows, as below.

The graph converges in the general direction of the value of 1 as the window gets longer. A value of one would imply perfect correlation between the two entropy functions. So we choose the window length for which the zero-lag cross-correlation is sufficiently high for our purposes. In the example above, I would find that the 150 ky window is sufficient.

For the late Quaternary, a window length of 150 ky also appears suitable.

I'm pretty sure that the different rates at which the cross-correlations approach 1 in the Late and Early Quaternary paleomonsoon proxy are telling us something about the dynamics of the system over these two intervals--but I'm not yet sure what.

Thursday, July 7, 2011

Information theoretic approaches to characterizing complex systems, part 3: optimizing the window for constructing probability density of state space trajectories

In earlier essays I discussed creating plots of the probability density of a 2-d reconstructed phase space portrait as a means for investigating the dynamics of complex systems. For examples I used proxy records representing global ice volume, Himalayan paleomonsoon strength, and oceanographic conditions.

Today's discussion centres around using entropy (in an information theoretic sense) as a tool for deciding whether the window selected for calculating the probability density is sufficient.

We've seen this before. The probability density plot of the trajectory of the 2-d phase space portrait for one 270-ky window of the paleomonsoon strength proxy data suggests that the phase space is characterized by five areas of high probability--five areas where the trajectory of the phase space tends to be confined in small areas for episodes of time before rapidly moving towards another such area. In earlier articles we have argued that these represent areas of Lyapunov stability (LSA).

I used a 270-ky window for all the plots. Why this number? Why any number?

Selection of the width of the window is important, and at the time I was creating these plots, I did not have a technique in place for prescribing this width. If the window is wide, resolving power is lower. If the window is narrow, resolving power is higher--consequently one would think to choose a narrow (i.e. short length of time) window. But if the window is too short, then the observed trajectory over the window is not representative of the phase space of the function.

The trajectories from two neighbouring 30-ky windows in phase space.

If we constructed probability-density plots from the two trajectories in the diagram above, the plots would not be very interesting. It would be even worse if the windows were shorter--say 1 ky, for instance. Then each successive probability density plot would be a blob translated some short distance in phase space, tracing out the trajectory of the entire function, alternating between being stretched a little and "snapping back".

Using entropy to prescribe window length

Entropy will be inversely related to the size of the region of phase space that encompasses the trajectory within the window of time under investigation. If the phase space travels from one LSA to another, then the phase space is characterized by alternating episodes of confinement to a small area punctuated by sweeping trajectories as the system moves to a new state of metastability. A graph of the entropy of successive windows of probability data would appear very noisy, as the entropy varies from very low values for windows dominated by confinement to one LSA, to much higher values during the shifts from one LSA to another.

As the window is lengthened, the noise level of the graph of successive values of entropy declines.

The diagram above is a series of graphs of successive values of entropy calculated over overlapping windows of length 30 ky (at top), 60 ky, 90 ky, 120 ky, and 150 ky (at bottom) using the calculated probability density of the 2-dimensional phase space reconstructed from the time series of the paleomonsoon strength proxy, during the early Quaternary. As the window lengthens, we see the plot become smoother until it reflects primarily secular changes rather than accidents of local trajectories.

The plot here seems to suggest that I can get by using a window of only 150 ky, rather than the 270 ky that I actually used.

The story for the trajectories in the Late Quaternary is the same--the minimum window width looks to be about 150 ky.

You'll have to excuse me. I have a lot of redrafting to do.

A more formal treatment of this method is given here.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Note: "ky" means thousand years, and refers to duration
"ka" means thousand years ago and refers to a specific time

Friday, June 24, 2011

Deconstructing algos, part 1

The third part of the series on information theoretic methods of analysis for dynamic systems is taking longer than anticipated. Crunching the numbers is killing me. So I'll take a break from it and look a little farther forward--how we can use the methods I have been describing so far to forensically examine the algorithms used in various high-frequency trading events of the recent past.

As seen on Nanex and Zero Hedge, there has recently been a lot of strange, algorithmically driven behaviour in the pricing of natural gas and individual stock prices on very short time frames. In an earlier article I pointed out that the apparent simple chaos we observe in the natural gas price appeared to be an emergent property of at least two duelling algorithms.

In this series of articles we will begin analysis of the algorithms involved. Today's discussion will mostly focus on framing the issues that must be addressed in order to study unknown algorithms on the basis of their time-varying outputs. Future articles will present results from the various analyses.

We begin by looking at the activity in the natural gas price on June 8, 2011:

Let us also consider the pricing action in CNTY on June 21, 2011:

In both of these examples (many more such examples exist) there are three time series of interest to us--the bid price, the ask price, and the prices of trades. Additional information which may also be of use are such things as volume, size of bids, size of asks, and so on. In principal both the bid and ask prices form continuous series which are prone to instantaneous changes. The actual trades form a discontinuous time series with obsrevations at irregular intervals.

We don't have access to the code involved in these algorithms--nevertheless, we can learn something about the computational processes involved, within certain limitations. Unfortunately, just as is the case in studying time series recorded in rocks, we have to make some assumptions, and the validity of our assumptions goes a long way towards predicting the success of our endeavours.

Our first assumption is that the bid price and the ask price are being set by competing interests. This assumption is extremely important. It is possible that the bid and the ask are both being set by a single entity, or by two closely related entities who are using them to manipulate the natural gas price. We will go though in some detail the reasoning behind our assumption that there are competing interests involved below.

Secondly, we are approaching this problem assuming that prices are set and changed discontinuously in time rather than continuously in time. Subtleties of this assumption are discussed in the introduction of Bosi and Ragot (2010).

The methodologies we will explore are as follows:

Cross-correlation of the bid and ask series over selected windows. We choose limited time intervals rather than the entire record because we expect that each series will sometimes lead and sometimes follow. Peaks here will show whether one of the series leads or trails the other consistently or whether each one leads intermittently, which would support the idea that these are distinct dueling algorithms. It seems likely that the bid price will lead as both are declining, and the ask will lead as both are climbing. We should test this hypothesis.

One goal of this analysis will be to see if we can detect trigger points, where one stops following and begins leading. We will locate the times and see if the trigger can be identified, which is only likely if the trigger is some change in either price series, the price of a trade, the volume of a trade. Unfortunately, many other triggers are possible, and it may not be possible to identify them if they are, for instance, a random number generator seeded by, say, the thousandths-of-a-second digit at the instant of some distant event like the first pitch of a Yankee's game or when the secretary in the front office misspells 'the'.

Phase space reconstruction--the relevant time series (bid prices, ask prices, trade prices) each represent one-dimensional data sets. If the algorithms used can be visualized in higher-dimensional phase space, we may be able to reconstruct the overall architecture.

The advantage of this approach is that in principle the dynamics of the system will be contained no matter which output of the model we use. We only have measurements of the bid price, but have no idea what other outputs are generated by the same algorithm, even if these unknown outputs are critical to the decision-making module of the algo. The reconstructed phase space

The difficulties here are that 1) the function may change from leader to follower so quickly that the resulting trajectory through phase space is too short to interpret; 2) there may be multiple players on both the bid and ask, meaning the reconstructed trajectory through phase space is an amalgamation of two or more different functions, the instant of joining of which may be impossible to determine; and 3) it may prove impossible to properly define windows for the data, again creating an amalgamation in phases space of two or more different functions.

Epsilon machine reconstruction--We will need to try to identify the actual "work" done by these programs. How do they decide on a price? How do they "decide" to drop or raise their offer? Do they change? How are we to recognize when an algorithm changes its behaviour when all we have to deal with is the output? Can we recognize when the structure of the computation involved in the decision-making part of the algorithm changes, given our extremely limited knowledge of that structure?

These questions may be addressed using the ε-machine reconstruction approach suggested by Crutchfield (1994). The objective of this approach is to use an open-ended modeling scheme to describe the computational structure objectively, so that different practitioners working on the same data will come up with similar (hopefully identical) constructs. By encouraging an heirarchical architecture of undefined complexity, the method allows investigators to identify changes in behaviour of the the system.

This particular approach is built around discrete computation, so is amenable to data which are discrete rather than continuous in time. We assume that the discrete outputs (the time series, or stream of values) is the result of a computational process which is knowable. The data have to be organized, and (this is the key) repeated states are identified. It is possible that these states will be identified from the reconstructed phase space portraits above; alternatively they may be be defined by particular observations. These states may be identified as key strings of data, or may be recognized in complex functions by reconstructing the state space in a higher dimension. The ordering of the states is significant, as the state that appears first before another particular state is referred to as the predictive state, and the following state is the successor state.

The ε-machine is constructed by identifying all the predictive and successor states and calculating the probabilities of all of their observed relationships. If more than one ε-machine is inferred, the sequence of these first-order ε-machines can be used to build a higher-order ε-machine. Given sufficient data, you may construct ε-machines of arbitrary order.

Information theory--as seen in recent articles, information theory may be used to characterize the complexity of the ε-machine reconstruction and the probability density. The yet-to-be completed third part of that series concerns methods of using information theory to find the optimum window length for creating a probability density plot of the reconstructed phase space. The subsequent parts of this series will concern itself with the analyses described above on the nat gas and CNTY algos, as well as others as they are found.

Given the limitations of time and computing resources, I can't guarantee a timeline. I regret that my speed of analysis is six or seven orders of magnitude slower than the incidents in real time.

Friday, June 17, 2011

Information theoretic approaches to characterizing complex systems, part 2: complexity of climate proxy data from probability density plots in state space

Last time the World Complex applied simple information theory to the probabilities of particular state transitions within the constructed epsilon machine representations of paleoclimate proxies.

Probability density is calculated by the method outlined here, and below, we see one frame of data for the oceanographic condition proxy (mid-ocean d13-C variability). In essence, the data are unfolded into two dimensions using the time delay method of phase space reconstruction. The probability density is estimated by mapping out a set of overlapping boxes through phase space, and the amount of time the reconstructed state lies within each box is calculated over a window (I have used 270-thousand-year windows--your mileage may vary). Similar windows have been created at 30 thousand year intervals going back nearly two million years.

The entropy of this distribution of data is calculated by reducing the number in each cell in the array above to a probability (a fraction of 270 thousand year duration). In the calculation of the above array, each observation is slotted into four separate cells, so the total "probabilities" add up to 4. The ideal correction would be to count up one cell in every four. What works as well is to use p(i) = t(i)/270,000, where t(i) are the elements of the array, sum up [-Σp(i)log p(i)] for all non-zero terms and divide the total by four.

Calculating entropy for all windows for the oceanographic condition proxy and plotting the results below . . .

Higher values indicate higher entropy, which we normally equate with greater complexity. Oceanographic changes were highly complex in the early Quaternary (until about 1150 thousand years ago); but complexity fell through the middle Quaternary, reaching a minimum at about 750 thousand years ago); increased until

about 450 thousand years ago, and has generally declined since.

I should point out here that when we average data over a window (say from 0-270 thousand years ago), the standard is to plot the result at the midpoint of the window (i.e., 135 thousand years ago). When you look at a plot of the 200-day moving average for a stock price, they plot that average at the end of the window, rather than the middle. This is an important difference.

If we consider individual probability density plots, we see that the area of phase space occupied by the attractor over the Quaternary has varied, but is generally about the same size at different times. We shall see below that this is not the case for the ice volume proxy and the paleomonsoon strength proxy.

The variation of complexity in the ice volume proxy phase space reconstruction with time is as follows:

There is a marked increase in complexity from the Early Quaternary (right side of the graph) to the Late Quaternary (left side of the graph). That complexity has dropped rather sharply in the most recent windows, and this last bit of variability appears to exceed the variability within the graph. Is this an indication we are on the cusp or a major change to the global climate system?

The variation in complexity for the paleomonsoon proxy over the past 2 million years is as follows:

Once again there is a significant increase in complexity in the behaviour of the phase space portrait of the paleomonsoon strength proxy throughout the Quaternary. In the early Quarternary the behaviour of the system is quite simple, and is dominated by tight orbits around a single area of high probability. Complexity is much higher in the late Quaternary, but does fall off somewhat fairly recently.

Probability density plot for phase space portrait of paleomonsoon strength proxy,

over a window from 1421 to 1691 thousand years ago.

Probability density plot for phase space portrait of paleomonsoon strength proxy,

from 281 to 551 thousand years ago. The larger number of LSA areas (high probability)

suggests greater complexity of behaviour in the Late Quaternary.

The overall picture of complexity through the Quaternary is in accordance with that inferred from the epsilon machine computation of our last installment.

Tuesday, June 14, 2011

Information theoretic approaches to characterizing complex systems, part 1: complexity of reconstructed epsilon machines

Introduction

In earlier posts, I opined that the behaviour of various climate subsystems showed greater complexity in the Late Pleistocene than in the early Pleistocene. This opinion is shaped by observations of the behaviour of these varying subsystems (Himalayan monsoon strength, global ice volume, and oceanographic conditions) inferred from proxy records up to about 2 million years in length.

Any such argument would be strengthened by a number. So the challenge I will address over the next few posts in this series will be--how to characterize the complexity of the output of a dynamic series by a single number. After all, in order to compare complexity between two periods, it would be helpful to have a single parameter to compare.

The principal information theoretic concept we shall use is Shannon's (1949) measurement of entropy. The trick is deciding how to apply this parameter.

Entropy of the epsilon machine for the ice volume proxy

Let's look at epsilon machine reconstructions of the ice volume proxy.

Three separate first-order epsilon machines describe portions of the Early Pleistocene variations in the ice volume proxy. A1 represents a minimum ice state, A2 is roughly what goes for an interglacial at present, and A3 is the maximum ice state of the early Pleistocene, which would pass for a minor glacial event in the late Pleistocene.

The Mid-Pleistocene epsilon machine looks more complex.

The Late Pleistocene epsilon machine reconstruction for the ice volume looks to be more complex than any of the Early Pleistocene reconstructions. But is it? How can we tell?

The approach we will try here is to characterize the complexity by the entropy of all of the state transitions. Entropy is expressed as -Σp(i)log p(i) for all values of p(i).* Entropy is considered to be a measure of the "information" in a stream of data. This expression is normally applied in systems where Σp(i) = 1, a condition not met in the figures above. The probabilities of each pathway leading from each of the predictive states adds up to 1; so that the total of all "probabilities" adds up to the number of predictive states in the epsilon machine (two or three in the Early Quaternary, four in the mid Quaternary, six in the late Quaternary.

Do we add up the probabilities as they appear? Should we divide all probabilities by the total number of predictive states so we end up with Σ p(i) = 1? Should we weight the various probabilities to reflect the relative importance of an individual predictive state?

Let's see what happens.

First off, consider a system based not on the proxy data, but on a model. Say, the Late Quaternary global ice volume model of Paillard (1998).

Quite provocative given the current state of the economy! Actually, the I stands for interglacial regime, the M for mild glacial regime, and the F for full glacial regime. The system bumps along from state to state, but there are no probabilities listed as there is only one possible successor state from each predictive state.

Is the above system more complex, less complex, or the same as one with a single state--say, "I".

From a dynamic system, which would use topological arguments--both systems would be equally complex (or simple, in this case), as there is no choice of successor state from each predictive state. From an information sense, it is not at all clear the two systems are the same.

I M F I M F I M F I M F I M F I M F . . .

I I I I I I I I I I I I I I I I I I I I I I . . .

It depends on whether you allow yourself to 'group' the Is, Ms, and Fs into words, which repeat. From a geological perspective, there is a difference in the complexity between the two systems--having three separate predictive states is different than having a single repeated predictive state.

However, if we calculate entropy [-Σp(i)log p(i)] for all the states in both systems, we come up with a value of zero. This is because the probability of each transition (I → M, M→ F, F → I or I → I in the second example) is 1. If, on the other hand, we establish the probability of each of the transitions (I → M, M → F, and F → I) as 1/3, then the entropy is 1.585, as compared to zero for the system with a single predictive state.

The implied complexity is 3x greater for the I M F system as compared to I. Seems reasonable. Now let's consider the epsilon machine construction for the Early Quaternary ice volume proxy.

There are two possibilities to recalculating the probabilities of each transition for α1, for instance: we could divide each of the probabilities by by the number of predictive states (remembering that the probability on the unlabelled A1 → A3 arrow is 1), or we could multiply the probability of each transition by the probability of the originating predictive state. In the interval from 1870-1700 ka, we find p(A1) = 0.2, p(A2) = 0.4, p(A3) = 0.4.

By method 1, the entropy for α1 is 2.13. By method 2, the entropy for α1 is 2.17. Not too different.

By both methods, the entropy for both α2 and α3 is 1.

For the Mid Pleistocene, the entropy for α4 (by method 1) is 3.23. We observe p(A1) = 0.19, p(A2) = 0.125, p(A3) = 0.31, p(A4) = 0.375, so by method 2, entropy for α4 is 3.21. Again, not much different from method 1.

For the Late Pleistocene, the entropy of α5 (by method 1) is 3.42. We observe p(A1) = 0.027, p(A2) = 0.243, p(A3) = 0.243, p(A4) = 0.297, p(A5) = 0.162, p(A6) = 0.027. The entropy of α5 is 2.85, which is considerably lower than by method 1. I think this is because observations of A1 and A6 are rare, as these predictive states are only observed once each during the Late Pleistocene.

Entropy of epsilon machines for paleomonsoon strength proxy

Now consider the reconstructed epsilon machines for the paleomonsoon strength proxy. We shall only use method 2 in calculating entropy.

Three predictive states in the Early Quaternary, dominated by M1 and M2. Given p(M1) = 0.50, p(M2) = 0.41, p(M3) = 0.09, then the entropy of μ1 is 1.83.

In the Late Quaternary, there are six predictive states, with observed probabilities as follows: p(M1) = 0.4, p(M2) = 0.2, p(M3) = 0.17, p(M4) = 0.1, p(M5) = 0.1, p(M6) = 0.03. The entropy of μ2 is 3.67.

Conclusions

Method 1 is the easier calculation, however method 2 is a better calculation. However, method 1 can be used as long as the distribution of predictive states is not too far from even.

In summary

Time Entropy (ice volume) Entropy (paleomonsoon)

Late Pleistocene 2.85 3.67

Mid-Pleistocene 3.21 1.83

Early Pleistocene 1-2.1 1.83

By this test, the behaviour of the climate system has been more complex in the Late Pleistocene than it was in the Early Pleistocene.

In our next installment, we look at how we can use the characterize the complexity for the probability density calculation for each window shown here, to give us a nice smooth graph of complexity of the climate system through time.

References

Crutchfield, J. P., 1994. The calculi of emergence: Computation, dynamics, and induction. Physica D 75: 11-54.
Gipp, M. R., 2001. Interpretation of climate dynamics from phase space portraits: Is the climate system strange or just different? Paleoceanography, 16, 335-351.
Kukla, G., Z. S. An, J. L. Melice, J. Gavin, and J. L. Xiao, 1990. Magnetic susceptibility record of Chinese loess. Trans. R. Soc. Edinburgh Earth Sci., 81: 263-288.
Paillard, D., 2001. Glacial cycles: Toward a new paradigm. Reviews of Geophysics, 3: 325-346.
Shackleton, N. J., A. Berger, and W. R. Peltier, 1990. An alternative astronomical calibration of the Lower Pleistocene timescale based on ODP site 677, Trans. R. Soc. Edinburgh, Earth Sci., 81: 251-261.
Shannon, Claude (1949). "Communication Theory of Secrecy Systems". Bell System Technical Journal 28 (4): 656–715.

* We calculate all logarithms in a base of 2, in accordance with the nerds who came up with this concept.

Dust flux, Vostok ice core