The World Complex: probability density plot

Showing posts with label probability density plot. Show all posts

Thursday, November 21, 2013

The Classification Problem

Posting has been light as I have been gobsmacked by something I discovered in a book that I've had for almost twenty years. I've always had trouble understanding it. I'm a geologist, and find this sort of thing (pdf) challenging.

It has to do with these probability density plots I've been making in phase space. I developed the idea intuitively, but the publishing has always been a slog because I had difficulty presenting a theoretical justification of my approach. I had made a leap of faith that each area of high probability density in phase space was centred about an attractor of indeterminate type.

It was a bit of a fluke getting the paper published in Paleoceanography--the reviewers weren't sure they agreed with it but were willing to give it a go. In terms of number of citations it ranks among the least influential publications in the journal's history.

The discovery was an interpretation of Zeeman's classification problem. His idea was that given a system which is described by a vector field on a manifold (say, a 2-d plane, which is what I have been using, but any surface is possible) so that the trajectory of the time-evolution of the system indicates a flow along the vectors; and given the system is somewhat noisy, so that there is a small random component to evolution along the trajectory; then the end-state of probability densities from all possible initial states on the manifold will be an invariant property of the vector field. What you will end up with will be diffuse balls of higher probability around each of the attractors on the manifold in phase space.

I read this as a justification of my intuitive approach, and he's a real mathematician.

Edit: Reference

Zeeman, E. C., 1988. Stability of dynamical systems. Nonlinearity, 1: 115. doi.org/10.1088/0951-7715/1/1/005

Friday, March 2, 2012

Snapshots of multistability in the climate system

Today the World Complex presents images from the recently redrafted movie of the probability density plot of the proxy record for global ice volume over the past two million years. The reason for the redrafting was to shorten the window, improving the resolution of the individual frames.

The methodology for deriving these plots from original data has been previously described here. The O-18 data used below are from Shackleton et al. (1990). Variations in O-18 in the deep ocean reflect global volumes of glacial ice.

This figure is a map of a ice-volume phase space over the interval 189 ka to 39 ka (ka = thousand years ago). There are three distinct regions of higher probability (grey areas) in phase space, which represent stable global glacial volumes. This figure suggests that over the interval in question, there were three stable global ice configurations--one corresponding roughly to the interglacial condition we have today (at lower left), and two more with considerably more (glacial) ice--and that transitions from one to another happened relatively rapidly. As the probability of any state outside of the three LSAs is low, global climate change was rapid during the interval in question. Glacial ice volumes therefore have three conditions of equilibrium, which are punctuated by brief episodes of rapid change. Using our dynamic interpretations from previous articles, we have inferred three areas of Lyapunov stability in the time delay state space of global ice volume.

The graph only tells us about global ice volume, but not where the ice is. Thus we cannot infer the global glacial configurations for each of the three LSA.

In the 699-519 ka interval (still Late Quaternary) we still see multiple areas of stability. There may be a limit cycle in the low volume section of probability density diagram.

The interval 1509-1359 ka was characterized by a large limit cycle, with a couple of particular regions of higher probability. The high probability peak at lower left represents an area of Lyapunov stability. Limit cycle behaviour in the ice volume system suggests oscillatory growth and decay of ice sheets.

This plot shows a limit cycle and two areas of Lyapunov stability. The lower one, near (3.5, 3.5) is the same as the one in the next figure above. The second area of attraction, near (3.9, 3.9) is present in the figure at representing the interval 1599-1449 ka above.

In general, limit cycle behaviour is more common in the Early Quaternary, and simple LSA multistable behaviour is more common in the Late Quaternary. This observation is reinforced in observations of reconstructed phase space portraits of smoothed C-13 measurements from cibicidoides sp. (Raymo et al., 2004).

The C-13 data is purported to represent oceanographic conditions and are reflective of overall glacial conditions, with lower values corresponding to glacial maxima (Bickert et al., 1997). The phasing of variability in the C-13 differs from that of O-18 at different frequencies and is thought to reflect changes in oceanographic flow at least partially in response to glacial cycles (Raymo et al., 2004).

In the Early Quaternary, the probability density plots of the reconstructed state space mainly suggest limit cycle behaviour. The period of the oscillations is approximately 41 ky.

In the Late Quaternary, the oceanographic state is more suggestive of multiple metastable equilibria, punctuated by brief episodes of rapid change.

Limit cycle behaviour is still observed in some windows in the Late Quaternary . . .

. . . but multiple equilibria is the predominant state in the Late Quaternary.

I have recently completed epsilon machine reconstructions for the 13C predictive states (at least the forward-evolving e-machines as described briefly here in the references) and will be posting these shortly.

References

Bickert, T., Curry, W. B., and Wefer, G., 1997. Late Pliocene to Holocene (2.6-0 Ma) western equatorial Atlantic deep-water circulation: Inferences from benthic stable isotopes. In Shackleton, N. J., et al. (eds.), Proceedings of the Ocean Drilling Program, Scientific Results, v. 154: 241-254.

Raymo, M. E., Oppo, D. W., Flower, B. P., et al., 2004. Stability of North Atlantic water masses in face of pronounced climate variability during the Pleistocene. Paleoceanography, v. 19: 13p. doi: 10.1029/2003PA000921.

Shackleton, N. J., A. Berger, and W. R. Peltier, 1990. An alternative astronomical calibration of the Lower Pleistocene timescale based on ODP site 677, Trans. R. Soc. Edinburgh, Earth Sci., 81: 251-261.

Friday, December 23, 2011

Innovation in complex systems

Innovation has been on my mind a lot lately. Unfortunately, not the kind that results in iPhones and the like.

We normally think of innovation as a good thing. But not all innovations are good ones. As counterexamples, let's consider recent political innovations in the US that allow indefinite detention without trial of anyone accused of terror-related activities; or the use of Predator drones to target American citizens.

My interest has been innovation in the Earth system--particularly in the behaviour of the climate system over the past two million years. The problem with recognizing innovation is that we tend to interpret any activities in light of what we already know--consequently it is difficult to discover anything new. Our first tendency would be to explain our new observations as a special case of what we already know. We resist the idea that something new is occurring.

The Earth system is driven by a few global parameters which interact with myriads of local agents; yet contrary to expectations instead of dissolving into noise, highly ordered global-scale structure arises. We may call such structures emergent properties, and the means by which they arise is termed emergence.

The problem of how these global structures arise from multitudes of interacting local agents is, shall we say, a non-trivial problem. They are in no way predictable from our knowledge of the local interactions; nevertheless we agree that emergence is in accordance with physical laws.

In earth systems, such emergent properties include plate tectonics, glaciations, superplume events, and some mass extinction events.

The emergent properties of a system may change. These changes may or may not be related to specific change(s) on the local level. For the purpose of this essay, I am referring to such changes as innovation.

Possible examples of innovation in Earth systems include the (somewhat controversial) proposed change in mode of tectonics in Archaean time; (very controversial) Neoproterozoic glaciation (i.e., "snowball Earth"); and magnetic pole reversals.

I have been considering change in operation of the climate system during the Mid-Pleistocene (from about 1 million years ago to about 500 thousand years ago).

I present the following probability density plots of the 2-d phase space reconstructions of the ice volume proxy, produced using the time delay method with a delay of 6 thousand years. Each of the figures below is calculated from 150 thousand years of data.

Starting from the Early Pleistocene . . .

Limit cycles (green dashed ellipses) are common in the Early Pleistocene, less so later.

Areas of Lyapunov stability, labelled A1 and A2, represent relatively ice-free conditions. Current global ice volume is comparable to A2, and A1 represents even less ice than at present. Limit cycles in the Early Pleistocene (representing slow, steady growth and decay of ice sheets) start from either the A1 or A2 condition.

The Late Pleistocene is characterized by discrete areas of high probability, suggesting rapid transitions between longer periods of stability. A2 represents an interglacial condition, and A3 to A6 represent separate metastable ice configurations of greater volume respectively. A6 represents a glacial maximum condition, as we experienced about 18,000 years ago.

Climate dynamics as inferred from global ice volume seems to have changed during the Pleistocene epoch. Was it innovation?

Opinions about what happened during the Mid-Pleistocene include changes in atmospheric CO2 leading to greater glaciations, cumulative cooling in the deep ocean changing the nature of the glacial-interglacial transition, erosive uncovering of crystalline bedrock leading to greater thickness of ice sheets, and spontaneous (chaotic) change. There is general agreement that there is no obvious external forcing or any fundamental change in the low-level dynamics leading to the change in climate behaviour, so it is at least possible to argue that the climate system began to act in an "innovative" fashion (provided we state that we do not view this innovation as having been directed in any way).

Let's look at another system instead--one represented by the share price of Century Casinos.

The chart of the daily closing price looks a little like my portfolio--up to a high in April, and all downhill from there.

The two-dimensional reconstructed phase space doesn't look much different from those of other stocks I've looked at in the past.

Actually, this has been smoothed a little, using a 3-point moving average.

There appears to be nothing interesting in the share price activity over the past year--unless we look at daily high prices instead of closing prices.

And here we see something unexpected--a singular spike in share price on June 21, where the share price bounced between about $3 and $8 several times over the day, on first a one-minute timescale, and around mid-day at a one-second timescale.

To investigate dynamics on this timescale, we have to construct our time-delay phase space with a small lag.

In two seconds of trading we have numerous fluctuations between $3 and $7. Lots of money to be made here! (or there would have been had the exchanges not cancelled all the trades).

A few minutes later we get this over one second.

This is orders of magnitude different from what we see in the annual behaviour of the stock, and even considerably different from the bowl of spaghetti above. This figure actually represents a phase space portrait of a random walk. Yes, you can trade randomly if you are quick enough.

So what is the difference between the trading in CNTY on June 21 and every other day this year? Another innovation--high-frequency trading, but in a form which creates the illusion of liquidity by placing lots of orders and then cancelling them as they begin to be filled. The resulting moves in a stock can be dramatic.

Suppose an institutional investor needs to buy a million shares of CNTY (perhaps part of some proprietary arbitrage position). The buyer looks at the depth chart and sees that there are a million shares being offered at $3, so the buyer attempts to fill the order--only to discover that he gets perhaps a thousand shares, the rest of the offer is cancelled, and there are now a million shares offered at $3.05. The tug-of-war may continue, but if the buyer is motivated, the share price may rise considerably in a remarkably short period of time.

Remember that the original intent of having a bid and ask price is that the various offerings were intended to be sold. The idea that these offerings would be used only as bait and not represent real liquidity is indeed innovative, but unhelpful.

Unlike the change in climate dynamics in the mid-Pleistocene, the change in dynamics in share price of CNTY is symptomatic of a fundamental change in the operation of the market, and this change is detrimental to the majority of its participants.

Friday, December 16, 2011

Inference of dynamics for complex systems, part 4: long records

Today we look at phase space reconstructions of long climate proxies (which are records of some geological parameter which is believed to be related to some climatic parameter--used because we have no way of directly measuring temperatures or global ice volumes of the distant past).

The proxy I will be looking at today is the ca. 2-million-year-long record of deep-sea d18O (difference in concentration of O-18 from some standard) from ODP 677.

The record is actually inverted, as it is a proxy for ice volume. In the figure above, the curve is near the top of the graph at times of low ice volume (i.e., interglacials) and near the bottom during glacial maxima.

Reconstructing the phase space over the past 585,000 years (since 585 ka in the figure below), using a delay of 6000 years (noted as 6 ky below), gives us the following.

Now we need to decide what sort of system this graph describes. Is it like this?

Or more like this one?

There are a lot of loops in our reconstructed phase space portrait. Are there any areas of Lyapunov stability? It is not too easy to see directly from the portrait.

To simplify it, we can divide up the data into bins and contour the density of data in each bin. I have called these "probability density plots" in previous posts. With sufficient data, you may be able to use a Gaussian kernel estimator--as many commonly used mathematical software packages contain such a feature (you may have to create your own subroutines to work in two or more dimensions).

The probability density plot (modified from the Paleoceanography paper) is a lot easier to interpret than the original phase space portrait (second figure from top). The peaks in probability density (labelled A2 through A5) are interpreted as areas of Lyapunov stability. Global ice volume over the past 750 thousand years (and much more) is characterized by multistability--there are multiple equilibria. At any given interval, only one equilibrium is "in play"; but the equilibrium point is subject to abrupt change from, say, A5 to A2, over very short intervals.

The above image was constructed from a "window" (a shorter section of the record) of 750 thousand years. The entire record might be studied in a series of three such windows. Windows of lesser duration offer a higher-resolution characterization of the system dynamics.

Same data set, shorter window.

Creating a probability density plot is a robust method of limited computational difficulty which can simplify your interpretations of the dynamics from long, complex data sets.

Thursday, September 22, 2011

Information theoretic approaches to characterizing complex systems, part 4: Formalizing a method for optimizing window lengths for probability density diagrams from phase space

When the market looks like it does today, it is better to think about other things.

Some time ago we began looking at the problem of determining the optimum window over which the probability density of the phase space should be calculated. The problem is essentially one of optimization, where if the window is too short the probability density plot will not be accurately represented, but if it is too long, then interesting features may be smoothed out.

Once again we look at a > 2 million year proxy for the strength of the Himalayan paleomonsoon (top figure).

Reconstructing a phase space portrait over small (say 30 thousand years) intervals gives us highly variable results because of the variability in the dynamic behaviour of the system over comparatively short timeframes. As we have seen elsewhere, many of these complex climatic systems are characterized by intervals during which the state space is confined within a comparatively small area of Lyapunov stability, interspersed with intervals during which the system evolves rapidly towards another area of stability.

While confined within an LSA, the probability density will consist of a few large values spread over a small area. The entropy (in an information theory sense) will be small.

When the system is evolving towards a new LSA, the probability density will consist of many small values spread over a large area. The entropy will be large.

Successive values of entropy calculated over small time windows will show a lot of variability. Some of that variability will be due to secular changes in the complexity of the system, and some will be due to the granularity of our observations. If we start choosing longer time windows, we get tend to get both episodes of stability and bifurcation within each window, so the effects of granularity ideally vanish and only the secular variations remain.

Variability declines as the window length increases from 30 ky (thousand years) to 150 ky. Now, each of the above graphs consists of a string of data, so we can do better than eyeballing a comparison.

The methodology proposed then is to normalize each of the strings of data above, and then compare the zero-lag cross-correlation of the entropy of two successive window-lengths to the zero-lag autocorrelation of the entropy observed in shorter of the two windows.

For instance, in the figure above, the entropy observed over 30-ky windows varies from about 1.5 to 3.5. We normalize the data by dividing the difference between each observation and the mean of the series by its standard deviation i.e. x(norm)= (x - mean)/(standard deviation)

We similarly normalize the entropy values for the 60-ky window. The zero-lag cross-correlation will have a lower value than the zero-lag autocorrelation--but how much lower is a measure of how different the two curves are. We find that the ratio of the two values improves as we look at longer windows, as below.

The graph converges in the general direction of the value of 1 as the window gets longer. A value of one would imply perfect correlation between the two entropy functions. So we choose the window length for which the zero-lag cross-correlation is sufficiently high for our purposes. In the example above, I would find that the 150 ky window is sufficient.

For the late Quaternary, a window length of 150 ky also appears suitable.

I'm pretty sure that the different rates at which the cross-correlations approach 1 in the Late and Early Quaternary paleomonsoon proxy are telling us something about the dynamics of the system over these two intervals--but I'm not yet sure what.

Thursday, July 7, 2011

Information theoretic approaches to characterizing complex systems, part 3: optimizing the window for constructing probability density of state space trajectories

In earlier essays I discussed creating plots of the probability density of a 2-d reconstructed phase space portrait as a means for investigating the dynamics of complex systems. For examples I used proxy records representing global ice volume, Himalayan paleomonsoon strength, and oceanographic conditions.

Today's discussion centres around using entropy (in an information theoretic sense) as a tool for deciding whether the window selected for calculating the probability density is sufficient.

We've seen this before. The probability density plot of the trajectory of the 2-d phase space portrait for one 270-ky window of the paleomonsoon strength proxy data suggests that the phase space is characterized by five areas of high probability--five areas where the trajectory of the phase space tends to be confined in small areas for episodes of time before rapidly moving towards another such area. In earlier articles we have argued that these represent areas of Lyapunov stability (LSA).

I used a 270-ky window for all the plots. Why this number? Why any number?

Selection of the width of the window is important, and at the time I was creating these plots, I did not have a technique in place for prescribing this width. If the window is wide, resolving power is lower. If the window is narrow, resolving power is higher--consequently one would think to choose a narrow (i.e. short length of time) window. But if the window is too short, then the observed trajectory over the window is not representative of the phase space of the function.

The trajectories from two neighbouring 30-ky windows in phase space.

If we constructed probability-density plots from the two trajectories in the diagram above, the plots would not be very interesting. It would be even worse if the windows were shorter--say 1 ky, for instance. Then each successive probability density plot would be a blob translated some short distance in phase space, tracing out the trajectory of the entire function, alternating between being stretched a little and "snapping back".

Using entropy to prescribe window length

Entropy will be inversely related to the size of the region of phase space that encompasses the trajectory within the window of time under investigation. If the phase space travels from one LSA to another, then the phase space is characterized by alternating episodes of confinement to a small area punctuated by sweeping trajectories as the system moves to a new state of metastability. A graph of the entropy of successive windows of probability data would appear very noisy, as the entropy varies from very low values for windows dominated by confinement to one LSA, to much higher values during the shifts from one LSA to another.

As the window is lengthened, the noise level of the graph of successive values of entropy declines.

The diagram above is a series of graphs of successive values of entropy calculated over overlapping windows of length 30 ky (at top), 60 ky, 90 ky, 120 ky, and 150 ky (at bottom) using the calculated probability density of the 2-dimensional phase space reconstructed from the time series of the paleomonsoon strength proxy, during the early Quaternary. As the window lengthens, we see the plot become smoother until it reflects primarily secular changes rather than accidents of local trajectories.

The plot here seems to suggest that I can get by using a window of only 150 ky, rather than the 270 ky that I actually used.

The story for the trajectories in the Late Quaternary is the same--the minimum window width looks to be about 150 ky.

You'll have to excuse me. I have a lot of redrafting to do.

A more formal treatment of this method is given here.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Note: "ky" means thousand years, and refers to duration
"ka" means thousand years ago and refers to a specific time

Friday, June 24, 2011

Deconstructing algos, part 1

The third part of the series on information theoretic methods of analysis for dynamic systems is taking longer than anticipated. Crunching the numbers is killing me. So I'll take a break from it and look a little farther forward--how we can use the methods I have been describing so far to forensically examine the algorithms used in various high-frequency trading events of the recent past.

As seen on Nanex and Zero Hedge, there has recently been a lot of strange, algorithmically driven behaviour in the pricing of natural gas and individual stock prices on very short time frames. In an earlier article I pointed out that the apparent simple chaos we observe in the natural gas price appeared to be an emergent property of at least two duelling algorithms.

In this series of articles we will begin analysis of the algorithms involved. Today's discussion will mostly focus on framing the issues that must be addressed in order to study unknown algorithms on the basis of their time-varying outputs. Future articles will present results from the various analyses.

We begin by looking at the activity in the natural gas price on June 8, 2011:

Let us also consider the pricing action in CNTY on June 21, 2011:

In both of these examples (many more such examples exist) there are three time series of interest to us--the bid price, the ask price, and the prices of trades. Additional information which may also be of use are such things as volume, size of bids, size of asks, and so on. In principal both the bid and ask prices form continuous series which are prone to instantaneous changes. The actual trades form a discontinuous time series with obsrevations at irregular intervals.

We don't have access to the code involved in these algorithms--nevertheless, we can learn something about the computational processes involved, within certain limitations. Unfortunately, just as is the case in studying time series recorded in rocks, we have to make some assumptions, and the validity of our assumptions goes a long way towards predicting the success of our endeavours.

Our first assumption is that the bid price and the ask price are being set by competing interests. This assumption is extremely important. It is possible that the bid and the ask are both being set by a single entity, or by two closely related entities who are using them to manipulate the natural gas price. We will go though in some detail the reasoning behind our assumption that there are competing interests involved below.

Secondly, we are approaching this problem assuming that prices are set and changed discontinuously in time rather than continuously in time. Subtleties of this assumption are discussed in the introduction of Bosi and Ragot (2010).

The methodologies we will explore are as follows:

Cross-correlation of the bid and ask series over selected windows. We choose limited time intervals rather than the entire record because we expect that each series will sometimes lead and sometimes follow. Peaks here will show whether one of the series leads or trails the other consistently or whether each one leads intermittently, which would support the idea that these are distinct dueling algorithms. It seems likely that the bid price will lead as both are declining, and the ask will lead as both are climbing. We should test this hypothesis.

One goal of this analysis will be to see if we can detect trigger points, where one stops following and begins leading. We will locate the times and see if the trigger can be identified, which is only likely if the trigger is some change in either price series, the price of a trade, the volume of a trade. Unfortunately, many other triggers are possible, and it may not be possible to identify them if they are, for instance, a random number generator seeded by, say, the thousandths-of-a-second digit at the instant of some distant event like the first pitch of a Yankee's game or when the secretary in the front office misspells 'the'.

Phase space reconstruction--the relevant time series (bid prices, ask prices, trade prices) each represent one-dimensional data sets. If the algorithms used can be visualized in higher-dimensional phase space, we may be able to reconstruct the overall architecture.

The advantage of this approach is that in principle the dynamics of the system will be contained no matter which output of the model we use. We only have measurements of the bid price, but have no idea what other outputs are generated by the same algorithm, even if these unknown outputs are critical to the decision-making module of the algo. The reconstructed phase space

The difficulties here are that 1) the function may change from leader to follower so quickly that the resulting trajectory through phase space is too short to interpret; 2) there may be multiple players on both the bid and ask, meaning the reconstructed trajectory through phase space is an amalgamation of two or more different functions, the instant of joining of which may be impossible to determine; and 3) it may prove impossible to properly define windows for the data, again creating an amalgamation in phases space of two or more different functions.

Epsilon machine reconstruction--We will need to try to identify the actual "work" done by these programs. How do they decide on a price? How do they "decide" to drop or raise their offer? Do they change? How are we to recognize when an algorithm changes its behaviour when all we have to deal with is the output? Can we recognize when the structure of the computation involved in the decision-making part of the algorithm changes, given our extremely limited knowledge of that structure?

These questions may be addressed using the ε-machine reconstruction approach suggested by Crutchfield (1994). The objective of this approach is to use an open-ended modeling scheme to describe the computational structure objectively, so that different practitioners working on the same data will come up with similar (hopefully identical) constructs. By encouraging an heirarchical architecture of undefined complexity, the method allows investigators to identify changes in behaviour of the the system.

This particular approach is built around discrete computation, so is amenable to data which are discrete rather than continuous in time. We assume that the discrete outputs (the time series, or stream of values) is the result of a computational process which is knowable. The data have to be organized, and (this is the key) repeated states are identified. It is possible that these states will be identified from the reconstructed phase space portraits above; alternatively they may be be defined by particular observations. These states may be identified as key strings of data, or may be recognized in complex functions by reconstructing the state space in a higher dimension. The ordering of the states is significant, as the state that appears first before another particular state is referred to as the predictive state, and the following state is the successor state.

The ε-machine is constructed by identifying all the predictive and successor states and calculating the probabilities of all of their observed relationships. If more than one ε-machine is inferred, the sequence of these first-order ε-machines can be used to build a higher-order ε-machine. Given sufficient data, you may construct ε-machines of arbitrary order.

Information theory--as seen in recent articles, information theory may be used to characterize the complexity of the ε-machine reconstruction and the probability density. The yet-to-be completed third part of that series concerns methods of using information theory to find the optimum window length for creating a probability density plot of the reconstructed phase space. The subsequent parts of this series will concern itself with the analyses described above on the nat gas and CNTY algos, as well as others as they are found.

Given the limitations of time and computing resources, I can't guarantee a timeline. I regret that my speed of analysis is six or seven orders of magnitude slower than the incidents in real time.

Friday, June 17, 2011

Information theoretic approaches to characterizing complex systems, part 2: complexity of climate proxy data from probability density plots in state space

Last time the World Complex applied simple information theory to the probabilities of particular state transitions within the constructed epsilon machine representations of paleoclimate proxies.

Probability density is calculated by the method outlined here, and below, we see one frame of data for the oceanographic condition proxy (mid-ocean d13-C variability). In essence, the data are unfolded into two dimensions using the time delay method of phase space reconstruction. The probability density is estimated by mapping out a set of overlapping boxes through phase space, and the amount of time the reconstructed state lies within each box is calculated over a window (I have used 270-thousand-year windows--your mileage may vary). Similar windows have been created at 30 thousand year intervals going back nearly two million years.

The entropy of this distribution of data is calculated by reducing the number in each cell in the array above to a probability (a fraction of 270 thousand year duration). In the calculation of the above array, each observation is slotted into four separate cells, so the total "probabilities" add up to 4. The ideal correction would be to count up one cell in every four. What works as well is to use p(i) = t(i)/270,000, where t(i) are the elements of the array, sum up [-Σp(i)log p(i)] for all non-zero terms and divide the total by four.

Calculating entropy for all windows for the oceanographic condition proxy and plotting the results below . . .

Higher values indicate higher entropy, which we normally equate with greater complexity. Oceanographic changes were highly complex in the early Quaternary (until about 1150 thousand years ago); but complexity fell through the middle Quaternary, reaching a minimum at about 750 thousand years ago); increased until

about 450 thousand years ago, and has generally declined since.

I should point out here that when we average data over a window (say from 0-270 thousand years ago), the standard is to plot the result at the midpoint of the window (i.e., 135 thousand years ago). When you look at a plot of the 200-day moving average for a stock price, they plot that average at the end of the window, rather than the middle. This is an important difference.

If we consider individual probability density plots, we see that the area of phase space occupied by the attractor over the Quaternary has varied, but is generally about the same size at different times. We shall see below that this is not the case for the ice volume proxy and the paleomonsoon strength proxy.

The variation of complexity in the ice volume proxy phase space reconstruction with time is as follows:

There is a marked increase in complexity from the Early Quaternary (right side of the graph) to the Late Quaternary (left side of the graph). That complexity has dropped rather sharply in the most recent windows, and this last bit of variability appears to exceed the variability within the graph. Is this an indication we are on the cusp or a major change to the global climate system?

The variation in complexity for the paleomonsoon proxy over the past 2 million years is as follows:

Once again there is a significant increase in complexity in the behaviour of the phase space portrait of the paleomonsoon strength proxy throughout the Quaternary. In the early Quarternary the behaviour of the system is quite simple, and is dominated by tight orbits around a single area of high probability. Complexity is much higher in the late Quaternary, but does fall off somewhat fairly recently.

Probability density plot for phase space portrait of paleomonsoon strength proxy,

over a window from 1421 to 1691 thousand years ago.

Probability density plot for phase space portrait of paleomonsoon strength proxy,

from 281 to 551 thousand years ago. The larger number of LSA areas (high probability)

suggests greater complexity of behaviour in the Late Quaternary.

The overall picture of complexity through the Quaternary is in accordance with that inferred from the epsilon machine computation of our last installment.

Monday, May 23, 2011

Modification of probability density for stocks rising steeply in price

Today's study is based on an 18-month record of daily closing prices for Silver Wheaton Corp. (disclosure--long position)

The chart shows a pretty nice move up to the $45 range, and has since pulled back.

The double bumps near the end of the record provide enough information to define a time delay at the first minimum of the average mutual information of the time series against sequential lags. A reasonable approximation of this lag is sixteen trading days, and this lag is used in creating the phase space portrait below.

Share price phase space and turbulent flow

The turbulent eddies diffuse outward as share prices rise. The simplest reason is there is no inherent scale for price changes.

The graph above shows the daily change in closing price for Silver Wheaton, expressed as a percentage, over the past 18 months. There is no real trend, nor are the percentage moves larger or smaller when the price was higher. The effect of this is to produce a diffuse phase space portrait for the higher price area of the graph. Consequently, when the probability density is plotted, the areas in phase space representing higher prices will have lower probabilities than might otherwise be the case.

The simplest way to correct for this effect is to plot the reconstructed phase space on a logarithmic scale rather than a normal scale.

The bin size remains constant, which really means that the bins become larger as we move into the high-price
area of the reconstructed state space. The effect is to increase somewhat the probability density in the high-price area.

The only compelling attractors for price here are in the 17-$21 range. Like the Pelangio chart last time, this is a reflection of the amount of time spent at these price levels. The only cure is to stay at higher prices (SLW is near $35 at this writing).

Also recall that like most other technical approaches, resistance and support are due to psychological factors. For mining companies the drill can defeat technical analysis.

Thursday, May 19, 2011

Probability density in phase space--Pelangio Exploration Inc.

Today we'll take a look at some charts for Pelangio Exploration Inc. (disclosure--have formerly done some contract work for this company, currently have a long position). They are currently active in Ghana at two sites, and a little less active in northern Ontario.

I remember back in December and January hoping this was forming a cup with handle. But the handle keeps going down. Will it bottom soon?

A two dimensional phase space projection (no trajectory, only the data) is presented below. As for the similar charts for Detour Gold, I have used a four-day time-delay.

The data are dense enough to create a probability density graph using a box $0.05 on a side.

Contouring the probability density shows us that from our present price in the 50-55c range, a further fall will bring the price under the influence of the massive attractor in the 15-20c range. There are attractors in the 35c and 45c areas, but these are weak.

If the price can hold or rise slightly, it will remain under the influence of the attractors in the 60c to 80c range.

Drill, drill!

BTW, this is not investment advice. DYODD.

Dust flux, Vostok ice core