Dust flux, Vostok ice core

Dust flux, Vostok ice core
Two dimensional phase space reconstruction of dust flux from the Vostok core over the period 186-4 ka using the time derivative method. Dust flux on the x-axis, rate of change is on the y-axis. From Gipp (2001).
Showing posts with label Natural gas. Show all posts
Showing posts with label Natural gas. Show all posts

Friday, April 27, 2012

USGS presents projections of future discoveries of oil and natural gas

The United States Geological Survey (USGS) has released its estimates for remaining undiscovered oil and natural gas resources in the world exclusive of the United States (pdf). The US estimates have been done as separate documents, one of which is here.

The estimates are probabilistic. The headline numbers: 565 billion barrels of conventional oil and 5,606 trillion cubic feet of natural gas, is enough for about 20 years of global consumption for oil--and somewhat more for gas, and have a probability of about 50% of being achieved. Maybe there will be more--maybe there will be less.

That 20 years of current consumption assumes that all the Africans, Chinese, and Indians who want cars but don't have them remain unable to get them.

Underneath the headline numbers we can limit the variability by considering the F95 (95% likely to exceed) and F5 (5% likely to exceed) from which we see a 90% likelihood of extracting between 200 billion and 1.2 trillion barrels of conventional oil and 90% chance of finding between about 2,000 trillion and 12,100 trillion cubic feet.

What is economic depends on price, so if price goes high enough we will probably be able to extract the F5 levels; which is promising, but even then is still only about 50 years of oil at current levels of consumption (although consumption should also fall if real price rises).

Also: these are conventional numbers, and don't include solving such problems as harvesting gas hydrates from continental slopes.

Friday, June 24, 2011

Deconstructing algos, part 1

The third part of the series on information theoretic methods of analysis for dynamic systems is taking longer than anticipated. Crunching the numbers is killing me. So I'll take a break from it and look a little farther forward--how we can use the methods I have been describing so far to forensically examine the algorithms used in various high-frequency trading events of the recent past.

As seen on Nanex and Zero Hedge, there has recently been a lot of strange, algorithmically driven behaviour in the pricing of natural gas and individual stock prices on very short time frames. In an earlier article I pointed out that the apparent simple chaos we observe in the natural gas price appeared to be an emergent property of at least two duelling algorithms.

In this series of articles we will begin analysis of the algorithms involved. Today's discussion will mostly focus on framing the issues that must be addressed in order to study unknown algorithms on the basis of their time-varying outputs. Future articles will present results from the various analyses.

We begin by looking at the activity in the natural gas price on June 8, 2011:


Let us also consider the pricing action in CNTY on June 21, 2011:


In both of these examples (many more such examples exist) there are three time series of interest to us--the bid price, the ask price, and the prices of trades. Additional information which may also be of use are such things as volume, size of bids, size of asks, and so on. In principal both the bid and ask prices form continuous series which are prone to instantaneous changes. The actual trades form a discontinuous time series with obsrevations at irregular intervals.

We don't have access to the code involved in these algorithms--nevertheless, we can learn something about the computational processes involved, within certain limitations. Unfortunately, just as is the case in studying time series recorded in rocks, we have to make some assumptions, and the validity of our assumptions goes a long way towards predicting the success of our endeavours.

Our first assumption is that the bid price and the ask price are being set by competing interests. This assumption is extremely important. It is possible that the bid and the ask are both being set by a single entity, or by two closely related entities who are using them to manipulate the natural gas price. We will go though in some detail the reasoning behind our assumption that there are competing interests involved below.

Secondly, we are approaching this problem assuming that prices are set and changed discontinuously in time rather than continuously in time. Subtleties of this assumption are discussed in the introduction of Bosi and Ragot (2010).

The methodologies we will explore are as follows:

Cross-correlation of the bid and ask series over selected windows. We choose limited time intervals rather than the entire record because we expect that each series will sometimes lead and sometimes follow. Peaks here will show whether one of the series leads or trails the other consistently or whether each one leads intermittently, which would support the idea that these are distinct dueling algorithms. It seems likely that the bid price will lead as both are declining, and the ask will lead as both are climbing. We should test this hypothesis.

One goal of this analysis will be to see if we can detect trigger points, where one stops following and begins leading. We will locate the times and see if the trigger can be identified, which is only likely if the trigger is some change in either price series, the price of a trade, the volume of a trade. Unfortunately, many other triggers are possible, and it may not be possible to identify them if they are, for instance, a random number generator seeded by, say, the thousandths-of-a-second digit at the instant of some distant event like the first pitch of a Yankee's game or when the secretary in the front office misspells 'the'.

Phase space reconstruction--the relevant time series (bid prices, ask prices, trade prices) each represent one-dimensional data sets. If the algorithms used can be visualized in higher-dimensional phase space, we may be able to reconstruct the overall architecture.

The advantage of this approach is that in principle the dynamics of the system will be contained no matter which output of the model we use. We only have measurements of the bid price, but have no idea what other outputs are generated by the same algorithm, even if these unknown outputs are critical to the decision-making module of the algo. The reconstructed phase space

The difficulties here are that 1) the function may change from leader to follower so quickly that the resulting trajectory through phase space is too short to interpret; 2) there may be multiple players on both the bid and ask, meaning the reconstructed trajectory through phase space is an amalgamation of two or more different functions, the instant of joining of which may be impossible to determine; and 3) it may prove impossible to properly define windows for the data, again creating an amalgamation in phases space of two or more different functions.

Epsilon machine reconstruction--We will need to try to identify the actual "work" done by these programs. How do they decide on a price? How do they "decide" to drop or raise their offer? Do they change? How are we to recognize when an algorithm changes its behaviour when all we have to deal with is the output? Can we recognize when the structure of the computation involved in the decision-making part of the algorithm changes, given our extremely limited knowledge of that structure?

These questions may be addressed using the ε-machine reconstruction approach suggested by Crutchfield (1994). The objective of this approach is to use an open-ended modeling scheme to describe the computational structure objectively, so that different practitioners working on the same data will come up with similar (hopefully identical) constructs. By encouraging an heirarchical architecture of undefined complexity, the method allows investigators to identify changes in behaviour of the the system.

This particular approach is built around discrete computation, so is amenable to data which are discrete rather than continuous in time. We assume that the discrete outputs (the time series, or stream of values) is the result of a computational process which is knowable. The data have to be organized, and (this is the key) repeated states are identified. It is possible that these states will be identified from the reconstructed phase space portraits above; alternatively they may be be defined by particular observations. These states may be identified as key strings of data, or may be recognized in complex functions by reconstructing the state space in a higher dimension. The ordering of the states is significant, as the state that appears first before another particular state is referred to as the predictive state, and the following state is the successor state.

The ε-machine is constructed by identifying all the predictive and successor states and  calculating the probabilities of all of their observed relationships. If more than one ε-machine is inferred, the sequence of these first-order ε-machines can be used to build a higher-order ε-machine. Given sufficient data, you may construct ε-machines of arbitrary order.


Information theory--as seen in recent articles, information theory may be used to characterize the complexity of the ε-machine reconstruction and the probability density. The yet-to-be completed third part of that series concerns methods of using information theory to find the optimum window length for creating a probability density plot of the reconstructed phase space. The subsequent parts of this series will concern itself with the analyses described above on the nat gas and CNTY algos, as well as others as they are found.

Given the limitations of time and computing resources, I can't guarantee a timeline. I regret that my speed of analysis is six or seven orders of magnitude slower than the incidents in real time.

Friday, June 10, 2011

Flash crash the nat gas!

As shown on posts on Zero Hedge, (correction, these are originally from Nanex) there have been some bizarre patterns in the trading of natural gas in the past couple of days. The charts below are from the evening of June 8, 2011 and come from the first of the two articles linked above:



At first glance this looks like nearly perfect chaos.


The last figure represents a one-dimensional projection of a Lorenz butterfly curve, shown in its glory below.


In reality the trading data isn't as nice as it first appears. There has been a bit of playing around with the time axis on the first two plots. I have subsequently digitized the data with trades at half-second intervals (but I'll outline the caveats below).


The axis on the bottom is time in seconds, starting from 19:40:37 on 08-Jun-11. The digitization is at half-second intervals, because the analysis below requires evenly spaced data. There were some difficulties, however. There was not always a trade right on the half- or full-second mark. Frequently the two nearest trades on either side of the desired time interval were at the same price so that it would be reasonable to use that price. Sometimes, however, the two nearest prices were quite far apart--for these I used a midpoint between bid and ask at the moment--however arguably this is not a price, particularly when we observe that many trades took place either above the ask or below the bid. Additionally, there are intervals where the midpoint between the bid and ask is actually undefinable, as during the interval from 19:42:04 to 19:42:06 where the bid price fluctuates in a complex fashion while the ask remains constant. So there are risks in this analysis.

The two dimensional lagged state space (using a 2-s lag to minimize mutual information) looks as below:


Not quite as beautiful as the Lorenz butterfly curve. I smoothed the data through a 3-point moving average filter as the original really looked like hell. The sinusoidal waves that slowly increase in size are reflected in the two-d state space as a spiral, drawn from the centre outwards, until the curve flies off into a new area of phase space.

The conventional dynamic explanation of the nat gas trading curve would be of a system in equilibrium--but the equilibrium is unstable, and the wobbles get progressively larger until the system shifts to a new equilibrium. Such a dynamic interpretation is incorrect. The Lorenz equilibrium is all equilibrium. What we perceive as a sudden shift in equilibrium is actually part of the equilibrium state.

Disequilibrium--it's new equilibrium.

From a trading perspective, the nat gas trading is more straightforward. Someone is able to profit from variability. The proper positioning of puts and calls may allow you to benefit by a large move (either up or down) in the price of a stock or commodity. After that, you can try to hit any buy or sell stops if you are able to drive the price up or down, (or in this case, both ways, until the sell stops were hit triggering a cascade in price). If there had been more buy stops, then the price would have melted up. It was simply a matter of luck that the price melted down.

As in all markets there are two (or more) participants. Leaving the actual trades for a minute and scrutinizing the fluctuations in the bid and ask, we see a complicated psychological game being played. I see many of the tricks that I have seen in thinly traded gold stocks priced by a market maker. You see the bid (or trade price) creep up towards the ask price, then run away, perhaps hoping to draw the ask price down.

But the time frame is completely different. I used to see this play out over the course of a trading day (my favourite was when I would get a partial fill of, say 500 shares of some penny stock before the price would swing away, and could imagine the market maker saying "do you want to pay full transaction fees for a $100 sale, or are you going to meet my price and fill your order?"). In this case, these games are being played on a split-second scale.

Game theory has been digitized and is running on a level of complexity that leaves TIT FOR TAT in the dust. Unfortunately my textbook on the subject is being used by my seven-year-old, so I can't get into strategy games between different players.

Many of the price rises happen while trades are occurring above the ask price. How does that happen. Is the asker crossing transactions with fictional entities to lure the buyer? Similarly, many trades occur below the bid price as the price is falling. Are these fictional crosses? Are these real trades or just a gimmick to lure in another party (hurry up and buy--look how fast price is falling!).

The chaotic appearance of this function is simply an emergent property of the gaming algorithms.

Lastly, I will point out that my analysis is roughly seven orders of magnitude slower than the frequency of some of the trades.