The World Complex: chaos

Showing posts with label chaos. Show all posts

Saturday, January 18, 2020

The History of the East Asia Monsoon

So I went to Washington DC last week for the AGU Chapman Conference on the East Asian monsoon. I found it to be a very rewarding conference, and even learned a bit about navigating around Washington on transit, as I was on a limited budget.

The conference was in AGU headquarters, which is near to Dupont Circle.

Not all that far from the Mall, although I didn't visit this time.

Speaking of scientists . . .

I was speaking during the opening session, which was about climate dynamics (and its role on the changes in monsoonal strengths through geologic history). A major dynamic role has been the rise of the Himalayan mountains and the Tibetan Plateau during the period of interest, and there is still a lot of debate about the importance of these tectonic events on the development of the monsoon. Some of the modeling studies suggest that the mountains only change the specific location of the rainfall, and that monsoon behaviour may occur even if there were no continents at all.

My work was based on analysis of global to regional proxy data sets, and has been summarized in all these places. Unfortunately, due to limited time, after working through the phase space reconstructions, I had to rush through the statistical computation part, and wasn't certain whether any of the message made it to the audience ungarbled. Fortunately, I was able to learn that at least some members of the audience understood the message.

The afternoon sessions were all about paleoceanographic records of the monsoon. Over the past decade, the International Ocean Discovery Program (IODP, formerly ODP and DSDP) has put down a number of boreholes in the Indus and Bengal Fans, and other boreholes in the Huang He fan and the Sea of Japan also provide useful records of at least some parts of the monsoon. The records I studied were generally global in scope--these other records allow for regional variations to be studied.

The next day's sessions dealt with continental environments (a common issue was the change in photosynthetic pathways of plants in response to environmental change during the Miocene) and records of continental erosion. Erosion is important because either rising mountains or increased rainfall will lead to increased erosion.

The last session was on modeling the effects of tectonic uplift as well as changes in the timing of the uplift, because there is still some disagreement about when the Tibetan Plateau was formed. I mean disagreement between it being less than 10 million years ago to more than 40 million years ago, which is a significant difference of opinion for something so recent.

The last portion of the conference was to break up into groups for focussed discussions on topics of interest leading to the testing of several hypotheses proposed at the start of the conference. I started off in the wrong room, so I was with the tectonic modeling people rather than the climate modeling people, but was still able to ask about whether anyone had successfully had chaos appear in their model output. Results were inconclusive.

For the second group meeting I joined the combined discussion between the climate modelers and the paleoceanographic records group. Over the course of the discussion I eventually managed to come up with a proposal. See if the modelers observe chaos, and see if they can tell which style of chaos they have. Such chaos will be manifested as spatial variability in some climate effects, such as the location of the maximum rainfall. The models may have the type of spatial variability modeled correctly, but the specific timing of variations will be incorrect. That spatial variability will be recorded in the widely spaced paleoceanographic records which already exist. They type of chaos observed in the models will tell us what to look for in the cores; from the cores we can obtain the correct timing of the modeled chaotic spatial variations of the monsoon system.

Exiting the Metro Station at Dupont Circle

I wasn't sure how the last part of the conference would go--early on, many of the old hands were of the opinion that nothing ever comes from these things. But I thought it was pretty rewarding, particularly as it was during these sessions that I came to realize that people felt that whatever I was doing was worthwhile.

Alone in my corner of the world, I had never been sure.

Night flight back to Toronto

Tuesday, November 5, 2013

Happy anniversary chaos!

Fifty years ago, Edward Lorenz published the first paper (pdf) generally recognized to discuss chaos.

Lorenz didn't call what he had discovered 'chaos'. It's not clear that he really understood the importance of what he had discovered. He knew it was interesting, and when scientists find something interesting they publish it, and worry about the ramifications later.

What Lorenz had discovered is that a deterministic system could have unpredictability. It is difficult to convey how unexpected this discovery was at the time, because the idea of what is now called chaos has disseminated (although imperfectly) through our common culture. A deterministic system is one in which the rules are well described (via equations) which operate on data to produce results. Since the time of Laplace's Dr. Manhattan quote, it had always been assumed that nothing unexpected could arise from such a system when an initial position and the rules of motion were defined to arbitrary precision. Unexpected behaviour should only result from randomness.

So when Lorenz put together a simple model for atmospheric convection in the presence of heating, he used three simple differential equations, simple boundary conditions, and an arbitrary starting point, there would have been no reason to suspect that anything unexpected might occur. After all, all the parameters in the equations were known.

In essence, what he discovered was that minute variances in starting conditions led to extremely large variations in outcome. This again was unexpected, because our knowledge was largely built on assumptions of linear behaviour, in which small variations only grow larger slowly. Lorenz's interpretation of what he had discovered was to correctly point out that long-term weather forecasting was impossible, because it was impossible to measure the present state of the system with perfect accuracy--and the range of possible differing outcomes from the measurement accuracy was essentially the range of all possible weather.

The discovery and formalization of chaos theory led to entirely new fields of study encompassing different aspects of nonlinear dynamics and complex systems. Among them is one field of endeavour which has been a point of interest on this blog--complexity.

What do we mean by complexity? Actually, I'll write about this in a future posting. For now, let's just note the relative unpredictability of complex systems and get into the whys of it all later.

- - - - - - -

This post is a bit belated, because Lorenz's publication was actually in March. But something as momentous as chaos should celebrate over the course of an entire year.

Sometimes we go all out and celebrate something over a couple of years. International Geophysical Year (1957-58) and International Heliophysical Year (2007-08) come to mind.

There have already been numerous celebratory events so far this year. But first, a word about the enablers of this year's celebrations on the markets.

High-frequency trading spams the exchanges with empty quotes destined to be cancelled--so much that it appears that many legitimate offers do not get filled at optimum pricing, as the system becomes overwhelmed with meaningless numbers.

According to the exchanges, HFT is a good thing. It increases liquidity, or at least that is the axiom that guides their acceptance. Unfortunately, observation tells us that the opposite may be the case--that HFT causes liquidity to vanish precisely at the time it is most needed.

In the last 50 years, we entered the nonlinear world. But our thinking--especially institutional thinking--is still trapped in the linear world.

In the linear world, if something is a benefit, then more of it is a greater a benefit. But in the nonlinear world, where one may be a benefit, and two may be better; three could turn out to be horrifying.

So in celebrating 50 years of chaos, the exchanges (with their sponsors, the algos) have brought you the following celebratory events.

Flash crash on the German market. Twitter feed flash crash. (Appropriately enough, both of these were in April). Thee Anadarko flash crash. Information travels faster than the speed of light! Closing of the Nasdaq options bourse. Not to mention hundreds of strange trade executions across all the exchanges.

How to lose lots of money in 45 ms by Nanex.

Most of these problems are the (un)predictable result of the interaction of numerous algorithms. Some may have been errors, or the so-called 'fat finger' trades; others may have been other form of human or algo error.

Algo error. Was that supposed to happen?

The markets are not what they used to be. The overall superstate has changed over the last ten years from one dominated by humans to one dominated by machines. The result has a been a series of entirely new phenomena, which we have earlier termed 'innovation'.

The year isn't over yet. I look forward to the next special event. I don't think I have long to wait.

- - - - - - - - - - - - - -

And then there's this. I was going to put in something by King Crimson here, but this seemed more appropriate.

Friday, June 24, 2011

Deconstructing algos, part 1

The third part of the series on information theoretic methods of analysis for dynamic systems is taking longer than anticipated. Crunching the numbers is killing me. So I'll take a break from it and look a little farther forward--how we can use the methods I have been describing so far to forensically examine the algorithms used in various high-frequency trading events of the recent past.

As seen on Nanex and Zero Hedge, there has recently been a lot of strange, algorithmically driven behaviour in the pricing of natural gas and individual stock prices on very short time frames. In an earlier article I pointed out that the apparent simple chaos we observe in the natural gas price appeared to be an emergent property of at least two duelling algorithms.

In this series of articles we will begin analysis of the algorithms involved. Today's discussion will mostly focus on framing the issues that must be addressed in order to study unknown algorithms on the basis of their time-varying outputs. Future articles will present results from the various analyses.

We begin by looking at the activity in the natural gas price on June 8, 2011:

Let us also consider the pricing action in CNTY on June 21, 2011:

In both of these examples (many more such examples exist) there are three time series of interest to us--the bid price, the ask price, and the prices of trades. Additional information which may also be of use are such things as volume, size of bids, size of asks, and so on. In principal both the bid and ask prices form continuous series which are prone to instantaneous changes. The actual trades form a discontinuous time series with obsrevations at irregular intervals.

We don't have access to the code involved in these algorithms--nevertheless, we can learn something about the computational processes involved, within certain limitations. Unfortunately, just as is the case in studying time series recorded in rocks, we have to make some assumptions, and the validity of our assumptions goes a long way towards predicting the success of our endeavours.

Our first assumption is that the bid price and the ask price are being set by competing interests. This assumption is extremely important. It is possible that the bid and the ask are both being set by a single entity, or by two closely related entities who are using them to manipulate the natural gas price. We will go though in some detail the reasoning behind our assumption that there are competing interests involved below.

Secondly, we are approaching this problem assuming that prices are set and changed discontinuously in time rather than continuously in time. Subtleties of this assumption are discussed in the introduction of Bosi and Ragot (2010).

The methodologies we will explore are as follows:

Cross-correlation of the bid and ask series over selected windows. We choose limited time intervals rather than the entire record because we expect that each series will sometimes lead and sometimes follow. Peaks here will show whether one of the series leads or trails the other consistently or whether each one leads intermittently, which would support the idea that these are distinct dueling algorithms. It seems likely that the bid price will lead as both are declining, and the ask will lead as both are climbing. We should test this hypothesis.

One goal of this analysis will be to see if we can detect trigger points, where one stops following and begins leading. We will locate the times and see if the trigger can be identified, which is only likely if the trigger is some change in either price series, the price of a trade, the volume of a trade. Unfortunately, many other triggers are possible, and it may not be possible to identify them if they are, for instance, a random number generator seeded by, say, the thousandths-of-a-second digit at the instant of some distant event like the first pitch of a Yankee's game or when the secretary in the front office misspells 'the'.

Phase space reconstruction--the relevant time series (bid prices, ask prices, trade prices) each represent one-dimensional data sets. If the algorithms used can be visualized in higher-dimensional phase space, we may be able to reconstruct the overall architecture.

The advantage of this approach is that in principle the dynamics of the system will be contained no matter which output of the model we use. We only have measurements of the bid price, but have no idea what other outputs are generated by the same algorithm, even if these unknown outputs are critical to the decision-making module of the algo. The reconstructed phase space

The difficulties here are that 1) the function may change from leader to follower so quickly that the resulting trajectory through phase space is too short to interpret; 2) there may be multiple players on both the bid and ask, meaning the reconstructed trajectory through phase space is an amalgamation of two or more different functions, the instant of joining of which may be impossible to determine; and 3) it may prove impossible to properly define windows for the data, again creating an amalgamation in phases space of two or more different functions.

Epsilon machine reconstruction--We will need to try to identify the actual "work" done by these programs. How do they decide on a price? How do they "decide" to drop or raise their offer? Do they change? How are we to recognize when an algorithm changes its behaviour when all we have to deal with is the output? Can we recognize when the structure of the computation involved in the decision-making part of the algorithm changes, given our extremely limited knowledge of that structure?

These questions may be addressed using the ε-machine reconstruction approach suggested by Crutchfield (1994). The objective of this approach is to use an open-ended modeling scheme to describe the computational structure objectively, so that different practitioners working on the same data will come up with similar (hopefully identical) constructs. By encouraging an heirarchical architecture of undefined complexity, the method allows investigators to identify changes in behaviour of the the system.

This particular approach is built around discrete computation, so is amenable to data which are discrete rather than continuous in time. We assume that the discrete outputs (the time series, or stream of values) is the result of a computational process which is knowable. The data have to be organized, and (this is the key) repeated states are identified. It is possible that these states will be identified from the reconstructed phase space portraits above; alternatively they may be be defined by particular observations. These states may be identified as key strings of data, or may be recognized in complex functions by reconstructing the state space in a higher dimension. The ordering of the states is significant, as the state that appears first before another particular state is referred to as the predictive state, and the following state is the successor state.

The ε-machine is constructed by identifying all the predictive and successor states and calculating the probabilities of all of their observed relationships. If more than one ε-machine is inferred, the sequence of these first-order ε-machines can be used to build a higher-order ε-machine. Given sufficient data, you may construct ε-machines of arbitrary order.

Information theory--as seen in recent articles, information theory may be used to characterize the complexity of the ε-machine reconstruction and the probability density. The yet-to-be completed third part of that series concerns methods of using information theory to find the optimum window length for creating a probability density plot of the reconstructed phase space. The subsequent parts of this series will concern itself with the analyses described above on the nat gas and CNTY algos, as well as others as they are found.

Given the limitations of time and computing resources, I can't guarantee a timeline. I regret that my speed of analysis is six or seven orders of magnitude slower than the incidents in real time.

Friday, June 10, 2011

Flash crash the nat gas!

As shown on posts on Zero Hedge, (correction, these are originally from Nanex) there have been some bizarre patterns in the trading of natural gas in the past couple of days. The charts below are from the evening of June 8, 2011 and come from the first of the two articles linked above:

At first glance this looks like nearly perfect chaos.

The last figure represents a one-dimensional projection of a Lorenz butterfly curve, shown in its glory below.

In reality the trading data isn't as nice as it first appears. There has been a bit of playing around with the time axis on the first two plots. I have subsequently digitized the data with trades at half-second intervals (but I'll outline the caveats below).

The axis on the bottom is time in seconds, starting from 19:40:37 on 08-Jun-11. The digitization is at half-second intervals, because the analysis below requires evenly spaced data. There were some difficulties, however. There was not always a trade right on the half- or full-second mark. Frequently the two nearest trades on either side of the desired time interval were at the same price so that it would be reasonable to use that price. Sometimes, however, the two nearest prices were quite far apart--for these I used a midpoint between bid and ask at the moment--however arguably this is not a price, particularly when we observe that many trades took place either above the ask or below the bid. Additionally, there are intervals where the midpoint between the bid and ask is actually undefinable, as during the interval from 19:42:04 to 19:42:06 where the bid price fluctuates in a complex fashion while the ask remains constant. So there are risks in this analysis.

The two dimensional lagged state space (using a 2-s lag to minimize mutual information) looks as below:

Not quite as beautiful as the Lorenz butterfly curve. I smoothed the data through a 3-point moving average filter as the original really looked like hell. The sinusoidal waves that slowly increase in size are reflected in the two-d state space as a spiral, drawn from the centre outwards, until the curve flies off into a new area of phase space.

The conventional dynamic explanation of the nat gas trading curve would be of a system in equilibrium--but the equilibrium is unstable, and the wobbles get progressively larger until the system shifts to a new equilibrium. Such a dynamic interpretation is incorrect. The Lorenz equilibrium is all equilibrium. What we perceive as a sudden shift in equilibrium is actually part of the equilibrium state.

Disequilibrium--it's new equilibrium.

From a trading perspective, the nat gas trading is more straightforward. Someone is able to profit from variability. The proper positioning of puts and calls may allow you to benefit by a large move (either up or down) in the price of a stock or commodity. After that, you can try to hit any buy or sell stops if you are able to drive the price up or down, (or in this case, both ways, until the sell stops were hit triggering a cascade in price). If there had been more buy stops, then the price would have melted up. It was simply a matter of luck that the price melted down.

As in all markets there are two (or more) participants. Leaving the actual trades for a minute and scrutinizing the fluctuations in the bid and ask, we see a complicated psychological game being played. I see many of the tricks that I have seen in thinly traded gold stocks priced by a market maker. You see the bid (or trade price) creep up towards the ask price, then run away, perhaps hoping to draw the ask price down.

But the time frame is completely different. I used to see this play out over the course of a trading day (my favourite was when I would get a partial fill of, say 500 shares of some penny stock before the price would swing away, and could imagine the market maker saying "do you want to pay full transaction fees for a $100 sale, or are you going to meet my price and fill your order?"). In this case, these games are being played on a split-second scale.

Game theory has been digitized and is running on a level of complexity that leaves TIT FOR TAT in the dust. Unfortunately my textbook on the subject is being used by my seven-year-old, so I can't get into strategy games between different players.

Many of the price rises happen while trades are occurring above the ask price. How does that happen. Is the asker crossing transactions with fictional entities to lure the buyer? Similarly, many trades occur below the bid price as the price is falling. Are these fictional crosses? Are these real trades or just a gimmick to lure in another party (hurry up and buy--look how fast price is falling!).

The chaotic appearance of this function is simply an emergent property of the gaming algorithms.

Lastly, I will point out that my analysis is roughly seven orders of magnitude slower than the frequency of some of the trades.

Wednesday, May 4, 2011

Bifurcation in gold-silver ratio! Where are we headed?

Is the recent exciting rise in the price of silver telling us something? Let's investigate.

The World Complex presents a two-dimensional reconstructed phase space portrait (using the time delay method) for monthly average gold and silver prices from January 1996 to April 2011. The time delay is twelve months, meaning that this plot shows the gold/silver ratio plotted against the ratio one year earlier.

Silver pricing data comes from here. Gold price monthly data from the World Gold Council website.

Don't get lost in the details. The important thing isn't every loop and swirl in the trajectory, nor is it even the trajectory itself. It is the space. For almost the entire data set, the system has cycled within a large area, which may have been an area of Lyapunov stability. The only exceptions are the excursion to the lower right area of the graph (an unusually high ratio coupled with an unusually high rate of increase in the gold/silver ratio, which happened in late 2008 to early 2009--I'm sure we all remember that one); and two excursions to the far left of the graph (in which the ratio is low and had also declined rapidly in the previous year)--these being the "Warren Buffett event" of 1998, and our present excursion.

Notice that the Warren Buffet excursion was short-lived. Indeed, the sudden reversal of the trajectory looks a little unnatural. We eagerly await to see whether our current excursion will be similarly short-lived, and marked by a similar reversal. Unfortunately, although we are eager, the time taken for such an event to play out is months to years, so patience is the order of the day.

The next several months will be crucial. If there are truly problems in silver supplies, it would be logical to expect to see the trajectory of our dynamic system leap into a new area of phase space--probably one beyond the boundaries of the graph.

One idea currently circulating is that bifurcations are preceded by a period of extreme stability (what we might term low volatility). It is not clear if the period of low volatility is dynamically necessary (clearing the road, as it were), or is merely an empirical observation. The volatility of the gold/silver ratio over the past couple of years has been breathtaking, so if a bifurcation is occurring, it would provide a counterargument to the extreme stability hypothesis.

Tuesday, September 21, 2010

A chaotic toy

Here is a fun toy illustrating chaos--the three body problem.

Lots of fun. Here are some screenshots.

Try two bodies and see for yourself sensitivity to inital conditions.

Both planets start off with almost the same initial conditions, and as time progresses they drift farther apart. The rate of this drift (which would be described by the Lyapunov exponent) is tied to the inital mass you give Sun 1 (on the left). When the value is very low, the Lyapunov exponent is similarly low, and it takes a long time for the orbits to diverge.

This is a lot like a game we used to have years ago, except you would set the initial position and velocity vector for a number of celestial bodies around a central Sun. It was great fun having them collide, and vexatious trying to make a stable solar system.

Tuesday, July 20, 2010

First steps into complexity part 1

I will try to document some of my thinking as I moved from a standard mechanistic viewpoint of science to one that was more complex.

I have been involved in Quaternary climate studies since I began my MSc in marine geology at Memorial University of Newfoundland. There I worked with Dr. Ali Aksu ostensibly on a typical marine geology study of a sedimentary basin on the continental shelf of Nova Scotia, but I also spent some time pondering Quaternary climate change--in particular, the Milankovitch theory of astronomically driven climate change.

At first the problem was a straightforward technical problem--how to tease out the appropriate signals from marine records. In the course of background reading, I encountered a relatively unknown paper (at least by geologists), by E. N. Lorenz in Quaternary Research in 1976 (far more famous were his earlier works on nondeterminism in weather prediction). The QR paper presented alternative ideas concerning the fundamental architecture of the global climate system and challenged the geological community to test them and so determine the nature of climate change on the Quaternary timescale. Most of the literature of the time considered the climate system to be deteministic, while yet acknowledging that there were nonlinearities which complicated the whole thing--but it was clear that the nonlinearities were hoped to be local in nature and that they could be dealt with through a judicious series of fudge factors. Lorenz described three possibilities for climate: 1) a straightforward "transitive" system, in which the system outputs can be linked to the system inputs by a simple set of differential equations; 2) what he called intransitive (what we would now term multistability; i.e., a system as above but with different sets of differential equations operating at different times); and 3) what he termed "almost intransitive", and called "strange attractors" in other publications, and which we now refer to as simple chaos.

I say that the paper is poorly known as I have never seen any commentary on it. Nor, for many years, did there appear to be a clear attempt to distinguish among these different modes of operation. To be sure, there have been publications advocating any one of these modes (here and here), but most of these were attempts to show observations which supported the proposed mode, rather than using observations to test between the different modes. More recently, various climate models have been proposed in which the modal operation is taken as a given.

I finished my MSc., then shifted to University of Toronto to carry out Ph.D. research with Dr. Nick Eyles. My principal thesis was again a geological one which concerned itself with tectonic influence on the development of glaciated continental margins, using Eastern Canada and the Gulf of Alaska as contrasting examples. However I also devoted a lot of time to Lorenz's proposed problem of Quaternary climate change. My approaches to this problem followed several branches. The first was improved signal processing (mainly through alterations in Fourier transform, including attempts to use maximum entropy or other methods). The second was looking at other data sets. The third involved developing entirely new techniques for processing information.

This last approach very quickly came to absorb most of my spare time.

In 1990, the concept of fractals had been around for awhile, but its application in earth sciences was still very much leading edge (I was actually thinking of the first edition of this book). The push to educate earth science professionals had only just begun. At Scarborough there was a post-doc in geography who was trying to make a name for himself by publishing paper after paper in which he reported the fractal dimension of some geographical feature. He had published something like a dozen papers in a year, each of which I must assume, was very short.

The concept of nonlinear dynamics was also very cutting edge in earth sciences. I proposed teaching a course on the topic, going so far as to propose that we teach our own mathematics to earth science students, but the idea didn't go anywhere.

I had encountered an interesting idea in a paper by Imbrie and Imbrie, in which they proposed that it was not ice volume directly that responded to solar insolation, but the rate of change of ice volume. At the time this struck as me as a brilliant insight, and I immediately constructed a figure showing the connection between insolation in the northern hemisphere and the rate of change of ice volume calculated from first differences from a deep sea O-18 record.

Plot comparing insolation at 65N and the rate of change of ice volume from a deep sea O-18 isotope record. A panel from the ill-fated Paleoceanography paper described below.

I then had the idea of constructing a figure in which I plotted the inferred global ice volume against its rate of change, once again calculated from first differences. The graph would be a curve, in which each point would represent the "state" of the system at a particular time, and when all the points were plotted in sequence, a trajectory would be traced which should reflect the dynamics of the ice volume system.

Part of my first two-dimensional phase space reconstruction of global ice volume. The small numbers represent time in thousands of years before present (ka).

Points on the graph that lie above the x-axis represent intervals where ice volume is increasing, and ice volume is decreasing over the segments of the trajectory below the x-axis. The further from the x-axis, the more rapid the growth or retreat of global ice. The plot above shows the relatively slow advance of glaciers from the period beginning about 120 ky ago until about 20 ky ago, followed by rapid deglaciation.

What was immediately noticeable in observing the function over the past 500 thousand years was that there were particular areas on the graph to which the function seemed attracted. It moved very rapidly towards them, and tended to stay in them for long periods of time before rapidly moving to another. All of these regions plotted along the x-axis, and corresponded to particular volumes of global ice. The location made sense, because it implied that there were particular volumes of ice which were more stable than others. During the times when ice volume was stable, its rate of change must be low--hence it would be impossible to find a small region of attraction off the x-axis.

Now this is a phase space portrait, in two dimensions, using the time-derivative method (Packard et al., 1980). At the time I did this, I didn't know what to call it. I was certain it had been done before, but even in today's world of search engines, without knowing the terminology it is very difficult to find information. I knew that I was on to something, but didn't know what.

In the meantime, I had had another idea for testing climate records for multistability--at least this was a test to distinguish multistability from the transitive case using information theory (I didn't understand enough about simple chaos to devise a test for it). My approach was that if climate had one or more stable states, then there should be measurable differences in the information between the climate record (again the deep ocean O-18 isotopic record) and the driver (which was presumed to be northern hemisphere insolation). If there were multiple stable modes of climate, then the insolation would be encrypted, as if by a polyalphabetic key, and there would be a change in a particular quantity called the index of coincidenc, which is the likelihood that two randomly selected characters in a string of text are identical. There were challenges in applying this, not the least of which that it required that the data should be 'binned' and it was not at all clear how the bin size in the observed data stream should be linked to that of the northern hemisphere insolation. This work was presented at two conferences in 1991 and 1992, and was awarded a top student paper prize in 1991. But when I wrote the paper and submitted it to Paleoceanography, I overlooked one of the cardinal rules of scientific writing.

Always look like you know what you are doing.

I have always been fascinated by the intellectual process of the scientific endeavour. This fascination lead me to make a basic mistake in presenting my experiment and results. In the course of my work I had discovered what appeared to be a novel use for the process of autoencryption--by which I mean using the message as its own key in a polyalphabetic substitution cipher. The charming result is a coded stream that cannot be unambiguously decrypted even by an intended recipient who has been furnished with the key. Such a method of encryption, understandably, had no real application, and so the behaviour of the index of coincidence for this style of encryption was not well known. However I did not discover this until I was forced to come up with an explanation for a rise in the index of coincidence in the observed signals (compared with the presumed driver).

So I wrote the paper this way. Testable hypothesis with two possible outcomes, conduct experiment, find unanticipated outcome, explain why unanticipated outcome was left out of the original hypothesis, modify hypothesis, conclusion. The paper was rejected. It may have been accepted had I submitted the modified hypothesis as the original one, tested it, and reported a result. I had thought that the process of discovery would be interesting to others. In the case of peer-reviewed journals, this view was mistaken. In the course of revisions, I came to realize that the binning issues mentioned above were unresolvable, and reluctantly abandoned this approach, returning to the reconstructed phase space portrait.

Dust flux, Vostok ice core