Hi Roger,
Here is a copy of the eventslide on impulse events. The plot doesn't have legends so it's impossible to read.
For your info, it contains the following:
[I've copied this note to Dick, Dean and York, since they might be interested]
Eventslide analysis.
The eventslide looks at the cumulative Z-score for events versus a time offset of the formal event examination periods. A negative timeshift, T<0, means that events are calculated using an earlier starting time for the event. A positive shift means the event is given a later start time than the formal prediction. The durations of the events are unchanged from the formal predictions in these calculations. A timeshift T, then, means that all events are given start times offset by T from the original predictions. Looking at successive shifts is like "sliding" the event periods in lockstep over the data.
The eventslide analysis is a way to judge qualitatively how sensitive the cumulative Z-score of events is to the exact prediction periods. One possibility is that the predictions merely guess or estimate the times when the data actually deviate anomalously. In this case, one would expect that sliding the event periods in time would give a gradual rise and fall of the Z-score. The width of the rise/fall would give some indication of the "real" width of events.
An alternate possibility is that the cumulative Z-score is due to an anomalous intuition on the part of experimenters. In this case the event periods are guesses about fluctuations in random data and a simple model would say that sliding the event start times should quickly extinguish the Z-score.
An early look is shown in the plot eventslide2.gif. The time offset, T, steps in 1/2 hour intervals in this plot. The Z-score is 3.6 at T=0 and after a single step all significance is lost. Timesteps of 5 minutes give a similar result.
One could take this as an indication that an intuition model is favored. Certainly a model that assumes anomalous data deviations would have a difficulty explaining why the formal predictions should be so finely tuned to the data anomalies - especially since most of the prediction periods are rounded to whole hours. However, it is difficult to make this interpretation because the Z-score is calculated using events that range from 1 minute to 24 hours in length and a variety of recipes are applied, as per the formal prediction specifications.
To address these difficulties the eventslide analysis is refined in two ways:
1. Events of 1/2 hour or less and events longer than 1 day are removed from the event set.
2. A uniform recipe (the so-called 'deviation statistic') is applied to the events.
Applying these simplifications excludes about 30 events. The reduced set has approximately 145 events and retains enough significance for the analysis. (The cumulative Z-score for the subset increases to about 4.0, using formal recipes, and z = 3.7 obtains using the deviation statistic).
The top panel in the plot ImpulseSlide.gif compares the formal (red) and uniform (blue) recipes applied to this subset of events. The cumulative Z-score is calculated for timesteps of 15 minutes. [When studying the plots, remember that z-scores magnitudes less than about 2 are not significant. It is helpful to compare right and left hand panels].
The formal (red) trace in the top panel suggests that, although the slide analysis still shows a dominent spike at T=0, there is now some additional weight for offsets within 2 hours of T=0 when compared to the full event set in the previous graphic. Interestingly, the slide width of +- 2 hours is more prominent when the uniform deviation statistic is applied (blue trace). Folding the deviation statistic and reduced event set into the eventslide analysis suggests that there is a width of about 2 to 4 hours to the slide that was not evident in the first eventslide analysis. This favors the view that anomalous deviations occur in the data and that event predictions approximately frame these periods (although this is not to say that one couldn't make a model in which an experimenter effect accounted for the eventslide width).
Ideally, the eventslide analysis would use events that had equal lengths and unambiguous start times. It is thus interesting to identify the 55 events whose start points are both unforseeable and clearly defined. These "impulse" events include earthquakes and terrorist attacks, for example, but not events with immediate anticipation such as the death of the pope or the conclusion of a trial or other contest. The middle and bottom panels of the figure show the slide analysis for the subsets of impulse and non-impulse events.
The middle panel is the slide calculation using the formal recipes. The black trace is the set of impulse events and the red shows the slide analysis for the remaining events. The traces show a width as before, but there is not enough significance to examine the difference between the traces.
The bottom panel shows the same slide analysis using the uniform deviation statistic. Here an interesting and marginally significant trend obtains. The impulse events (black) and non-impulse events (blue) both show slide widths of 2-4 hours. However, the impulse events have weight shifted to early times whereas the non-impulse events have slide weight symmetrically distributed about T=0.
If the slide width is due to inexact guesses of deviation periods, the width should be symmetric about T=0 for events with ambiguous start and end points. This is the situation for the non-impulse events. The T=0 symmetry should be broken when the start period of the event is clearly defined. In this case, the data deviation and the event prediction are both related to a known moment in time. The expected shape of the slide width should be positively skewed (rising more sharply at negative timeshifts and tailing off at positive shifts). Qualitatively, this is seen in the impulse (black) trace, but the statistics are very weak [modeling the slide would allow a probability value to be assigned to the impulse slide skewness]. The impulse trace is also displaced toward negative timeshifts by 1-2 hours. This is an indication that the data deviations tend to start before the event start times. This would indeed be a surprising result, but there is a precedent in the September 11, 2001 analysis which found a strong statistical departure in one statistic roughly 3 hours before the terrorist attacks began. Stronger statistical evidence will be needed to study further the possibility of predictor data deviations.