In general, the step detection algorithm has to extract unknown steps of height given an unknown mean value of the signal, . Based on the mean shift model given by
with the step function and the emission time , a statistical treatment of this problem is possible using the bootstrapping and cumulative sum (BCSUM) algorithm [170].
The BCSUM detection algorithm has the ability of detecting either positive or negative steps in a data set. From the BCSUM procedure the decision that either a discrete step in the underlying data samples exists or not is obtained by selection of a detection sensitivity parameter . In the following the cumulative sum charts [167, 168] are discussed, the bootstrapping mechanism is introduced and finally the advanced detection method is presented.
First consider a series of independent measurement points with the number of samples. The signal distributions before and after a change in the signal mean are given by and , respectively, with their corresponding means and , where is the step height. Introducing the log-likelihood ratio given by
a change in the signal mean is reflected by a change of the sign
of the mean value of the log-likelihood ratio [168] of the data set given by
and is illustrated in Figure 10.1.
Together with the assumption of normally distributed samples with the probability density function
the log-likelihood ratio for and is
and the change point position is at the sample with index
In the case of trapping events, a negative mean shift is obtained and the log-likelihood ratio for reads
with the change point at the sample index position
Since the change of the mean is unknown the detection of either positive or negative changes has to be provided by the algorithm. Combining both cases, and , the log-likelihood ratio for normally distributed samples can be formulated as
where the threshold value already considers the prefactor . With the choice of the CSUM chart function is
The sample index of the change point is given by
A frequently used method for parameter estimation of a set of data samples for unknown underlying distributions is bootstrapping. Its big advantage is that it is fully automatic and it does not matter how complicated the mathematical model for the probability distribution is.
For a given set of samples of length a bootstrap sample is obtained by randomly choosing samples from with replacement. Since bootstrapping is a resample technique with replacement, the samples out of can occur never, once or more often in the bootstrap estimate [171].
The statistical nature of bootstrapping includes the necessity of a huge number of bootstrap samples . This circumstance inevitably leads to computationally expensive procedures.
The BCSUM algorithm is a Monte Carlo based combination of bootstraps and cumulative sum charts to detect changes in the signal mean. The divide and conquer procedure to detect one change point in the measurement data set is:
1. Calculate the bootstrap samples from .
2. Create the cumulative sum charts for each bootstrap sample with .
3. Estimate the spans of each chart and collect them into a sorted list from the minimum to the maximum .
4. With the detection sensitivity the decision threshold is obtained as , where denotes the list index operator. For a data set the list index operator is
5. Create the CSUM chart from the initial data set and evaluate
6. A change point in the underlying sequence occurs if . Split the data set at the change point position and recursively continue with 1) for each subsequence until all change points are detected.
Critical parameters for the detection outcome of the BCSUM algorithm is the number of bootstraps and the detection sensitivity . For statistical relevance, the bootstrapping and the calculation of has to be performed many times, is recommended.
A choice of means that just the most dominant changes in mean are detected. For very fast events, which often just span two or three measurement data samples, the detection sensitivity is a crucial parameter. To extract small and very fast changes is recommended. Moreover, the position of the change point in the measurement data is directly accessible via the CSUM chart obtained from the initial data series.
Last, but not least it has to be noted that the algorithm can be applied to uniformly and non-uniformly sampled data as well. This property is very important in the context of the TDDS. To cover several decades in time a compromise between the sampling rate and the accumulated recovery time is necessary in order to achieve a feasible amount of measurement data.
« PreviousUpNext »Contents