Welcome to the Time Series Toolbox
The Time Series Toolbox enables users to perform preliminary analysis on either user uploaded time series data or preloaded United States Geological Survey (USGS) streamflow gage data. Without programming expertise, users can deploy streamlined analysis pipelines, uncovering previously hidden data patterns and rapidly moving from data acquisition to analytic insight. Thus, the tool enables more consistent, repeatable, and efficient time series analysis.
This web tool applies various statistical tests to facilitate a better understanding of the data. In particular, the tool can detect nonstationarities in the historical record to help the user segment the record into datasets whose statistical properties can be considered stationary. The tool also allows users to condUsers can also explore their data through three different time series models: Auto-Regressive Integrated Moving Average (ARIMA), Exponential Smoothing (ETS), and Linear Models (TSLM). These models can be applied in the forecasting, error handling, interpretation, and decomposition of climate data.
This tool was developed in conjunction with USACE Engineering Technical Letter (ETL) 1100-2-3, Guidance for Detection of Nonstationarities in Annual Maximum Discharges, to detect nonstationarities in maximum annual flow time series. Per this ETL 1100-2-3, engineers will be required to assess the stationarity of all streamflow records analyzed in support of hydrologic analysis carried out for USACE planning and engineering decision-making purposes.
This functionality is contained within four different sheets:
Explore Data - This sheet allows users to select a) their own data or b) preloaded streamflow datasets. From there, the user can visualize that data for immediate inspection and evaluation, with further exploration on the Data Summary, Magnificent Seven, and Seasonality tabs if applicable. If USGS preloaded data is selected, the user can also visually confirm the location of a gage.
Model-Based Analysis - The Trend Analysis sheet utilizes two different statistical methods for trend detection and outputs the corresponding trend line coefficients and trend hypothesis tests for significance.
Nonstationarity Detection - The Nonstationarity Detector sheet uses a dozen different statistical methods to detect the presence of both abrupt and smooth nonstationarities in the period of record.
Time Series Modeling - For deeper inspection, the last tab fits a time series model to the uploaded data. Users can choose between a Time Series Linear Model, Auto Regressive Integrated Moving Average Model, and Exponential Smoothing Model, extracting both model fit statistics and the model’s forecasts for the uploaded data.
Please acknowledge the U.S. Army Corps of Engineers for producing this time series analysis tool as part of their progress in climate preparedness and resilience and making it freely available.
Data updated as of August 2023
If you have any questions or comments, please let us know by contacting our team: firstname.lastname@example.org
Tool Updates, September 2023
Addition of Monthly Preloaded Streamflow Data - The tool now includes preloaded monthly streamflow and gage height data. This data is directly pulled from the USGS Water Services API, so these values will reflect up-to-date data from USGS.
Enabled Seasonally Decomposed Components - Users can now directly analyze seasonally decomposed components (trend, random, seasonality) in the tool rather than having to download and re-upload the data.
1. Select Data Source
2. Select Search Method
2. Upload Data Set
Define the path to the file you want to upload. It should be a csv file with two columns, the first of which is the date vector (mm/dd/yyyy, mm-dd-yyyy, or yyyy) and the second of which is the data for analysis. The first row should be column headers.
3. Apply Preprocessing Methods
Time Series Data Table
Magnificent Seven Analysis
Magnificent Seven Description
As outlined in Archfield et. al (2013), the Magnificent Seven summary statistics are effective in data characterization and classification. Please note: the values below are linear combinations of order statistics (L-moments) rather than ordinary moments. Sample L-moments are defined in Hosking (1990).
1. L-Mean - The average value of the data--a measure of location. For reference, the normal distribution has an L-Mean of 0. Calculations use the algorithm given in Hosking (1996, p.14).
2. L-CV (Coefficient of L-Variance) - The ratio of standard deviation to the mean of the data. This coefficient takes values between 0 and 1. If the mean of the data set is zero, the coefficient of variation will approach infinity and hence cannot be calculated. Please see Hosking 1990 and 1996 for more information.
3. L-Skewness - The measure of asymmetry of the probability distribution. A normal distribution has a skew of zero, while a lognormal distribution, for example, would exhibit some degree of right-skew. L-skewness ranges from 0 to 1, with values greater than 0.300 indicative of large skewness. Please see Hosking 1990 and 1996 for more information.
4. L-Kurtosis - The measure of tail density of the probability distribution (i.e., how much of the distribution is contained in the tails). For reference, a uniform distribution has an L-Kurtosis of 0, while a normal distribution has one of 1/6. Please see Hosking 1990 and 1996 for more information.
5. AR1 - The autoregressive lag-one correlation coefficient (i.e., how predictive the previous value in the time series is of the next value). Long term monthly means are used to deseasonalize the data. The code normalizes the data and then applies a first order auto-regression function using the Yule Walker Method. Values can be positive or negative.
6. Amplitude - A measure of the best fitting, annual sinusoidal curve height. Amplitude values are always positive numbers (e.g., 4, 1.5, 108). First, flows are standardized, then fitted to the linear model: cos(2*pi*t)+sin(2*pi*t) The final value is calculated as sqrt(cos(2*pi*t)^2+sin(2*pi*t)^2).
7. Phase - The measure, in radians, of the angle of the best fitting, annual sinusoidal curve at time zero. Using radians, each of the values will be between −π and π. The same pre-processing steps used for calculating Amplitude are used to calculate Phase. However, the final value is calculated as arcTan(-sin(2*pi*t)/cos(2*pi*t)).
1. Archfield, S. A., Kennen, J. G., Carlisle, D. M., and Wolock, D. M. (2014), AN OBJECTIVE AND PARSIMONIOUS APPROACH FOR CLASSIFYING NATURAL FLOW REGIMES AT A CONTINENTAL SCALE, River Res. Applic., 30, 1166- 1183, doi: 10.1002/rra.2710
2. Hosking, J. R. M. (1990). L-moments: analysis and estimation of distributions using linear combinations of order statistics. Journal of the Royal Statistical Society, Series B, 52, 105-124
3. Hosking, J. R. M. (1996). Fortran routines for use with the method of L-moments, Version 3. Research Report RC20525, IBM Research Division, Yorktown Heights, N.Y.
Seasonal Cycle Graph
The seasonal cycle graph shows aggregated monthly data throughout all of the years. This data includes monthly minimum, maximum, and average values. These values represent variance throughout the years for a particular month.
Seasonal Cycle Data
Use this page to detect the presence and severity of trends. It is recommended to have at least 10 years in the analysis periods. Please update years first and then perform analysis on your data.
Trend Line Coefficients
Trend Hypothesis Test
Seasonal Decomposition: This tool uses a series of statistical methods to identify and define seasonal patterns in the data. These techniques take into account underlying trends in the data, as well as noise and natural variability
Define decomposition analysis:
Seasonal Decomposition of Uploaded Data
Nonstationarity Analysis: This tool uses statistical testing to detect the presence of nonstationarities in the uploaded data. These tests examine the data for nonstationarities (or changes) in the data mean, variance, or distribution.
Large Datasets and Computational Complexity: There are three computationally expensive tests: Energy Divisive, Smooth Lombard Wilcoxon, and Smooth Lombard Mood. When selecting the checkbox to remove computationally expensive tests, these tests will be removed from analysis and focus on the remaining nine methods. For reference, data of size 2,000 entries will take approximately 3 minutes with all tests running.
*** If nonstationarities are not all appearing on the heatmap that appear in the time series graphic, please try expanding the graph by enlarging the application screen.
The USGS streamflow gage sites available for assessment within this application include locations where there are discontinuities in USGS peak flow data collection throughout the period of record and gages with short records. Engineering judgment should be exercised when carrying out analysis where there are significant data gaps.
In general, a minimum of 30 years of continuous streamflow measurements must be available before this application should be used to detect nonstationarities in flow records.
(Sensitivity parameters are described in the manual. Engineering judgment is required if non-default parameters are seletected)
Test for Breakpoints
Approach: This tool uses linear regression, and the analysis of model errors with hypothesis testing, to identify points in the data that reflect sharp changes in behavior, suggesting the need for segmented analysis.
Missing Values: Breakpoint analysis models typically struggle to handle missing values. To facilitate robust analysis, this function concatenates data across missing values to smooth out gaps in the data and accurately detect breakpoints. The final visualization and analysis maintains your original selected approach to handle missing values, but the underlying methodology reduces that possibility that missing values do not misalign analytic insights.
Trend Significance: To aid in the understanding of the breakpoints, this tool has the option to show segment trend lines. For these individual segments, it also provides significance testing using the t-test. Within the datatable, the slope value will have an * or ** if the p-value of these tests shows significance of trend (at the alpha 0.05 (*) and 0.01 level (**)).
Metrics used in determining optimal breakpoints
the residual sum of squares. The residual value is a measure of how far the regression line is from the original data. This term is used to measure the amount of variance in a data set that is not explained by a regression model itself. Both RSS and BIC are directly used in the selection of breakpoints.Bayesian Information Criterion (BIC):
an index, based on Bayesian statistics, that is used to determine what model is best for a given dataset. In this case, the criterion helps determine an optimal number of structural breaks. The BIC adds a penalty term, which favors more parsimonious models over more complex models. This penalty term helps prevent overfitting.Root Mean Square Error (RMSE):
the standard deviation of the residual values. The residual value is a measure of how far the regression line is from the original data.
Breakpoint Segment Details
Build Time Series Models
This section helps users determine the appropriate time series model by using techniques that control for seasonality, trend, and nonstationarities and visualizing outputs.