Tree ring reconstruction procedures using cross validation techniques

Document Type


Publication Date



Principal Components Analysis (PCA), has been extensively used for reconstructing past climatic variability by regressing the components derived from tree-ring chronologies onto the climatic variable (i.e. streamflow). However, this technique allows several procedural choices that may result in very different reconstructions. For this reason, proper validation statistics are needed to select the model having the least predicting error with the fewest number of variables, and to insure the necessary connection with the physical processes modeled. Including too many variables may result in the problem of "overfitting" the model, making it able to predict even the smallest variations of the observed data, but with a low predictive skill. This leads us to reconsider if the coefficient of determination (R square) and the mean square error (MSE) are sufficient conditions to optimize the selection of the model that most accurately predicts climatic variability. Independent testing or cross validation may be a better way to identify the model that best represents the physical processes using the minimum number of variables (a parsimonious model), and that reduces the influence of unrelated noise. This study uses 17 tree-ring index chronologies from different sites in the upper Colorado River Basin to predict streamflow at Lee's Ferry, the legal point separating the upper and lower Colorado River Basins. Combinations of up to eight variables with the lowest cross-validation standard error were selected from these seventeen chronologies. This procedure was repeated using rotated components, using unrotated components, using all the modes, and by pre-selecting the modes with eigenvalues higher than one (but guaranteeing that 85% or more of the total variance was explained). For each of the four cases, two approaches for selecting the modes were checked. The first approach simply lets the stepwise regression select the modes that were significant using a t-test. The second approach started by selecting the first mode, then subsequent modes were selected one by one until one of the modes did not pass the t-test (skipping components was not allowed). Before definitely accepting a mode, the sign of the correlation between the predictand and a particular predictor had to match the sign of the coefficient for that variable in the final regression. The results show that cross validation is a good tool for determining the most parsimonious model, having a low MSE, and retaining consistency with the underlying physical processes.


Colorado River Basin; Hydroclimatology; Instruments and techniques; Paleoclimatology; Runoff prediction; Streamflow modelling


Climate | Environmental Sciences | Fresh Water Studies


Hugo Hidalgo a.k.a. Hugo Hidalgo-Leon

Presented at the American Geophysical Union, 1998 Fall Meeting, San Francisco, California, December 6-10.


Use Find in Your Library, contact the author, or use interlibrary loan to garner a copy of the article. Publisher copyright policy allows author to archive post-print (author’s final manuscript). When post-print is available or publisher policy changes, the article will be deposited