The United States Geological Survey has collected continuous instantaneous time-series data, with intervals commonly ranging from 5-60 minutes. Historically, these instantaneous data have been processed into various daily values, such as the daily maximum, minimum and/or mean. This was done primarily to provide concise values for publication in paper reports. In more recent years, and particularly since the USGS began making real-time instantaneous data available on NWISWeb in 1994, more attention has been given to historical instantaneous data and USGS offices have received increasing requests for these data. Some challenges with meeting those requests are:
Most historical instantaneous data are paper based and were never stored on a computer or were deleted from computers after the computation of the daily values in order to save computer storage space. In most cases this data still exists as original field records, but it is a significant effort to create digital data from the paper-based records.
Instantaneous data have not historically received the same level of quality control as the official published daily values. For example, periods of fouling may affect the calculation of daily values for a water-quality parameter. In these situations, daily values are typically not published but erroneous instantaneous data remain in the database.
For more info on accuracy codes etc, see comments/README
Processing and quality control of this historical dataset was co-funded by an NSF supplement to the White Clay Creek LTREB project (NSF award 1052716) and by internal funds from the USGS PA Water Science Center. Coordination was supported by the CRB-CZO project (NSF award 0724971).
In developing a system to provide historical instantaneous data, the USGS has built a process that compares the available instantaneous data to the published daily values and assigns an accuracy code based on the result. Accuracy codes are as follows:
0 - The published daily mean* value for a water-quality parameter is zero.
1 - A daily mean* value for a water-quality parameter, calculated from the instantaneous data on this day, matches the published daily mean* within 1 percent.
2 - A daily mean* value for a water-quality parameter, calculated from the instantaneous data on this day, matches the published daily mean* from greater than 1 to 5 percent.
3 - A daily mean* value for a water-quality parameter, calculated from the instantaneous values on this day, matches the published daily mean* from greater than 5 to 10 percent.
*Subsequent to 1999 median is used as the comparative statistic for pH
If the daily mean (or median when appropriate) water-quality parameter calculated from the instantaneous values on a given day is greater than 10 percent different from the published daily mean or median, those instantaneous values are excluded from the archive. In addition, instantaneous data that corresponds to a daily mean or median value that was estimated are automatically excluded from the archive regardless of any comparison.
An additional classification is available at those limited sites where quality assurance of individual instantaneous data has been done. When this classification is used there is no comparison to the daily mean values.
9 - The instantaneous value is considered correct by the collecting USGS Water Science Center. A published daily mean value does not exist and/or no comparison was made.
It is important to note that, other than for classification code 9, the values available in the archive have not been individually reviewed and approved. They have been automatically compared against the published daily mean value and found to have no gross errors when used to compute a daily mean. Individual instantaneous values may still have significant error. For example a data spike might cause a given value to be off by several hundred percent, yet when combined with the other available values in the daily mean calculation the result might lead that spike to get added to the archive with an accuracy code of 2 or 3 because the impact on the daily mean is less than 5%. Users of the archive are thus strongly encouraged to review all data prior to use.
A cursory review of all processed water-quality data, the oldest of which date back to October 1985, took place prior to public distribution. Apparent erroneous data “spikes” were addressed in the following ways based on the best judgment of the reviewer:
a. Data were removed when determined to be an obvious error and the correct value could not be estimated based on adjacent data.
b. Data were corrected if adjacent data provided a reasonable degree of confidence as to what the correct value should be.
c. Data were left as is if the questionable value was at least minimally feasible, allowing the user of the data to make the final determination.
hydrology, stream temperature, stream chemistry, christina river
XML is in ISO-19115 geographic metadata format, compatible with ESRI Geoportal Server.
Citation for This Dataset
Citation for This Webpage
US Geological Survey (2007). "CZO Dataset: Christina River Basin - Stream Water Chemistry, Stream Suspended Sediment, Stream Water Temperatures, Groundwater Depth (1985-2007) - USGS." Retrieved 21 Apr 2019, from http://criticalzone.org/national/data/dataset/2506/
Data Use Policy
1. Use our data freely. All CZO Data Products* except those labelled Private** are released to the public and may be freely copied, distributed, edited, remixed, and built upon under the condition that you give acknowledgement as described below. Non-CZO data products — like those produced by USGS or NOAA — have their own use policies, which should be followed.
2. Give proper citation and acknowledgement. Publications, models and data products that make use of these datasets must include proper citation and acknowledgement. Most importantly, provide a citation in a similar way as a journal article (i.e. author, title, year of publication, name of CZO “publisher”, edition or version, and URL or DOI access information. See http://www.datacite.org/whycitedata). Also include at least a brief acknowledgement such as: “Data were provided by the NSF-supported Southern Sierra Critical Zone Observatory” (replace with the appropriate observatory name).
3. Let us know how you will use the data. The dataset creators would appreciate hearing of any plans to use the dataset. Consider consultation or collaboration with dataset creators.
*CZO Data Products. Defined as a data collected with any monetary or logistical support from a CZO.
**Private. Most private data will be released to the public within 1-2 years, with some exceptionally challenging datasets up to 4 years. To inquire about potential earlier use, please contact us.
Data Sharing Policy
All CZO investigators and collaborators who receive material or logistical support from a CZO agree to:
1. Share data privately within 1 year. CZO investigators and collaborators agree to provide CZO Data Products* — including data files and metadata for raw, quality controlled and/or derived data — to CZO data managers within one year of collection of samples, in situ or experimental data. By default, data values will be held in a Private CZO Repository**, but metadata will be made public and will provide full attribution to the Dataset Creators†.
2. Release data to public within 2 years. CZO Dataset Creators will be encouraged after one year to release data for public access. Dataset Creators may chose to publish or release data sooner.
3. Request, in writing, data privacy up to 4 years. CZO PIs will review short written applications to extend data privacy beyond 2 years and up to 4 years from time of collection. Extensions beyond 3 years should not be the norm, and will be granted only for compelling cases.
4. Consult with creators of private CZO datasets prior to use. In order to enable the collaborative vision of the CZO program, data in private CZO repositories will be available to other investigators and collaborators within that CZO. Releasing or publishing any derivative of such private data without explicit consent from the dataset creators will be considered a serious scientific ethics violation.
* CZO Data Products. Defined as data collected with any monetary or logistical support from a CZO. Logistical support includes the use of any CZO sensors, sampling infrastructure, equipment, vehicles, or labor from a supported investigator, student or staff person. CZO Data Products can acknowledge multiple additional sources of support.
** Private CZO Repository. Defined as a password-protected directory on each CZO’s data server. Files will be accessible by all investigators and collaborators within the given CZO and logins will be maintained by that local CZO’s data managers. Although data values will not be accessible by the public or ingested into any central data system (i.e. CUAHSI HIS), metadata will be fully discoverable by the public. This provides the dual benefit of giving attribution and credit to dataset creators and the CZO in general, while maintaining protection of intellectual property while publications are pending.
† Dataset Creators. Defined as the people who are responsible for designing, collecting, analyzing and providing quality assurance for a dataset. The creators of a dataset are analogous to the authors of a publication, and datasets should be cited in an analogous manner following the emerging international guidelines described at http://www.datacite.org/whycitedata.