INTERIM GUIDANCE FOR DATA PUBLICATION

The CZ Hub Team is currently developing a Data Submission Portal to meet the needs of the CZ Collaborative Network. The Data Submission Portal will provide guidance regarding which data repositories into which products should be deposited and will also provide functionality that allows submission of data to the appropriate repository through the Portal. Once the Portal is operational, we recommend that all data and research products be submitted to the appropriate repository through the Portal. While the Portal is under development, we provide the following interim guidance for submitting data.

If you need help with data management or have questions about the guidance provided here, you can contact us in a number of ways.

Join Us in the CZNet Slack Workspace

The CZNet Slack workspace was established to foster communication between the thematic clusters and with the CZNet Hub team. If you haven’t joined the CZNet Slack workspace, we encourage you to join us there. There are specific channels set up within the workspace for topical discussions. If you would like to join the CZNet Slack workspace, please email us at cznet@cuahsi.org.

Contact Us Directly via Email

For specific questions about CZNet data management, please contact us at cznet@cuahsi.org. Be sure to include “CZNet” in the subject line of the email you send. We will route these emails to the person(s) who we think can best answer your questions.

Our initial focus in developing the CZ Hub Data Submission Portal and for this interim data publication guidance will be on the repositories listed below. We anticipate adding support for additional repositories via the Data Submission Portal as resources allow us to add them. If the supported repositories listed below will not meet your needs, choose a trusted repository that follows international best practices by offering:

  1. A public landing page for your dataset/product.

  2. A unique identifier/digital object identifier (DOI) and URL for accessing the landing page for your dataset/product.

  3. A formal citation for your dataset/product.

You can find directories of data repositories at https://www.re3data.org and https://fairsharing.org.

We provide the general guidance with regard to supported repositories below, but we are happy to discuss any questions you may have and make more specific recommendations (contact us):

  1. HydroShare (http://www.hydroshare.org): A general purpose repository for submitting water-science related datasets and models. Submit datasets, models, and other research products that are water-related. This includes time series of hydrologic observations, time series of data from weather stations, geospatial datasets, etc. HydroShare is flexible and allows upload of files having any format. HydroShare also includes a linked JupyterHub server if you wish to upload executable Jupyter Notebooks using Python or R with your content. For information about how to submit to HydroShare, you can access HydroShare’s help system at https://help.hydroshare.org.

  2. EarthChem (http://www.earthchem.org): A repository for submitting data derived from material samples such as soil, sediment, pore water, or rock specimens; cores; and other physical objects. Primary focus is on geochemical, geochronological, petrological, and mineralogical data. EarthChem strongly recommends use of data submission templates available on the EarthChem web site (https://earthchem.org/ecl/templates/) that provide guidance for properly documenting data quality and provenance. Though data can be contributed to the EarthChem Library in a wide range of formats (https://ecl.earthchem.org/fileformat.php), it always needs to be documented with relevant information regarding the analytical data quality. Submission guidelines for EarthChem can be accessed at https://www.earthchem.org/ecl/submission-guidelines/. For help, go to https://earthchem.org/resources/support/earthchem-library-documentation/ or write to info@earthchem.org.

  3. Zenodo (http://www.zenodo.org): A catch-all repository that may be an appropriate place to upload content that is not appropriate for the other repositories. Zenodo does not limit what you can upload and is not domain specific. For more information about Zenodo and submitting data, see https://zenodo.org/record/787445.

  4. SESAR (https://www.geosamples.org): A sample registry that catalogs sample metadata, sample images, and other information to make samples more discoverable, accessible, and reusable. SESAR allows you to get an IGSN for your samples, a globally unique identifier that is essential for unambiguously citing samples in datasets and publications (https://www.igsn.org). Use SESAR to submit metadata about your samples and obtain IGSN Global Sample Numbers. Resources for researchers can be accessed at https://www.geosamples.org/resources/researchers, and tutorials and FAQs can be accessed at https://www.geosamples.org/resources/help.

  5. OpenTopography (https://opentopography.org): A repository that facilitates community access to high-resolution, Earth science-oriented, topography data and related tools and resources. Submit high resolution topography data acquired with lidar and other technologies. A tutorial describing how to submit data to OpenTopography can be accessed at: https://cloud.sdsc.edu/v1/AUTH_opentopography/www/docs/CommunityDataspaceTutorial.pdf.

If you choose to submit your data or research products to a repository other than the ones listed above, we will still need to know about those submissions.

Please see our specific guidance for CZNet data and research products for information on what to do if you submit to a different repository.

We offer the following general guidelines for submitting your data and other research products:

  1. Start assembling your data now: We recommend that you begin thinking about which products (e.g., data, images, samples, code) you will generate. Assemble and document them as early as possible after generating them (it will save you time later), and register them within one of the above repositories. You need not wait until your data are ready for formal publication. HydroShare, EarthChem, and Zenodo all offer the ability to upload private content, allowing you to capture and store your data safely while your research and publication process is ongoing. Then share and publish when you are ready. Register your samples online in SESAR.

  2. Tag your data with appropriate metadata: Use the repository’s submission form and templates to enter descriptive metadata for your products. Metadata should include information needed to discover and cite your products, but also additional information required to interpret your content (e.g., procedures used to collect or analyze samples or observations).

  3. Use common, open, and accepted data formats: We know that the data you are creating are diverse and that there is not always specific guidance for how to organize your data and which file formats to use. We have provided some links to general data management resources in the section at the end of this document. If you have questions about what to do, please contact us.

  4. Include a “readme” file: A readme file can include detailed information about the structure or content of your data or research product. It can also be used to describe how to perform specific analyses to make your research results more reproducible. If you upload a “readme.txt” or “readme.md” file to your resource in HydroShare, it will be displayed on the landing page for your resource.

We have included links to additional documentation that may assist you in preparing your datasets and research products for sharing and publication. We are always happy to discuss specific questions you may have (contact us).

The CZ Hub Team will be working to provide a coordinated view of all CZNet data/research products submitted to reputable data repositories via cataloging and data discovery functionality. We will be using some specific information to find and catalog resources submitted to the different repositories (see below). If the information we request below is missing, we may not be able to find your submitted resource and include it in the CZNet metadata index and discovery tools.

NOTE: When complete, the Data Submission Portal will guide you to provide the following information to be entered for all datasets and products submitted through the Portal. However, we know that some may choose to submit directly to a repository without using the Data Submission Portal, so it is important to specify this information regardless of how or when you submit your data.

To accomplish this, we ask that you do the following, regardless of which repository you deposit your data in:

  1. Tag your dataset/product with the subject keyword “CZNet”.

  2. Ensure that you enter funding information, including funding agency (National Science Foundation), award number, and, if possible given the metadata elements available, award title.

  3. Make sure that your dataset is publicly available. HydroShare, EarthChem, and Zenodo enable private content. We may not be able to discover and catalog your datasets/products until you have made them publicly available.

Guidance for HydroShare, EarthChem, and Zenodo

We will be using “community” and “group” functionality within the different repositories to help group and present data from the cluster projects. When you submit to HydroShare, EarthChem, or Zenodo, we ask that you associate your dataset/product with an appropriate Group or Community, depending on which repository you submit to:

  1. HydroShare - share your dataset/product with your Cluster’s HydroShare Group. We have created a HydroShare “Group” for each thematic cluster project. You can share your HydroShare resource with your Group by clicking on the “Manage Access” button on the resource’s landing page and giving view or edit access to your cluster’s group. This can be done at any time after you create your resource in HydroShare.

  2. EarthChem - associate your dataset/product with the “Critical Zone” community within EarthChem. See https://earthchem.org/communities/cznet/

  3. Zenodo - associate your dataset/product with the “Critical Zone Data and Research Products” community (https://www.zenodo.org/communities/czdata/). When you create a new upload, search for “czdata” or “Critical Zone Data and Research Products” to find the correct community.

Registering Samples with SESAR

If you have physical samples, you should register them with the System for Earth Sample Registration (SESAR) at https://www.geosamples.org/.

Guidance for Data/Products Submitted to Other Repositories

We know that our list of supported repositories may not meet the needs of all of the cluster projects for all of the different types of data products. If you choose to submit your data/product(s) to a repository other than the ones in our list of supported repositories, we will be unable to discover and catalog those products unless you tell us about them.

When the CZ Hub Data Submission Portal is complete, we will provide functionality for registering products submitted to other repositories so that we can find them. In the interim, we encourage you to maintain a list of URLs/citations for those datasets/products so that they can be entered when that functionality becomes available. This will only be required for datasets that are NOT submitted to HydroShare, EarthChem, or Zenodo.

We strongly recommend creating formal linkages between the datasets you have deposited within the above mentioned repositories and any manuscripts that you submit for publication that use or are based on the data. This ensures that anyone who discovers the paper can link to and access the data used and anyone who discovers the data can link to and access publications that have used the data.

After depositing your datasets within one of the above repositories we recommend formally citing the data within any research manuscripts based on the data that you submit for publication. Recommendations in the Author Guide for the journal to which you submit may determine how datasets are cited in the text of the paper. In the absence of specific instructions from the journal, we recommend citing data within the text of the paper using the journal’s citation style (e.g., Author(s), Year) and including a full bibliographic citation to the dataset within the references section of the paper.

Once a paper has been accepted for publication and a digital object identifier (DOI) or citation has been issued for the paper, you should modify the metadata for the dataset in the repository to include a link to the paper as a “related resource.”

The following references are provided for publications that you may find useful in considering the data management needs of your thematic cluster as well as cross-cluster and other collaborations:

Guidelines for Structuring and Formatting Data

Borer, E.T., Seabloom, E.W., Jones, M.B., Schildhauer, M. (2009). Some simple guidelines for effective data management, Bulletin Ecological Society of America, 90(2), 205-214, https://doi.org/10.1890/0012-9623-90.2.205

Broman, K. W., & Woo, K. H. (2018). Data Organization in Spreadsheets. The American Statistician, 72(1), 2–10. https://doi.org/10.1080/00031305.2017.1375989

Goodman, A., Pepe, A., Blocker, A. W., Borgman, C. L., Cranmer, K., Crosas, M., Di Stefano, R., Gil, Y., Groth, P., Hedstrom, M., Hogg, D. W., Kashyap, V., Mahabal, A., Siemiginowska, A., & Slavkovic, A. (2014). Ten Simple Rules for the Care and Feeding of Scientific Data. PLoS Computational Biology, 10(4), e1003542. https://doi.org/10.1371/journal.pcbi.1003542

Hart, E. M., Barmby, P., LeBauer, D., Michonneau, F., Mount, S., Mulrooney, P., Poisot, T., Woo, K. H., Zimmerman, N. B., & Hollister, J. W. (2016). Ten Simple Rules for Digital Data Storage. PLOS Computational Biology, 12(10), e1005097. https://doi.org/10.1371/journal.pcbi.1005097

Wickham, H. (2014). Tidy Data. Journal of Statistical Software, 59(10), Article 10. https://doi.org/10.18637/jss.v059.i10


Guidelines for Citing Data

Colavizza, G., Hrynaszkiewicz, I., Staden, I., Whitaker, K., & McGillivray, B. (2020). The citation advantage of linking publications to research data. PLOS ONE, 15(4), e0230416. https://doi.org/10.1371/journal.pone.0230416

Guidelines for Making Data More Reusable

White, E., Baldridge, E., Brym, Z., Locey, K., McGlinn, D., & Supp, S. (2013). Nine simple ways to make it easier to (re)use your data. Ideas in Ecology and Evolution, 6(2), Article 2. https://doi.org/10.4033/iee.2013.6b.6.f


Guidelines for Selecting a Data Repository

Sansone, S.-A., McQuilton, P., Cousijn, H., Cannon, M., et al (2020). Data Repository Selection: Criteria That Matter. Zenodo. https://doi.org/10.5281/zenodo.4084763