“BiG CZ” Scientific Software Integration Proposal Submitted to NSF

19 Mar 2013

SI2-SSI: The community-driven BiG CZ software system for integration and analysis of bio- and geoscience data in the critical zone.

Fourteen participants from the sucessful CZ-EarthCube Domain Workshop (Jan. 21-23, 2013) took on the charge from all 103 workshop participants the task of translating the workshop vision into a proposal to NSF Office for CyberInfrastrucure's (OCI) Scientific Software Integration (SSI) solicitation.  The mandate came from a strong and nearly unanimous vision that was developed in breakout groups

The central scientific challenge of the critical zone science community is to develop a “grand unifying theory” of the critical zone through a theory-model-data fusion approach to answer:

  1. How do tectonics, lithology, climate and biology co-determine the evolution of critical zone structure and function?;
  2. What are the drivers of energy and material fluxes (i.e. water, sediment, carbon, nutrients, solutes, etc) moving through the critical zone?;
  3. How will critical zone structure, function and evolution respond to human and natural disturbances and over various time and spatial scales?

After much hard work over an intense few weeks, the proposal team was proud to submit on behalf of the CZ community a proposal entitled "SI2-SSI: The community-driven BiG CZ software system for integration and analysis of bio- and geoscience data in the critical zone."  Read the full text of the proposal>>

Vision for the Proposed BiG CZ Software System

Our Overall Goal is to co-develop with the critical zone science community a web-based integration and visualization environment for joint analysis of cross-scale bio and geoscience processes in the critical zone (BiG CZ), spanning experimental and observational designs.

This BiG CZ software stack would consist of the BiG CZ Portal web application, the BiG CZ Toolbox, and the BiG CZ Central software infrastructure that together enable interoperability for cross-scale multi-modal discovery, visualization, access, and publication of data through a simple yet powerful user interface and open APIs. 

Objective 1. Engage the CZ and broader community to co-develop and deploy the BiG CZ software stack to meet their specific needs through a CZ community advisory committee and a series of co-design and training & testing workshops

Objective 2. Develop the BiG CZ Portal web application for intuitive, high-performance map-based discovery, visualization, access and publication of data on critical zone structure and function by scientists, resource managers, citizen-science volunteers, and the general public.

Objective 3. Develop the BiG CZ Toolbox to enable cyber-savy CZ scientists to directly access BiG CZ APIs to search, access, manage and publish data using a turnkey open-source scripting and database package.

Objective 4. Develop the BiG CZ Central software stack to bridge data systems developed for multiple critical zone domains into a single metadata catalog for single query search, visualization and access from the Portal web app or using more powerful APIs, and to serve as a public repository for data published via the Toolbox or via web forms.

The entire BiG CZ Software system will be developed on public repositories as a modular suite of fully open source software projects, and will be built around a new Observations Data Model Version 2.0 (ODM2) that is currently under development by the CZOData team (NSF Award #1224638).

BiG CZ SSI Project Deliverables

Objective 1 Deliverables: Community Engagement in Software Design.

  • CZ Community Advisory Board (CZCAB), to advise feature development priorities, provide feedback on releases, recruit workshop participants and promote the BiG CZ software system.
  • Co-design workshops, two in year 1, to prioritize requested functionality and give specific feedback on our detailed plans and mockups for the BiG CZ software stack.
  • 3-day BootCamp Training & Testing workshops, four total in years 2-4 for 20 attendees each, both to elevate the overall computational capabilities of CZ scientists while teaching them to be power users of the BiG CZ system (Portal, Toolbox, APIs), and also to provide feedback on usability and desired new features. Modeled after Software Carpentry hands-on bootcamps, the BiG CZ will include a 3rd day focused on the BiG CZ usage. We will target students, postdocs, and early-career investigators.
  • Citizen-science & Resource Manager workshop, one in late year 3, to evaluate the value of our integrated earth surface data and visualization system to professionals and volunteers who manage and monitor natural resources.
  • 7-day CZ Science Synthesis Institute, one in early in year 4 and immediately after a 3-day BootCamp. A cohort of 14-16 BootCamp graduates will be mentored by 3-6 senior CZ investigators and 3-6 project team members to utilize their newly developed skills with the BiG CZ software system address 2-3 high-priority synthesis objectives identified through the earlier community engagement. This institute is based on the very successful model developed in NSF’s exploratory Hydrologic Synthesis program (Thompson et al., 2012), and will serve as the ultimate “Training and Testing” evaluation of the BiG CZ software system.

Objective 2 Deliverables: The BiG CZ Portal web application for high-performance map-based discovery, visualization, access and publication of data on critical zone structure and function.

  • The BiG CZ Portal web application client to the BiG CZ Central system would enable a user to zoom to any location in the continuous USA (i.e. lower 48) to search, view, filter, select and download heterogeneous datasets, including:
    • Points with sensor and sample based observational data, with pop-ups showing 2D data series displays (i.e. time series or depth profiles) and metadata. This functionality will be based on the software behind the NANOOS Visualization System (NVS);
    • 2D satellite and GIS imagery from many different agencies and sources. This functionality will be based on GeoTrellis, which forms the foundation of the Model My Watershed web modeling app;
    • 2D and 3D interactive visualization of select datasets, such as geophysical “fenceline” images of subsurface structure, topography (i.e. similar to Google Earth) or canopy structure from LiDAR point clouds. Interactive visuzalization functionalities will be based on a number of open source libraries, such as the Cesium WebGL Vitual Globe and Map Engine javascript library, VisIt (and also http://en.wikipedia.org/wiki/VisIt), and the Data Driven Document javascript library.
  • Map views will be filterable by a time period, variable, medium, data provider, data creators (authors) and a large number of other parameters.
  • A suite of data publication/registration forms will be integrated into the BiG CZ Portal web app, to assist “long tail” scientists to tag, publish and share their “dark data”. These forms will:
    • be readily accessible from the map-based discovery & visualization system through various buttons and quick-links;
    • allow users to map file-level and field-level metadata to the BiG CZ shared vocabulary system;
    • accept a wide variety of file types, including CSV, tab-delimited and other text files, MS Excel workbooks, GIS files and many other file types;
    • publish to one of our many partnering data repositories (i.e. IEDA, CUAHSI, KBase) or the BiG CZ Central ODM2 database selected through guided user choice (see Obj. 4, BiG CZ Central).

Objective 3 Deliverables: The BiG CZ Toolbox to enable cyber-savvy CZ scientists and data managers to manage and publish the data they produce through a single scientist focused toolkit. The BiG CZ Toolbox will incorporate access multiple BiG CZ APIs from a single package for searching, accessing, transforming and analyzing heterogeneous data using readily deployed open-source scripting and database packages, and to load cross-discipline data into the BiG CZ infrastructure. Training and Testing workshops/bootcamps (Obj. 1) will focus on using the BiG CZ Toolbox. The BiG CZ Toolbox includes two main capabilities:

  • Facilitation of local data management and publication to data repositories via Big CZ Central. Components will include:
    • A ready-to-use cross-platform relational database schema based on ODM2 implemented in PostgreSQL, to facilitate local data management of both sensor and sample based datasets. This database will build on the capabilities of ODM 1.1, which are focused on site-based time series for particular variables. ODM2 information model is being designed to address results associated with a variety of experimental protocols, such as specimen-based laboratory experiments; observations from sensor arrays, moving sensors, and ex situ sample analyses; and tracking of observation result provenance through processing chains.
    • ODM Tools 2.0, a cross-platform update to ODM Tools for managing sensor networks and their data (http://his.cuahsi.org/odmtools.html), using the ODM2 information model. ODM Tools 2.0 will be a set of Python modules designed to function out of the box with a local ODM2 database or configurable to work with a number of other information repositories.
    • Streaming sensor middleware, possibly based on Data Turbine (http://www.dataturbine.org/) or the CUAHSI HIS Streaming Data Loader, configured out of the box to load data from a stream into a local ODM2 database or configurable to work with a number of other data systems.
    • Web service interfaces for publishing near real-time or historical data to BiG CZ Central. Project tools will facilitate service deployment associated with ODM2 databases, with flexibility for mapping to other data models. We anticipate using recently developing functionality, such as WebSockets (Fette, 2011) or CoAP (Shelby et al, 2012, Dingee, 2013) to address real time performance requirements. We will expand the "Water One Flow in Python" library (WOFpy, http://pythonhosted.org/WOFpy/) and prototype OGC SOS service implementations.
    • A publication coordination tool to maintain common identifiers and spatial-temporal registration when publishing genomic and environmental data to appropriate repositories (KBase and ODM2-related repositories, respectively).
  • API for direct access to BiG CZ Central to programmatically search, ingest, transform and analyze heterogeneous data using the powerful, widely-used Python language and computing environment. Components will include:
    • Web services interfaces for searching and fetching data from BiG CZ Central and associated catalogs for local, programmatic and interactive analysis. This capability will build on existing Python libraries for web-service based data access (OWSLib, ulmo, pyoos), prototype implementations being developed for ODM2, and data access and parsers for KBase that use the new, “MG-RAST” REST API currently under development. These data services will be wrapped into a more consistent object presentation for easier use, relying on core Python scientific libraries for data handling and efficient local storage and transformation (NumPy, SciPy, Pandas, PyTables).
    • Consistent data exploration and visualization capabilities based on Python-based or enabled scientific libraries, including matplotlib, Pandas and VisIt.

Objective 4 Deliverables: The BiG CZ Central software stack to bridge data systems developed for multiple critical zone domains using linked data principles (Berners-Lee, 2007; Heath and Bizer, 2011) BiG CZ Central would provide capabilities to:

  • Register and catalog data-series level metadata from domain-specific data services for sensor- and sample-based Earth observation data managed in ODM2 information model, and other information models adopted by the community. Our data repository partners will include:
    • CUAHSI Hydrologic Information System (HIS, http://his.cuahsi.org/) and it’s daughter systems, such as HydroShare (http://www.cuahsi.org/HydroShare.aspx), which have already integrated hyrological and meteorological data from dozens of federal and state agencies and hundreds of academic projects.
    • Integrated Earth Data Applications (IEDA, http://www.iedadata.org/) and its many catalogs, such as the EarthChem data repository (http://www.earthchem.org/) and the System for Earth Sample Registration (SESAR, http://www.geosamples.org/).
    • DOE’s System Biology Knowledgebase (KBase, http://kbase.science.energy.gov/), which is a large, emerging software and data environment bridging molecular and systems biology of microbes, plants, and their communities.
    • OpenTopogaphy (http://www.opentopography.org/) is a repository and access point for LIDAR data.
    • U.S. and State Geological Surveys through U. S. Geoscience Information Network (USGIN, http://usgin.org)
  • Develop a BiG CZ data repository to collect and integrate datasets at multiple spatial and temporal granularities for the CZ community, along with sensor and sample management information that is not presently captured along with datasets by any data repository that we know of. This data repository will be built on a scalable architecture for high performance to support the mapping and visualization capabilities of BiG CZ Portal.
  • Develop web services for observation data based on the ODM2 information model.
  • Provide a web-service interface to provide single stream/query access to data from multiple, different data systems using controlled vocabularies from ODM2, and mappings to vocabularies for other systems that BiG CZ community adopts for clients such as BiG CZ Portal and others.

News Category:
INFRASTRUCTURE | DATA | EDUCATION/OUTREACH


Files

"BiG CZ SSI" Proposal, 2013
(1 MB pdf)
Full Text "BiG CZ SSI" Proposal


People Involved

CZO
Non-CZO

Robert Cheetham - Founder & CEO, Azavea, Inc.

Aaron Packman - Professor, Northwestern University

Emma Aronson - Assistant Professor, UC Irvine

Roelof Versteeg - Subsurface Insights

Stephen Richard - Arizona Geological Survey

Gary Berg-Cross - Executive Secretary, Spatial Ontology Community of Practice (SOCoP)

David Valentine - Research Programmer, San Diego Supercomputer Center

Folker Meyer - Microbial Communities Lead, DOE KBASE, Argonne National Lab and Computation Institute, Univ. of Chicago

Christopher Henry - Microbes Lead, DOE KBASE, Argonne National Lab and Computation Institute, Univ. of Chicago


Publications

2013

SI2-SSI: The community-driven BiG CZ software system for integration and analysis of bio- and geoscience data in the critical zone. Aufdenkampe, A.K., Zaslavsky, I., Mayorga, E., Horsburgh, J. and Lehnert K. (2013): Submitted to NSF Solicitation 13-525 Scientific Software Integration.

Explore Further

NEWS | 2019 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013 | 2012 | 2011