CISN label CGS | USGS | Caltech | CalEMA | UCB

NCEMC - DBMS Issues






Doug Neuhauser, doug@seismo.berkeley.edu
Lind Gee, lind@seismo.berkeley.edu
DRAFT - 2001/10/02

Archive Database Parameters

Information to be stored in the archive (data center) database and made available to the public.
  1. Associated information for each event:
    1. Earthquake hypocenter(s), with associated observations, such as:
      1. phase arrivals.
      2. azimuth info.
      3. error information.
    2. Earthquake magnitudes.
      1. associated amplitude readings, or other associated information.
    3. Earthquake mechanisms.
      1. associated information.
    4. Ground motion observations (PGA, PGV, spectral info) (*).
    5. Shakemaps (*).
    6. Other earthquake products (finite fault parameters, etc) (*).
    The system should be able to store for each event multiple versions of derived products (such as hypocenters, magnitudes, mechanisms, and shakemaps), and to provide an indication of which version is currently the "preferred" version.

  2. Operational parameters for the data center.
    1. Channel meta-data over time.
    2. Channel state-of-health and/or usage information (?) (*).
    3. System control and configuration parameters (?) (*).

  3. Event review status and history (*).

  4. Waveforms and/or pointers and descriptors for waveforms.
(*) = Functionality not currently in NCEDC schema.

NCSS Real-time and Event Collator Database Needs

The NCSS Real-Time Processing and Event Collator would use a database for the following operations:
  1. Provide reliable and consistent storage, retrieval system, and communication area for earthquake parameters and data from real-time earthquake system(s).

  2. Provide a reliable and consistent storage, retrieval system, and communication area for network and operational parameters for the real-time earthquake processing system.

  3. Provide a source for near-real-time data flow from the real-time system(s) to the analysis systems and NCEDC, or any other downstream or cross-stream processing system.

  4. Provide a mechanism to control and log near real time notification.

(Note: Collator database provides the mechanism to control and log offline event analysis and review).

Real-time Database Parameters

Information to be stored in the real-time database and made available in near real time to the Event Collator and the data center.
  1. Associated information for each event:
    1. Earthquake hypocenter(s), with associated observations, such as:
      1. phase arrivals.
      2. azimuth info.
      3. error information.
    2. Earthquake magnitudes.
      1. associated amplitude readings, or other associated information.
    3. Earthquake mechanisms.
      1. associated information.
    4. Ground motion observations (PGA, PGV, spectral info).
    5. Shakemaps.
    6. Other earthquake products (finite fault parameters, etc).
    The system should be able to store for each event multiple versions of derived products (such as hypocenters, magnitudes, mechanisms, and shakemaps), and to provide an indication of which version is currently the "preferred" version.

  2. Unassociated parameters (?)
    1. unassociated phase arrivals (?)
    2. unassociated ground motion observations (?)

  3. Operational parameters for the real-time system.
    1. Channel meta-data (current and recent past).
    2. Channel state-of-health and/or usage information.
    3. System control and configuration parameters.

  4. Event review status and history.

  5. Waveforms and/or pointers and descriptors for waveforms.

  6. Notification rules (relatively static).

  7. Notification status.

  8. Notification history.

NCSS design questions

  1. The current proposal is to use the NCEDC/TriNet DBMS schema for the data archive.
    1. Should the same schema be used for the Event Collator? Why?
    2. Should the same schema be used for the Real-Time Processing system? Why?

  2. Do the different components such as the Real-Time Processing and the Event Collator require separate databases? Why?

  3. If different DBSM schemas are used in the Real-Time Processing and Event Collator components, what type of data should be sent from the Real-Time Processing system to the Event Collator?

  4. What is the granularity of information flow between these components?
    1. at the single observation level or database table row,
    2. at the earthquake component level (hypocenter and associated phases),
    3. at the earthquake level (all info for a single event).

  5. What mechanism should be used to transfer data between the RT and Collator DBMS?
    1. Database replication tools?
    2. Application program?
    3. Combination of the above?

  6. How do we ensure reliable data transfer between the RT and Collator DBMS?

  7. Where should rapid event review take place - within the RT or the Event Collator?

  8. Where should event revision take place - within the RT or the Event Collator?

  9. Should any DBMS in the sytem store information from other sites within the NCSS or other data sources such as SCSN/TriNet, UNR, or NEIC?
    1. other real-time system(s) covering the same reporting area?
    2. other real-time system(s) covering adjoining areas?
    If so, which DBMS?
    If so, how should we use this information? How do we associate this information with our own information, and how does this affect data migration to other adjacent systems or archive systems downstream?

  10. What data (if any) should flow from the archive to the Event Collator, and from the Event Collator to the Real Time Processing component?

  11. How should parametric data from the be sent to the archive in near real time? Should it come from only the master Event Collator, or from all Event Collators?

  12. How waveforms be delivered to the archive in near real time?
    1. Should event waveforms be available at the archive in near real time?
    2. Should continuous waveforms be available at the archive in near real time?
    3. Should waveforms be push to the archived, pulled by the archive, or a combination (push waveform requests to archive, archive pulls waveforms)?
    4. How do we incorporate data quality control (QC) if data is sent in near real time to the archive?

  13. What applications are needed to interact with:
    1. The Real-Time Processing component?
    2. The Event Collator component?
    3. The Data Archive?

  14. How does the design of the real-time and the underlying DBMS and APIs facilitate participation in:
    1. CISN
    2. ANSS

Software design issues

There are a number of fundamental design philosophies that differ between the existing Earthworm software used by Menlo Park and the exising REDI software used by the BSL. Some of these arise from fundamental differences in the original design goals of the systems, and some arise from conditions imposed outside of the ideal design goal, such as the physical separation of USGS/MP and UCB, and the CPU time required to certain computations (such as moment tensors).

If we want to work towards common software for our monitoring networks, we will need to address these issues. I bring them up now in order that we can acknowledge these issues, and if possible, decouple them from decisions made about database and schema issues.

My concern is that our discussions and decisions about database schemas and APIs do not implicitly (or explicitly) dictate decisions about the use of existint software or existing software designs. For example, a decision to use the EW database schema and/or EW API should NOT imply that EW software (in its current configuration) be used throughout the new real-time system. Nor should a decision to use a different API or database schema than the EW system imply that EW software would NOT be used in places within the system.

There are several major design areas that I think we need to discuss.

  1. Data transport and communication reliability between system components:

    Both the EW and REDI systems uses some volatile and non-reliable mechanisms for communication between some system components, such as broadcast, multicast, and memory-based transport rings. These mechanisms are not reliable, in the sense that data is not reliably delivered from sender to receiver if the receiver is not always available to receive the data when it is sent. The initial design of these systems was strictly for "real-time notification", but we are now looking at integrating notification functionality with permanent and authoritative event processing and data archiving.

  2. Reconfiguration capabilities:

    Much of the software currently in use does not provide means for dynamic (or runtime) reconfiguration or restart capability without loss of data. This limits the ability of the system to incorporate new data, change data or processing characteristics, or fix and install updated software without loss of data and information.

  3. Scheduling and computer resource allocation:

    Due to the different characteristics of the processing currently performed by the EW and REDI software, the EW and REDI systems have very different approaches to scheduling and computer resource allocation. The EW system by and large processes data in a time ordered manner (or a first-come, first-serve) system. The REDI system provides an overall schedule to determine what processing should be performed, at what time, and in what order. As we integration the full range of data processing at both sites and implement new flows of data between the sites, we need to evaluate the entire system scheduling and resource requirements, and to perform Q/A testing to ensure that we can handle the system load.

  4. Joint design and development:

    The development of a unified software system for earthquake monitoring in northern California requires that both institutions participate equally in the design and implementation of the software. This effort must be coordinated with CISN partners and the ANSS to ensure that we can accomplish our goals of unified state and national earthquake monitoring. This process may require careful negotiation and compromise for all parties in order to reach consensus.

  5. Control of our destiny...

    What is the process by which we, the users of the DBMS system, can influence or control the direction of the database schema and application software design?


Valid HTML 4.01!

California Integrated Seismic Network
www@cisn.org

Part of the Advanced National Seismic System

ANSS logo

Last modified: Mon Jul 25 14:18:51 PDT 2011

CGS USGS OES Caltech UC Berkeley