Scalable processing of ozone data from the Odin satellite

The Odin satellite has measured ozone and related gases in the stratosphere and mesosphere for more than 15 years. This is one of the longest data series with global coverage that exists in atmospheric science. To make sure we can provide the best possible reference data set to the science community ESA has funded a total overhaul of the data set. This includes everything from review of the calibration algorithm of the sensor, to a complete review of the algorithms that provide concentrations of the different species at different heights and locations in the atmosphere.

The instrument is a passive sub millimeter radiometer, measuring emitted energy from different molecules through the limb of the atmosphere. The atmosphere is scanned over different heights around 60 times an orbit and the satellite makes about 15 orbits a day. That way measurements quickly cover most of the Earth’s atmosphere. As an example we have been able follow the development of the ozone depletion over the polar regions. We can also follow the circulation of air masses globally.

The REST API is the interface for all communication between the different components in the system. As an example the number crunching “processors” are not aware of the technology behind the REST API - they only ask for data and deliver results back to the API. This means it is easy for us to scale, move or change the underlying data storage technology.
The REST API is the interface for all communication between the different components in the system. As an example the number crunching “processors” are not aware of the technology behind the REST API – they only ask for data and deliver results back to the API. This means it is easy for us to scale, move or change the underlying data storage technology.

Technology

The Odin science community is using the SNIC Science Cloud. This enables scalable processing of Odin data both for testing of alternative algorithms and standard products.

To be able to process the dataset as quickly as possible we have packaged a set of “processors” in docker images ready to be deployed at any docker enabled computer. These docker images are self-contained with auxiliary datasets and code ready to be fed with measurements from the Odin satellite. Once the docker images are deployed the container asks the Odin-API, a REST service, for the next available measurement and starts immediately to crunch the data. When the process is done it delivers the results back to the Odin-API and starts over with a new measurement.

Users can browse and download data from the user interface http://odin.rss.chalmers.se and power-users can communicate directly with the REST-API to analyse data programmatically or start new processing campaigns.

Virtual Research Environments for Clinical Metabolomics

PhenoMeNalLogo

PhenoMeNal is a 3-year EU Horizon 2020 project (2015-2018) that will develop a standardised e-infrastructure for analysing medical metabolic phenotype data. This comprises development of standards for data exchange, pipelines, computational frameworks and resources for the processing, analysis and information-mining of the massive amount of medical molecular phenotyping and genotyping data that will be generated by metabolomics applications now entering research and clinic.

At the Spjuth research group we lead WP5; “Operation and maintenance of PhenoMeNal grid/cloud” and our aim is to provide PhenoMeNal and researchers with the capability to spawn secure Virtual Research Environments (VRE or VE) with easy access to scalable, interoperable data and tools for data analysis. These virtual environments should be able to run on most hardware architectures ranging from single laptops/workstations, to private and public cloud (IaaS) providers.

We use MANTL to set up, and to provide, a microservice-oriented virtual infrastructure. In PhenoMeNal, all partners provide tools as Docker images, , that are automatically built, tested, and pushed to DockerHub, by a continuous integration system (Jenkins). Within MANTL we provide long-running services using Marathon, including Jupyter and Galaxy workflows systems, that can orchestrate microservices-based pipelines using e.g. Chronos or Kubernetes.

Phase3 Draft Budget

So far we have successfully provisioned PhenoMeNal VRE on Google Cloud Platform, EBI Embassy Cloud (OpenStack), and SNIC Science Cloud (OpenStack). We are currently experimenting with Packer for speeding up the provisioning of virtual machines within the VRE, and Consul for federating multiple VREs. Another ongoing project is to use Apache Spark for distributed data analysis within the VRE.

Links:

http://www.farmbio.uu.se/forskning/researchgroups/pb/PhenoMeNal/

http://www.farmbio.uu.se/forskning/researchgroups/pb/Data-intensive/

http://phenomenal-h2020.eu/