Header Image - Computational Scientist


Domain Specific Language for “Variable Transformation” with PySpark

by Jiao 0 Comments

In quantitative finance, a sophisticated model system could consist of a large collection of supervised ML models. Each ML model may require hundreds of features, many of which are transformed from “raw” input data columns using PySpark. Managing consistency of these transformations across hundreds of ML models in an integrated model system can be a daunting task, let alone the evolution of these transformations in model research and development. A domain specific language for expressing such transformations was invented to provide not only a much more clean and succinct grammar, but also a structured specification that can be automatically scanned for human errors such as cyclic definitions, conflicts, typos, etc. It greatly enhances the efficiency and productivity of model development.

CNN model for methane detection using remote sensing imagery

by Jiao 0 Comments
CNN model for methane detection using remote sensing imagery

A convolutional neural network (CNN) model was built to predict trace-gas concentration from remote sensing images for the first time.

A remote sensing system with multi-spectral or hyper-spectral capabilities can be used to identify materials and their compositions on the ground from a few miles (airplanes) to hundreds of miles (satellites) away in the sky. It could be also used to detect trace gas (such as methane, an important greenhouse gas) near the ground if condition is right.

Such detection relies on the individual spectrum for each pixel in the multi- or hyper- spectral remote sensing image. Trace gas like methane has a signature absorption spectrum that sometimes can be identified from remote-sensing spectra. As a result, the methane concentration could be obtained for each single-pixel spectrum, and an image of methane concentration can be derived accordingly. This is the pixel-wise approach. However, the result of this pixel-wise approach can be quite noisy and has very large error bars due to various reasons such as low reflectivity of some ground covers, and fluctuation in detector sensitivity.

A CNN model can be used to mitigate some of these problems by combining information from neighboring pixels. However, to train such a model, a large amount of data needs to be available for the supervised learning, while remote sensing data with ground truth of trace gas concentration is scarce and it is very expensive to acquire large amount of training data.

To overcome the data problem, we combined micro-scale meteorology modeling and atmospheric radiative transfer simulation to obtain synthetic satellite images. This allows us to build sufficient amount of training, validation, and testing images for our CNN model which helps facilitate methane detection. This novel, CNN-based method improves greatly the robustness of methane detection, compared to the traditional pixel-wise approach.

This is a satelytics project.

Web application for resolution of DGS instruments: first release

by Jiao

The new website should be pretty straightforward to navigate, but here are some quick explanations:

At the start, the instrument selection is shown on the top left of the page. Click on one instrument to start.

Click “Help” under the page banner will show some information about the data and modeling

For each instrument there are two tabs. One for inelastic energy resolution plots, one for elastic resolution plots.

In the inelastic page, there is a form to choose incident energy and some (chopper) settings. Click the “Calculate” button will calculate the resolution curve using those settings

Click “Summary” above the resolution-vs-energy plot will display some basic information such as FWHM at elastic line

Below summary, a polynomial fit result may be displayed (only CNCS for now)

Click the download button below the resolution-vs-energy plot will download a csv file

The elastic tab shows some plots about the elastic resolution/flux information. It is different from instrument to instrument, because different kinds of data exist for different instruments.

Super-resolution satellite image correlation helps study glacier erosion law

by Jiao
Super-resolution satellite image correlation helps study glacier erosion law

published in Science on Oct 9, 2015. https://science.sciencemag.org/content/350/6257/193.

From Wikipedia: “A glacier is a persistent body of dense ice that is constantly moving under its own weight; it forms where the accumulation of snow exceeds its ablation (melting and sublimation) over many years, often centuries”

  • Glacier erosion has obvious effects on the landscape of the Earth
  • Glacial erosion rates span several orders of magnitude from polar and dry regions to temperate alpine glaciers, and from hill-slope landscapes to steep, tectonically active mountain ranges

Glacier erosion power law was proposed by Jonathan Harbor, Bernard Hallet, and Charles Raymond, Nature, 1988.

But what is value of b, the power factor?

Sub-pixel image correlation techniques for satellite imagery were developed and used in this study to obtain accurate 3D model of the glacier surface, and then its moving speed, u, and helped constrain the value of b.

  • ~50k lines of python/C++ code for Satellite Image Correlation
  • Include the full DEM workflow of ancillary data correction (bundle correction), tie-points generation, orthorectification, correlation, triangulation, and gridding

Energy dependence of the flux and elastic resolution for the ARCS neutron spectrometer

by Jiao
Energy dependence of the flux and elastic resolution for the ARCS neutron spectrometer

Flux and elastic resolution for a wide range of instrument conditions were measured and modeled for the ARCS chopper spectrometer.

This work provides a key reference for users of the ARCS instrument, making it easier for users to plan for their experiments. The online interactive resolution/flux plots make it much easier for ARCS users to take educated tradeoff between resolution and flux, while taking their experiment requirements into account.

Research details:

  • A series of vanadium calibration measurements for a wide range of incident energies were performed for the ARCS direct geometry chopper spectrometer
  • Analytical models were able to reproduce most of the general trends observed in the measurements.
  • MCViNE simulations were used to confirm modeling results when experimental results are not reliable for direct measurement of resolution

This work was published at https://doi.org/10.1016/j.physb.2018.11.027.

Monte Carlo neutron ray-tracing simulations for neutron scattering experiments

by Jiao
Monte Carlo neutron ray-tracing simulations for neutron scattering experiments


  • A software framework for Monte Carlo ray-tracing simulation of modern neutron instruments, including novel sample/sample environments, and sophisticated detector systems.
  • It contributes to the design of all 8 selected instruments in the first phase of the STS project. Relevant publications can be found here.
  • ~570k lines of C++/Python.