Domain Specific Language for “Variable Transformation” with PySpark

by Jiao 0 Comments

In quantitative finance, a sophisticated model system could consist of a large collection of supervised ML models. Each ML model may require hundreds of features, many of which are transformed from “raw” input data columns using PySpark. Managing consistency of these transformations across hundreds of ML models in an integrated model system can be a daunting task, let alone the evolution of these transformations in model research and development. A domain specific language for expressing such transformations was invented to provide not only a much more clean and succinct grammar, but also a structured specification that can be automatically scanned for human errors such as cyclic definitions, conflicts, typos, etc. It greatly enhances the efficiency and productivity of model development.

Leave a reply

Your email address will not be published.

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>