Extract, Transform and LoadThis post shows you how easy it is to create an end-to-end ETL process in OpenDataDSL
For this example, we are going to extract some publically available currency exchange data in XML and load it into our database as our own defined structure.
Defining the data model
We are going to create a simple type to store our FX data as follows:
ForeignExchange = type
// All foreign exchange (FX) data
category as String()
source as String()
product as String()
base as String()
currency as String()
Extracting the XML data
We are going to extract the data from the following URL: https://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml
The following line of code grabs the data and saves it into a variable called xml:
Transforming the data into our data model
We now need to transform this 'xml' variable into a list of 'ForeignExchange' data types, do do this we need to create a transformer:
ECB_FX = transform xml into ForeignExchange as fx
create with Cube
unique id = "ECB_FX_EUR" + fx.currency
SPOT = TimeSeries(xml.Cube.Cube.time, "BUSINESS", fx.rate)
category = "Foreign Exchange"
product = "ECB_FX"
source = "European Central Bank"
base = "EUR"
currency = fx.currency
Testing the transformation
To check to see that the data is transformed correctly, we can run the following to see what the transformed data looks like:
models = ECB_FX.run(xml)
This prints out a JSON array of 'ForeignExchange' models, a short extract is shown below:
"category": "Foreign Exchange",
"description": "ECB Euro FX reference rates EUR/AUD",
"provider": "European Central Bank"
Loading the data into the database
The final part to this is to load the data into the database, this can be done simply by iterating through the models and saving them, a more robust way is to batch them up and send them to the loader service. Here are both methods:
for model in models
This post shows a one-off simple example of extracting XML data from a web URL, transforming it into our own data model and loading it into the OpenDataDSL database. The custom part of this process is the transformer that understands both the input and output data structures, utilising the OpenDataDSL transformer it simplifies this process into a purely mapping exercise.
In order to make this into a production-ready process, we would create a workflow that joins together all the components and create a process that we can use to schedule the workflow.