Jobs and Jobfiles

You can load and run job configurations (combinations of an Extractor, multiple transformers, and multiple loaders) from a simple XML file. In this file you express which classes to use, and what the arguments are.

Here's a simple example:

<?xml version="1.0" ?>
<job name="Invoice information">
    <extractor>
        <class>BiSight\Etl\Extractor\PdoExtractor</class>
        <argument name="dbname">my_input_dbname</argument>
        <argument name="sql">
            <![CDATA[
                SELECT c.firstname, c.lastname, c.company, i.ref, i.totalprice
                FROM customer AS c
                INNER JOIN invoice AS i ON c.id = i.customer_id
            ]]>
        </argument>
    </extractor>

    <transformer>
        <class>BiSight\Etl\Transformer\NullTransformer</class>
    </transformer>

    <loader>
        <class>BiSight\Etl\Loader\PdoLoader</class>
        <argument name="dbname">my_output_dbname</argument>
        <argument name="tablename">flat_invoices</argument>
    </loader>
</job>

This job will simply merge the customer and invoice information into a flat invoice table.

Flat tables can be used more easily in Business Intelligence tools, such as BiSight Portal, Pentaho, BIRT, Tableau, etc.

Running job files:

There's a command-line utility to execute job files:

bisight-etl etl:run my/jobfile.xml

Executing multiple job files

You can wrap 1 or more job elements into a jobs element, to run multiple jobs in one command, in sequence.

You can also use XInclude to include multiple job files into a single jobs file. Here's a simple example:

<?xml version="1.0" ?>
<jobs xmlns:xi="http://www.w3.org/2001/XInclude">
    <xi:include href="first_job.xml" />
    <xi:include href="second_job.xml" />
    <job>
        <!-- extractor, transformer, loader configs here -->
    </job>
</jobs>

These can then all be executed in sequence using the etl:run command described earlier.

Variables in jobfiles

It can be practical to use variables in your jobfiles. For example for basepaths, or dbnames.

You can do that like this:

<?xml version="1.0" ?>
<job name="Invoice information">
    <extractor>
        <class>BiSight\Etl\Extractor\PdoExtractor</class>
        <argument name="dbname">{{dbname}}</argument>
        <!-- etc... -->

Then you can provide the variable on the cli to the job runner:

bisight-etl etl:run my/jobfile.xml dbname=exampledb

You can use and pass as many variables as you want.