ETL Process Management

ETL Process Management

Test the Extraction Operation

From the Extraction Operation detail page it's possible to test the execution of the data extraction. By pressing the test operation button

OperationTestButton.png
Spago4Q executes the extraction operation and presents the list of the Generic Items produced by the extractor code; no Data Interface mappings or mapping scripts are executed. From this list it's possible to check if all the fields we are expecting to be extracted are actually extracted and the same for the fields values.

Execute the Extraction Process

To manually run an Extraction Process, from the processes list

Extractors -> Extraction Procesess
click on the "Select" button select.png to open the desired process detail page.

3.3.6.b.png

Click on the "Save and Execute" button execute.png to start the process. If the extraction process ends without any error the extracted data are now available on the Data Interface fact table.

Schedule the Extraction Process

If you want to schedule the process you have to do these operations:

  1. Open the process detail by clicking the "Select" button select.png of the Extraction Process that you want to execute.
  2. Select a periodicity and the extraction process will be added to the scheduler.
periodicity-selection.png

Backup and Rejected data archives

For every Extraction Operation it's possible to define to save (or not) on the local filesystem an archive of all the extracted data, Backup Archive, for every single extraction execution. The same is for the Generic Items that didn't pass the custom data filter or the default data type conversion during the Data Interface mapping. In this second case the flag is the one for the Rejected Archive.

save-archive-rejected.png

The format of both (archived and rejected) XML files is based on the GenericItem structure so that this file can be also accepted by the manual Data Import user interface.

<GENERICITEMS>
  <GENERICITEM>
    <FIELD1> ... </FIELD1>
    <FIELD2> ... </FIELD2>
    ...
    <FIELDN> ... </FIELDN>
  </GENERICITEM>
</GENERICITEMS>

If those two archives are configured to be produced, from the extraction log interface it's possible to download the archives. The server folder where to save produced archives can be configured on the spago4q.xml configuration file.

Monitor the execution results

This feature allows to check if the single Extraction Operation ended correctly, how many items has been retrieved, rejected and inserted into the data warehouse.

select-log.png

For every operation execution, the completed column value to true identify that the operation ended correctly.

log-list.png

The two icons at end of the row are used to save the Backup Archive or the Rejected items (if properly configured).

Data Import

It's possible to manually load data items directly from a user interface.

data-import.png

The data have to be defined in XML format (zipped or not) and have to respect the next structure:

<GENERICITEMS>
  <GENERICITEM>
    <FIELD1> ... </FIELD1>
    <FIELD2> ... </FIELD2>
    ...
    <FIELDN> ... </FIELDN>
  </GENERICITEM>
</GENERICITEMS>

The section of the Extraction Operation implies that the data extracted from the XML file will be mapped to the Data Interface defined on that Operation and according to those mapping rules. With the manual Data Import it's also possible to set the date (and time) on which the data are stored into the data warehouse, this allows for example to simulate a sequence of data loading executed in different and subsequent dates.


Creator: oltolina on 2010/05/31 11:40
This wiki is licensed under a Creative Commons 2.0 license
XWiki Enterprise 2.7.33694 - Documentation
Spago4Q