ETL Process configuration

ETL Process configuration

One of the main features of Spago4Q is the capability to do ETL operations. ETL operations are:

  • Extract data from outside sources
  • Transforming it to fit operation needs
  • Loading it into the end target
They can be configure and realize following these steps:

  • Configure the data source
  • Configure and create the target database
  • Define the extraction and transformation process
We assume that you login as Spago4Q administrator pointing the browser to:

URL
http://localhost:8080/SpagoBI

Login as:

UserPassword
s4qadmins4qadmin

The Main Page is presented, from the "Main Menu" select the "Extractors" item and than click on :

  • Source Types to create and configure the data source
  • Extraction process to create and configure the extraction process
  • Data Interfaces to create an interface
  • Scripts to create a transformation scripts
  • Import Export to import and export configurations from a Spago4Q installation to another one

How to configure the Data Source

The section exlplains how to configure the data source. The data source is the place where will be performed the extraction. To create the data source you have to create and configure:

  • The source type: where to specify the type of data source (XML, Data Base, etc.) and configure the engine (Java class) that extract the data.
  • The data source.
  • The data source parameters: specify the parameter to access the data source.
In this guide, as example, a Data Base data source will be created and configured.

Create a Source Type

From the "Main Menu" select the "Extractors" item and than click on the Source Types menu item and you will enter into the list of existing Source Types.
In order to create a new Source Type click on the "Insert" button insert.png in the top right corner of the page and you will enter in a form like the following one:
3.1.1.b.png

In this form you can insert the fields:

  • Name (mandatory): it is the label of the Source Type and it identifies it.
  • Description: describe the Source Type
  • Extractor class: it is the name of the Java class that perform the extraction. In this version you can use the following extractor classes:
Extractor class it.eng.spago4q.extractors.db.DbExtractor This class extracts data from a Database using a SQL-92 query
Data Source parametersUSERThe username that has the access to the Database
PASSWORDThe password of the user
URLThe url that identifies the Database
DRIVERThe driver to perform the queries
Operation ParametersQUERYThe query to do when spago4Q execute an extraction

Enter the required information and click on the "Save" button save.png. Than click on the "Back" button back.png and you will see the new Source Type just inserted in Source Types list.

3.1.1.c.png

Create a Data Source

In Spago4Q a Data Source is an instance of a Source Type with its own parameter. To create a Data Source you have to perform these operations:

  • Click on the menu item "Source Types" and you will enter into the list of existing Source Types.
  • Click on the "View Data Sources" button check.png and you will enter into the list of existing Data Sources that use a specific Source Type.
3.1.2.a.png

In order to create a new Data Source click on the "Insert" button insert.pngin the top right corner of the page and you will enter in a form like the following one:

3.1.2.b.png

In this form you can insert the fields:

  • Name (mandatory): it is the label of the Data Source and it identifies it.
  • Description: Describe the Data Source.
Enter the required information and click on the "Save" button save.png. Than click the "Back" button back.png and you will see the new Data Source just inserted in Data Sources list.

Create Data Source parameters

To perform an extraction the Extraction class (that is a field of a Source Type) needs same parameters that describes how the extractor can connect to the Data Source. A Data Source is an instance of a Source Type with its own parameters. To specify the parameters of a Data Source you have to do these operations:

  • Click on the menu item "Source Types" and you will enter into the list of existing Source Types.
  • Click on the "View Data Sources" button check.png and you will enter into the list of existing Data Sources that use a specific Source Type.
  • Click on the "View Data Sources Parameters" button check.png and you will enter into the list of existing Data Source Parameters of a specific Data Source.
3.1.3.a.png

In order to create a new Data Source Parameter click on the "Insert" button insert.pngin the top right corner of the page and you will enter in a form like the following one:

3.1.3.b.png

In this form you can insert the fields:

  • Name (mandatory): it is the label of the Data Source Parameter and it identifies it.
  • Value: is the value of the field.
  • Cript: if it is selected, when you click on the "Save" button save.png, the value will be stored and encoded in Spago4Q platform.
Enter the required information and click on the "Save" button save.png. Than click the "Back" button back.png and you will see the new Data Source Parameter just inserted in Data Source Parameters list.

To configure a Data Source that implements a Source Type that has it.eng.spago4q.extractors.db.DbExtractor as extractor class you have to insert the following parameters:

  • USER
  • PASSWORD
  • URL
  • DRIVER

How to configure and create the Data Interface

The section explains how to configure the Data Interfaces that are the data structure (fact tables) inside the target database that are used to store the extracted data. To create the fact table for a Data Interface you have to follow these steps:

  • Create a Data Interface where you define the name of the fact table.
  • Create the Data Interface field. An interface field represents a column of the fact table.
This guide describes how to create an Interface type that represent this table:

FT_PROJECT_MAN
Field Type Key
PRJ VARCHAR YES
AC DOUBLE NO
PV DOUBLE NO
EV DOUBLE NO

Create an Interface Type

An Interface Type represent a fact table that will be create in the database. This table will be populated with the data extracted from the extraction process. To create an Interface Type you have to do these operations:

  • from the "Main Menu" select the "Extractors" item and than Click on the menu item "Interfaces" and you will enter into the list of existing Interface Types.

In order to create a new Interface Type click on the "Insert" button insert.png in the top right corner of the page and you will enter in a form like the following one:

3.2.1.b.png

In this form you can insert the fields:

  • Name (mandatory): it is the label of the Interface Type and it identifies it.
  • Description: describe the interface type
  • Table Name: it is the name of the fact table to create. When you create a fact table, a prefix like "FT_" will be automatically added.
In the example explained in the figure a fact table named "FT_PROJECT_MAN" will be create.

Enter the required information and click on the "Save" button save.png. Than click the "Back" button back.png and you will see the new Interface Type just inserted in Interface Type list.

3.2.1.c.png

Create interface fields

To replicate the table explained before in the example you have to add to the Interface Type these fields:

PRJ STRING
AC DOUBLE
PV DOUBLE
EV DOUBLE

To Associate same field to an Interface type you have to do these operations:

  • Click on the menu item "Interfaces" and you will enter into the list of existing Interface Types.
  • Select the interface type that you want to add the new fields (in the example select "PROJECT_MANAGMENT") and click the "View Interface Fields" button check.png and you will enter into the list of Interface Fields of a specific Interface Type.
In order to create a new Interface field click on the "Insert" button insert.png in the top right corner of the page and you will enter in a form like the following one:

In this form you can insert these fields:

  • Name (mandatory): it is the label of the Interface Field and it identifies it. It represents the name of a column of the fact table specify by the Interface Type.
  • Field Type: it is the type of the column that will be created in the fact table. The type that you can select are:
    • String
    • Integer
    • Double
    • Boolean (0,1)
    • Date (Specify in the format of the Database where the fact table will be created)
  • Key: specify if this column identify a row of the Data base
  • Sensible: If it is checked if an extraction produce two rows with the same values as key and different values as sensible field produce the insertion of two rows. If an extraction of two rows with the same values as key and same value as sensible field produce the insertion of only one row.
Enter the required information and click on the "Save" button save.png . Than click the "Back" button back.png and you will see the new Interface field just inserted in Interface fields list.

If you want to replicate the example explained before you have to insert the following interface field:

Name Field Type Sensible Key
PRJ STRING false true
AC DOUBLE true false
PV DOUBLE true false
EV DOUBLE true false

Domains value

Sometime an extraction can produce a column of a finite set of values. In the following example the extraction produced this data:

ID proposal approved/rejected
1 proposal1 rejected
2 proposal2 approved
3 proposal3 approved

The column "approved/rejected" can have only this set of value : {approved, rejected}. To optimize the quantity of data that will store from an extraction, Spago4Q has the capability to define a Set of value that can be repeated in an extraction. When an extraction will be performed Spago4Q replace this value with the identifier of the value that has be specified in the Domain Value. To associate a Domain Value to an interface field you have to do this operations:

  • Click on the menu item "Interfaces" and you will enter into the list of existing Interface Types.
  • Select the interface type and click the "View Interface Fields" button check.png and you will enter into the list of Interface Fields of a specific Interface Type.
3.2.3.a.png

Select the interface field that you want to associated a Domain Value and click the "View Interface Fields" button check.png and you will enter into the list of Interface Fields of a specific Interface Type.

3.2.3.b.png

In order to create a new Domain Value click on the "Insert" button insert.png in the top right corner of the page and you will enter in a form like the following one:

  • In this form you can insert this field:
    • Value: identify one of the value in the set.
3.2.3.c.png

Enter the required information and click on the "Save" button save.png. Than click the "Back" button back.png and you will see the new Domain Value just inserted in Interface fields list. To complete the example insert two Domain Value:

  • approved
  • rejected
When you execute an extraction that produce the value show before, the column "approved/rejected", if it is associated with an interface field that has the domain value just inserted will be produce this result:

ID proposal approved/rejected
1 proposal1 {ID Domain Value rejected}
2 proposal2 {ID Domain Value approved}
3 proposal3 {ID Domain Value approved}

Where {ID Domain Value rejected/approved} will be the identifier of the data set with the value rejected/approved.

Create the fact table

After you defined the Interface Type and the Interface Fields that represent the Fact Table you can create the fact table in the Data Base. To create the Fact Table in the Data Base you have to do these operations:

  • Click on the menu item "Interfaces" and you will enter into the list of existing Interface Types.
  • Click the "Select" button select.png of the interface type that represents the fact table that you want to create and you will enter in a form like the following one:
3.2.4.a.png

To create the fact table click the "Create Fact Table" button tableAdd.gif. If you check in the database you will find a table named as "FT_"+ the name specify in the "Table Name" field of the Interface Type with the field specify in the Interface Fields associated of this Interface Type.

3.2.4.b.png

If you want to add or modify the fields of an interface type you have to delete the fact table and recreate it. To Delete a fact table click the "Drop Fact Table" button tableDelete.gif(all the data in the fact table will be erased).

How to Define the extraction and transformation process

An extraction process is a set of operations. Every operation extracts, transforms, and loads data from a data source to a fact table. To define an extraction process you have to do this operations:

  • Define an extraction process.
  • Define and associate same operations to the extraction process.
  • Define and associate same parameter to an operation.
  • Map the data extracted from the data source to the column of the fact table.
  • Execute the extraction process.

Define an extraction process

To create an Extraction Process you have to do these operations:

  • from the "Main Menu" select the "Extractors" item and than click on the menu item "Extraction Process" and you will enter into the list of existing extraction processes.

Click on the "View Operations" button check.png and you will enter into the list of existing Operations associated to a specific Extraction Process:

3.3.1.b.png

In order to create a new Extraction Process click on the "Insert" button insert.png in the top right corner of the page and you will enter in a form like the following one:

3.3.1.c.png

In this form you can insert the fields:

  • Name (mandatory): it is the label of the Extraction Process and it identifies it.
  • Description: describe the Extraction Process
  • Coordinator class: Specify the Java Class that performs the extraction operations. By default is it.eng.spago4q.extractors.DefaultCoordinator.
  • Periodicity: In this field you can select the frequency when the extraction process will be executed. If you select a periodicity automatically the extraction process will be scheduled. If you want to remove the extraction process from the scheduler set this field to empty.
Enter the required information and click on the "Save" button save.png. Than click the "Back" button back.png and you will see the new Extraction Process just inserted in Extraction Processes list.

3.3.1.d.png

Define and associate same operations to the extraction process

An Extraction Process will be executed same operations. In an operation you can specify:

  • What you can extract from a Data source
  • How to map and transform the data extracted from a Data Source in an Interface Type (Fact table)
This guide explains how to create an Operation that perform a query in a Data Base (defined in a Data Source) and map the value extracted to the fact table (defined in a Interface Type). The first thing to do is create and associate an operation to an Extraction process. To create and associate an Operation to an extraction process you have to follow these steps:

  • click on the menu item "Extraction Process" and enter into the list of existing extraction processes.
  • click on the to enter the operations list for that extraction process
3.3.2.a.png

In order to create a new operation click on the "Insert" button insert.png in the top right corner of the page and you will enter in a form like the following one:

3.3.2.b.png

In this form you can insert this fields:

  • Name (mandatory): it is the label of the Operation and it identifies it.
  • Description: describe the operation.
  • Data Source(mandatory): Using this combo you can associate a Data Source to the Operation.
  • Interface Type (mandatory): Using this combo you can associate an Interface type to the Operation.
In this example the operation named PRJ_MANAGMENT_OPERATION will extract from the S4Q_DS_DATA_BASE and put the data to the fact table described from the PROJECT_MANAGMENT Interface Type. Enter the required information and click on the "Save" button save.png. Than click the "Back" button back.png and you will see the new Operation just inserted in Operations list.

Define and associate same parameters to an operation

To define what the operation has to extract you have to define same parameters that describe what the extractor class (Defined in Source Type and execute from the operation) has to return. The extractor class described in this guide in section X needs this operation parameters:

Extractor class it.eng.spago4q.extractors.db.DbExtractor This class extracts data from a Database using a SQL-92 query
Operation Parameters QUERY The query to do when spago4Q execute an extraction

To create an Operation Parameter you have to do these operations:

  1. Click on the menu item "Extraction Process" and enter into the list of existing extraction processes.
  2. Click on the "View Operations" button check.png and you will enter into the list of existing Operations associated to a specific Extraction Process.
  3. Click on the "View Operation Parameters" button check.png and you will enter into the list of existing Operation Parameters associated to a specific Operation.
3.3.3.a.png

In order to create a new operation parameter click on the "Insert" button insert.png in the top right corner of the page and you will enter in a form like the following one:

In this form you can insert the fields:

  • Name (mandatory): it is the label of the Operation Parameter and it identifies it. This name will be defined by the documentation of the extractor class. (in this example the name will be QUERY)
  • Value: specify the value of the operation parameter.
  • Cript: if it is checked the data will be saved and cripted in the Data Base.
In this example the parameter define a query that select all the fields from a table named DT_PROJECT_MAN in the Database defined by the Data Source named S4Q_DS_DATA_BASE. Enter the required information and click on the "Save" button save.png. Than click the "Back" button back.png and you will see the new Operation Parameter just inserted in Operation Parameters list.

Test Operation

TODO

Map the data extracted from the data source to the column of the fact table

When an operation is executed it produced same data that will be insert in the fact table defined in the Interface Type. To Map the data extracted to the field of an Interface Type you have to do these operations:

  1. Click on the menu item "Extraction Process" and enter into the list of existing extraction processes.
  2. Click on the "View Operations" button check.png and you will enter into the list of existing Operations associated to a specific Extraction Process.
  3. Click on the "View Operation Field" button operationField.gif and you will enter into the list of existing Operation Field associated to a specific Operation.
3.3.4.a.png

In order to create a new operation field click on the "Insert" button insert.png in the top right corner of the page and you will enter in a form like the following one:

3.3.4.b.png

In this form you can insert the fields:

  • Name (mandatory): it is the label of the field extracted.
  • Script: The combo defines a script that has the ability to transform a data extract from the data source before the insertion in the fact table.
  • Interface Field: it is the field of the Interface Type (the column of a Fact Table) where to put the data extracted from the operation.
Enter the required information and click on the "Save" button save.png . Than click the "Back" button back.png and you will see the new Operation Field just inserted in Operation Parameters list.

Create a Transformation Script

Sometime the data produced from the extraction process are incompatible with the format of the fact table. Sometime the data produced from the extraction process has to be transform before the insertion in the fact table. To perform these operations Spago4q has the capability to define a script and associated it to an operation field (as explained before). To create a script you have to do these operations:

*from the "Main Menu" select the "Extractors" item and than click on the menu item "Script" and enter into the list of existing Script.

In order to create a new Script click on the "Insert" button insert.png in the top right corner of the page and you will enter in a form like the following one:

3.3.5.b.png

In this form you can insert the fields:

  • Name (mandatory): it is the label of the Script.
  • Description: it describes the script
  • Script Type: The combo defines the type of the script:
    • Groovy: if you select this value you will insert a groovy script in the "Script" field. The groovy script will be implemented the "it.eng.spago4q.extractors.script.IMappingScript" interface.
    • Date Converter: If you select this value you will insert in the "Script" field the date format of the value extracted. This type of script is used to convert a date in the format compatible with the one used by the fact table.
For example if you want to convert a boolean value to 1 if the value is true and 0 otherwise you have to insert this value in the form:

name BooleanConverter
description Convert true to 1 0 oderwise
Script type Groovy
Script import it.eng.spago4q.extractors.script.IMappingScript;


public class Tester implements IMappingScript {
public String execute(String value) { String toReturn = "0"; if (value == "true") toReturn = "1"; return toReturn; }
Object o = new Tester();

If you want to test the script you can click on "Save and Test" button test.png. When you click this button should be appear a pop up like this

3.3.5.c.png

Insert an input value (like "true") and click on the "OK" button. You should see at the bottom of the form this information:

3.3.5.d.png

If the script produces the result that you expected click on the "Save" button save.png. Than click the "Back" button back.png and you will see the list of the scripts.

How to Monitor an extraction Configuration

you When you've finished the configuration of an extraction process (Data Sources, Interface Type and Operations), before the execution of the extraction process, you can check the configuration using a report that summarize all the configuration objects and parameters. To generate this report you have to do these operations:

Click on the menu item "Extraction Process" and enter into the list of existing extraction processes.

3.3.6.a.png

Click on the "Detail Process Report" button document.png and you will see a pdf report that summarizes all the configuration of the extraction process.

How to Export an extraction Configuration

From the "Main Menu" select the "Extractors" item and than "Import Export"

Main Menu -> Extractors -> Import Export

  • On the left side of the page insert a name for the package (Es: extractionConfiguration)
  • flag the operation(s) name you want to export
  • click on the image (up-right of the export frame)
  • save the package

Creator: oltolina on 2010/05/31 11:29
This wiki is licensed under a Creative Commons 2.0 license
XWiki Enterprise 2.7.33694 - Documentation
Spago4Q