Saturday, March 5, 2016

Recipes of Data Warehouse and Business Intelligence (Use case 02) - How to load a data file from network with control file

Analysis of the use case 2

This use case shows how to configure a data file that is present in a network folder (obviously it must be visible to the server that contains the Data Warehouse). Furthermore this file is coupled to the corresponding control file that, in order to be managed by MEF, must have the same name as the data file unlike its extension.
This situation typically occurs for two reasons.

1 - the feeding system can not (or do not want to do it) send the data file to the DWH
2 - the feeding system generates a data file that is used by multiple applications.


In this case we have take the data file (moving it or just copying it) and carry it to the
reception folder called rcv .
The control file contains no further information. His mere presence certifies that the file date is OK. We shall see, in a next use case, also in the case where the control file must contain the number of rows in the data file. This setting is preferable, as it allows us greater control on the consistency of the data file .
As in use case 01 follow the indications contained in the read_me file. Here I will describe in detail the sequence of operations.

Changes to MEF repository

In order to properly handle the move (or copy) of a data file from a network folder
to rcv folder, you can access to the Operating System using java functionality.
The original package mef_java was then modified to be able to execute
Operating System commands on both Linux and Windows. Consequently they were made of small interventions even on packages and mef mef_sta to use the new Java function.

Another intervention was necessary to also configure the information
"FIELDS TERMINATED BY" associated the Oracle external table. Ovviously, the data file can be constructed on very different systems between them, so some
settings can be valid on Windows but not on Linux and vice versa. The only thing
what we can do is to make configurable the feature of the external table and try to change the value if there are any problems.
In this use case, for example, the same data file, on Windows 64 had to be set as "0x 0d0a" but on Linux and Windows 32 as "NEWFILE". If the load gives error, try changing the value.

Before to start with configuration and run, do the work indicated in the read_me.txt to prepare the MEF environment.

Configuration of the data file

The data file used for this demonstration is that of the financial regulated markets
,downloadable from ESMA site (http://mifiddatabase.esma.europa.eu). For the purposes of demonstration, the network folder is instead a local folder that is initialized automatically after the installation of the MEF. 

You can safely try the use case with a your network folder, for example '\\ w2886038.bim.local \ shared'.
In this case, you must move the data file and the control file under the
'\\ W2886038.bim.local \ shared' folder and you have to change the pointing of the Oracle directory as follows:


CREATE OR REPLACE DIRECTORY
DWH_ESM_M_REGMAR_SRC AS
'\\w2886038.bim.local\shared';


The io_regmar.txt contains the setting for the data file. Set your correct value according to your Operating System. 

Configuration of the structure of the data file

The configuration is contained in the regmar.csv. There is nothing else to do. You have done all the work neeeded to process the data file. Connect to Oracle and run the configuration.

sqlplus etl/etl
SQL> @sta_conf_io REGMAR



Pre o post processing

Nothing to do.

Load of the data file

To load the data file, you must put it and its control file under the folder pointed from the DWH_ESM_M_REGMAR_SRC directory.