iEntry 10th Anniversary News Articles

Data Transformation

Filed under: General — admin at 5:11 pm on Saturday, July 26, 2008

An amount of data transformation will almost certainly need to be performed. For example, the source may supply more fields than are required; these fields will need to be stripped during the load. The source fields may be in formats different from that of the RDBMS you are using, requiring the format to be changed. There is often a requirement for a single field to be broken into two, or for multiple fields to be compacted into one. Extra fields, such as a sequence number to be used as a unique identifier, may need to be added. Other fields may need to be derived from some combination of the supplied data and data already in the data warehouse. If more complex data mapping or transformation is required, you will need programs to carry out the transformations. You may need to look at using a copy management or transformation tool.

If the data is being transferred directly via a gateway product into the database rather than by file, the load manager can use the gateway product itself to check how many records are to be transferred. It can then check at the data warehouse end that -the correct number of records have been transferred. However it is done, these checks must be implemented to ensure that all the correct data makes it from source to data warehouse. You also need to ensure that no data gets loaded twice.

Load Manager in a Warehouse

Filed under: General — admin at 5:08 pm on Saturday, July 19, 2008

The load manager is responsible for any data transformation required and for the loading of data into the database. These responsibilities can be summarized as data source interaction, data transformation and data load. The depth and complexity of the requirements placed on the load manager by each of these areas vary radically from data warehouse to data warehouse. This means that the load manager can vary from the simplicity of a few simple load scripts to an enormously involved development involving data transformation tools, database to database gateways and complex load programs.

It is not uncommon for the load manager to start out simple and to grow in complexity as data sources and user requirements are added. It is also possible for the development to go the other way, as data sources merge or are replaced. In particular, if the organization has multiple data warehouses, one can often become a greatly simplified source of data for another.

If a copy management tool is being used to transfer the data from the source system, this task becomes the responsibility of the copy management tool. In this case you need to ensure that the tool is carrying out the relevant checks. For example, you need to ensure that the tool verifies that the whole of a day’s data gets transferred, and not just part of it.

Data Warehouse Process Managers

Filed under: General — admin at 5:06 pm on Friday, July 4, 2008

The data warehouse process managers are pieces of software responsible for the flow, maintenance and upkeep of the data, both into and out of the data warehouse database. There are three different data warehouse process managers – load manager, warehouse manager and the query manager. Each process manager has its own responsibilities and complexities, and there are potential overlaps between the responsibilities of each process. It will help to avoid any ambiguities if we draw some strict boundaries within which each process will work.

The other useful feature of having a database-aware backup recovery manager is that the different parts of the database can be addressed by their database names, and not by the often abbreviated and unreadable file names. One last point worth mentioning; ideally, the backup software used should work across clusters or MPP platforms if that is the environment being used. The backup recovery manager should also be capable of this.