IKM Incremental Update



The Incremental Update strategy is used to integrate data in the target table by comparing the records of the flow with existing records in the target according to a set of columns called the "update key". Records that have the same update key are updated when their associated data is not the same. Those that don't yet exist in the target are inserted. This strategy is often used for dimension tables when there is no need to keep track of the records that have changed.
The challenge with such IKMs is to use set-oriented SQL based programming to perform all operations rather than using a row-by-row approach that often leads to performance issues. The most common method to build such strategies often relies on the integration table ("I$") which stores the transformed source sets. This method is described below:
  1. Drop (if it exists) and create the integration table in the staging area. This is created with the same columns as the target table so that it can be passed to the CKM for flow control. It also contains an IND_UPDATE column that is used to flag the records that should be inserted ("I") and those that should be updated ("U").
  2. Transform and insert data in the loading table from the sources and loading tables using a single INSERT/SELECT statement. The IND_UPDATE column is set by default to "I".
  3. Recycle the rejected records from the previous run to the integration table if the RECYCLE_ERROR KM option is selected.
  4. Call the CKM for flow control. The CKM will evaluate every constraint defined for the target table on the integration table data. It will create an error table and insert the erroneous records into this table. It will also remove erroneous records from the integration table.
  5. Update the integration table to set the IND_UPDATE flag to "U" for all the records that have the same update key values as the target ones. Therefore, records that already exist in the target will have a "U" flag. This step is usually an UPDATE/SELECT statement.
  6. Update the integration table again to set the IND_UPDATE column to "N" for all records that are already flagged as "U" and for which the column values are exactly the same as the target ones. As these flow records match exactly the target records, they don't need to be used to update the target data.
a.      After this step, the integration table is ready for applying the changes to the target as it contains records that are flagged:
    1. "I": these records should be inserted into the target.
    2. "U": these records should be used to update the target.
    3. "N": these records already exist in the target and should be ignored.
  1. Update the target with records from the integration table that are flagged "U". Note that the update statement is typically executed prior to the INSERT statement to minimize the volume of data manipulated.
  2. Insert records in the integration table that are flagged "I" into the target.
  3. Commit the transaction.
  4. Drop the temporary integration table.
Optimization
This approach can be optimized depending on the underlying database. The following examples illustrate such optimizations:
  • With Teradata, it may be more efficient to use a left outer join between the flow data and the target table to populate the integration table with the IND_UPDATE column already set properly.
  • With Oracle, it may be more efficient in some cases to use a MERGE INTO statement on the target table instead of an UPDATE then INSERT.
Update Key
The update key should always be unique. In most cases, the primary key will be used as an update key. The primary key cannot be used, however, when it is automatically calculated using an increment such as an identity column, a rank function, or a sequence. In this case an update key based on columns present in the source must be used.
Comparing Nulls
When comparing data values to determine what should not be updated, the join between the integration table and the target table is expressed on each column as follow:
<target_table>.ColumnN = <loading_table>.ColumnN or (<target_table> is null and <loading_table>.ColumnN is null)
This is done to allow comparison between null values, so that a null value matches another null value. A more elegant way of writing it would be to use the coalesce function. Therefore the WHERE predicate could be written this way:
<%=odiRef.getColList("","coalesce(" + odiRef.getTable("L", "INT_NAME", "A") + ".[COL_NAME], 0) = coalesce(T.[COL_NAME], 0)", " \nand\t", "", "((UPD and !TRG) and !UK) ")%>
Column-Level Insert/Update Behavior
Columns updated by the UPDATE statement are not the same as the ones used in the INSERT statement. The UPDATE statement uses selector "UPD and not UK" to filter only mappings marked as "Update" in the interface and that do not belong to the update key. The INSERT statement uses selector "INS" to retrieve mappings marked as "insert" in the interface.
Transaction
It is important that the UPDATE and the INSERT statements on the target belong to the same transaction. Should any of them fail, no data will be inserted or updated in the target.

No comments:

Post a Comment