IBM Books

DB2 Replication Guide and Reference


Data Transformation Requirements

Source data most likely has to be subsetted, transformed, or enhanced as part of the replication process, particularly in the support of decision support or data warehousing. This section sorts these requirements into those that are easily fulfilled by using the Control Center and those that require direct manipulation of the control tables.

Basic Data Enhancement

This section explains the following basic capabilities that the Control Center supports.

Choices of Target Table Types

You can use the following types of target tables:

User copy target tables
These are the most common target tables, and are copies of the replication source without an overhead timestamp column. These tables require a primary key.

Point-in-time target tables
These are plain copies of source tables with timestamp columns added to specify when updates were made. These tables require a primary key.

Aggregate tables
These tables are used to track data and changes to data. These tables are built when rows are added or appended to the table over time.

Queries with column functions (AVG, MAX, MIN, SUM, COUNT) can range over many thousands of qualifying rows, yet they can return very compact, easy-to-understand results. By adding a timestamp and aggregating the results of these calculations, you can track broad trends, while still retaining the benefits of data reduction and compact storage.

There are two kinds of aggregate tables:

Base aggregate tables
In these tables, calculations are made against the source table. This type of history is useful for tracking broad indicators that have relatively low volatility. Base aggregate tables can include calculations against stable base data; calculations are made according to the subscription frequency criteria. Use base aggregate tables if you need to summarize your source table contents on a regular basis. For example, you can use a base aggregate table to track a daily inventory of items from a source table.

Change aggregate tables
In these tables, calculations are made against change data, not base data. Each calculation ranges over the recent changes since the time of the last calculation. You can track insert, update, and delete operations collectively or individually, determined by the filtering predicates that you specify. If there are no recent updates, calculation is deferred, creating very compact storage. Use these tables if you need to summarize the results of the changes made between each Apply program refresh operation. You can use change aggregate tables on their own, where daily summaries of change events are required, or as input to an application that must apply the summaries to a cumulative table.

Noncondensed CCD target tables
These tables are used for history tables or audit trail tables. They can contain changes only, or they can include the initial state of the source table.

Condensed CCD target tables
These target tables can be used as an alternative to a point-in-time copy when it is important to keep the last state of any record, (that is, no records are to be deleted). A delete is processed as an update, with the IBMSNAP_OPERATION column acting as the logical delete flag.

Updateable replica target tables
These target tables can be used as sources for copying data back to the replication source table and are used in update-anywhere scenarios

Subsetting Targets

IBM Replication supports both vertical and horizontal subsetting of the source table. This means that you can specify that only a subset of the replication source table columns and rows are replicated to the target table, rather than all of the columns and rows:

Column subsetting
In some replication scenarios, you might not want to replicate all columns into the CD table. With vertical subsetting, you can define the CD table to have fewer columns than your base table. However, CD tables must contain sufficient key data for point-in-time copies, which are maintained with key-qualified predicates. Omitting columns from the target is appropriate if the column data types are not supported. Vertical subsetting is available for all tables except replica tables.

You can define vertical subsetting either:

Row subsetting
You can split a single source table into various target tables based on the source column contents (for example, department number or region code) or you can specify that only certain source rows be replicated. Use the advanced subscription options to define a WHERE clause when you define the subscripton. All target table types support row subsetting. You might need to define a view replication source to bring together all the columns needed to define the row subset.

Target table primary keys are assumed to be invariable. If one or more of the columns defined in the predicate are updated, you must specify replication logical partitioning key support when you define the replication source. Replication logical partitioning key support defines updates as DELETE and INSERT statements.

Source Views

Using the Control Center, you can define a join as a replication source. The joins can include only tables defined as replication sources. If the replication sources defined in the join have CD (or CCD) tables, a CD view is created from the replication sources' CD tables. The Capture program (or the Apply program, for CCD sources) maintains the control information for the joined replication sources and the CD views in the source server control tables.

Join views fill many requirements, both for denormalizing (restructuring) copies in data warehouse scenarios, enabling easier querying of copied data, and also for addressing the routing problem, sometimes called the database partitioning problem in distributed computing scenarios. For example, knowing where to send a bank account update may require a join of the account table with the customer table, in order to know which branch of the bank the customer deals with. Typically, production databases are normalized so that the geographic details, such as branch-number, are not stored redundantly throughout the production database. Views are also useful when you need to specify predicates that exceed the capacity of the ASN.IBMSNAP_SUBS_MEMBR.PREDICATES control table column. You can choose to manage your subsetting predicates through views rather than in the subscription meta data.

IBM Replication supports the following types of view definitions:

Before and After Images

Both before and after images can be defined in replication sources and subscriptions. A before-image column is a copy of a column before it is updated. DB2 logs both the before-image and after-image columns of the table for each change to the table. Before images do not make sense with a base aggregate target table type. All other target table types can make use of before-image columns.

Column Renaming

Source site column names can be renamed for point-in-time and user-copy target table types.

Computed Columns

New columns can be derived from the existing source columns on the basis of SQL expressions. New columns can be aggregate functions such as COUNT or SUM in the case of aggregate target table types, or simple derivations in the case of all other target table types.

User-defined functions for DB2 Universal Database can be specified by using the computed column facility in the Control Center.

Before and After Run-Time Processing

You can define run-time processing statements using SQL statements and stored procedures before or after the Apply program processes the replication subscription. This feature is useful for pruning CCD tables and controlling the sequence in which replication subscriptions are processed. The run-time processing statements can be run at the source server before the replication subscription is processed, and at both the source and target servers after the replication subscription is processed. For example, you can transform source data before replicating the data or target data after it is copied.

The stored procedures use the SQL CALL statement, newly supported by IBM Replication, without parameters. The run-time procedures are executed together in a single unit-of-work. You can also define acceptable SQLSTATEs for each processing statement.

Depending on the DB2 platform, logic can be invoked through the SQL before and after processing:

Advanced Data Enhancement

The transformations listed below require manipulation of the IBM Replication control tables outside of the Control Center. The techniques required to implement these transformations are not discussed in this book. For planning purposes, when you find that you will use many of these techniques in the replication scenario, you should allow additional implementation time.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]