Source data most likely has to be subsetted, transformed, or enhanced as part of the replication process, particularly in the support of decision support or data warehousing. This section sorts these requirements into those that are easily fulfilled by using the Control Center and those that require direct manipulation of the control tables.
This section explains the following basic capabilities that the Control Center supports.
You can use the following types of target tables:
Queries with column functions (AVG, MAX, MIN, SUM, COUNT) can range over many thousands of qualifying rows, yet they can return very compact, easy-to-understand results. By adding a timestamp and aggregating the results of these calculations, you can track broad trends, while still retaining the benefits of data reduction and compact storage.
There are two kinds of aggregate tables:
IBM Replication supports both vertical and horizontal subsetting of the source table. This means that you can specify that only a subset of the replication source table columns and rows are replicated to the target table, rather than all of the columns and rows:
You can define vertical subsetting either:
Target table primary keys are assumed to be invariable. If one or more of the columns defined in the predicate are updated, you must specify replication logical partitioning key support when you define the replication source. Replication logical partitioning key support defines updates as DELETE and INSERT statements.
Using the Control Center, you can define a join as a replication source. The joins can include only tables defined as replication sources. If the replication sources defined in the join have CD (or CCD) tables, a CD view is created from the replication sources' CD tables. The Capture program (or the Apply program, for CCD sources) maintains the control information for the joined replication sources and the CD views in the source server control tables.
Join views fill many requirements, both for denormalizing (restructuring) copies in data warehouse scenarios, enabling easier querying of copied data, and also for addressing the routing problem, sometimes called the database partitioning problem in distributed computing scenarios. For example, knowing where to send a bank account update may require a join of the account table with the customer table, in order to know which branch of the bank the customer deals with. Typically, production databases are normalized so that the geographic details, such as branch-number, are not stored redundantly throughout the production database. Views are also useful when you need to specify predicates that exceed the capacity of the ASN.IBMSNAP_SUBS_MEMBR.PREDICATES control table column. You can choose to manage your subsetting predicates through views rather than in the subscription meta data.
IBM Replication supports the following types of view definitions:
Both before and after images can be defined in replication sources and subscriptions. A before-image column is a copy of a column before it is updated. DB2 logs both the before-image and after-image columns of the table for each change to the table. Before images do not make sense with a base aggregate target table type. All other target table types can make use of before-image columns.
Source site column names can be renamed for point-in-time and user-copy target table types.
New columns can be derived from the existing source columns on the basis of SQL expressions. New columns can be aggregate functions such as COUNT or SUM in the case of aggregate target table types, or simple derivations in the case of all other target table types.
User-defined functions for DB2 Universal Database can be specified by using the computed column facility in the Control Center.
You can define run-time processing statements using SQL statements and stored procedures before or after the Apply program processes the replication subscription. This feature is useful for pruning CCD tables and controlling the sequence in which replication subscriptions are processed. The run-time processing statements can be run at the source server before the replication subscription is processed, and at both the source and target servers after the replication subscription is processed. For example, you can transform source data before replicating the data or target data after it is copied.
The stored procedures use the SQL CALL statement, newly supported by IBM Replication, without parameters. The run-time procedures are executed together in a single unit-of-work. You can also define acceptable SQLSTATEs for each processing statement.
Depending on the DB2 platform, logic can be invoked through the SQL before and after processing:
The transformations listed below require manipulation of the IBM Replication control tables outside of the Control Center. The techniques required to implement these transformations are not discussed in this book. For planning purposes, when you find that you will use many of these techniques in the replication scenario, you should allow additional implementation time.
When the target table is maintained by different servers or different replication subscriptions, a subset view ensures that the DELETE statement generated before a full-refresh is copied to the appropriate horizontal fragment. The Apply program is almost always driven by SQL operations at the source server and in addition, generates an unqualified DELETE statement at the target server before applying a full-refresh answer set. By defining a subset view over the target table, and defining the replication subscription target to be the view, you restrict the DELETE statement to the appropriate horizontal fragment. That way, a full refresh of data from the source does not affect the information from the other sources.
Although IBM Replication might be able to handle these individually, there are some extreme situations where it is not useful to attempt replication before some manual data cleansing is done.
General rule: If the data transformation can be expressed by using SQL, IBM Replication will almost certainly be able to execute it. The SQL is introduced either as part of the SELECT statement that does the copying or as part of the user-specified SQL before and after statements. This SQL might include procedural logic by including database triggers, user-defined functions, and stored procedure calls as described in Basic Data Enhancement.
IBM Replication can also be supplemented with procedural code in user-written programs. For example, a program could be scheduled to manipulate data in the CD or CCD tables before execution of the Apply program. Or, if the changes are so radical that the data cannot be changed in place, you could go as far as writing a program to read the data changes from the CD or CCD tables and then applying them to the target tables yourself. (See Defining SQL Statements or CALL Procedures for the Replication Subscription to learn about running SQL and Call procedures with replication subscriptions.)