DB2 Replication Guide and Reference

Staging Changed Data

One of the advantages of IBM Replication is that it allows you to stage changed data; that is, the Capture program captures changes to a source table only once and inserts change data rows into a CD table. The Apply program then pulls the changes from the CD tables. The Capture program also automatically prunes change data rows from change data (CD) tables when they are no longer needed if PRUNE is enabled; it does not, however, prune change data from consistent change data (CCD) tables.

CD and CCD Tables

A CD table receives an arbitrary number of change data rows from the Capture program that are not condensed. The CD table has no knowledge of transaction boundaries, or whether the transactions issuing the updates are committed, are incomplete, or are in flight. The Apply program joins the CD tables with the unit-of work (UOW) table to determine the committed changes to apply to copies. Uncommitted changes are eventually pruned, depending on the retention limit that you define in the Capture program tuning parameters control table.

Rows in a CD table reflect changes that are equivalent, if not identical, to the original operational updates. Uncommitted and incomplete changes can appear in rows in a CD table.

CCD staging tables are copies, defined in much the same way as point-in-time copies. They are the join of the CD and UOW tables and contain only committed change data. Figure 8 shows which columns of the UOW table are included in the CCD table.

Figure 8. The CCD Table. This table is the join of the UOW table at the source server and the CD table for the replication source table.

Table 7 shows the options that you have when you define these staging tables, and which options are the default selected by the Control Center.

Table 7. CCD Staging Table Attributes

CCD Is Local to: Is CCD Complete? Is CCD Condensed? How Can CCD be Used in This Configuration?
Source table Y¹ Y The CCD table is redundant with the local user table, and is therefore not usable as a history.
Remote copies Y² Y As a remote staging table, where history retention is not required.
Source table N² Y As a local staging table, which provides a stable source for synchronizing "fan out" copies.
Remote copies N¹ Y By advanced users, who need to write their own apply programs to maintain indexed files or foreign DBMS tables. This configuration cannot support initialization of new point-in-time copies.
Source table Y¹ N As local history table.
Remote copies Y² N As remote history table.
Source table N² N As a general-purpose, local staging table. When used with the original user table, this configuration can support copies that are complete histories. This table contains all change data for a given interval, but is not a complete history without the user table.
Remote copies N¹ N By advanced users, whose application requirements are for change data only. This configuration cannot support initialization of point-in-time copies.

¹Table and definitions must be set up outside of the Control Center.

²The Control Center creates the table and sets up definitions.

Staging Tables

CCD staging tables hold captured changes from insert, update, or delete operations against a base table. CCD staging tables can be local to, or remote from, the original source.

A complete staging table contains every row of interest from the original source. A condensed staging table contains only the most current value for the row. Because condensed CCD tables do not have the same potential for unlimited growth as CD tables, they do not usually require pruning. Noncondensed CCD tables contain complete histories.

The Apply program inserts rows into noncondensed CCD tables; for condensed CCD tables, the Apply program updates rows already in the table. The Apply program uses CD and CCD tables as sources when maintaining replication subscriptions for point-in-time, change aggregate, user copy, and CCD target tables.

Benefits of Staging Data

You can use staging tables to:

Maintain complete histories of data changes.
Design generalized information and data warehouses.
Support a variety of data delivery configurations.
Minimize change data and unit-of-work join processing.
Condense "hot spot" updates before transmitting data.

For example, you can use staging tables as part of a replication scenario that includes data from IMS and other sources. IBM's DataPropagator NonRelational can deliver IMS change data into a CCD staging table. You can then define the CCD table as a replication source.

With staging tables, you can set up sophisticated distribution networks and balance your work load across multiple DBMSs. You can copy DB2 for MVS changes to a DB2 Universal Database database, and then have, for example, 50 other replication subscriptions referring only to the staging tables. In this way, the DB2 for MVS database is required to maintain only one set of copies directly, although dozens of sets of copies are maintained indirectly.

Using Internal, Local, and Remote CCD Tables

When you create the first CCD table in a replication subscription of a source table and store the target table at the same source server, the target table is called an internal CCD table. Internal CCD tables for DB2 sources are created by joining the unit-of-work and CD tables locally. They serve as a local cache for committed changed data.

If you create a second CCD table replication subscription against that source table, and store the target table at the same source server, the target table is called a local CCD table; only the first CCD at the source server is the internal CCD. Figure 9 shows an example of an internal or local CCD table that replicates changes to two target tables.

Figure 9. An Internal or Local CCD Table. The internal or local CCD table is located at the source server.

A CCD table at a server which is not the source server is called a remote CCD table. The remote CCD table can be located at the target server or an intermediate staging server as shown in Figure 10.

Figure 10. A Remote CCD Table. The remote CCD table is located at an intermediate staging server.

When you subscribe to a source table, the Control Center automatically defines certain target tables as a replication source for further copying, known as auto-registration. However, internal CCD tables are auto-registered differently than CCD tables. When internal CCD tables are auto-registered, replication source information is stored in the CCD_OWNER and CCD_TABLE columns of the source table row in the register control table. Staging tables created in subsequent replication subscriptions are defined as replication source tables.

This difference in auto-registration means that you cannot create replication subscriptions against internal CCD tables. Instead, you create replication subscriptions against the replication source table. The Apply program follows the hierarchy described in How the Apply Program Selects a Source Table to select a source table from which to copy. In most cases, the Apply program chooses the CCD table associated with the source table.

External Data Sources

Changes captured within applications or other system tools, such as DataPropagator NonRelational, can also be defined as sources for replication subscription. The external data source must provide a complete CCD table and the CCD table must be updated by the application. For example, if an IMS segment is the source, DataPropagator NonRelational updates the DB2 CCD table. If the source table is not from IMS, you will need another program to update the CCD table. You can then define the CCD table as a surrogate replication source table with the Control Center. The CCD table can be stored and defined as a replication source in any supported database. You can then define replication subscriptions, regardless of whether the original transaction updates occurred in an IMS or DB2 database.

Transaction-Based versus Transaction-Consistent Replication: Using Internal CCD Tables to Reduce Network Load

IBM Replication supports both transaction-based replication (replication of every update used by every transaction) and nontransaction-based replication (replication of just the net results of the recent activity).

The following example illustrates the difference between the two types:

Transaction 1: Update table1 set col1 = 'X' where key1 = 425
               Update table2 set col2 = 'B' where key2 = 425
Transaction 2: Update table1 set col1 = 'Y' where key1 = 425
Transaction 3: Update table1 set col1 = 'Z' where key1 = 425

In transaction-based replication, all four transactions are captured and replicated. In transaction-consistent replication, only the second update in Transaction 1 and Transaction 3 are replicated.

Transaction-based replication is necessary in update-anywhere scenarios.

Transaction-consistent replication is superior to transaction-based replication because it produces the same change data results with fewer updates actually replicated. This type of replication reduces network load and can increase the availability of the target table.

You can implement transaction-consistent replication by using the internal CCD table model. In this model, outbound queues are condensed before replication, keeping only the latest captured value for each row. The Apply program condenses the queues by copying noncondensed change data already in the CD table into an internal (local) condensed CCD table. This CCD is then used as the source for replicating changes to the target table. Figure 9 shows the configuration of an internal or local CCD table.

How External CCD Tables Are Refreshed (Cascade CCD Full Refresh)

When an Apply program refreshes an external CCD table, it deletes all of the rows from the pruning control table associated with the CCD table. The missing CCD table rows in the pruning control table indicate a full refresh and alert the Apply program that replication subscriptions based on the external CCD table must also be refreshed.

The Apply program keeps track of replication subscriptions based on the external CCD table in the following way:

If the refreshed CCD table contains rows, the Apply program updates the value in the CCD_OLD_SYNCHPOINT column of the corresponding CCD row in the CD control table. The value is set to the minimum commit sequence value of the refresh rows applied to the CCD table.
If the CCD table is empty as a result of the refresh, the CCD_OLD_SYNCHPOINT value is set to binary zeroes. You maintain the CCD_OLD_SYNCHPOINT value after pruning the CCD table because you are responsible for pruning the CCD table.

Developing a Data Warehouse with CCD Tables

You can define any number of replication subscriptions for CCD tables

that refer to a source table. You can also define replication subscriptions referring to CCD tables that are remotely located from the original source table. In this sense, remote CCD tables become surrogate source tables. This introduces distribution databases or warehouses that serve as sources for all other copies. Changes captured on your operational systems need only be replicated once to the warehouse database to be applied to CCD tables in the warehouse database. These CCD staging tables can then serve as surrogate source tables for all other replication subscription definitions.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]