Saturday, December 14, 2013

OGG-01973 The redo record indicates data loss on object <OBJECT,ID>

While tinkering with GoldenGate 11.2.1.0.X on an Oracle Database for a recent project, I actually managed to break it by my own mistakes. In my successful attempt to setup DDL replication between Source and Target, I tried to created a table on the source with the expectation that it would be ready on the target shortly. Instead, what I found was an ABENDED extract.

Let’s walk through the setup, steps, and error message.

Configuration

A single Extract writing to trail files that are ready by a single Replicat - both on the same host but, with different source and target databases. The E_GG Extract captures transactions from 2 schemas from the SKY database and the replicat R_GG applies them on the target database (WATER).
Steps

1. Check GoldenGate process status
[oracle@blue ggate]$ ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.2.1.0.13 17435036 OGGCORE_11.2.1.0.0OGGBP_PLATFORMS_131002.1206_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Oct 3 2013 02:39:46

Copyright (C) 1995, 2013, Oracle and/or its affiliates. All rights reserved.

GGSCI (blue.color) 1> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING 
EXTRACT RUNNING E_GG 00:00:00 00:00:14 
REPLICAT RUNNING R_GG 00:25:39 00:23:27
2. All looks good. Create a table on the source schema being replicated.
[oracle@blue ~]$ sqlplus day/oracle

SQL*Plus: Release 11.2.0.3.0 Production on Mon Dec 9 14:05:00 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to:

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> create table breakgg as select * from dba_tables;

Table created. 
3. Check the Extracts status. ABENDED?
[oracle@blue ggate]$ ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.2.1.0.13 17435036 OGGCORE_11.2.1.0.0OGGBP_PLATFORMS_131002.1206_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Oct 3 2013 02:42:32

Copyright (C) 1995, 2013, Oracle and/or its affiliates. All rights reserved.

GGSCI (blue.color) 1> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING 
EXTRACT ABENDED E_GG 00:00:00 00:00:14 
REPLICAT RUNNING R_GG 00:28:39 00:25:27
4. What does the report file tell us?
Source Context :
SourceModule : [er.redo.ora.sr]
SourceID : [/scratch/aime1/adestore/views/aime1_adc4150267/oggcore/OpenSys/src/app/er/redo/oracle/redoorasr.c]
SourceFunction : [get_subrec]
SourceLine : [6377]
ThreadBacktrace : [11] elements
: [/u01/app/ggate/libgglog.so(CMessageContext::AddThreadContext()+0x1e) [0x7ffdba940f3e]]
: [/u01/app/ggate/libgglog.so(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...)+0x2cc) [0x7ffdba939c8c]]
: [/u01/app/ggate/libgglog.so(_MSG_ERR_ORACLE_REDO_DATA_LOSS(CSourceContext*, unsigned int, CMessageFactory::MessageDisposition)
+0x2f) [0x7ffdba9185f1]]
: [/u01/app/ggate/extract() [0x701d69]]
: [/u01/app/ggate/extract(REDOORASR_get_all_subrecs(int, objectpool*, unsigned char*, subrec_info_t*, redo_thread_t*, log_contex
t_t*)+0x6c) [0x701e8c]]
: [/u01/app/ggate/extract(REDO_read(char*, int*, long*, char*, short*, int, char*)+0x7e4) [0x6a4308]]
: [/u01/app/ggate/extract(READ_EXTRACT_RECORD(chkpt_context_t*, short, char*, __std_rec_hdr*, int*, int*, long*, short, short, c
har*, short*, char*, short*, long*, char*)+0x19b) [0x4f9bbb]]
: [/u01/app/ggate/extract(process_extract_loop()+0x625) [0x565975]]
: [/u01/app/ggate/extract(main+0x5e6) [0x578166]]
: [/lib64/libc.so.6(__libc_start_main+0xfd) [0x3d1e41ed1d]]
: [/u01/app/ggate/extract(__gxx_personality_v0+0x3a2) [0x4f18ba]]

2013-12-09 14:05:30 ERROR OGG-01973 The redo record indicates data loss on object 68,687.

***********************************************************************
* ** Run Time Statistics ** *
***********************************************************************


Report at 2013-12-09 14:05:30 (activity since 2013-12-09 13:41:25)

Output to ./dirdat/lt:

 

DDL replication statistics:

Operations: 0
Mapped operations: 0
Unmapped operations: 0
Other operations: 0
Excluded operations: 0


No records extracted.


2013-12-09 14:05:30 ERROR OGG-01668 PROCESS ABENDING.
5. OGG-01973? Well, this just raises too many questions. A quick search on MOS yields only a single note “OGG v11.2 Extract amends with “OGG-01965 The redo record indicates data loss on object <object id>” or "OGG-01973 The redo record indicates data loss on object <object id>" (Doc ID 1483431.1)". In it, there’s a note that the table can not be in NOLOGGING mode. But wait, this a new table and by default LOGGING is turned on.

6. What was I missing? I looked through the report file again, and found this message.
2013-12-09 13:41:25  WARNING OGG-01970  Database is not in archiving mode. Extract may be forced to fetch LONG columns and may miss transaction data if th

e online logs are recycled.

2013-12-09 13:41:25  WARNING OGG-01972  Extract will fetch all LONG columns because archiving mode is disabled.

2013-12-09 13:41:25  WARNING OGG-01423  No valid default archive log destination directory found for thread 1.
7. Ok, is archived logging turned on? Ouch, that is probably it!
SQL> archive log list

Database log mode      No Archive Mode

Automatic archival      Disabled

Archive destination      USE_DB_RECOVERY_FILE_DEST

Oldest online log sequence     52

Current log sequence      54
8. Okay, lets turn it on.
SQL> shutdown immediate

Database closed.

Database dismounted.

ORACLE instance shut down.

SQL> startup mount;

ORACLE instance started.

 

Total System Global Area  814227456 bytes

Fixed Size   2232760 bytes

Variable Size 297799240 bytes

Database Buffers 511705088 bytes

Redo Buffers   2490368 bytes

Database mounted.

SQL> alter database archivelog;

Database altered.

SQL> alter database open;

Database altered.
9. And try to create the table again. By the way, I was not able to skip the particular transaction in the Extract, and the only way around it (for the moment) was a rebuild of the extract and pump.

10. Once I did that, along with enabling archive log mode, DDL replication worked as expected!

In conclusion, follow the steps to pre-requisite checks to the letter i.e. database must have minimum supplemental logging enabled, check point table etc. Otherwise, you are simply inviting trouble! RTM my friends :) I hope you found this post helpful. I’ll have to spend more time to figure out how to skip a “bad” transaction in the Extract and blog about it at a later date.

Cheers!

No comments :

Post a Comment