Friday, March 18, 2011

ORA-08103: object no longer exists

From the docs:
ORA-08103: object no longer exists
Cause: The object has been deleted by another user since the operation began, or a prior incomplete recovery restored the database to a point in time during the deletion of the object.
Action: Action: Delete the object if this is the result of an incomplete recovery.

It started innocently enough.



Weird, I can see it in the schema browser.



I can select it from the schema browser and view the table properties.



I can issue a SELECT *



Strangely, the Count Rows just fails silently.

Enough of this GUI crap.

From the server itself, but first, the specs:
Host: Ubuntu 10.10 Desktop
RAM: 24 GB
CPU: Dual Xeon QuadCore something or another
VirtualBox: 4.0.4 r70112

Guest: Oracle Enterprise Linux 5.5 (Carthage)
RAM: 12 GB
Processors: 4
Shared Folders: 3
Guest Additions installed.
Database: 11gR2

Describe the table:
sql>desc eligibility
Name Null? Type
----------------------- -------- ----------------
FILE_ID NUMBER
RECORD_ID NUMBER(38)
RECIPIENT_ID VARCHAR2(12)
MONTH_OF NUMBER(4)
DISTRICT_CODE VARCHAR2(2)
COUNTY VARCHAR2(2)
ELIGIBILITY_FLAG VARCHAR2(1)
HMO_FLAG VARCHAR2(1)
NURSING_HOME_FLAG VARCHAR2(1)
MEDIPASS_FLAG VARCHAR2(1)
AGE NUMBER(38)
PROGRAM_CODE VARCHAR2(4)
GENDER VARCHAR2(1)
Good.

SELECT COUNT(*)
sql>SELECT COUNT(*) FROM eligibility;
SELECT COUNT(*) FROM eligibility
*
ERROR at line 1:
ORA-08103: object no longer exists
OK.
sql>SELECT * FROM eligibility WHERE rownum < 11;

FILE_ID RECORD_ID RECIPIENT_ID MONTH_OF DI CO E H N M AGE PROG G
---------- ---------- ------------ ---------- -- -- - - - - ---------- ---- -
52 1892424743 ACF9ASfEJrVw 1 07 48 6 0 0 0 27 MM P F
52 1892424744 ACF9ASfEJrVw 2 07 48 6 0 0 0 27 MM P F
52 1892424745 ACF9ASfEJrVw 3 07 48 6 0 0 0 28 MM P F
52 1892424746 ACF9ASfEJrVw 4 07 48 6 0 0 0 28 MM P F
52 1892424747 ACF9ASfEJrVw 5 07 48 1 0 0 0 28 MA R F
52 1892424748 ACF9ASfEJrVw 6 07 48 1 0 0 0 28 MA R F
52 1892424749 ACF9ASfEJrVw 7 07 48 1 0 0 0 28 MA R F
52 1892424750 ACF9ASfEJrVw 8 07 48 1 0 0 0 28 MA R F
52 1892424751 ACF9ASfEJrVw 9 07 48 1 0 0 0 28 MA R F
52 1892424752 ACF9ASfEJrVw 10 07 48 1 0 0 0 28 MA R F

10 rows selected.
Hah. You are there, sort of. Try to fool it.
sql>SELECT COUNT(*) FROM ( SELECT * FROM eligibility );
SELECT COUNT(*) FROM ( SELECT * FROM eligibility )
*
ERROR at line 1:
ORA-08103: object no longer exists
Barnacles.

If I remember correctly, this table took about 10 hours to load. Sometime later, I had another corrupted block, fortunately it was on a small table so I dropped it, the tablespace and the associated data file. Next time I restarted the server, I'm told to run fsck. Not good.

I run it with the -c switch. I can get back into it. Now this.

This is a DW type environment. ARCHIVELOG is not turned on. There is no backup.

So, should I drop and reload the dang thing? Or is it somehow recoverable?

Help. Winning answer will get a t-shirt.

Update 03/18/2011 10:46 PM EST
I tried Gary's method from below first, as it seemed the easiest to test out. There was a single bitmap index on the table, so I booted it:
DROP INDEX bmp_fileid_eligibility;
Now the test:
SELECT COUNT(*) FROM eligibility;
Waiting...
ORA-01578: ORACLE data block corrupted (file # 9, block # 2440127)
ORA-01110: data file 9: '/u01/app/oracle/oradata/TESTING/medicaid_01_04.dbf'
01578. 00000 - "ORACLE data block corrupted (file # %s, block # %s)"
*Cause: The data block indicated was corrupted, mostly due to software
errors.
*Action: Try to restore the segment containing the block indicated. This
may involve dropping the segment and recreating it. If there
is a trace file, report the errors in it to your ORACLE
representative.
Great. Same situation as last time. I ended up rebuilding the entire VM. I've never had these kinds of problems before...perhaps there is something wrong with my system setup or possibly how I setup Linux? I don't know. Sadly, I currently lack the skills necessary to figure it out. More importantly, I lack the time.

Following Martin's suggestion from last time, I have determined that the corrupt block belongs to the table and not the index (duh, I just dropped it and got that error). Next up, Joel's suggestion, CTAS.

Update 03/19/2011 1:46 AM EST
No go with CTAS
Error starting at line 1 in command:
CREATE TABLE elig
COMPRESS BASIC
NOLOGGING
AS
SELECT *
FROM eligibility
Error at Command Line:6 Column:5
Error report:
SQL Error: ORA-01578: ORACLE data block corrupted (file # 9, block # 2440127)
ORA-01110: data file 9: '/u01/app/oracle/oradata/TESTING/medicaid_01_04.dbf'
01578. 00000 - "ORACLE data block corrupted (file # %s, block # %s)"
*Cause: The data block indicated was corrupted, mostly due to software
errors.
*Action: Try to restore the segment containing the block indicated. This
may involve dropping the segment and recreating it. If there
is a trace file, report the errors in it to your ORACLE
representative.
Update 03/19/2011 3:46 AM EST
After reviewing my notes, I decided to drop and reload. This wasn't the large table that I have, it was only 10 GB or so of compressed data. I was able to drop the load time by 67% by moving the parallel hint from the INSERT section to the SELECT section. Originally it looked like this:
INSERT /*+ append parallel( e, 8 ) */ INTO claims
SELECT
col1,
...,
col10
FROM ext_claims;

COMMIT;
Changing it to this sped things up dramatically
INSERT /*+ append */ INTO claims
SELECT /*+ parallel( e, 8 ) */
col1,
...,
col10
FROM ext_claims;
After it was complete this time...
SQL>SHUTDOWN IMMEDIATE;
SQL>STARTUP MOUNT;
SQL>host

[oracle@medicaid ~]$ rman target=sys/testing@testing

Recovery Manager: Release 11.2.0.1.0 - Production on Sat Mar 19 04:30:46 2011

Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.

connected to target database: TESTING (DBID=89312249)


RMAN> backup database;

Starting backup at 19-MAR-11
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=91 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
...
...
...
input datafile file number=00001 name=/u01/app/oracle/oradata/TESTING/system01.dbf
input datafile file number=00002 name=/u01/app/oracle/oradata/TESTING/sysaux01.dbf
input datafile file number=00005 name=/u01/app/oracle/oradata/TESTING/example01.dbf
input datafile file number=00003 name=/u01/app/oracle/oradata/TESTING/undotbs01.dbf
input datafile file number=00004 name=/u01/app/oracle/oradata/TESTING/users01.dbf
channel ORA_DISK_1: starting piece 1 at 19-MAR-11
channel ORA_DISK_1: finished piece 1 at 19-MAR-11
piece handle=/u01/app/oracle/product/11.2.0/dbhome/dbs/07m7j6h6_1_1 tag=TAG20110319T043341 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 01:05:26
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
including current SPFILE in backup set
channel ORA_DISK_1: starting piece 1 at 19-MAR-11
channel ORA_DISK_1: finished piece 1 at 19-MAR-11
piece handle=/u01/app/oracle/product/11.2.0/dbhome/dbs/08m7jabs_1_1 tag=TAG20110319T043341 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 19-MAR-11
Now I have a good backup. Will continue to do this after anything major changes.

Still need to figure out what is happening, Gary's suggestion below to check out the BLOCK related init parameters will be a start.

5 comments:

EscVector said...

I'd say depends, but... In this case, if there's any question, load again. It may take as long to figure it out only to find you can't recover anyway. Just my quantitatively eased 3 cents.

Joel Garry said...

Can you create table as SELECT * FROM eligibility WHERE rownum < 11;

Are there indices on the table?

If the ctas doesn't work, how about setting a delimiter, spooling select * to a file, and using sqlloader?

If you find you are dying on a particular bogus block, there may be some way to force that to be skipped. (dbms_repair, skipping bad blocks, some other stuff I can't think of because I am trained to run for the train).

SydOracle said...

I'd suspect an unusable or invalid index on a not null column (or a bitmap index on any column). The count(*) would use the index, but a where rownum query wouldn't.

Try dropping any indexes on it. Or a full_scan hint. Or ALTER SESSION SET SKIP_UNUSABLE_INDEXES=TRUE

SydOracle said...

Have you considered
DB_BLOCK_CHECKING and DB_BLOCK_CHECKSUM next time ?

http://download.oracle.com/docs/cd/B19306_01/server.102/b14237/initparams039.htm

http://download.oracle.com/docs/cd/B19306_01/server.102/b14237/initparams040.htm#CHDDCEIC

They might at least indicate if the issue comes from memory corruption (picked up when DBWR writes block to disk from memory) or disk corruption (picked up when DBWR reads block from disk).

Some form of RAID would (in theory) help in the latter.

But then IANADBA (equivalent of "I am not a lawyer").

oraclenerd said...

@gary

those are definitely on the list for next time...hopefully i'll have more time than i do now.

muchas gracias.