Monday, February 21, 2011

ORA-01578: ORACLE data block corrupted (file # 7, block # 42061)

I've been loading the raw data into my sandbox for my little side project (Florida Medicaid reporting database).

Since much of this is throw away, I am not terribly concerned with repeatable processes, just get the data in there.

I have about 100 files to load, ~250 GBs worth. I've created a spreadsheet with the file names, the associated external table, whether it has been loaded or not and the data loaded. This is just for my own sanity, as I do this when time permits and I don't want to lose track of my progress.

Using the external table name and file name, I've created a formula to build my ALTER statements (ALTER TABLE <table_name> LOCATION ( '<file_name>' );). Each file takes 3 to 5 minutes to load.

The other morning, around 3 AM, I got to the 10th file and received a communication error with the server. Being so late, I didn't bother with researching the issue, I just retired.

The next day, I attempted to look into the issue, first I wanted to see which files were loaded.
SELECT COUNT(*)
FROM orig_claim;

SELECT COUNT(*) FROM orig_claim
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 7, block # 42061)
ORA-01110: data file 7: '/u01/app/oracle/oradata/TESTING/medicaid_02.dbf'
Uh oh.

As I usually do, I posted that error message to Twitter.

Twitter = Awesomesauce Part 983



Here are some answers to the questions above:
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>SELECT
instance_name,
host_name,
version,
status,
archiver
FROM v$instance;

INSTANCE_NAME HOST_NAME VERSION STATUS ARCHIVE
---------------- ---------- ----------------- ------------ -------
TESTING oracle 11.2.0.1.0 OPEN STOPPED

SQL> SELECT
dbid,
name,
log_mode
FROM v$database;

DBID NAME LOG_MODE
---------- --------- ------------
86530622 TESTING NOARCHIVELOG
Also, since I just started loading data, I had not performed a backup yet.

Other possibly pertinent information:
- Using COMPRESS BASIC
- Just before I had the communication error, I had enabled PARALLEL DML for the session and set it to 8 as a hint in the SQL statement
- INSERT /*+ APPEND */ from an external table into a staging table
- Oracle Enterprise Linux 5.5 (Carthage)
- 6 GB of RAM allocated to Oracle
- 4 CPUs allocated to this VirtualBox virtual machine
SQL> SELECT * FROM V$DATABASE_BLOCK_CORRUPTION;

FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
4 100939 1 0 CHECKSUM
7 42061 1 0 CHECKSUM
I tried using RMAN (BLOCKRECOVER DATAFILE 7 BLOCK 42061;), but naturally it had nothing to recover from. I had pondered using DBMS_REPAIR, but I was exactly sure of the implications, i.e. would I have to reload specific data files.



From Martin Berger:
An answer to his question below, what is the corrupted object?
SELECT 
tablespace_name,
segment_type,
owner,
segment_name
FROM dba_extents
WHERE file_id = 7
AND 42061 BETWEEN block_id AND block_id + blocks - 1;
It's a table. The ORIG_CLAIM table to be exact. That's the only core table I have begun to load.

Dropping and recreating it won't be terribly difficult, way better than rebuilding the entire VM/DB.



After identifying that a specific table had a corrupt block, I dropped it per Martin's guidance.

Since I wanted to keep the original (raw) data along with the, dimensional model, I decided to create a separate tablespace to house the raw data. I did that and began to load the data.

That's when my new joy happened:
ORA-01578: ORACLE data block corrupted (file # 8, block # 26031)
At first, at I was just going to drop and recreate the database. Issued the SHUTDOWN command then started it back up in RESTRICTED mode, STARTUP NOMOUNT EXCLUSIVE RESTRICTED, then dropped the thing. After a bit more thinking, I just decided to rebuild the entire VM.

The good news there, I get to mess around with Oracle Enterprise Linux 6. The bad news, more time.

6 comments:

Stewart Bryson said...

On another note... an external table can have multiple files associated with it, so you don't have to alter it to change the files. Just alter the external table to contain all the files, and do a single select against it.

oraclenerd said...

I might have known that, but obviously forgot. Here's the link to the documentation that describes it.

It appears that it is just a comma separated list, cool. Unfortunately, I'm updating the FILE_ID for each file loaded (want to be able to easily find the source after it has all been moved around), so I can't do that, unless...

Martin Berger said...

ok, I still stick to Note:28814.1.
Let's find the object in trouble:


SELECT tablespace_name,
segment_type,
owner,
segment_name
FROM dba_extents
WHERE file_id = 7
AND 42061 BETWEEN block_id AND block_id + blocks - 1


If the segment_type is a table, just drop that particular table and rel-load it.
If it's and index, rebuild it (not online).
If it's something else ... le's think about it later.

have fun!

oraclenerd said...

@Martin

Added the results of that query to the post, it's a table.

So you're saying I don't have to rebuild the entire VM or database? Correct?

Martin Berger said...

I see no reason why you should do more work than necessary (lazy people are good in tuning).
Just drop the object and re-load it.
If such an error re-occurs, it might be worth investigating it.
I am just somewhat irritated by the fact of a block corruption itself. Can we trust your full stack? We will see in the future.

oraclenerd said...

That's excellent news sir, thank you.

I'll drop and begin the load process again. Hopefully this post will die it's natural death. :)