ORACLENERD: rman

Showing posts with label rman. Show all posts

Monday, March 21, 2011

ORA-27054: NFS file system where the file is created or resides is not mounted with correct options

All I seem to post are my issues...well, technical issues.

On Friday I encountered ORA-08103: object no longer exists. The result there, I had to re-re-re-re-re-load the files. Once I did that, I realized that the file name had actual meaning...In the table itself was a column called MONTH which was relative to the fiscal year (July - June). Guess where fiscal year was? Yup, the file name. Ugh.

So, as soon as I got the data loaded, I did this:

RMAN>BACKUP DATABASE;

I didn't want to go through the corrupted block issue for a 3rd time. After some more work, moving data around, I checked out the RMAN docs and decided to do perform an incremental backup. Well, I'm still learning, I apparently did the whole thing.

RMAN>BACKUP DATABASE INCREMENTAL LEVEL 0;

OK, now I'm running out of space, I have 500 GB allocated to this VM.

I read through the RMAN docs some more, BACKUP BACKUPSET. Yeah, backing up your backup. Inception.

I wanted to save this to my Shared Folders that now auto-mount (as of VirtualBox 4.0).

RMAN> BACKUP BACKUPSET 4
2>   FORMAT '/media/sf_files/oradata/TESTING/20110320_%u'
3>   DELETE INPUT;

Starting backup at 20-MAR-11
using channel ORA_DISK_1
channel ORA_DISK_1: input backup set: count=13, stamp=746274912, piece=1
channel ORA_DISK_1: starting piece 1 at 20-MAR-11
channel ORA_DISK_1: backup piece /u01/app/oracle/product/11.2.0/dbhome/dbs/0dm7mg30_1_1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 03/20/2011 21:36:53
ORA-19504: failed to create file "/media/sf_files/oradata/TESTING/20110320_0dm7mg30"
ORA-27054: NFS file system where the file is created or resides is not mounted with correct options
Additional information: 3
ORA-19600: input file is backup piece  (/u01/app/oracle/product/11.2.0/dbhome/dbs/0dm7mg30_1_1)
ORA-19601: output file is backup piece  (/media/sf_files/oradata/TESTING/20110320_0dm7mg30)

Checking the Google Machine took me to Kevin Closson's post Mount Options for Oracle over NFS. It’s All About the Port.. Those options are the following:

rw,bg,hard,nointr,tcp,vers=3,timeo=300,rsize=32768,wsize=32768,actimeo=0

OK, I'm still new to Linux and now my Shared Folders auto-mount, WTF do I do? More Google Machine goodness took me to a post on the OTN Forums with the exact command I needed to run (which is of course, why I am writing this up now...).

[root@medicaid media]# mount -o rw,bg,hard,nointr,tcp,vers=3,timeo=300,rsize=32768,wsize=32768,actimeo=0 -t nfs 
oraclenerd:/media/data_2/files/oradata/TESTING /media/test

See that little trick I picked up? I used oraclenerd:/media/data_2/... instead of relying on the VirtualBox magic. Nice.

No errors, I then checked out the permissions:

[root@medicaid media]# ls -la
total 36
drwxr-xr-x  6 root root   4096 Mar 21 01:37 .
drwxr-xr-x 28 root root   4096 Mar 18 02:13 ..
-rw-r--r--  1 root root      0 Mar 18 02:13 .hal-mtab
drwxrwx---  1 root vboxsf 4096 Mar  4 04:18 sf_data
drwxrwx---  1 root vboxsf 8192 Mar  8 13:59 sf_exp
drwxr-xr-x  1 root vboxsf 4096 Mar 20 21:18 sf_files
drwxr-xr-x  2 root root   4096 Mar 21 01:37 test

OK, let's change those:

[root@medicaid media]# chown oracle:dba test/ -R
[root@medicaid media]# ls -la
total 36
drwxr-xr-x  6 root   root   4096 Mar 21 01:37 .
drwxr-xr-x 28 root   root   4096 Mar 18 02:13 ..
-rw-r--r--  1 root   root      0 Mar 18 02:13 .hal-mtab
drwxrwx---  1 root   vboxsf 4096 Mar  4 04:18 sf_data
drwxrwx---  1 root   vboxsf 8192 Mar  8 13:59 sf_exp
drwxr-xr-x  1 root   vboxsf 4096 Mar 20 21:18 sf_files
drwxr-xr-x  2 oracle dba    4096 Mar 21 01:37 test

Sweet. Now, run the RMAN backup of a backup again:

RMAN> BACKUP BACKUPSET 4
2>   FORMAT '/media/test/2011_%u'
3>   DELETE INPUT;

Starting backup at 21-MAR-11
using channel ORA_DISK_1
channel ORA_DISK_1: input backup set: count=13, stamp=746274912, piece=1
channel ORA_DISK_1: starting piece 1 at 21-MAR-11
channel ORA_DISK_1: backup piece /u01/app/oracle/product/11.2.0/dbhome/dbs/0dm7mg30_1_1

No errors...and it seems to be running fine. If it breaks, I'll update.

Update: 03/21/2011 02:41 AM

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 03/21/2011 02:39:39
ORA-19502: write error on file "/media/test/2011_0dm7mg30", block number 10173824 (block size=8192)
ORA-27072: File I/O error
Linux-x86_64 Error: 11: Resource temporarily unavailable
Additional information: 4
Additional information: 10173824
Additional information: 630784
ORA-19502: write error on file "/media/test/2011_0dm7mg30", block number 10173824 (block size=8192)

Barnacles.

Monday, February 21, 2011

ORA-01578: ORACLE data block corrupted (file # 7, block # 42061)

I've been loading the raw data into my sandbox for my little side project (Florida Medicaid reporting database).

Since much of this is throw away, I am not terribly concerned with repeatable processes, just get the data in there.

I have about 100 files to load, ~250 GBs worth. I've created a spreadsheet with the file names, the associated external table, whether it has been loaded or not and the data loaded. This is just for my own sanity, as I do this when time permits and I don't want to lose track of my progress.

Using the external table name and file name, I've created a formula to build my ALTER statements (ALTER TABLE <table_name> LOCATION ( '<file_name>' );). Each file takes 3 to 5 minutes to load.

The other morning, around 3 AM, I got to the 10th file and received a communication error with the server. Being so late, I didn't bother with researching the issue, I just retired.

The next day, I attempted to look into the issue, first I wanted to see which files were loaded.

SELECT COUNT(*)
FROM orig_claim;

SELECT COUNT(*) FROM orig_claim
                     *
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 7, block # 42061)
ORA-01110: data file 7: '/u01/app/oracle/oradata/TESTING/medicaid_02.dbf'

Uh oh.

As I usually do, I posted that error message to Twitter.

Twitter = Awesomesauce Part 983

Here are some answers to the questions above:

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>SELECT 
  instance_name, 
  host_name, 
  version, 
  status, 
  archiver 
FROM v$instance;

INSTANCE_NAME    HOST_NAME  VERSION           STATUS       ARCHIVE
---------------- ---------- ----------------- ------------ -------
TESTING          oracle     11.2.0.1.0        OPEN         STOPPED

SQL> SELECT 
  dbid, 
  name, 
  log_mode 
FROM v$database;

      DBID NAME      LOG_MODE
---------- --------- ------------
  86530622 TESTING   NOARCHIVELOG

Also, since I just started loading data, I had not performed a backup yet.

Other possibly pertinent information:
- Using COMPRESS BASIC
- Just before I had the communication error, I had enabled PARALLEL DML for the session and set it to 8 as a hint in the SQL statement
- INSERT /*+ APPEND */ from an external table into a staging table
- Oracle Enterprise Linux 5.5 (Carthage)
- 6 GB of RAM allocated to Oracle
- 4 CPUs allocated to this VirtualBox virtual machine

SQL> SELECT * FROM V$DATABASE_BLOCK_CORRUPTION;

     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
         4     100939          1                  0 CHECKSUM
         7      42061          1                  0 CHECKSUM

I tried using RMAN (BLOCKRECOVER DATAFILE 7 BLOCK 42061;), but naturally it had nothing to recover from. I had pondered using DBMS_REPAIR, but I was exactly sure of the implications, i.e. would I have to reload specific data files.

From Martin Berger:
An answer to his question below, what is the corrupted object?

SELECT 
  tablespace_name, 
  segment_type,
  owner, 
  segment_name 
FROM dba_extents 
WHERE file_id = 7 
  AND 42061 BETWEEN block_id AND block_id + blocks - 1;

It's a table. The ORIG_CLAIM table to be exact. That's the only core table I have begun to load.

Dropping and recreating it won't be terribly difficult, way better than rebuilding the entire VM/DB.

After identifying that a specific table had a corrupt block, I dropped it per Martin's guidance.

Since I wanted to keep the original (raw) data along with the, dimensional model, I decided to create a separate tablespace to house the raw data. I did that and began to load the data.

That's when my new joy happened:

ORA-01578: ORACLE data block corrupted (file # 8, block # 26031)

At first, at I was just going to drop and recreate the database. Issued the SHUTDOWN command then started it back up in RESTRICTED mode, STARTUP NOMOUNT EXCLUSIVE RESTRICTED, then dropped the thing. After a bit more thinking, I just decided to rebuild the entire VM.

The good news there, I get to mess around with Oracle Enterprise Linux 6. The bad news, more time.

Wednesday, September 23, 2009

Testing: FLASHBACK, Data Pump and RMAN

I've been having loads of fun the last few days, Learning By Breaking, Learning By...Breaking?, IMPDP and REMAP_TABLESPACE and finally How To: Clean Your Schema (which needs a little work).

All of this in an effort to test my migration script with changes in the hundreds. A quick recap:

1. Import metadata from production environment from 4 schemas. This includes mapping all the tablespaces from production to USER in my sandbox.
2. Create restore point.
3. Run build script.
4. Check for errors, fix errors
5. Flashback database to pre-deployment state.
6. Rinse and repeat as necessary.

If you have read any of those other posts, you'll know that:

a. I'm an idiot.
b. I like to guess.
3. I performed my first "recovery."
d. I learned the basics of Data Pump.
e. I like to break things.

For #1 above, import metadata, I've learned some hard lessons. I've had to repeat this step a number of times because I've either corrupted my database or dropped the restore point before flashing back.

And just now, a colleague of mine helped me out with another problem. See, the import process was extremely slow. Part of the reason (I think) I corrupted the silly thing was because I was mucking around at a level I don't quite understand...storage.

I just sent this pic to him:

I was like, "WTF? Why do I need a 33GB datafile for metadata?"

Him: "Ummm...idiot...it preallocates the space which is based on production."

Me: <crickets>

Me: "How can I change the storage characteristics?" (I sent him the link to the Data Pump docs.)

Me: "How about this?" (Link to the TRANSFORM clause)

Aha...TRANSFORM has 4 options:
1. SEGMENT_ATTRIBUTES (Y, Default, to keep them, N to toss 'em)
2. STORAGE (Y, Default, to keep them, N to toss them)
3. OID
4. PCTSPACE

Where does RMAN fit into all of this? I'm not really sure. Last night I issued

RECOVER DATABASE;

and it worked perfectly. Now, once I get an import completed, I take a backup of the tablespace. (I've since created a separate tablespace for the 2 largest schemas being imported.) That way, if I drop the restore point before flashing back, I should be able to restore it back to it's original state.

One of these days I'll get around to virtualizing all of this. I imagine that has to be easier, import the data, take a snapshot, run the script, fix, revert to previous snapshot. If I did that though, I wouldn't get to play (learn) with all of these cool tools.

Tuesday, September 22, 2009

Learning By...Breaking?

I caught ~~Chen~~ Gwen mocking me the other day on Twitter.

That was in response to Learning By Breaking.

Come on. Give me a break.

Now if I were a big DBA (not a little dba), I would not be so haphazard in my approach. This is a sandbox after all.

Naturally, I was at it again today.

I performed my very first recovery today. That is if you consider

ALTER DATABASE RECOVER;

a recovery. To make it worse, I guessed at the syntax.

Just a reminder as to what I am doing. I am testing my migration script (a couple of hundred DDL/DML and code changes) in an object only copy of production in my own private idaho...sandbox. I run the script, find the errors (usually order of execution related), fix the script(s), flashback the database and then rerun. To make my life someone easier, I created a script that performs the necessary commands to flashback the database.

ALTER DATABASE CLOSE;

FLASHBACK DATABASE TO RESTORE POINT PRE_DEPLOY;

SHUTDOWN;

STARTUP;

ALTER DATABASE OPEN RESETLOGS;

Then I just run @flashback after disconnecting from my other session.

Somewhere along the way I believe I have managed to destroy my database. How did I do that? Well, I don't know. I do know I have accidentally dropped the restore point (2 times) prior to flashing back. When that happens, I clean up the database with one of 2 methods:
1. Use my handy-dandy clean up script to remove all the objects
2. DROP USER test CASCADE;

That's followed by a full import (via datapump) of the 2 necessary schemas.

So tonight I went to flashback and...well, something went really wrong. Something about I needed to recover the system datafile (I didn't save that one unfortunately).

ALTER DATABASE RECOVER?

Nope.

ERROR at line 1:
ORA-00279: change 9235581 generated at 09/22/2009 21:24:09 needed for thread 1
ORA-00289: suggestion : FRA\TESTING\ARCHIVELOG\2009_09_22\O1_MF_1_23_%U_.ARC
ORA-00280: change 9235581 for thread 1 is in sequence #23

Sweet...Oracle's giving me a suggestion. What does that mean? (Me guessing again):

ALTER DATABASE RECOVER ARCHIVELOG 'FRA\TESTING\ARCHIVELOG\2009_09_22\O1_MF_1_23_%U_.arc
*
ERROR at line 1:
ORA-00905: missing keyword

ALTER DATABASE RECOVER LOGFILE 'FRA\TESTING\ARCHIVELOG\2009_09_22\O1_MF_1_23_%U_.ARC
'
*
ERROR at line 1:
ORA-00308: cannot open archived log 'FRA\TESTING\ARCHIVELOG\2009_09_22\O1_MF_1_23_%U
_.ARC'
ORA-27041: unable to open file
OSD-04002: unable to open file
O/S-Error: (OS 2) The system cannot find the file specified.

That second one looked closer right? Meanwhile, I began a backup of the archived redo logs.

BACKUP ARCHIVELOGS ALL;

I wasn't really sure what that was going to do, but I needed some space.

Meanwhile, back in SQL*Plus I keep trying different commands.

SQL>ALTER DATABASE RECOVER TABLESPACE SYSTEM;
ALTER DATABASE RECOVER TABLESPACE SYSTEM
*
ERROR at line 1:
ORA-00275: media recovery has already been started

SQL>ALTER DATABASE RECOVER TABLESPACE USERS;
ALTER DATABASE RECOVER TABLESPACE USERS
*
ERROR at line 1:
ORA-00275: media recovery has already been started

SQL>ALTER DATABASE RECOVER;
ALTER DATABASE RECOVER
*
ERROR at line 1:
ORA-00275: media recovery has already been started

SQL>ALTER DATABASE OPEN;
ALTER DATABASE OPEN
*
ERROR at line 1:
ORA-01156: recovery in progress may need access to files

I'm pretty sure I picked those up in a Google search (in other words, I didn't bother to click through). Out of frustration, I killed the instance (net stop oracleservicetesting) and tried to bring it back up.

STARTUP MOUNT;

SQL>RECOVER DATAFILE 'E:\ORACLE\PRODUCT\10.2.0\ORADATA\TESTING\SYSTEM01.DBF';
ORA-00283: recovery session canceled due to errors
ORA-38798: Cannot perform partial database recovery
ORA-38797: Full database recovery required after a database has been flashed back

For some unknown reason, I went back into RMAN (the session I didn't save to cut and paste) and issued

RECOVER DATABASE;

It's doing something...

Did I get it?

Lo and behold it worked. I was able to get back into my precious sandbox.

What's the moral of the story? I have no idea. Like I said last time, I learned something...I just don't know what it is yet.

Wait, I did learn one thing...Oracle is a pretty incredible piece of software if the likes of me can go in, muck it up, and it still comes back to life. That my friends, is pretty impressive.