Tuesday, March 23, 2010

1Z0-052 - Moving Data


Describe and use methods to move data (Directory objects, SQL*Loader, External Tables)
  • Data Pump
  • SQL*Loader
  • External Tables
  • APEX
  • SQL Developer
  • Toad
  • SQL*Plus (COPY)
  • Etc.
Explain the general architecture of Oracle Data Pump
The following was found in this great white paper (pdf): Data Pump in Oracle® Database 11g: Foundation for Ultra High-Speed Data Movement Utilities.
  • Master Table - This is a table created in the schema of the user running a Data Pump job. It is a directory that maintains all details about the job: the current state of every object being exported or imported, the locations of those objects in the dumpfile set, the user-supplied parameters for the job, the status of every worker process, the current set of dump files, restart information and so on.

    During a file-based export job, the master table is built during execution and written to the dumpfile set as the last step. Conversely, loading the master table into the current user's schema is the first step of a file-based import operation, so that the master table can be used to sequence the creation of all objects imported.

    The user of the master table is the key to the ability of Data Pump to restart a job in the event of a planned or unplanned job stoppage. Because it maintains the status of every object to be processed by the job, Data Pump knows which objects were currently being worked on, and whether or not those objects were successfully completed.
  • Process Structure - A Data Pump job comprises several processes. These processes are described in the order of their creation.
    • Client Process - This is the process that makes calls to the Data Pump API.
    • Shadow Process - This is the standard Oracle shadow (or foreground) process created when a client logs in to the Oracle Database. The shadow services Data Pump API requests. Upon receipt of a DBMS_DATAPUMP.OPEN request, the shadow process created the job, which consists primarily of creating the master table, the AdvancedQueuing (AQ) queues used for communication among the various processes, and the master control process. Once a job is running, the main task of the shadow process consists of servicing GET_STATUS requests from the client. If the client detaches, the shadow process also goes away.
    • Master Control Process (MCP) - the MCP controls the execution and sequencing of a Data Pump job. There is one MCP per Data Pump job, maintaining the job state, job description, restart, and dumpfile information in the master table. A job is divided into various phases of metadata and data unloading or loading, and the MCP hands out work request to the worker processes appropriate for the current phase. The bulk of MCP processing is performed in this work dispatch loop. The MCP lalso performs central file management duties, maintaining the active dumpfile list and handing out file pieces as request by processes unloading data or metadata. An MCP has a process name of the form: <instance>_DMnn_<pid>.
    • Worker Process - Upon receipt of a START_JOB request, the MCP creates worker processes as needed, according to the value of the PARALLEL parameter. The worker processes perform the tasks requests by the MCP (primarily unloading and loading of metadata and data), and maintain the object rows that make up the bulk of the master table. As database objects are unloaded or loaded, these rows are written and updated with the current status of these objects: pending, completed, failed, and so on. The worker processes also maintain type completion rows, which describe the type of object currently being worked on: tables, indexes, views and so on. The types completion rows are used during restart. A worker process has a name of the form: "*DWnn*".
    • Parallel Query (PQ) Process - If the External Tables data access method is chosen for loading or unloading a table or partition, some parallel query processes are created by the worker process that was given the load or unload assignment, and the worker process then acts as the query coordinator. The are standard parallel execution slaves that exploit the parallel execution architecture of Oracle Database, and enable intra-partition loading and unloading. In Real Application Clusters (RAC), parallel query processes may be created on an instance other than that on which the Data Pump job was initiated. All other processes described thus far are created on that initial instance.
  • Data Movement
    • In order of speed (top most being the fastest)
      1. Data File Copying (transportable tablespaces)
      2. Direct Path load and unload
      3. External Tables
      4. Conventional Path
  • Metadata Movement - The Metadata API (DBMS_METADATA) is used by worker processes for all metadata unloading and loading. The Metadata API extracts object definitions from the database, and writes them to the dumpfile set as XML documents. This allows great flexibility to apply XML XSLTs when created the DDL at import time. The COMPRESSION parameter can be used to decrease the size of metadata written during a Data Pump export job.
  • Interprocess Communication - uses AQ.
  • File Management - Each worker and parallel query process makes local process requests to the file manager to allocate space, read a file chunk, write to a buffer, or update progress statistics. The local file manager determines if the request can be handled locally and if not, forwards it to the MCP using the command and control queue. Reading file chunks and updating file statistics in the master table are handled locally. Writing to a buffer is typically handled locally, but may result in a request to the MCP for more space.
  • Directory Management
Use Data Pump Export and Import to move data between Oracle databases

Done it.


No comments: