
Comprehensive notes covering architecture, topology, Designer, Knowledge Modules, SCD, monitoring, load plans, Big Data integration, and deployment for Oracle Data Integrator 12c.
Oracle Data Integrator (ODI) 12c is a comprehensive data integration platform that implements the E-LT (Extract, Load, Transform) paradigm, pushing transformations to the target database for maximum performance. Its architecture consists of several key components:
Traditional ETL (Extract, Transform, Load) tools extract data, transform it in a middle-tier engine, and then load it into the target. This approach can be inefficient for large volumes because transformations occur outside the database.
ODI adopts an ELT (Extract, Load, Transform) approach, where data is first loaded into a staging area (often the target database) and transformations are performed using native database SQL. This leverages the power of the target database engine, minimizing data movement and significantly improving performance.
ODI's declarative design allows developers to define the logical transformation rules without specifying how they are executed. The Knowledge Modules (KMs) handle the physical implementation, automatically generating optimized SQL code based on the source and target databases. This reduces development effort and ensures optimal performance across heterogeneous environments.
Installation Steps:
# RCU command line example
$ORACLE_HOME/bin/rcu -silent \
-createRepository \
-databaseType ORACLE \
-connectString localhost:1521:orcl \
-dbUser SYS \
-dbPassword sys_password \
-schemaPrefix ODI \
-components ODI
Connecting ODI Studio to Repositories:
The Topology navigator in ODI Studio is used to define the infrastructure:
-- Example: Defining an Oracle Data Server in Topology -- Physical Architecture -> Data Servers -> New Data Server -- Name: ORCL_DEV -- Technology: Oracle -- JDBC Driver: oracle.jdbc.OracleDriver -- JDBC URL: jdbc:oracle:thin:@localhost:1521:orcl -- User: odi_user -- Password: odi_password -- Logical Architecture -> Logical Schema -> Map to Physical Schema
Contexts are used to manage environment-specific variables:
Languages define the default language for messages and prompts. OD I supports multiple languages for multi-lingual deployments.
Schemas represent the actual database schemas (users) that contain the data. They are linked to logical schemas via contexts.
ODI supports a wide range of data sources: Oracle, SQL Server, MySQL, IBM DB2, Teradata, flat files (CSV, fixed-width), XML, JSON, and more. Connections are managed through the Topology navigator. For file systems, you define a "File" data server with a directory path and optionally a file mask.
-- Example: File Data Server for flat files -- Name: FILE_IN -- Technology: File -- Directory: /data/inbound/ -- File Mask: *.csv -- Logical Schema: FILE_SCHEMA -- Context: DEV points to /data/inbound/, PROD to /prod/data/inbound/
ODI Studio has four main navigators:
In the Designer navigator, you organize work into Projects. A project contains Folders that group related objects: interfaces, packages, load plans, variables, etc. Models are created for each data source (database schema or file structure). Models contain Datastores (tables, views, files). Reverse-engineering a model imports the metadata from the source into ODI.
-- Steps to create a Model: -- 1. In Designer, right-click on Models and select New Model. -- 2. Provide a name (e.g., "SALES_ODS"), select the Logical Schema (e.g., "SALES_ODS_SCHEMA"). -- 3. Click on Reverse Engineer to import tables, columns, and relationships. -- 4. Choose the objects to reverse engineer (tables, views, synonyms).
Reverse engineering is the process of reading metadata from the database into ODI. You can select which tables, views, and synonyms to import. After reverse engineering, datastores appear under the model. You can modify datastore properties, add filters, change column definitions, and define primary/foreign keys.
An Interface in ODI is a data flow that extracts data from one or more sources, transforms it, and loads it into a target. The graphical flow editor allows you to drag and drop datastores, define joins, filters, and transformations, and specify the target table.
-- Creating a simple Interface: -- 1. In Designer, right-click on Interfaces and select New Interface. -- 2. Give it a name (e.g., "LOAD_CUSTOMERS"). -- 3. Drag source datastore(s) from the Model into the Source panel. -- 4. Drag the target datastore into the Target panel. -- 5. Draw a link between source and target columns (mapping). -- 6. Add any filters or transformations in the expression editor. -- 7. Select the appropriate Knowledge Modules (LKM, IKM, CKM). -- 8. Save and execute.
KMs are reusable code templates that define how ODI executes specific tasks. There are several types:
LKM defines how data is extracted from the source and loaded into a staging table. For example, LKM SQL to SQL uses a database link to pull data from a remote database into the staging area.
IKM defines how data from the staging table is integrated into the target. Common IKMs include:
Choosing the right KM is critical for performance and functionality. ODI provides a wide range of built-in KMs, and you can also create custom KMs.
Check Knowledge Module (CKM) is used to validate data against defined constraints (e.g., not null, unique, foreign key). You can use CKM at different stages:
Errors can be logged to error tables, and you can define actions for handling rejected records.
-- Example: Enabling Flow Control -- In the Interface properties, set the CKM to "CKM SQL". -- Under the Flow Control tab, select the checks to perform. -- Options: check primary keys, referential integrity, data types, etc. -- Errors will be stored in the error table defined in the model.
Change Data Capture (CDC) allows you to capture only changed data from the source. ODI supports CDC through Journalization (using database triggers or log-based capture). Key steps:
-- Enabling Journalization -- 1. Right-click on the source datastore and select "Journalizing". -- 2. Choose the journalizing mode (Simple or Consistent). -- 3. For Simple mode, specify the timestamp or version column. -- 4. ODI will create necessary objects (journal tables, triggers).
SCD is a common data warehousing pattern for maintaining historical data. ODI supports Type 1, Type 2, and Type 3 SCD via dedicated KMs.
IKM SCD is specifically designed for Type 2 SCD. It requires a dimension table with start and end date columns, an active flag, and optionally a surrogate key. The KM handles updating the current row and inserting new rows.
IKM Incremental Update can be adapted for Type 1 SCD by setting appropriate update and insert columns. It performs a merge (upsert) operation.
-- Configuring IKM SCD for Type 2 -- 1. In the Interface, select the target datastore (dimension table). -- 2. Choose IKM SCD (e.g., IKM Oracle SCD). -- 3. Define the SCD columns: natural key, start date, end date, active flag. -- 4. Map the source columns to target columns. -- 5. In the KM options, set the behavior for updates and inserts.
ODI interfaces support complex mappings involving multiple source tables with inner/outer joins, aggregations, and filtering. You can also use subqueries and expressions within the flow. For multi-target inserts, you can use multi-table insert KMs.
The expression editor in ODI allows you to write SQL functions, case statements, and custom transformations. You can use ODI's built-in functions (e.g., SYSDATE, NVL, DECODE) or database-specific functions.
-- Defining a variable: -- 1. In Designer, right-click on Variables and select New Variable. -- 2. Name: VAR_LOAD_DATE, Type: Date, Default Value: SYSDATE. -- Using the variable in an interface: set a column to #VAR_LOAD_DATE.
ODI can handle heterogeneous sources in a single interface using the Multi-Source Mapping feature. You can join data from an Oracle table with a CSV file and load into a SQL Server table. The LKM handles the extraction from each source.
Use Sub-interfaces or Procedures to encapsulate reusable logic. You can also create template interfaces and copy them. Packages allow you to orchestrate multiple interfaces in a sequence.
ODI can enforce referential integrity during data loading. Use CKM to check foreign keys and prevent orphan records. You can also define constraints in the model and use the "Control" option in the interface.
The Operator navigator is the central console for monitoring ODI executions. It displays sessions, tasks, and execution logs. You can:
When you execute an interface, ODI goes through several phases:
Each phase can produce logs and errors that you can view in the Operator.
In the Operator, you can click on a running session to see its live progress. You can also view the generated SQL code for each step, which is useful for debugging.
You can use Conditional Steps and Error Handlers in Packages to handle failures. For example, if an interface fails, you can send an email notification or run a recovery procedure.
Use CKM's "Distinct" check to eliminate duplicates. You can also create custom check conditions in the model to enforce business rules.
ODI can store rejected records in error tables. You can then query these tables to report on data quality issues. The error table structure is generated by the CKM.
A Load Plan is a high-level orchestration object that sequences multiple packages, interfaces, and procedures. It provides features like parallel execution, conditional branching, and restartability.
ODI Agents can be scheduled to run load plans or packages at specific times using the built-in scheduler or integration with enterprise schedulers like Oracle Scheduler, Control-M, or Autosys.
You can also use the ODI Console (web interface) for monitoring and scheduling.
You can define variables at the project or load plan level that are passed at runtime. Context switching allows you to run the same load plan with different physical resources (e.g., DEV vs PROD).
Load plans support restart — if a step fails, you can restart from the failure point after fixing the issue. You can also configure rollback options (e.g., using Oracle Flashback or transaction rollback).
Use the Smart Export and Smart Import utilities to move ODI objects between environments. You can export a project to an XML file, transfer it, and import it into another repository. This is essential for version control and promotion.
# Export a project odiparameters.bat -export -path "C:\Projects\ODI_SALES.xml" -project "SALES" # Import a project odiparameters.bat -import -file "C:\Projects\ODI_SALES.xml"
ODI projects can be stored in XML format and committed to version control systems. Teams can work on different branches and merge changes.
ODI 12c provides connectors for Hadoop and Spark. You can:
Supported KMs include LKM Hive, IKM Hive, and LKM Spark.
-- Example: Using LKM Hive to load data from a file to Hive -- 1. Create a Model for Hive datastore. -- 2. In Interface, source is a file (File model), target is a Hive table. -- 3. Choose LKM Hive (to load into Hive staging). -- 4. Choose IKM Hive (to integrate into target Hive table).
Oracle Big Data Connectors allow ODI to leverage Hadoop's distributed processing. Advanced KMs include IKM Spark for in-memory transformations and LKM Spark for distributed loading.
ODI supports real-time integration through:
You can create interfaces that process CDC data and load incrementally.
Use Smart Export/Import to promote projects. Manage environment-specific configurations using contexts and logical schemas. Use separate repositories for each environment.
OdiLite is a lightweight runtime environment for executing ODI jobs without the full Studio. It can be embedded in applications. Standalone agents can be installed on any machine to execute jobs.
Project Objective: Design and implement a full data integration pipeline for a real-world data warehousing scenario using Oracle ODI 12c.
Deliverables:
📚 References: