Oracle ODI 12c — Detailed Course Notes

Comprehensive notes covering architecture, topology, Designer, Knowledge Modules, SCD, monitoring, load plans, Big Data integration, and deployment for Oracle Data Integrator 12c.

📑 Table of Contents

Week 1 — Architecture & Topology Management
Week 2 — Designer: Interfaces, Mappings & Knowledge Modules
Week 3 — Advanced ETL/ELT: SCD & Integration
Week 4 — Operator, Monitoring & Error Handling
Week 5 — Load Plans, Scheduling & Production Automation
Week 6 — Big Data Integration, Deployment & Capstone

Week 1 — Architecture & Topology Management Foundations

Overview of Oracle Data Integrator 12c Architecture

Oracle Data Integrator (ODI) 12c is a comprehensive data integration platform that implements the E-LT (Extract, Load, Transform) paradigm, pushing transformations to the target database for maximum performance. Its architecture consists of several key components:

ODI Studio — the graphical development environment for designing, managing, and monitoring data integration projects. It includes four navigators: Designer, Operator, Topology, and Security Manager.
Repositories — metadata stores that hold all ODI objects. There are two types:
- Master Repository — stores security, topology, and versioning information. There is one Master Repository per ODI installation.
- Work Repository — stores project metadata (models, interfaces, packages, load plans) and execution logs. There can be multiple Work Repositories per Master Repository.
ODI Agent — a Java-based service that orchestrates and executes integration jobs (sessions). Agents can be standalone or deployed in Java EE containers.
Knowledge Modules (KMs) — reusable code templates that define how data is extracted, loaded, and transformed. KMs are the building blocks of ODI's declarative design.
Topology — defines the physical and logical infrastructure (data servers, schemas, contexts, agents) used in ODI.

💡 Key Concept: ODI's declarative design separates the "what" (logical mappings) from the "how" (physical execution). This allows the same interface to be executed on different databases with optimal performance.

Understanding ETL vs. ELT with ODI's Declarative Design

Traditional ETL (Extract, Transform, Load) tools extract data, transform it in a middle-tier engine, and then load it into the target. This approach can be inefficient for large volumes because transformations occur outside the database.

ODI adopts an ELT (Extract, Load, Transform) approach, where data is first loaded into a staging area (often the target database) and transformations are performed using native database SQL. This leverages the power of the target database engine, minimizing data movement and significantly improving performance.

ODI's declarative design allows developers to define the logical transformation rules without specifying how they are executed. The Knowledge Modules (KMs) handle the physical implementation, automatically generating optimized SQL code based on the source and target databases. This reduces development effort and ensures optimal performance across heterogeneous environments.

Installing and Configuring ODI Studio, Master Repository, and Work Repository

Installation Steps:

Install ODI 12c software using the Oracle Universal Installer or the Quick Start installer.
Install a supported database (Oracle, MySQL, etc.) for the repositories.
Create a database schema for the Master Repository and Work Repository.
Run the Repository Creation Utility (RCU) to create the repository tables and load necessary metadata.
Configure ODI Studio to connect to the repositories using the Master and Work Repository connections.
Set up the ODI Agent(s) for scheduling and execution.

# RCU command line example
$ORACLE_HOME/bin/rcu -silent \
    -createRepository \
    -databaseType ORACLE \
    -connectString localhost:1521:orcl \
    -dbUser SYS \
    -dbPassword sys_password \
    -schemaPrefix ODI \
    -components ODI

Connecting ODI Studio to Repositories:

Launch ODI Studio and select "Connect to Repository".
Enter connection details: login, password, host, port, service name.
Select the Master Repository and the Work Repository.
Test the connection and save it.

Setting up the Topology: Physical and Logical Architectures, Data Servers, and Agents

The Topology navigator in ODI Studio is used to define the infrastructure:

Physical Architecture — defines the actual data servers (databases, file systems, applications) and their connection details (host, port, schema, credentials).
Logical Architecture — provides an abstraction layer that maps physical servers to logical names. This enables environment portability (e.g., development, test, production).
Data Servers — represent database instances, flat files, XML sources, etc. Each data server has a technology type (Oracle, SQL Server, File, etc.) and connection properties.
Agents — define the execution environment for ODI jobs. You can configure multiple agents for load balancing and failover.
Contexts — allow separation of configuration parameters (e.g., connection strings, directory paths) for different environments (DEV, QA, PROD).

-- Example: Defining an Oracle Data Server in Topology
-- Physical Architecture -> Data Servers -> New Data Server
-- Name: ORCL_DEV
-- Technology: Oracle
-- JDBC Driver: oracle.jdbc.OracleDriver
-- JDBC URL: jdbc:oracle:thin:@localhost:1521:orcl
-- User: odi_user
-- Password: odi_password
-- Logical Architecture -> Logical Schema -> Map to Physical Schema

Configuring Contexts, Languages, and Schemas for Multi-Environment Deployments

Contexts are used to manage environment-specific variables:

Create a context for each environment (e.g., DEV, UAT, PROD).
Assign appropriate physical schemas to logical schemas within each context.
Use context-specific variables (e.g., file paths, connection strings).

Languages define the default language for messages and prompts. OD I supports multiple languages for multi-lingual deployments.

Schemas represent the actual database schemas (users) that contain the data. They are linked to logical schemas via contexts.

Managing Database Connections and File Systems

ODI supports a wide range of data sources: Oracle, SQL Server, MySQL, IBM DB2, Teradata, flat files (CSV, fixed-width), XML, JSON, and more. Connections are managed through the Topology navigator. For file systems, you define a "File" data server with a directory path and optionally a file mask.

-- Example: File Data Server for flat files
-- Name: FILE_IN
-- Technology: File
-- Directory: /data/inbound/
-- File Mask: *.csv
-- Logical Schema: FILE_SCHEMA
-- Context: DEV points to /data/inbound/, PROD to /prod/data/inbound/

Navigating the ODI Studio Interface

ODI Studio has four main navigators:

Designer — for creating projects, models, interfaces, packages, and load plans.
Operator — for monitoring job executions, viewing logs, and managing sessions.
Topology — for defining infrastructure (data servers, agents, contexts).
Security Manager — for managing users, roles, and permissions.

Best Practices for Repository Backup, Versioning, and Environment Setup

Backup repositories regularly using standard database backup tools (RMAN, expdp, etc.).
Use version control (Git/SVN) for ODI objects (interfaces, packages, load plans). Export projects as XML and commit to version control.
Standardize naming conventions for projects, folders, interfaces, variables, etc.
Keep development, test, and production environments isolated using contexts to manage differences.
Document dependencies and data lineage for each interface.

Week 2 — Designer: Interfaces, Mappings & Knowledge Modules Design

Creating Projects, Folders, and Models

In the Designer navigator, you organize work into Projects. A project contains Folders that group related objects: interfaces, packages, load plans, variables, etc. Models are created for each data source (database schema or file structure). Models contain Datastores (tables, views, files). Reverse-engineering a model imports the metadata from the source into ODI.

-- Steps to create a Model:
-- 1. In Designer, right-click on Models and select New Model.
-- 2. Provide a name (e.g., "SALES_ODS"), select the Logical Schema (e.g., "SALES_ODS_SCHEMA").
-- 3. Click on Reverse Engineer to import tables, columns, and relationships.
-- 4. Choose the objects to reverse engineer (tables, views, synonyms).

Reverse-Engineering Database Schemas and Working with Datastores

Reverse engineering is the process of reading metadata from the database into ODI. You can select which tables, views, and synonyms to import. After reverse engineering, datastores appear under the model. You can modify datastore properties, add filters, change column definitions, and define primary/foreign keys.

Building Basic and Intermediate Interfaces (Mappings)

An Interface in ODI is a data flow that extracts data from one or more sources, transforms it, and loads it into a target. The graphical flow editor allows you to drag and drop datastores, define joins, filters, and transformations, and specify the target table.

-- Creating a simple Interface:
-- 1. In Designer, right-click on Interfaces and select New Interface.
-- 2. Give it a name (e.g., "LOAD_CUSTOMERS").
-- 3. Drag source datastore(s) from the Model into the Source panel.
-- 4. Drag the target datastore into the Target panel.
-- 5. Draw a link between source and target columns (mapping).
-- 6. Add any filters or transformations in the expression editor.
-- 7. Select the appropriate Knowledge Modules (LKM, IKM, CKM).
-- 8. Save and execute.

Understanding Knowledge Modules (KMs)

KMs are reusable code templates that define how ODI executes specific tasks. There are several types:

Loading Knowledge Module (LKM) — extracts data from a source and loads it into a staging area (e.g., LKM SQL to SQL, LKM File to SQL).
Integration Knowledge Module (IKM) — integrates data from the staging area into the target (e.g., IKM SQL Control Append, IKM Incremental Update).
Check Knowledge Module (CKM) — validates data quality and handles errors (e.g., CKM SQL).
Service Knowledge Module (SKM) — used for web services integration.

Deep Dive into IKM and LKM

LKM defines how data is extracted from the source and loaded into a staging table. For example, LKM SQL to SQL uses a database link to pull data from a remote database into the staging area.

IKM defines how data from the staging table is integrated into the target. Common IKMs include:

IKM SQL Control Append — inserts new rows into the target, applying constraints.
IKM SQL Incremental Update — performs merge (update/insert) operations.
IKM SQL Multi-Table Insert — loads data into multiple target tables.

Choosing the right KM is critical for performance and functionality. ODI provides a wide range of built-in KMs, and you can also create custom KMs.

Configuring CKM for Data Quality and Error Handling

Check Knowledge Module (CKM) is used to validate data against defined constraints (e.g., not null, unique, foreign key). You can use CKM at different stages:

Static Control — checks existing data in the target for quality issues.
Flow Control — checks data during the integration flow before loading.

Errors can be logged to error tables, and you can define actions for handling rejected records.

-- Example: Enabling Flow Control
-- In the Interface properties, set the CKM to "CKM SQL".
-- Under the Flow Control tab, select the checks to perform.
-- Options: check primary keys, referential integrity, data types, etc.
-- Errors will be stored in the error table defined in the model.

Implementing Incremental Data Loading Strategies (CDC and Journalization)

Change Data Capture (CDC) allows you to capture only changed data from the source. ODI supports CDC through Journalization (using database triggers or log-based capture). Key steps:

Enable journalizing on a source datastore.
Choose journalizing mode: simple (using timestamp/version columns) or consistent (using CDC tables).
In the Interface, set the "Incremental" option to use the journalized data.
Use the appropriate LKM that reads journalized data.

-- Enabling Journalization
-- 1. Right-click on the source datastore and select "Journalizing".
-- 2. Choose the journalizing mode (Simple or Consistent).
-- 3. For Simple mode, specify the timestamp or version column.
-- 4. ODI will create necessary objects (journal tables, triggers).

Performance Tuning of Interfaces Using Optimized ELT Techniques

Push transformations to the database — use database functions, joins, and aggregations in the source or target database.
Use appropriate KMs — choose KMs that leverage database-specific optimizations (e.g., bulk load, array fetch).
Minimize staging — if possible, use "no staging" options (e.g., IKM SQL to SQL without LKM).
Enable parallel execution — use multiple threads by setting the "Number of Rows per Commit" or using multi-threaded KMs.
Monitor execution — use the Operator to identify bottlenecks.

Week 3 — Advanced ETL/ELT: SCD & Integration Advanced

Implementing Slowly Changing Dimensions (SCD) in ODI

SCD is a common data warehousing pattern for maintaining historical data. ODI supports Type 1, Type 2, and Type 3 SCD via dedicated KMs.

Type 1 SCD — overwrites old values with new ones. Use IKM SQL Incremental Update or IKM SQL Control Append with update options.
Type 2 SCD — maintains full history by creating new rows with effective dates. Use IKM SCD (Slowly Changing Dimension).
Type 3 SCD — maintains limited history by adding previous value columns. Use IKM SQL Incremental Update with custom logic.

Using IKM SCD and IKM Incremental Update

IKM SCD is specifically designed for Type 2 SCD. It requires a dimension table with start and end date columns, an active flag, and optionally a surrogate key. The KM handles updating the current row and inserting new rows.

IKM Incremental Update can be adapted for Type 1 SCD by setting appropriate update and insert columns. It performs a merge (upsert) operation.

-- Configuring IKM SCD for Type 2
-- 1. In the Interface, select the target datastore (dimension table).
-- 2. Choose IKM SCD (e.g., IKM Oracle SCD).
-- 3. Define the SCD columns: natural key, start date, end date, active flag.
-- 4. Map the source columns to target columns.
-- 5. In the KM options, set the behavior for updates and inserts.

Handling Complex Source-Target Mappings and Multi-Table Joins

ODI interfaces support complex mappings involving multiple source tables with inner/outer joins, aggregations, and filtering. You can also use subqueries and expressions within the flow. For multi-target inserts, you can use multi-table insert KMs.

Implementing Expressions, Filters, and Transformations

The expression editor in ODI allows you to write SQL functions, case statements, and custom transformations. You can use ODI's built-in functions (e.g., SYSDATE, NVL, DECODE) or database-specific functions.

Working with Variables, Sequences, and User-Defined Functions

Variables — store values that can be used across interfaces, packages, and load plans. They can be global or project-level.
Sequences — generate unique numbers, often for surrogate keys.
User-Defined Functions — can be created in the database and called from ODI expressions.

-- Defining a variable:
-- 1. In Designer, right-click on Variables and select New Variable.
-- 2. Name: VAR_LOAD_DATE, Type: Date, Default Value: SYSDATE.
-- Using the variable in an interface: set a column to #VAR_LOAD_DATE.

Integrating Multiple Data Sources (Relational, Flat Files, XML, JSON)

ODI can handle heterogeneous sources in a single interface using the Multi-Source Mapping feature. You can join data from an Oracle table with a CSV file and load into a SQL Server table. The LKM handles the extraction from each source.

Building Reusable Mappings and Modular ETL Components

Use Sub-interfaces or Procedures to encapsulate reusable logic. You can also create template interfaces and copy them. Packages allow you to orchestrate multiple interfaces in a sequence.

Implementing Referential Integrity and Constraint Validation

ODI can enforce referential integrity during data loading. Use CKM to check foreign keys and prevent orphan records. You can also define constraints in the model and use the "Control" option in the interface.

Week 4 — Operator, Monitoring & Error Handling Monitoring

Navigating the Operator Navigator

The Operator navigator is the central console for monitoring ODI executions. It displays sessions, tasks, and execution logs. You can:

View all sessions (past and running).
Drill down into each session to see steps, tasks, and messages.
Filter by date, status, or object.
View detailed error messages and execution statistics.

Understanding the Execution Lifecycle: Design, Flow, and Data Control Phases

When you execute an interface, ODI goes through several phases:

Design phase — checks the interface consistency and generates the execution plan.
Flow phase — executes the LKM (loading) and IKM (integration) steps.
Data Control phase — runs the CKM (check) for data quality.

Each phase can produce logs and errors that you can view in the Operator.

Monitoring Sessions, Steps, and Tasks in Real-Time

In the Operator, you can click on a running session to see its live progress. You can also view the generated SQL code for each step, which is useful for debugging.

Managing Error Logging, Debugging, and Troubleshooting Failed Sessions

Error Logs — ODI logs errors in the session logs. You can view them in the Operator.
Debugging — use the "Debug" option in the interface to run it step-by-step.
Troubleshooting — check the generated SQL, verify database permissions, and ensure data types match.

Implementing Custom Error Handlers and Recovery Mechanisms

You can use Conditional Steps and Error Handlers in Packages to handle failures. For example, if an interface fails, you can send an email notification or run a recovery procedure.

Optimizing Memory, Temporary Tables, and Bulk-Loading Performance

Memory settings — adjust the JVM heap size for the ODI Agent.
Temporary tables — ODI uses staging tables for intermediate storage. Use appropriate KMs that minimize staging.
Bulk loading — use KMs that support bulk load options (e.g., Oracle SQL*Loader).

Implementing Data Quality and Record Deduplication Using CKM

Use CKM's "Distinct" check to eliminate duplicates. You can also create custom check conditions in the model to enforce business rules.

Creating Exception Tables and Reporting Bad Records

ODI can store rejected records in error tables. You can then query these tables to report on data quality issues. The error table structure is generated by the CKM.

Week 5 — Load Plans, Scheduling & Production Automation Scheduling

Designing Load Plans for End-to-End Integration Orchestration

A Load Plan is a high-level orchestration object that sequences multiple packages, interfaces, and procedures. It provides features like parallel execution, conditional branching, and restartability.

Defining Sequences, Conditionals, and Parallel Execution

Sequences — steps executed one after another.
Parallel execution — steps executed concurrently (uses multiple threads).
Conditionals — steps executed based on the outcome of previous steps (success/failure).

Configuring ODI Agents for Scheduling and Automating Job Execution

ODI Agents can be scheduled to run load plans or packages at specific times using the built-in scheduler or integration with enterprise schedulers like Oracle Scheduler, Control-M, or Autosys.

You can also use the ODI Console (web interface) for monitoring and scheduling.

Integrating ODI with Enterprise Schedulers

Use the ODI Command Line (startcmd.bat/sh) to invoke loads.
Call the ODI Agent from shell scripts or scheduled tasks.
Use REST APIs (in ODI 12c) for programmatic control.

Managing Runtime Parameters, Variables, and Dynamic Context Switching

You can define variables at the project or load plan level that are passed at runtime. Context switching allows you to run the same load plan with different physical resources (e.g., DEV vs PROD).

Implementing Rollback, Restart, and Recovery Strategies

Load plans support restart — if a step fails, you can restart from the failure point after fixing the issue. You can also configure rollback options (e.g., using Oracle Flashback or transaction rollback).

Exporting and Importing Projects Using Smart Export/Import and XML Archives

Use the Smart Export and Smart Import utilities to move ODI objects between environments. You can export a project to an XML file, transfer it, and import it into another repository. This is essential for version control and promotion.

# Export a project
odiparameters.bat -export -path "C:\Projects\ODI_SALES.xml" -project "SALES"

# Import a project
odiparameters.bat -import -file "C:\Projects\ODI_SALES.xml"

Version Control Integration with Git/SVN for Collaborative Development

ODI projects can be stored in XML format and committed to version control systems. Teams can work on different branches and merge changes.

Week 6 — Big Data Integration, Deployment & Capstone Big Data

Integrating ODI with Big Data Ecosystems: Hadoop (Hive, HDFS) and Spark

ODI 12c provides connectors for Hadoop and Spark. You can:

Load data into HDFS using LKM File to Hive.
Transform data using Hive or Spark SQL.
Use Oracle Big Data Connectors for seamless integration.

Supported KMs include LKM Hive, IKM Hive, and LKM Spark.

-- Example: Using LKM Hive to load data from a file to Hive
-- 1. Create a Model for Hive datastore.
-- 2. In Interface, source is a file (File model), target is a Hive table.
-- 3. Choose LKM Hive (to load into Hive staging).
-- 4. Choose IKM Hive (to integrate into target Hive table).

Using Oracle Big Data Connectors and Advanced KMs

Oracle Big Data Connectors allow ODI to leverage Hadoop's distributed processing. Advanced KMs include IKM Spark for in-memory transformations and LKM Spark for distributed loading.

Implementing Real-Time Data Integration Using CDC and Web Services

ODI supports real-time integration through:

Change Data Capture (CDC) — using journalization to capture changes in near real-time.
Web Services — ODI can invoke or expose web services via Service KMs (SKM).

You can create interfaces that process CDC data and load incrementally.

Deploying ODI Projects Across Development, Test, and Production Environments

Use Smart Export/Import to promote projects. Manage environment-specific configurations using contexts and logical schemas. Use separate repositories for each environment.

Using OdiLite and Standalone Agents for Lightweight Deployments

OdiLite is a lightweight runtime environment for executing ODI jobs without the full Studio. It can be embedded in applications. Standalone agents can be installed on any machine to execute jobs.

Managing Security, Credential Management, and Encryption

Use the Security Manager to define users and roles.
Store passwords in ODI Credential Stores (encrypted).
Use SSL for secure connections to databases and agents.

Performance Monitoring, Capacity Planning, and High-Availability Configurations

Monitoring — use the Operator and ODI Console to track execution times and identify bottlenecks.
Capacity planning — monitor resource usage (CPU, memory, I/O) to plan for scaling.
High availability — deploy multiple agents in a cluster with load balancing and failover.

Capstone Project

Project Objective: Design and implement a full data integration pipeline for a real-world data warehousing scenario using Oracle ODI 12c.

Deliverables:

Extract data from multiple source systems (relational, flat files, XML).
Implement Type 2 SCD for dimension tables.
Load fact tables with incremental data (CDC).
Implement data quality checks using CKM.
Create a load plan to orchestrate the entire pipeline.
Schedule the load plan using an ODI Agent.
Deploy the project to a production environment using contexts.
Document the solution and monitor execution.

🎯 Capstone Goal: Build a complete enterprise data warehouse integration pipeline from source extraction to target loading with full CDC and scheduling, demonstrating mastery of ODI 12c.

📚 References:

Oracle Data Integrator 12c Documentation — docs.oracle.com
ODI 12c Developer's Guide — docs.oracle.com
ODI 12c Administration Guide — docs.oracle.com
Oracle Big Data Connectors — oracle.com
Oracle Data Integrator Blog — blogs.oracle.com