Oracle Machine Learning 26ai — Detailed Course Notes

Comprehensive notes covering OML4SQL, OML4Py, supervised and unsupervised learning, AI Vector Search, ONNX integration, and MLOps for Oracle AI Database 26ai (Release 23.26.1).

📑 Table of Contents

Week 1 — Introduction to OML 26ai & OML4SQL
Week 2 — OML4Py (Python) & Advanced Data Preparation
Week 3 — Supervised Learning: Classification & Regression with XGBoost
Week 4 — Unsupervised Learning, Time Series & Feature Extraction
Week 5 — Advanced Features: AI Vector Search & ONNX Integration
Week 6 — MLOps, REST Integration & Capstone Project

Week 1 — Introduction to OML 26ai & OML4SQL Foundations

Overview of Oracle Machine Learning (OML) Architecture

Oracle Machine Learning (OML) is a comprehensive, in-database machine learning platform embedded within Oracle AI Database 26ai. Unlike traditional ML frameworks that require data to be extracted from the database and moved to external environments, OML brings the algorithms directly to the data. This architecture eliminates the latency, security risks, and scalability constraints associated with data movement.

The OML architecture consists of several key components:

OML4SQL — enables building ML models using SQL and PL/SQL, accessible to SQL developers and analysts.
OML4Py — provides Python interfaces for data scientists, offering seamless integration with popular Python data science libraries.
OML4R — provides R interfaces for statistical computing and machine learning.
OML Notebooks — collaborative Jupyter-like notebooks for interactive data science.
OML Services — REST APIs for deploying and managing models.
OML AutoML — automated machine learning for model selection and hyperparameter tuning.

💡 Key Advantage: In-database ML eliminates data movement, reduces latency, ensures data security, and leverages database parallelism for scalable model training on massive datasets.

Advantages of In-Database Machine Learning

Traditional ML frameworks (scikit-learn, TensorFlow, PyTorch) typically require extracting data from the database into external environments. This approach introduces several challenges:

Data movement overhead — copying large datasets is time-consuming and network-intensive.
Security risks — moving sensitive data outside the database increases exposure.
Resource contention — external environments compete for compute resources.
Scalability limitations — external environments struggle with petabyte-scale data.

OML addresses these challenges by embedding ML algorithms directly into the database kernel, leveraging Oracle's parallel execution engine and advanced query optimization.

Introduction to OML4SQL

OML4SQL (formerly Oracle Data Mining) allows data scientists and SQL developers to build, evaluate, and deploy machine learning models using SQL and PL/SQL. This is particularly valuable for organizations where SQL expertise is more prevalent than Python or R.

Key OML4SQL capabilities:

Over 30 built-in ML algorithms including classification, regression, clustering, feature extraction, anomaly detection, and time series.
Model building using DBMS_DATA_MINING package.
Model scoring using SQL functions like PREDICTION, PREDICTION_PROBABILITY, and CLUSTER_ID.
Automatic data preparation (ADP) for handling missing values, outliers, and transformations.
Partitioned models for building separate models on different data segments.
Model lineage tracking via data dictionary views.

-- Build a classification model using OML4SQL
BEGIN
    DBMS_DATA_MINING.CREATE_MODEL(
        model_name          => 'CUSTOMER_CHURN_MODEL',
        mining_function     => 'CLASSIFICATION',
        data_table_name     => 'CUSTOMER_DATA',
        case_id_column_name => 'CUSTOMER_ID',
        target_column_name  => 'CHURN_FLAG',
        settings_table_name => 'MODEL_SETTINGS'
    );
END;
/

-- Score new data using the model
SELECT CUSTOMER_ID,
       PREDICTION(CUSTOMER_CHURN_MODEL USING *) AS PREDICTED_CHURN,
       PREDICTION_PROBABILITY(CUSTOMER_CHURN_MODEL USING *) AS CHURN_PROBABILITY
FROM NEW_CUSTOMER_DATA;

Setting Up OML Notebooks

OML Notebooks provide a collaborative, Jupyter-like environment for developing, sharing, and deploying machine learning workflows. They support SQL, Python, R, and Markdown cells within a single notebook.

Key features:

Interactive development — write and execute code in cells.
Visualizations — integrate with popular plotting libraries.
Version control — track changes and collaborate with team members.
Scheduled execution — automate notebook runs for regular reporting.
Connection management — connect to Autonomous Database and on-premises databases.

Navigating the OML User Interface

The OML UI provides a comprehensive interface for managing the complete ML lifecycle:

Notebooks — create, organize, and execute notebooks.
Models — view, evaluate, and deploy trained models.
Jobs — schedule and monitor automated workflows.
Data — browse and explore database tables and views.
Algorithms — explore available ML algorithms and their parameters.

Data Exploration and Preparation Using OML4SQL

Data quality is critical for ML success. OML4SQL provides functions for data exploration and preparation directly in SQL:

-- Explore data distribution
SELECT CHURN_FLAG, COUNT(*) AS RECORD_COUNT,
       AVG(AGE) AS AVG_AGE,
       AVG(INCOME) AS AVG_INCOME
FROM CUSTOMER_DATA
GROUP BY CHURN_FLAG;

-- Identify missing values
SELECT COUNT(*) - COUNT(AGE) AS MISSING_AGE,
       COUNT(*) - COUNT(INCOME) AS MISSING_INCOME,
       COUNT(*) - COUNT(TENURE) AS MISSING_TENURE
FROM CUSTOMER_DATA;

-- Data preparation with transformations
CREATE OR REPLACE VIEW PREPARED_CUSTOMER_DATA AS
SELECT CUSTOMER_ID,
       CHURN_FLAG,
       NVL(AGE, 0) AS AGE,
       NVL(INCOME, 0) AS INCOME,
       NVL(TENURE, 0) AS TENURE,
       CASE GENDER WHEN 'M' THEN 1 ELSE 0 END AS GENDER_BINARY
FROM CUSTOMER_DATA;

Model Lineage Tracking

Oracle AI Database 26ai introduces enhanced model lineage tracking. The BUILD_SOURCE column in data dictionary views now records the exact query used to train a model, providing full traceability.

-- Query model lineage information
SELECT MODEL_NAME,
       BUILD_SOURCE,  -- Contains the exact training query
       CREATION_DATE,
       ALGORITHM_NAME,
       MINING_FUNCTION
FROM ALL_MINING_MODELS
WHERE MODEL_NAME = 'CUSTOMER_CHURN_MODEL';

Best Practices for Secure and Scalable In-Database ML

Use data views — restrict access to sensitive columns using database views.
Leverage automatic data preparation (ADP) to simplify preprocessing.
Implement model versioning — maintain multiple versions and track performance.
Use partitioning for large datasets to improve performance.
Monitor model performance using OML's built-in evaluation metrics.
Implement role-based access — restrict model creation, viewing, and scoring permissions.

Week 2 — OML4Py (Python) & Advanced Data Preparation Python

Introduction to OML4Py

OML4Py is Oracle's Python interface for in-database machine learning. It allows data scientists to leverage the Python ecosystem (pandas, NumPy, scikit-learn, etc.) while executing ML algorithms directly inside the Oracle AI Database.

Key architecture components:

Proxy Objects — lightweight Python objects that represent database tables, views, and models, enabling seamless interaction without materializing data.
Embedded Python Execution — Python code can be executed inside the database engine, leveraging database parallelism.
OML4Py API — Pythonic interface for building and deploying ML models.
Data Frames — Python objects that represent database tables and support familiar pandas-like operations.

🐍 Pythonic Data Science in the Database: OML4Py enables data scientists to work with familiar Python syntax while keeping data secure inside the database.

Connecting OML4Py to Oracle AI Database

Connecting OML4Py to the database is straightforward using Oracle's cx_Oracle driver and the OML4Py connection manager.

# Import OML4Py library
import oml

# Connect to Oracle AI Database
oml.connect(
    user="my_username",
    password="my_password",
    dsn="my_database:1521/service_name"
)

# Verify connection
print(oml.version())

Leveraging the VECTOR Data Type AI

Oracle AI Database 26ai introduces the native VECTOR data type, which OML4Py fully supports. This enables similarity search, feature representation, and embedding-based analytics directly from Python.

# Create a table with a VECTOR column from Python
import oml
import numpy as np

# Generate random embeddings
embeddings = np.random.rand(100, 128).astype(np.float32)

# Create a proxy DataFrame
df = oml.DataFrame({
    'product_id': list(range(100)),
    'embedding': list(embeddings)
})

# Store in database
df.create_table('product_embeddings', overwrite=True)

# Query similar products
similar = df.cbind(
    oml.similarity(
        df['embedding'], 
        df['embedding'][0], 
        metric='cosine'
    )
).sort(2, ascending=False).head(10)

Advanced Data Preparation with OML4Py

Data preparation is critical for model quality. OML4Py provides comprehensive data preparation capabilities at scale.

Handling High-Cardinality Categorical Features

The ODMS_EXPLOSION_MIN_SUPP setting controls the minimum support threshold for feature explosion in high-cardinality categorical variables. This helps manage memory usage and model interpretability.

# Configure data preparation settings
settings = {
    'ODMS_EXPLOSION_MIN_SUPP': 0.01,  # Minimum support threshold (1%)
    'ODMS_MISSING_VALUE_HANDLING': 'AUTO',
    'ODMS_OUTLIER_HANDLING': 'AUTO'
}

# Prepare data with transformations
from oml import OMLDataFrame
df = oml.sync(oml.dataframe('CUSTOMER_DATA'))

# Apply transformations
df_prepared = df.transform(
    oml.transform.scale(['age', 'income', 'tenure']),
    oml.transform.encode_one_hot(['category', 'region']),
    oml.transform.fillna('all', 0)
)

Scaling and Normalization

OML4Py provides built-in scaling functions for standardizing features:

from oml import transform

# Standard scaling (z-score)
df_scaled = df.transform(
    transform.scale(['age', 'income', 'tenure'])
)

# Min-max scaling
df_normalized = df.transform(
    transform.minmax_scale(['age', 'income', 'tenure'])
)

Data Partitioning

Splitting data into training, testing, and validation sets is essential for unbiased model evaluation.

# Split data using OML4Py
train_data, test_data, valid_data = df.split(
    train_fraction=0.6,
    test_fraction=0.2,
    valid_fraction=0.2,
    random_state=42
)

print(f"Training: {train_data.shape[0]} rows")
print(f"Testing: {test_data.shape[0]} rows")
print(f"Validation: {valid_data.shape[0]} rows")

Handling Missing Data and Outliers

OML4Py provides comprehensive tools for handling missing values and outliers at scale:

# Detect and handle missing values
missing_counts = train_data.isnull().sum()
print("Missing values per column:", missing_counts)

# Fill missing values
df_filled = train_data.fillna({
    'age': train_data['age'].mean(),
    'income': train_data['income'].median(),
    'category': 'UNKNOWN'
})

# Outlier detection and handling
from oml import outlier_detection
outliers = outlier_detection.z_score(df_filled, threshold=3.0)
df_clean = df_filled[~outliers]  # Remove outliers

Performance Benchmarking and Optimization

OML4Py leverages database parallelism for high-performance data processing. Best practices include:

Use database-native operations — push computations to the database where possible.
Limit data transfer — use proxy objects instead of materializing data in Python.
Use batch processing — process data in chunks for large datasets.
Monitor execution — use database performance views to identify bottlenecks.

# Benchmark data preparation
import time
start = time.time()

# Perform operations in the database
df_prepared = df.transform(
    transform.scale(['age', 'income']),
    transform.encode_one_hot(['category', 'region'])
)

end = time.time()
print(f"Data preparation completed in {end - start:.2f} seconds")

Week 3 — Supervised Learning: Classification & Regression with XGBoost Supervised

Building Supervised Learning Models Using OML4SQL and OML4Py

Supervised learning is the most common ML paradigm, where models learn patterns from labelled data to predict outcomes for new, unseen data. OML provides extensive support for both classification (predicting discrete categories) and regression (predicting continuous values).

Classification Algorithms

OML supports multiple classification algorithms, each with different strengths:

Logistic RegressionSimple, interpretable linear model for binary classification. Fast training and inference.

Decision TreesInterpretable tree-based models that handle both numerical and categorical features.

Random ForestEnsemble of decision trees with excellent performance and built-in feature importance.

XGBoostExtreme Gradient Boosting — state-of-the-art performance with regularization and handling of missing values.

# OML4SQL: Build a Random Forest model
BEGIN
    DBMS_DATA_MINING.CREATE_MODEL(
        model_name          => 'RF_CHURN_MODEL',
        mining_function     => 'CLASSIFICATION',
        data_table_name     => 'PREPARED_CUSTOMER_DATA',
        case_id_column_name => 'CUSTOMER_ID',
        target_column_name  => 'CHURN_FLAG',
        settings_table_name => 'RF_SETTINGS'
    );
END;
/

# OML4Py: Build an XGBoost model
from oml import OMLModel

# Create XGBoost classifier
xgb_model = OMLModel(
    algorithm='xgboost',
    task='classification',
    target='churn_flag'
)

# Train the model
xgb_model.fit(train_data)

# Make predictions
predictions = xgb_model.predict(test_data)

Deep Dive into XGBoost Focus

XGBoost (Extreme Gradient Boosting) is a powerful ensemble algorithm that has become a cornerstone of modern machine learning. It offers several advantages:

Regularization — L1 and L2 regularization reduce overfitting.
Handling missing values — automatically learns the best direction to handle missing values.
Built-in cross-validation — integrated cross-validation during training.
Tree pruning — greedy and depth-first pruning for efficient trees.
Parallelization — uses all CPU cores for faster training.

# OML4Py: XGBoost with hyperparameter tuning
from oml import OMLModel, GridSearchCV

# Define XGBoost model
xgb = OMLModel(
    algorithm='xgboost',
    task='classification',
    target='churn_flag'
)

# Parameter grid for tuning
param_grid = {
    'max_depth': [3, 6, 9],
    'learning_rate': [0.01, 0.1, 0.3],
    'n_estimators': [50, 100, 200],
    'subsample': [0.8, 1.0]
}

# Grid search with cross-validation
grid_search = GridSearchCV(
    estimator=xgb,
    param_grid=param_grid,
    cv=5,
    scoring='accuracy'
)

grid_search.fit(train_data)
print(f"Best parameters: {grid_search.best_params_}")
print(f"Best score: {grid_search.best_score_}")

Model Evaluation Metrics

Evaluating model performance is critical for selecting the best model and understanding its limitations.

Classification Metrics

Accuracy — proportion of correct predictions.
Precision — proportion of true positives among positive predictions.
Recall (Sensitivity) — proportion of true positives captured.
F1-Score — harmonic mean of precision and recall.
ROC-AUC — area under the ROC curve, measuring discrimination ability.
Confusion Matrix — detailed breakdown of predictions vs actuals.

# OML4SQL: Evaluate model
SELECT * FROM TABLE(
    DBMS_DATA_MINING.COMPUTE_CONFUSION_MATRIX(
        model_name => 'CUSTOMER_CHURN_MODEL'
    )
);

# OML4Py: Evaluate model
from oml import metrics

predictions = xgb_model.predict(test_data)
confusion = metrics.confusion_matrix(test_data['churn_flag'], predictions)
print("Confusion Matrix:\n", confusion)

accuracy = metrics.accuracy_score(test_data['churn_flag'], predictions)
precision = metrics.precision_score(test_data['churn_flag'], predictions)
recall = metrics.recall_score(test_data['churn_flag'], predictions)
f1 = metrics.f1_score(test_data['churn_flag'], predictions)

print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}")

Regression Metrics

RMSE (Root Mean Squared Error) — penalizes large errors more heavily.
MAE (Mean Absolute Error) — average absolute difference between predictions and actuals.
R² — proportion of variance explained by the model.
MAPE (Mean Absolute Percentage Error) — relative error metric.

Model Explainability

Understanding why models make certain predictions is increasingly important for regulatory compliance and business trust. OML provides built-in explainability features:

# OML4Py: Feature importance
feature_importance = xgb_model.feature_importance()
print("Feature Importance:\n", feature_importance)

# Explain a single prediction
explanation = xgb_model.explain(test_data.iloc[0])
print("Explanation for row 0:\n", explanation)

Managing Model Persistence and Versioning

Models are stored as first-class database objects, allowing for versioning, backup, and sharing:

# List all models
SELECT MODEL_NAME, ALGORITHM_NAME, CREATION_DATE, MINING_FUNCTION
FROM ALL_MINING_MODELS
ORDER BY CREATION_DATE DESC;

# Export model to a file
DECLARE
    model_blob BLOB;
BEGIN
    model_blob := DBMS_DATA_MINING.EXPORT_MODEL('CUSTOMER_CHURN_MODEL');
    -- Store model_blob in a file or table
END;
/

# Import a model from a file
BEGIN
    DBMS_DATA_MINING.IMPORT_MODEL(
        model_name => 'CUSTOMER_CHURN_MODEL_V2',
        model_blob => :blob_data
    );
END;
/

Week 4 — Unsupervised Learning, Time Series & Feature Extraction Unsupervised

Clustering Algorithms

Clustering is an unsupervised learning technique that groups similar data points together. It's widely used for customer segmentation, anomaly detection, and exploratory data analysis.

K-Means Clustering

The most popular clustering algorithm, K-Means partitions data into K clusters based on distance to cluster centroids.

# OML4SQL: Build a K-Means model
BEGIN
    DBMS_DATA_MINING.CREATE_MODEL(
        model_name          => 'CUSTOMER_SEGMENT_MODEL',
        mining_function     => 'CLUSTERING',
        data_table_name     => 'CUSTOMER_DATA',
        case_id_column_name => 'CUSTOMER_ID',
        settings_table_name => 'KMEANS_SETTINGS'
    );
END;
/

# OML4Py: K-Means clustering
from oml import OMLModel

kmeans = OMLModel(
    algorithm='kmeans',
    task='clustering',
    n_clusters=5
)

kmeans.fit(train_data)

# Assign cluster IDs
cluster_assignments = kmeans.predict(test_data)

# View cluster centers
cluster_centers = kmeans.cluster_centers_
print("Cluster Centers:\n", cluster_centers)

Hierarchical Clustering

Hierarchical clustering builds a tree of clusters, providing a hierarchical view of data relationships. It's useful for smaller datasets where interpretability is key.

Anomaly Detection

Anomaly detection identifies data points that deviate significantly from normal behaviour. OML provides several algorithms for this purpose:

One-Class SVM — learns a boundary around normal data.
Isolation Forest — isolates anomalies by random partitioning.
Local Outlier Factor (LOF) — identifies outliers based on local density.

# OML4Py: Anomaly detection with Isolation Forest
from oml import OMLModel

iso_forest = OMLModel(
    algorithm='isolation_forest',
    contamination=0.01,
    random_state=42
)

iso_forest.fit(train_data)

# Predict anomalies (1 = anomaly, -1 = normal)
anomaly_scores = iso_forest.predict(test_data)
anomalies = test_data[anomaly_scores == 1]
print(f"Found {anomalies.shape[0]} anomalies")

Feature Extraction with NMF Feature Extraction

Non-negative Matrix Factorization (NMF) is a dimensionality reduction technique that decomposes a matrix into two lower-rank matrices with non-negative entries. It's particularly useful for:

Topic modelling in text data
Feature extraction for image data
Reducing high-dimensional data to manageable dimensions

# OML4SQL: NMF for feature extraction
BEGIN
    DBMS_DATA_MINING.CREATE_MODEL(
        model_name          => 'FEATURE_EXTRACTION_MODEL',
        mining_function     => 'FEATURE_EXTRACTION',
        data_table_name     => 'TEXT_EMBEDDINGS',
        case_id_column_name => 'DOCUMENT_ID',
        settings_table_name => 'NMF_SETTINGS'
    );
END;
/

Time Series Forecasting with ESM

The Exponential Smoothing Method (ESM) in OML provides automatic time series forecasting with the following key features:

Automatic model selection — automatically chooses the best ESM variant (additive, multiplicative, damped).
Automatic hyperparameter tuning — selects optimal smoothing parameters.
Seasonal decomposition — handles seasonal patterns automatically.
Confidence intervals — provides prediction intervals for forecasts.

# OML4Py: Time series forecasting with ESM
from oml import OMLModel

esm = OMLModel(
    algorithm='exponential_smoothing',
    task='time_series',
    target='sales',
    time_column='date',
    seasonal_period=12  # Monthly seasonality
)

esm.fit(sales_data)

# Generate forecasts
forecast = esm.forecast(horizon=12)  # 12-month forecast
print("Forecast:\n", forecast)

# Plot forecast
esm.plot_forecast()

Explicit Semantic Analysis (ESA) for Text

ESA is a text analysis technique that maps documents to dense vector representations (doc2vec embeddings) based on their semantic content. Key enhancements in 26ai include:

Dense projection generation — produces dense document vectors.
Improved classification accuracy — ESA embeddings improve text classification.
Integration with ML models — ESA outputs can be used as features in other ML algorithms.

# OML4SQL: ESA for text analysis
BEGIN
    DBMS_DATA_MINING.CREATE_MODEL(
        model_name          => 'TEXT_ANALYSIS_MODEL',
        mining_function     => 'FEATURE_EXTRACTION',
        data_table_name     => 'DOCUMENTS',
        case_id_column_name => 'DOCUMENT_ID',
        settings_table_name => 'ESA_SETTINGS'
    );
END;
/

# Use ESA embeddings for classification
SELECT DOCUMENT_ID,
       PREDICTION(CATEGORY_MODEL USING *) AS PREDICTED_CATEGORY
FROM DOCUMENTS_WITH_EMBEDDINGS;

Case Studies

Customer Segmentation

Using K-Means clustering to identify customer segments based on purchasing behaviour, demographics, and engagement metrics. This enables targeted marketing campaigns and personalized recommendations.

Demand Forecasting

Using ESM and other time series models to forecast product demand, enabling better inventory management, production planning, and supply chain optimization.

Week 5 — Advanced Features: AI Vector Search & ONNX Integration Advanced AI

AI Vector Search in Depth Core

AI Vector Search is a breakthrough capability in Oracle AI Database 26ai that enables semantic similarity search across unstructured data. It's powered by the native VECTOR data type and advanced indexing techniques.

The VECTOR Data Type

Vectors represent data as mathematical embeddings that capture semantic meaning. They can be:

Dense vectors — continuous, high-dimensional representations.
Sparse vectors — efficient representations with mostly zero values.
Binary vectors — efficient representations for certain use cases.

Generating Embeddings with OML4Py

# Generate embeddings from text
from oml import embedding

# Text to embedding
text = "This is a sample product description for similarity search"
embedding_vector = embedding.generate(text, model='all-MiniLM-L6-v2')

# Store embedding in database
df = oml.DataFrame({
    'product_id': [101],
    'description': [text],
    'embedding': [embedding_vector]
})
df.create_table('product_embeddings', overwrite=True)

Combined Semantic + Relational + Graph Queries

The power of AI Vector Search lies in combining it with traditional relational predicates and graph queries in a single SQL statement.

-- Hybrid query: Vector similarity + relational filter + graph relationships
SELECT p.product_id, p.product_name,
       VECTOR_DISTANCE(p.embedding, :search_vector) AS similarity
FROM products p
WHERE p.price BETWEEN 100 AND 500
  AND p.category = 'Electronics'
  AND EXISTS (
      SELECT 1 FROM product_relationships pr
      WHERE pr.product_id = p.product_id
        AND pr.relationship_type = 'COMPATIBLE'
  )
ORDER BY similarity
FETCH FIRST 10 ROWS ONLY;

Vector Indexes for Performance

Oracle 26ai supports multiple vector index types for high-performance similarity search:

HNSW (Hierarchical Navigable Small World) — high accuracy, low latency.
IVF (Inverted File Index) — scalable for large vector datasets.
Hybrid Vector Index — combines vector search with relational predicates.
Scalar Quantized HNSW — reduced memory footprint.

-- Create an HNSW vector index
CREATE VECTOR INDEX products_vec_idx ON products(embedding)
ORGANIZATION HNSW
DISTANCE COSINE
WITH TARGET ACCURACY 95;

-- Performance tuning with concurrent DML
ALTER VECTOR INDEX products_vec_idx
REBUILD PARAMETERS ('optimize_for_concurrent_dml=true');

ONNX Integration Cross-Platform

ONNX (Open Neural Network Exchange) is an open standard for representing machine learning models. Oracle AI Database 26ai provides first-class support for ONNX models.

Exporting OML Models to ONNX

# OML4Py: Export model to ONNX format
xgb_model.export_onnx('xgboost_churn_model.onnx')

# Import ONNX model into the database
BEGIN
    DBMS_VECTOR.IMPORT_ONNX_MODEL(
        model_name => 'ONNX_XGBOOST_MODEL',
        file_name  => 'xgboost_churn_model.onnx'
    );
END;
/

In-Memory Sharing of External Initializers

Oracle 26ai introduces in-memory sharing of external initializers for ONNX models, reducing memory usage and improving scalability for concurrent inference workloads.

-- Deploy ONNX model with in-memory sharing
BEGIN
    DBMS_VECTOR.DEPLOY_ONNX_MODEL(
        model_name       => 'ONNX_XGBOOST_MODEL',
        memory_share     => TRUE,
        max_concurrent   => 100
    );
END;
/

Real-Time Scoring with ONNX Models

# Score using ONNX model inside the database
SELECT product_id,
       ONNX_SCORE('ONNX_XGBOOST_MODEL', features) AS predicted_price
FROM products
WHERE product_id IN (101, 102, 103);

Performance Tuning for Vector Operations

AI Smart Scan (Exadata) — parallelized vector scans at memory speed.
Elastic Vector Memory Management — automatic tuning of vector memory pools.
Distributed HNSW and Local HNSW — RAC environment support.
Vector In-Lists — efficient processing of large SQL IN-lists.

-- Monitor vector index performance
SELECT INDEX_NAME, INDEX_TYPE, STATUS,
       VECTOR_MEMORY_USAGE_MB,
       VECTOR_INDEX_SIZE_MB
FROM V$VECTOR_INDEX_STATUS;

-- Optimize vector memory
ALTER SYSTEM SET VECTOR_MEMORY_SIZE = 4G;

Week 6 — MLOps, REST Integration & Capstone Project MLOps

Understanding the Machine Learning Lifecycle

MLOps (Machine Learning Operations) is the practice of applying DevOps principles to machine learning. The ML lifecycle includes:

Data preparation — collecting, cleaning, and transforming data.
Model training — building and tuning models.
Model evaluation — assessing model performance.
Model deployment — making models available for inference.
Model monitoring — tracking performance over time.
Model retraining — updating models with new data.
Model governance — managing compliance and lineage.

🔄 Continuous ML: OML provides a complete MLOps platform, enabling end-to-end automation of the ML lifecycle within the database.

Model Management and Lineage Tracking

Oracle AI Database 26ai provides comprehensive model management capabilities:

-- Track model lineage
SELECT MODEL_NAME,
       ALGORITHM_NAME,
       MINING_FUNCTION,
       BUILD_SOURCE,  -- Training query used
       CREATION_DATE,
       LAST_MODIFIED_DATE
FROM ALL_MINING_MODELS
WHERE MODEL_NAME LIKE 'CUSTOMER%';

-- Monitor model usage
SELECT MODEL_NAME,
       LAST_SCORED_DATE,
       SCORE_COUNT,
       AVG_SCORE_TIME_MS
FROM DBA_MODEL_USAGE
ORDER BY LAST_SCORED_DATE DESC;

Deploying Models as REST Endpoints

OML Services provides REST endpoints for model inference, enabling integration with web applications and microservices.

# Deploy model as a REST endpoint
from oml import OMLEndpoint

endpoint = OMLEndpoint(
    model='CUSTOMER_CHURN_MODEL',
    endpoint_name='churn_prediction',
    description='Real-time churn prediction'
)

endpoint.deploy()

# Model is now available at:
# https://your-database-server/oml/api/v1/predict/churn_prediction

# Test the endpoint
response = endpoint.test({
    'age': 35,
    'income': 75000,
    'tenure': 24,
    'category': 'Electronics'
})
print(f"Prediction: {response['prediction']}")

Integrating with Oracle Analytics Cloud and BI Tools

OML models can be integrated with Oracle Analytics Cloud and third-party BI tools for visualization and decision support:

SQL scoring — use PREDICTION functions in BI dashboards.
REST APIs — integrate with any application via REST.
OAC integration — visualise predictions and model performance in Oracle Analytics Cloud.

Monitoring Model Drift and Performance Degradation

Models degrade over time as data distributions change. Monitoring for drift is essential for maintaining model accuracy.

# Monitor model performance over time
CREATE TABLE model_performance_history AS
SELECT SYSDATE AS evaluation_date,
       model_name,
       accuracy,
       precision,
       recall,
       f1_score,
       data_version
FROM model_evaluation;

# Check for performance degradation
SELECT evaluation_date, accuracy,
       LAG(accuracy) OVER (ORDER BY evaluation_date) AS prev_accuracy,
       (accuracy - LAG(accuracy) OVER (ORDER BY evaluation_date)) AS change
FROM model_performance_history
WHERE model_name = 'CUSTOMER_CHURN_MODEL'
ORDER BY evaluation_date DESC
FETCH FIRST 30 ROWS ONLY;

Automating Retraining Pipelines

Oracle Scheduler and OML automation APIs enable automated retraining pipelines:

# Create a scheduled job for model retraining
BEGIN
    DBMS_SCHEDULER.CREATE_JOB(
        job_name        => 'RETRAIN_CHURN_MODEL_JOB',
        job_type        => 'PLSQL_BLOCK',
        job_action      => 'BEGIN retrain_churn_model(); END;',
        start_date      => SYSDATE,
        repeat_interval => 'FREQ=MONTHLY; INTERVAL=1',
        enabled         => TRUE,
        comments        => 'Monthly retraining of churn model'
    );
END;
/

# OML4Py: Automated retraining script
from oml import OMLModel, OMLDataFrame

def retrain_churn_model():
    # Load new data
    new_data = OMLDataFrame('CUSTOMER_DATA_NEW')
    
    # Retrain model
    model = OMLModel('CUSTOMER_CHURN_MODEL')
    model.retrain(new_data)
    
    # Evaluate new model
    metrics = model.evaluate(validation_data)
    
    # Log performance
    log_performance(metrics)
    
    # Deploy if performance improves
    if metrics['accuracy'] > 0.85:
        model.deploy()
    else:
        model.rollback()

Security and Access Control for ML Models

Securing ML models and data is critical for enterprise deployments:

Row-level security — restrict access to data used for training.
Column-level security — hide sensitive columns from data scientists.
Model privileges — restrict who can create, view, score, or delete models.
Audit trails — track all model access and modifications.
SQL Firewall — protect against unauthorized SQL execution.

# Grant model privileges
GRANT CREATE MINING MODEL TO data_scientist_role;
GRANT SELECT ON CUSTOMER_CHURN_MODEL TO analyst_role;
GRANT EXECUTE ON customer_churn_pkg TO app_user;

# Revoke access
REVOKE SELECT ON CUSTOMER_CHURN_MODEL FROM analyst_role;

Capstone Project

Project Objective: Develop and deploy a complete machine learning solution using OML4SQL and OML4Py, including data preparation, model training, ONNX export, and REST API deployment for a real-world business problem.

Project Deliverables:

Problem definition — clearly defined business problem (classification, forecasting, or recommendation).
Data preparation — comprehensive data cleaning, transformation, and feature engineering.
Model development — train and compare multiple ML models using OML4SQL and OML4Py.
Hyperparameter tuning — optimize model performance using grid search or AutoML.
Model evaluation — comprehensive evaluation with appropriate metrics.
Model export — export the best model to ONNX format.
REST deployment — deploy the model as a REST endpoint.
MLOps setup — implement monitoring and retraining automation.
Documentation — comprehensive documentation of the entire solution.

🎯 Capstone Goal: Build a production-ready ML solution demonstrating mastery of Oracle Machine Learning 26ai, from data preparation to automated deployment and monitoring.

📚 References:

Oracle Machine Learning for SQL User's Guide, 26ai — docs.oracle.com
Oracle Machine Learning for Python User's Guide, 26ai — docs.oracle.com
Oracle AI Vector Search User's Guide — docs.oracle.com
Oracle AI Database New Features Guide, 26ai — docs.oracle.com
Oracle SQL Firewall User's Guide — docs.oracle.com
Oracle-Base Articles — oracle-base.com

All documentation links are for Oracle AI Database 26ai (Release 23.26.1).