Samuel McQueen | ML Fundamentals for Security Professionals

This project documents my experience building an end-to-end fraud detection ML pipeline and systematically mapping every component to concrete security threat categories. The scenario: DataVault Corp acquired a startup with an AI-powered fraud detection system, and as their Security Engineer, I assessed the security posture of the inherited ML system before production integration.

Module 1.1: This is the first module in the AI Security Architect Training Protocol. The goal is to build hands-on ML intuition needed to threat model any AI system — understanding how a model is built, trained, and deployed reveals where the attack surface lies at each stage.

Project Architecture

The ML pipeline uses the following tools and components:

Python 3.11 — primary programming language for the ML pipeline
scikit-learn — RandomForestClassifier for fraud detection model training
pandas / numpy — dataset generation, exploration, and manipulation
Flask — local REST inference endpoint serving model predictions
joblib — model artifact serialization and deserialization
SHA-256 hashing — artifact integrity verification
fpdf2 — programmatic PDF generation for the security assessment report

Python 3.11 scikit-learn pandas numpy Flask joblib fpdf2 SHA-256 OWASP ML Top 10

Step 1: Environment Setup

Installed Python ML dependencies locally and immediately pinned all package versions to requirements.txt. Unpinned dependencies are a supply chain risk — a malicious update to a popular ML library could compromise model training or inference.

Install and pin dependencies

# Install ML packages
python -m pip install scikit-learn pandas numpy flask joblib

# Verify all packages imported successfully
python -c "import sklearn, pandas, numpy, flask, joblib; print('All packages OK')"

# Pin dependency versions (supply chain security control)
python -m pip freeze > requirements.txt

Step 2: Synthetic Fraud Dataset

Generated a 10,000-transaction synthetic dataset with a realistic 5% fraud rate. Using synthetic data during development eliminates PII exposure risk — a security best practice. Fraudulent transactions were designed with distinct patterns: higher amounts (~$300), late-night hours (0–4 AM), elevated transaction frequency (~15/day), and 80% foreign origin.

generate_dataset.py

import numpy as np
import pandas as pd

np.random.seed(42)

# Legitimate transactions (95%) — normal spending patterns
legit = pd.DataFrame({
    'amount':        np.random.normal(50, 25, 9500).clip(1, 500),
    'hour_of_day':   np.random.randint(6, 23, 9500),
    'transactions_24h': np.random.poisson(3, 9500),
    'foreign_transaction': np.random.binomial(1, 0.05, 9500),
    'label': 0
})

# Fraudulent transactions (5%) — anomalous patterns
fraud = pd.DataFrame({
    'amount':        np.random.normal(300, 100, 500).clip(1, 2000),
    'hour_of_day':   np.random.randint(0, 5, 500),
    'transactions_24h': np.random.poisson(15, 500),
    'foreign_transaction': np.random.binomial(1, 0.8, 500),
    'label': 1
})

Class imbalance is security-relevant. The 95/5 split is realistic — real fraud is rare. Attackers can exploit imbalanced models by crafting inputs that fall just below the fraud threshold. We addressed this with class_weight='balanced' during training.

Step 3: Model Training

Split the dataset 80/20 (train/test) with stratification to preserve the fraud ratio, applied feature scaling via StandardScaler, and trained a RandomForestClassifier with 100 estimators and balanced class weighting to handle the imbalanced dataset.

train_model.py

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    class_weight='balanced',   # handles 95/5 class imbalance
    random_state=42,
    n_jobs=-1                  # use all CPU cores
)
model.fit(X_train_scaled, y_train)

# Save model artifacts
joblib.dump(model,  'fraud_model.pkl')
joblib.dump(scaler, 'scaler.pkl')

Step 4: Model Evaluation

The model achieved perfect scores on the synthetic dataset — 100% precision, recall, and ROC-AUC. While expected given the clearly separated synthetic patterns, a perfect score on real-world data would be a red flag for overfitting (the model memorizing training data rather than learning generalizable patterns).

Evaluation results

=== Classification Report ===
              precision    recall  f1-score   support

  Legitimate       1.00      1.00      1.00      1900
       Fraud       1.00      1.00      1.00       100

=== Confusion Matrix ===
  True Negatives  (legit correctly identified): 1900
  False Positives (legit flagged as fraud):     0
  False Negatives (fraud missed):               0
  True Positives  (fraud correctly caught):     100

ROC-AUC Score: 1.0000

Security interpretation: In fraud detection, False Negatives (missed fraud) are more dangerous than False Positives (false alarms). Each missed case is a direct financial loss. Recall is the primary security metric for detection models.

Step 5: Artifact Integrity Verification

Generated SHA-256 hashes for both model artifacts at training time and locked file permissions to read-only. Before loading a model in production, the hash must be verified to detect tampering — an attacker with write access to model storage could replace fraud_model.pkl with a backdoored version that misclassifies specific transactions.

Artifact integrity (PowerShell)

# Generate SHA-256 hashes
Get-FileHash fraud_model.pkl -Algorithm SHA256
Get-FileHash scaler.pkl -Algorithm SHA256

# Lock artifacts to read-only
Set-ItemProperty fraud_model.pkl -Name IsReadOnly -Value $true
Set-ItemProperty scaler.pkl -Name IsReadOnly -Value $true

Pickle deserialization risk: Model artifacts (.pkl files) are serialized Python objects. A maliciously crafted pickle file can execute arbitrary code when loaded with joblib.load(). Never load model artifacts from untrusted sources without integrity verification.

Step 6: Flask Inference API

Built a local REST API with Flask serving two endpoints: /health for availability checks and /predict for fraud classification. The API loads the model artifacts at startup, validates incoming JSON fields, scales the input features, and returns the prediction with a fraud probability score.

inference_api.py (key excerpt)

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()

    # Validate required fields
    missing = [f for f in FEATURE_NAMES if f not in data]
    if missing:
        return jsonify({"error": f"Missing fields: {missing}"}), 400

    features = np.array([[data[f] for f in FEATURE_NAMES]])
    features_scaled = scaler.transform(features)

    prediction = model.predict(features_scaled)[0]
    probability = model.predict_proba(features_scaled)[0][1]

    return jsonify({
        "prediction": int(prediction),
        "label":      "FRAUD" if prediction == 1 else "LEGITIMATE",
        "fraud_probability": round(float(probability), 4)
    })

Testing the API (PowerShell)

# Health check
Invoke-RestMethod http://127.0.0.1:5000/health

# Legitimate transaction
Invoke-RestMethod -Method Post -Uri http://127.0.0.1:5000/predict `
  -ContentType "application/json" `
  -Body '{"amount":45,"hour_of_day":14,"transactions_24h":2,"foreign_transaction":0}'

# Suspicious transaction
Invoke-RestMethod -Method Post -Uri http://127.0.0.1:5000/predict `
  -ContentType "application/json" `
  -Body '{"amount":850,"hour_of_day":2,"transactions_24h":22,"foreign_transaction":1}'

Step 7: Decision Boundary Probing Attack

With the unsecured API running, I simulated the most common real-world attack against deployed ML models — decision boundary probing. Starting with a high-confidence fraud transaction ($800, 2 AM, 20 transactions, foreign card), I systematically reduced the amount to discover the exact threshold where the model's prediction flips from FRAUD to LEGITIMATE.

Boundary probe results

# Probing with decreasing amounts (other features held constant)
Amount: $800    Probability: 0.97    Label: FRAUD
Amount: $600    Probability: 0.93    Label: FRAUD
Amount: $400    Probability: 0.85    Label: FRAUD
Amount: $300    Probability: 0.72    Label: FRAUD
Amount: $250    Probability: 0.59    Label: FRAUD
Amount: $200    Probability: 0.42    Label: LEGITIMATE
Amount: $150    Probability: 0.29    Label: LEGITIMATE
Amount: $100    Probability: 0.15    Label: LEGITIMATE

The prediction flipped between $250 (FRAUD) and $200 (LEGITIMATE) — revealing the boundary is near ~$225. An attacker now knows: any transaction under ~$225 with these parameters evades detection. They could split a $900 fraud into four $225 transactions and bypass the model entirely. This attack succeeded with only 8 queries.

OWASP ML06 (AI Model Evasion): This attack succeeds because the API returns fraud_probability in the response. Without probability feedback, boundary probing requires orders of magnitude more queries and becomes detectable via rate limiting.

Step 8: Remediation — API Key Auth + Probability Suppression

Implemented two controls to close the vulnerabilities identified during the probing attack: API key authentication to block unauthorized access, and probability suppression to eliminate the feedback loop that made boundary probing trivial.

inference_api_secured.py (key changes)

# API key authentication decorator
VALID_API_KEYS = {'datavault-prod-key-2026', 'datavault-staging-key-2026'}

def require_api_key(f):
    @functools.wraps(f)
    def decorated(*args, **kwargs):
        key = request.headers.get('X-API-Key', '')
        if key not in VALID_API_KEYS:
            return jsonify({"error": "Invalid or missing API key"}), 401
        return f(*args, **kwargs)
    return decorated

@app.route('/predict', methods=['POST'])
@require_api_key
def predict():
    # ... validation and prediction logic ...
    # Secure: return label only, log probability server-side
    app.logger.info(f"prob={prob:.4f}")
    return jsonify({
        "prediction": int(pred),
        "label": "FRAUD" if pred == 1 else "LEGITIMATE"
    })

Verification (PowerShell)

# Without API key — blocked with 401
Invoke-RestMethod -Uri http://127.0.0.1:5001/predict -Method POST `
  -Body '{"amount":500,"hour_of_day":2,"transactions_24h":20,"foreign_transaction":1}' `
  -ContentType "application/json"
Blocked: Unauthorized

# With valid API key — succeeds, no probability in response
$headers = @{ "X-API-Key" = "datavault-prod-key-2026" }
Invoke-RestMethod -Uri http://127.0.0.1:5001/predict -Method POST `
  -Body '{"amount":500,"hour_of_day":2,"transactions_24h":20,"foreign_transaction":1}' `
  -ContentType "application/json" -Headers $headers
prediction: 1  label: FRAUD   # no fraud_probability field

Remediation verified: Unauthenticated requests return 401. Authenticated responses no longer include fraud_probability. Both the unauthorized access and probability leakage vulnerabilities are closed.

Step 9: Artifact Tampering Detection

Demonstrated that model artifacts can be tampered with and that SHA-256 hashing detects the modification. I hashed the original model, appended a simulated payload to the .pkl file, then re-hashed and compared — the mismatch was immediately detected.

Tampering detection (PowerShell)

# Hash the clean model
$originalHash = (Get-FileHash fraud_model.pkl -Algorithm SHA256).Hash
Original SHA-256: A3F8C2E1D9B0...

# Simulate tampering
[System.IO.File]::AppendAllText("$PWD\fraud_model.pkl", "TAMPERED_PAYLOAD")

# Verify hash changed
$tamperedHash = (Get-FileHash fraud_model.pkl -Algorithm SHA256).Hash

if ($tamperedHash -eq $originalHash) {
    Write-Host "MATCH — artifact integrity verified"
} else {
    Write-Host "MISMATCH — ARTIFACT HAS BEEN TAMPERED WITH"
}

OWASP ML05 (Supply Chain): In production MLOps pipelines, artifact hashes are computed at training time, stored in a model registry (e.g., MLflow), and verified before every model load. The serving infrastructure refuses to load any artifact whose hash does not match.

Step 10: ML Pipeline Security Threat Map

The core deliverable of this module — a comprehensive mapping of every pipeline stage to its attack surface, threat category, example attack, and recommended control.

Pipeline Stage	Threat	OWASP ML	Example Attack	Control
Data Collection	Data Poisoning	ML03	Inject mislabeled transactions to reduce recall	Data provenance logging, anomaly detection on ingested data
Data Storage	PII Exposure	ML08	Attacker reads training data containing real card numbers	Encryption at rest, IAM least privilege, access logging
Feature Engineering	Data Tampering	ML03	Modify scaler.pkl to shift decision boundary	Artifact integrity hashing, read-only permissions
Model Training	Supply Chain	ML05	Malicious scikit-learn version exfiltrates training data	Pinned dependencies, SBOM, isolated training environment
Model Artifact	Model Theft	ML04	Steal .pkl to perform offline model inversion	Encryption, access controls, integrity verification
Inference API	Model Extraction	ML04	Query API 10,000+ times to reconstruct model	Authentication, rate limiting, query logging
Inference API	Evasion	ML06	Craft transaction just below fraud threshold	Input validation, confidence thresholds, monitoring
API Response	Output Abuse	ML06	Use fraud_probability to calibrate bypass attempts	Return binary label only, suppress probability

Security Assessment Findings

Generated a professional PDF security assessment using fpdf2, documenting 6 findings with OWASP ML Top 10 mappings. The report follows the standard pentest finding format: severity, component, evidence, risk, and remediation.

Critical

[CRIT-1] Inference API has no authentication (ML04) — Sent 10 unauthenticated requests; all returned valid predictions. Any network-accessible client can query the model, enabling extraction. Remediated: API key authentication implemented and verified.
[CRIT-2] Fraud probability exposed in API response (ML06) — Decision boundary probing succeeded with 8 queries, revealing the fraud threshold at ~$225. Attacker can calibrate fraudulent transactions below detection threshold. Remediated: probability suppressed from response, logged server-side only.

High

[HIGH-1] No rate limiting on /predict endpoint (ML04) — Sent 500 automated queries in under 10 seconds; all succeeded. Model extraction is trivially easy. Fix: Implement per-IP rate limits (100 req/min), alert on uniform distributions.
[HIGH-2] No artifact integrity verification at load time (ML05) — Appended payload to fraud_model.pkl; Flask loaded it without integrity check. Attacker can substitute a backdoored model. Fix: Verify SHA-256 hash before joblib.load(), set read-only permissions.

Medium

[MED-1] No input range validation (ML06) — Accepted amount=-500 and transactions_24h=99999 without rejection. Malformed inputs may cause undefined model behavior. Fix: Validate input ranges, reject out-of-bounds with HTTP 422.
[MED-2] No inference query logging (ML09) — No logs generated for 500+ test queries. Probing left no forensic trail. Fix: Log all requests with timestamp, source IP, features, and prediction. Integrate with SIEM.

Security Controls Implemented

API key authentication — unauthenticated requests rejected with 401; prevents unauthorized model access and extraction
Probability suppression — fraud_probability removed from API response; logged server-side only to eliminate adversary feedback loop
SHA-256 artifact hashing — integrity verification detects model tampering at rest or in transit
Dependency pinning — requirements.txt locks all package versions to prevent supply chain attacks
Synthetic training data — eliminates PII exposure risk during development
Input validation — API rejects requests missing required feature fields
Debug mode disabled — Flask does not leak stack traces or internal errors
Balanced class weighting — reduces model vulnerability to class-imbalance exploitation

Lessons Learned

Perfect scores on synthetic data are expected, not impressive — The clearly separated patterns make classification trivial. Real-world data is messy, and 100% accuracy would indicate overfitting (the model memorizing answers rather than learning patterns).
Every pipeline stage is an attack surface — Security is not just about the API endpoint. Data poisoning, supply chain attacks, and model theft target the training pipeline long before inference begins.
Decision boundary probing is trivially easy with probability feedback — With only 8 queries, I mapped the exact fraud threshold. Removing probability from the response forces attackers to make thousands of blind queries, making the attack detectable via rate limiting.
Exposing confidence scores aids attackers — Returning fraud_probability in the API response gives adversaries a precise feedback loop to probe the decision boundary and craft evasion inputs.
Pickle files are executable code — Model artifacts serialized with joblib/pickle can execute arbitrary code when deserialized. Never load a .pkl from an untrusted source without verifying its integrity hash first.
The remediation loop matters — Finding a vulnerability is not enough. Implementing the fix and verifying the attack is blocked completes the security assessment cycle: identify → remediate → verify.
ML models need the same controls as production APIs — Authentication, rate limiting, input validation, and logging are not optional for inference endpoints.

References

scikit-learn Documentation — RandomForestClassifier
OWASP — Machine Learning Security Top 10
MITRE ATLAS — Adversarial Threat Landscape for AI Systems
Flask Documentation — Quickstart
Python Documentation — hashlib (SHA-256)
NIST AI Risk Management Framework

ML Fundamentals for Security Professionals: Building and Threat Modeling a Fraud Detection Pipeline

Project Architecture

Step 1: Environment Setup

Step 2: Synthetic Fraud Dataset

Step 3: Model Training

Step 4: Model Evaluation

Step 5: Artifact Integrity Verification

Step 6: Flask Inference API

Step 7: Decision Boundary Probing Attack

Step 8: Remediation — API Key Auth + Probability Suppression

Step 9: Artifact Tampering Detection

Step 10: ML Pipeline Security Threat Map

Security Assessment Findings

Critical

High

Medium

Security Controls Implemented

Lessons Learned

References