This project documents my experience building an end-to-end fraud detection ML pipeline and systematically mapping every component to concrete security threat categories. The scenario: DataVault Corp acquired a startup with an AI-powered fraud detection system, and as their Security Engineer, I assessed the security posture of the inherited ML system before production integration.
Project Architecture
The ML pipeline uses the following tools and components:
- Python 3.11 — primary programming language for the ML pipeline
- scikit-learn — RandomForestClassifier for fraud detection model training
- pandas / numpy — dataset generation, exploration, and manipulation
- Flask — local REST inference endpoint serving model predictions
- joblib — model artifact serialization and deserialization
- SHA-256 hashing — artifact integrity verification
- fpdf2 — programmatic PDF generation for the security assessment report
Step 1: Environment Setup
Installed Python ML dependencies locally and immediately pinned all package versions to
requirements.txt.
Unpinned dependencies are a supply chain risk — a malicious update to a popular ML
library could compromise model training or inference.
# Install ML packages python -m pip install scikit-learn pandas numpy flask joblib # Verify all packages imported successfully python -c "import sklearn, pandas, numpy, flask, joblib; print('All packages OK')" # Pin dependency versions (supply chain security control) python -m pip freeze > requirements.txt
Step 2: Synthetic Fraud Dataset
Generated a 10,000-transaction synthetic dataset with a realistic 5% fraud rate. Using synthetic data during development eliminates PII exposure risk — a security best practice. Fraudulent transactions were designed with distinct patterns: higher amounts (~$300), late-night hours (0–4 AM), elevated transaction frequency (~15/day), and 80% foreign origin.
import numpy as np import pandas as pd np.random.seed(42) # Legitimate transactions (95%) — normal spending patterns legit = pd.DataFrame({ 'amount': np.random.normal(50, 25, 9500).clip(1, 500), 'hour_of_day': np.random.randint(6, 23, 9500), 'transactions_24h': np.random.poisson(3, 9500), 'foreign_transaction': np.random.binomial(1, 0.05, 9500), 'label': 0 }) # Fraudulent transactions (5%) — anomalous patterns fraud = pd.DataFrame({ 'amount': np.random.normal(300, 100, 500).clip(1, 2000), 'hour_of_day': np.random.randint(0, 5, 500), 'transactions_24h': np.random.poisson(15, 500), 'foreign_transaction': np.random.binomial(1, 0.8, 500), 'label': 1 })
class_weight='balanced' during training.
Step 3: Model Training
Split the dataset 80/20 (train/test) with stratification to preserve the fraud ratio, applied feature scaling via StandardScaler, and trained a RandomForestClassifier with 100 estimators and balanced class weighting to handle the imbalanced dataset.
from sklearn.ensemble import RandomForestClassifier model = RandomForestClassifier( n_estimators=100, max_depth=10, class_weight='balanced', # handles 95/5 class imbalance random_state=42, n_jobs=-1 # use all CPU cores ) model.fit(X_train_scaled, y_train) # Save model artifacts joblib.dump(model, 'fraud_model.pkl') joblib.dump(scaler, 'scaler.pkl')
Step 4: Model Evaluation
The model achieved perfect scores on the synthetic dataset — 100% precision, recall, and ROC-AUC. While expected given the clearly separated synthetic patterns, a perfect score on real-world data would be a red flag for overfitting (the model memorizing training data rather than learning generalizable patterns).
=== Classification Report ===
precision recall f1-score support
Legitimate 1.00 1.00 1.00 1900
Fraud 1.00 1.00 1.00 100
=== Confusion Matrix ===
True Negatives (legit correctly identified): 1900
False Positives (legit flagged as fraud): 0
False Negatives (fraud missed): 0
True Positives (fraud correctly caught): 100
ROC-AUC Score: 1.0000
Step 5: Artifact Integrity Verification
Generated SHA-256 hashes for both model artifacts at training time and locked file
permissions to read-only. Before loading a model in production, the hash must be verified
to detect tampering — an attacker with write access to model storage could replace
fraud_model.pkl
with a backdoored version that misclassifies specific transactions.
# Generate SHA-256 hashes Get-FileHash fraud_model.pkl -Algorithm SHA256 Get-FileHash scaler.pkl -Algorithm SHA256 # Lock artifacts to read-only Set-ItemProperty fraud_model.pkl -Name IsReadOnly -Value $true Set-ItemProperty scaler.pkl -Name IsReadOnly -Value $true
.pkl files) are serialized Python objects. A maliciously crafted pickle file can execute arbitrary code when loaded with joblib.load(). Never load model artifacts from untrusted sources without integrity verification.
Step 6: Flask Inference API
Built a local REST API with Flask serving two endpoints:
/health for availability checks and
/predict for fraud classification.
The API loads the model artifacts at startup, validates incoming JSON fields, scales the
input features, and returns the prediction with a fraud probability score.
@app.route('/predict', methods=['POST']) def predict(): data = request.get_json() # Validate required fields missing = [f for f in FEATURE_NAMES if f not in data] if missing: return jsonify({"error": f"Missing fields: {missing}"}), 400 features = np.array([[data[f] for f in FEATURE_NAMES]]) features_scaled = scaler.transform(features) prediction = model.predict(features_scaled)[0] probability = model.predict_proba(features_scaled)[0][1] return jsonify({ "prediction": int(prediction), "label": "FRAUD" if prediction == 1 else "LEGITIMATE", "fraud_probability": round(float(probability), 4) })
# Health check Invoke-RestMethod http://127.0.0.1:5000/health # Legitimate transaction Invoke-RestMethod -Method Post -Uri http://127.0.0.1:5000/predict ` -ContentType "application/json" ` -Body '{"amount":45,"hour_of_day":14,"transactions_24h":2,"foreign_transaction":0}' # Suspicious transaction Invoke-RestMethod -Method Post -Uri http://127.0.0.1:5000/predict ` -ContentType "application/json" ` -Body '{"amount":850,"hour_of_day":2,"transactions_24h":22,"foreign_transaction":1}'
Step 7: Decision Boundary Probing Attack
With the unsecured API running, I simulated the most common real-world attack against deployed ML models — decision boundary probing. Starting with a high-confidence fraud transaction ($800, 2 AM, 20 transactions, foreign card), I systematically reduced the amount to discover the exact threshold where the model's prediction flips from FRAUD to LEGITIMATE.
# Probing with decreasing amounts (other features held constant) Amount: $800 Probability: 0.97 Label: FRAUD Amount: $600 Probability: 0.93 Label: FRAUD Amount: $400 Probability: 0.85 Label: FRAUD Amount: $300 Probability: 0.72 Label: FRAUD Amount: $250 Probability: 0.59 Label: FRAUD Amount: $200 Probability: 0.42 Label: LEGITIMATE Amount: $150 Probability: 0.29 Label: LEGITIMATE Amount: $100 Probability: 0.15 Label: LEGITIMATE
The prediction flipped between $250 (FRAUD) and $200 (LEGITIMATE) — revealing the boundary is near ~$225. An attacker now knows: any transaction under ~$225 with these parameters evades detection. They could split a $900 fraud into four $225 transactions and bypass the model entirely. This attack succeeded with only 8 queries.
fraud_probability
in the response. Without probability feedback, boundary probing requires orders of magnitude
more queries and becomes detectable via rate limiting.
Step 8: Remediation — API Key Auth + Probability Suppression
Implemented two controls to close the vulnerabilities identified during the probing attack: API key authentication to block unauthorized access, and probability suppression to eliminate the feedback loop that made boundary probing trivial.
# API key authentication decorator VALID_API_KEYS = {'datavault-prod-key-2026', 'datavault-staging-key-2026'} def require_api_key(f): @functools.wraps(f) def decorated(*args, **kwargs): key = request.headers.get('X-API-Key', '') if key not in VALID_API_KEYS: return jsonify({"error": "Invalid or missing API key"}), 401 return f(*args, **kwargs) return decorated @app.route('/predict', methods=['POST']) @require_api_key def predict(): # ... validation and prediction logic ... # Secure: return label only, log probability server-side app.logger.info(f"prob={prob:.4f}") return jsonify({ "prediction": int(pred), "label": "FRAUD" if pred == 1 else "LEGITIMATE" })
# Without API key — blocked with 401 Invoke-RestMethod -Uri http://127.0.0.1:5001/predict -Method POST ` -Body '{"amount":500,"hour_of_day":2,"transactions_24h":20,"foreign_transaction":1}' ` -ContentType "application/json" Blocked: Unauthorized # With valid API key — succeeds, no probability in response $headers = @{ "X-API-Key" = "datavault-prod-key-2026" } Invoke-RestMethod -Uri http://127.0.0.1:5001/predict -Method POST ` -Body '{"amount":500,"hour_of_day":2,"transactions_24h":20,"foreign_transaction":1}' ` -ContentType "application/json" -Headers $headers prediction: 1 label: FRAUD # no fraud_probability field
Step 9: Artifact Tampering Detection
Demonstrated that model artifacts can be tampered with and that SHA-256 hashing detects the modification. I hashed the original model, appended a simulated payload to the .pkl file, then re-hashed and compared — the mismatch was immediately detected.
# Hash the clean model $originalHash = (Get-FileHash fraud_model.pkl -Algorithm SHA256).Hash Original SHA-256: A3F8C2E1D9B0... # Simulate tampering [System.IO.File]::AppendAllText("$PWD\fraud_model.pkl", "TAMPERED_PAYLOAD") # Verify hash changed $tamperedHash = (Get-FileHash fraud_model.pkl -Algorithm SHA256).Hash if ($tamperedHash -eq $originalHash) { Write-Host "MATCH — artifact integrity verified" } else { Write-Host "MISMATCH — ARTIFACT HAS BEEN TAMPERED WITH" }
Step 10: ML Pipeline Security Threat Map
The core deliverable of this module — a comprehensive mapping of every pipeline stage to its attack surface, threat category, example attack, and recommended control.
| Pipeline Stage | Threat | OWASP ML | Example Attack | Control |
|---|---|---|---|---|
| Data Collection | Data Poisoning | ML03 | Inject mislabeled transactions to reduce recall | Data provenance logging, anomaly detection on ingested data |
| Data Storage | PII Exposure | ML08 | Attacker reads training data containing real card numbers | Encryption at rest, IAM least privilege, access logging |
| Feature Engineering | Data Tampering | ML03 | Modify scaler.pkl to shift decision boundary | Artifact integrity hashing, read-only permissions |
| Model Training | Supply Chain | ML05 | Malicious scikit-learn version exfiltrates training data | Pinned dependencies, SBOM, isolated training environment |
| Model Artifact | Model Theft | ML04 | Steal .pkl to perform offline model inversion | Encryption, access controls, integrity verification |
| Inference API | Model Extraction | ML04 | Query API 10,000+ times to reconstruct model | Authentication, rate limiting, query logging |
| Inference API | Evasion | ML06 | Craft transaction just below fraud threshold | Input validation, confidence thresholds, monitoring |
| API Response | Output Abuse | ML06 | Use fraud_probability to calibrate bypass attempts | Return binary label only, suppress probability |
Security Assessment Findings
Generated a professional PDF security assessment using fpdf2, documenting 6 findings with OWASP ML Top 10 mappings. The report follows the standard pentest finding format: severity, component, evidence, risk, and remediation.
Critical
- [CRIT-1] Inference API has no authentication (ML04) — Sent 10 unauthenticated requests; all returned valid predictions. Any network-accessible client can query the model, enabling extraction. Remediated: API key authentication implemented and verified.
- [CRIT-2] Fraud probability exposed in API response (ML06) — Decision boundary probing succeeded with 8 queries, revealing the fraud threshold at ~$225. Attacker can calibrate fraudulent transactions below detection threshold. Remediated: probability suppressed from response, logged server-side only.
High
- [HIGH-1] No rate limiting on /predict endpoint (ML04) — Sent 500 automated queries in under 10 seconds; all succeeded. Model extraction is trivially easy. Fix: Implement per-IP rate limits (100 req/min), alert on uniform distributions.
- [HIGH-2] No artifact integrity verification at load time (ML05) — Appended payload to fraud_model.pkl; Flask loaded it without integrity check. Attacker can substitute a backdoored model. Fix: Verify SHA-256 hash before joblib.load(), set read-only permissions.
Medium
- [MED-1] No input range validation (ML06) — Accepted amount=-500 and transactions_24h=99999 without rejection. Malformed inputs may cause undefined model behavior. Fix: Validate input ranges, reject out-of-bounds with HTTP 422.
- [MED-2] No inference query logging (ML09) — No logs generated for 500+ test queries. Probing left no forensic trail. Fix: Log all requests with timestamp, source IP, features, and prediction. Integrate with SIEM.
Security Controls Implemented
- API key authentication — unauthenticated requests rejected with 401; prevents unauthorized model access and extraction
- Probability suppression — fraud_probability removed from API response; logged server-side only to eliminate adversary feedback loop
- SHA-256 artifact hashing — integrity verification detects model tampering at rest or in transit
- Dependency pinning —
requirements.txtlocks all package versions to prevent supply chain attacks - Synthetic training data — eliminates PII exposure risk during development
- Input validation — API rejects requests missing required feature fields
- Debug mode disabled — Flask does not leak stack traces or internal errors
- Balanced class weighting — reduces model vulnerability to class-imbalance exploitation
Lessons Learned
- Perfect scores on synthetic data are expected, not impressive — The clearly separated patterns make classification trivial. Real-world data is messy, and 100% accuracy would indicate overfitting (the model memorizing answers rather than learning patterns).
- Every pipeline stage is an attack surface — Security is not just about the API endpoint. Data poisoning, supply chain attacks, and model theft target the training pipeline long before inference begins.
- Decision boundary probing is trivially easy with probability feedback — With only 8 queries, I mapped the exact fraud threshold. Removing probability from the response forces attackers to make thousands of blind queries, making the attack detectable via rate limiting.
- Exposing confidence scores aids attackers — Returning
fraud_probabilityin the API response gives adversaries a precise feedback loop to probe the decision boundary and craft evasion inputs. - Pickle files are executable code — Model artifacts serialized with joblib/pickle can execute arbitrary code when deserialized. Never load a .pkl from an untrusted source without verifying its integrity hash first.
- The remediation loop matters — Finding a vulnerability is not enough. Implementing the fix and verifying the attack is blocked completes the security assessment cycle: identify → remediate → verify.
- ML models need the same controls as production APIs — Authentication, rate limiting, input validation, and logging are not optional for inference endpoints.
References
- scikit-learn Documentation — RandomForestClassifier
- OWASP — Machine Learning Security Top 10
- MITRE ATLAS — Adversarial Threat Landscape for AI Systems
- Flask Documentation — Quickstart
- Python Documentation — hashlib (SHA-256)
- NIST AI Risk Management Framework