This module adds an AI/ML detection layer on top of a production Zero Trust Architecture. The core problem: CloudWatch alarms fire on static thresholds, which means a misconfigured service and a real credential-theft attack look identical. SOC analysts waste hours manually correlating OPA decision logs to distinguish the two. This project automates that triage with an Isolation Forest model that learns what "normal" looks like and flags behavioral deviations, then hands anomalies to Amazon Bedrock Claude for structured incident analysis.
Architecture Overview
The AI layer reads from the existing ZTA stack but never modifies it. OPA decision logs feed into the Isolation Forest for scoring. When anomalies exceed a threshold, Bedrock Claude generates a SOC-ready incident report. CloudWatch receives custom metrics for dashboarding and alarming. This "observe-only" design means the AI layer adds detection capability without introducing new attack surface.
Feature Engineering
The detector extracts 7 features from each OPA decision log entry. The key insight is that each feature captures a different dimension of "normal" behavior, so deviations along any axis signal a potential threat.
def _extract_features(self, entry): # 7 features per OPA decision log entry return [ entry["timestamp"].hour, # hour_of_day (0-23) self.method_enc.transform([ # method (GET/POST/PUT/DELETE) entry["input"]["method"] ])[0], self.role_enc.transform([ # role (analyst/admin/service) entry["input"]["role"] ])[0], 1 if entry["result"] else 0, # result (allowed=1, denied=0) int(hashlib.md5( # ip_hash (normalized MD5) entry["input"]["source_ip"].encode() ).hexdigest()[:8], 16) / 0xFFFFFFFF, len(entry["input"]["path"]), # path_length (URL length) 1 if path in KNOWN_PATHS else 0 # is_known_path (whitelist check) ]
Anomaly Detection: Isolation Forest
Isolation Forest is an unsupervised algorithm that isolates anomalies by randomly partitioning feature space. Normal points require many splits to isolate; anomalies require few. The model trains on 3,500 "normal" OPA log entries and assigns each new entry an anomaly score. Entries scoring below the contamination threshold are flagged.
from sklearn.ensemble import IsolationForest self.model = IsolationForest( n_estimators=200, # 200 isolation trees contamination=0.07, # expect 7% anomalous data random_state=42, # reproducible results n_jobs=-1 # use all CPU cores ) # Train on historical normal traffic self.model.fit(feature_matrix) # Baseline: min=-0.03, max=0.29, mean=0.18
Anomaly Categories Detected
- Privilege escalation — analyst issuing DELETE requests outside their role
- Credential theft — valid user from an unknown IP address
- Unusual hours — 3 AM access from a 9-to-5 user
- Burst denials — rapid-fire denied requests (brute-force scanning)
- Suspicious admin — admin from a new IP at an unusual hour
Bedrock Claude Incident Response
When the pipeline detects 3+ anomalies in a scoring window, it packages the flagged entries and sends them to Claude 3 Haiku via Amazon Bedrock with a structured SOC analyst system prompt. Claude returns a formatted incident report covering severity, root cause, affected assets, recommended actions, and indicators of compromise.
self.bedrock = boto3.client("bedrock-runtime", region_name="us-east-1") response = self.bedrock.invoke_model( modelId="anthropic.claude-3-haiku-20240307-v1:0", body=json.dumps({ "anthropic_version": "bedrock-2023-05-31", "system": "You are a Senior SOC Analyst specializing in Zero Trust...", "messages": [{"role": "user", "content": anomaly_summary}], "max_tokens": 1500, "temperature": 0.1 # low temperature for consistent analysis }) )
CloudWatch Integration
Two custom metrics are pushed to the FedSecure/ZeroTrust
namespace: AnomalyScore (count of anomalies) and AnomalyRate
(percentage flagged). A CloudWatch alarm fires when AnomalyScore exceeds 3 in a 5-minute window,
which can trigger SNS notifications to the on-call team.
Pipeline Orchestration
The zt_ai_pipeline.py
script chains all stages: load the trained model, score recent OPA logs, push metrics to CloudWatch,
and (if anomalies cross the threshold) trigger Bedrock Claude analysis and generate a timestamped
PDF incident report. A single command runs the full detect-to-report workflow.
Security Assessment
A self-assessment of the AI layer identifies 5 findings (2 Medium, 3 Low) covering synthetic training data risk, missing drift detection, lack of human-in-the-loop review, limited feature set, and on-demand (vs. event-driven) pipeline execution. Each finding includes severity, impact, and recommended remediation.
Technologies
Lessons Learned
- Contamination tuning is everything. The default 0.005 detected only 7% of anomalies. Incrementally testing 0.01, 0.02, 0.05, 0.06, 0.07 found the sweet spot at 80%+ detection without flooding false positives.
- Feature engineering beats model complexity. Seven well-chosen features from raw JSON logs outperformed more complex approaches. The IP hash and is_known_path features alone caught credential theft and reconnaissance.
- Bedrock Claude needs low temperature for SOC work. Temperature 0.1 produces consistent, structured reports. Higher values introduced creative but unreliable severity assessments.
- fpdf2 chokes on Unicode. Em dashes and smart quotes in Claude's output caused latin-1 encoding errors. Sanitizing Unicode to ASCII before PDF rendering fixed it.
- WSL pip is externally managed. Python packages must be installed in a venv or with --break-system-packages. PowerShell pip worked without this restriction.
- The AI layer must be read-only. It reads OPA logs and writes to CloudWatch/PDF, but never modifies the ZTA pipeline. This "observe-only" constraint prevents the AI from becoming a new attack surface.
References
- scikit-learn Isolation Forest documentation
- Amazon Bedrock Claude API reference
- NIST SP 800-207 Zero Trust Architecture
- 2024 Snowflake breach analysis — credential theft via anomalous bulk exports
- CloudWatch custom metrics and alarms documentation