Security Threats in AI-Native Operating Systems
An Empirical Study Using Privilege-Escalated LLM Agents
Traditional operating system security models assume deterministic, predictable software. An LLM agent violates that assumption fundamentally — its behavior is probabilistic, context-dependent, and shaped by natural language inputs that are difficult to sanitize or predict.
We chose an operating system as the research environment because it represents the broadest possible integration surface. If we can characterize and mitigate threats at the OS level, the findings generalize to every narrower context.
Three Escalation Stages
User-Level
Standard access, no sudo. Establishes the baseline threat surface.
Sudo-Enabled
Full root control. The LLM can modify anything on the system.
Web-Enabled
Sudo plus internet access. Enables data exfiltration and remote prompt injection.
Threat Taxonomy
Empirically identified through designing, building, and operating JARVIS OS. Each threat was observed through direct system operation and is addressed by one or more architectural mitigations.
Malicious MCP Servers
Third-party MCP servers from community repositories may contain malicious code. The LLM has no mechanism to verify server integrity before granting access. A tool with a helpful description and malicious code is indistinguishable to the model.
Community-vetted AUR-style registry with code review, declared capabilities, and tool description accuracy checks.
Prompt Injection
External content — web pages, documents, tool outputs — can override the AI's instructions. Without structural separation of instruction and data planes, the attack is invisible and undetectable by the model.
Cryptographic Boundary Protocol — provenance nonces separate instruction plane from data plane.
Misleading MCP Server Usage
MCP servers with ambiguous or misleading descriptions cause the LLM to invoke tools in unintended ways, potentially performing destructive operations the user never requested.
Registry vetting + structured tool schema with accurate capability declarations.
Unauthorized Sudo Requests via MCP
The LLM autonomously escalates privileges by requesting sudo access through MCP tool calls without explicit user authorization, bypassing standard privilege separation.
TLA (Threat Level Access) system + PolicyKit enforcement. Escalation requires out-of-band confirmation.
Sudo Capability Exploitation
Once granted sudo access, the LLM chains multiple privileged operations beyond the scope of the original request, creating compounding security risks that exceed what the user authorized.
TLA + goal-scoped confirmation. Sudo access expires on task completion.
Bloated Context
The AI doesn't disobey security constraints — it forgets them. Context window saturation silently drops earlier instructions. No warning, no error. The boundaries you set vanish, and the agent keeps operating as if they never existed.
dispatch rolling signal window (last 20 entries) + contextor retention-based pruning + high-priority preservation of security constraints.
Context Saturation as a Security Threat
Unlike the other threats — which relate to tool misuse or privilege escalation — Bloated Context is a property of how LLMs process information.
Context window saturation silently drops security constraints that were explicitly stated earlier in the conversation. There is no warning, no error, no acknowledgment that the constraint was lost. The model proceeds as if the restriction was never given.
This is the first identification of context window saturation as a discrete security threat rather than a reliability problem. It has profound implications for any system that relies on conversational context to enforce security policy.
Architectural Mitigations
Cryptographic Boundary Protocol
dispatch generates a six-character provenance nonce (Splitmix64) for each completed MCP task. Successful output is stored out-of-band — it never enters the LLM's context unless explicitly retrieved. This structurally separates the instruction plane from the data plane.
TLA (Threat Level Access) System
Dynamic, context-aware privilege model: Guest → User → Elevated → Sudo → Kernel. Enforced at the OS level. Every tool invocation is evaluated against the current TLA level. Escalation requires explicit out-of-band user confirmation — it cannot be triggered by model output alone.
Community-Vetted MCP Registry
AUR-style proofread model. Third-party MCP servers pass community review — code, declared capabilities, tool description accuracy — before being listed. Malicious or deceptive servers are filtered before they are discoverable by tool search.
Bloated Context Mitigation
dispatch's bounded rolling signal window (last 20 entries per wakeup) + contextor's retention-based pruning + high-priority preservation of security-critical constraints across context refreshes.
Four Contributions
A taxonomy of empirically-identified security threats specific to privilege-escalated LLM agents — including Bloated Context, the first identification of context window saturation as a discrete security threat.
Architectural mitigations for each threat class, implemented and verified against source code in the JARVIS OS platform.
JARVIS OS itself — a fully functional, bootable, open-source AI-native OS released as a research and development platform for the community.
A documented MCP tool-description architecture, independently developed in October–November 2025, predating its appearance in commercial deployments.
Publications
SURCA 2026
Gray Grant WinnerPoster presentation at Washington State University's Showcase for Undergraduate Research and Creative Activities. Winner of the Gray Grant research award.
Full Paper
Security Threats in AI-Native Operating Systems: An Empirical Study Using Privilege-Escalated LLM Agents. Pre-publication manuscript available on request.
"Traditional OS security models are fundamentally inadequate for probabilistic AI agents. The JARVIS OS project provides both the platform and the empirical evidence to demonstrate this."
— Yakup Atahanov & Toufic Majdaleni, WSU Everett