Security Threats in AI-Native Operating Systems

An Empirical Study Using Privilege-Escalated LLM Agents

Yakup Atahanov, Toufic Majdaleni — Washington State University Everett
Faculty Advisor: Dr. Jeremy Thompson

Traditional operating system security models assume deterministic, predictable software. An LLM agent violates that assumption fundamentally — its behavior is probabilistic, context-dependent, and shaped by natural language inputs that are difficult to sanitize or predict.

We chose an operating system as the research environment because it represents the broadest possible integration surface. If we can characterize and mitigate threats at the OS level, the findings generalize to every narrower context.

Three Escalation Stages

Stage 1

User-Level

Standard access, no sudo. Establishes the baseline threat surface.

Stage 2

Sudo-Enabled

Full root control. The LLM can modify anything on the system.

Stage 3

Web-Enabled

Sudo plus internet access. Enables data exfiltration and remote prompt injection.

Threat Taxonomy

Empirically identified through designing, building, and operating JARVIS OS. Each threat was observed through direct system operation and is addressed by one or more architectural mitigations.

critical

Malicious MCP Servers

Third-party MCP servers from community repositories may contain malicious code. The LLM has no mechanism to verify server integrity before granting access. A tool with a helpful description and malicious code is indistinguishable to the model.

Mitigation

Community-vetted AUR-style registry with code review, declared capabilities, and tool description accuracy checks.

critical

Prompt Injection

External content — web pages, documents, tool outputs — can override the AI's instructions. Without structural separation of instruction and data planes, the attack is invisible and undetectable by the model.

Mitigation

Cryptographic Boundary Protocol — provenance nonces separate instruction plane from data plane.

high

Misleading MCP Server Usage

MCP servers with ambiguous or misleading descriptions cause the LLM to invoke tools in unintended ways, potentially performing destructive operations the user never requested.

Mitigation

Registry vetting + structured tool schema with accurate capability declarations.

critical

Unauthorized Sudo Requests via MCP

The LLM autonomously escalates privileges by requesting sudo access through MCP tool calls without explicit user authorization, bypassing standard privilege separation.

Mitigation

TLA (Threat Level Access) system + PolicyKit enforcement. Escalation requires out-of-band confirmation.

critical

Sudo Capability Exploitation

Once granted sudo access, the LLM chains multiple privileged operations beyond the scope of the original request, creating compounding security risks that exceed what the user authorized.

Mitigation

TLA + goal-scoped confirmation. Sudo access expires on task completion.

critical

Bloated Context

The AI doesn't disobey security constraints — it forgets them. Context window saturation silently drops earlier instructions. No warning, no error. The boundaries you set vanish, and the agent keeps operating as if they never existed.

Mitigation

dispatch rolling signal window (last 20 entries) + contextor retention-based pruning + high-priority preservation of security constraints.

Bloated Context

Context Saturation as a Security Threat

Unlike the other threats — which relate to tool misuse or privilege escalation — Bloated Context is a property of how LLMs process information.

Context window saturation silently drops security constraints that were explicitly stated earlier in the conversation. There is no warning, no error, no acknowledgment that the constraint was lost. The model proceeds as if the restriction was never given.

This is the first identification of context window saturation as a discrete security threat rather than a reliability problem. It has profound implications for any system that relies on conversational context to enforce security policy.

Architectural Mitigations

Cryptographic Boundary Protocol

dispatch generates a six-character provenance nonce (Splitmix64) for each completed MCP task. Successful output is stored out-of-band — it never enters the LLM's context unless explicitly retrieved. This structurally separates the instruction plane from the data plane.

TLA (Threat Level Access) System

Dynamic, context-aware privilege model: Guest → User → Elevated → Sudo → Kernel. Enforced at the OS level. Every tool invocation is evaluated against the current TLA level. Escalation requires explicit out-of-band user confirmation — it cannot be triggered by model output alone.

Community-Vetted MCP Registry

AUR-style proofread model. Third-party MCP servers pass community review — code, declared capabilities, tool description accuracy — before being listed. Malicious or deceptive servers are filtered before they are discoverable by tool search.

Bloated Context Mitigation

dispatch's bounded rolling signal window (last 20 entries per wakeup) + contextor's retention-based pruning + high-priority preservation of security-critical constraints across context refreshes.

Four Contributions

01

A taxonomy of empirically-identified security threats specific to privilege-escalated LLM agents — including Bloated Context, the first identification of context window saturation as a discrete security threat.

02

Architectural mitigations for each threat class, implemented and verified against source code in the JARVIS OS platform.

03

JARVIS OS itself — a fully functional, bootable, open-source AI-native OS released as a research and development platform for the community.

04

A documented MCP tool-description architecture, independently developed in October–November 2025, predating its appearance in commercial deployments.

Publications

SURCA 2026

Gray Grant Winner

Poster presentation at Washington State University's Showcase for Undergraduate Research and Creative Activities. Winner of the Gray Grant research award.

Full Paper

Security Threats in AI-Native Operating Systems: An Empirical Study Using Privilege-Escalated LLM Agents. Pre-publication manuscript available on request.

"Traditional OS security models are fundamentally inadequate for probabilistic AI agents. The JARVIS OS project provides both the platform and the empirical evidence to demonstrate this."

— Yakup Atahanov & Toufic Majdaleni, WSU Everett