MALV: A way to build better AI products

System Architecture

Building with MALV means working with three distinct architectural layers: the client layer providing user interfaces, the application layer containing domain-specific business logic, and the infrastructure layer providing shared services for security, storage, and deployment. This separation enables independent scaling and deployment of components while maintaining system-wide consistency through well-defined interfaces.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#e3f2fd','primaryTextColor':'#000','primaryBorderColor':'#1976d2','lineColor':'#1976d2','secondaryColor':'#fff3e0','tertiaryColor':'#f3e5f5'}}}%% graph TB subgraph CLIENT["CLIENT LAYER
User Interfaces"] WEB["🌐 Web Client
Vite SPA + IndexedDB"] CLI["💻 CLI Client
Terminal Interface"] end subgraph APPS["APPLICATION LAYER
Cloudflare Workers - Domain Logic"] ORCH["🧠 Orchestrator
AI Planning & Semantic Filtering"] CONV["💬 Conversation
Message History & Objects"] AUTH["🔐 Authentication
OAuth Integration"] GMAIL["📧 Gmail
Email Integration"] TABLES["📊 Tables
Structured Data"] CUSTOM["⚙️ Custom Apps
Extensible Architecture"] end subgraph INFRA["INFRASTRUCTURE LAYER
Cloudflare Workers - Platform Services"] TOKEN["🔑 Token Service
Ed25519 Signing
<1ms latency"] STORAGE["💾 Storage Service
3-Phase Permissions
<0.5ms verification"] EVENT["📡 Event Service
Pub/Sub Management
Automatic Lifecycle"] HUB["🚀 Hub Service
Deployment Gateway
Embedding Generation"] CDN["📦 Apps CDN
Asset Distribution
300+ Edge Locations"] end subgraph DATA["DATA LAYER
R2 Object Storage"] R2[("☁️ R2 Bucket
Unlimited Storage
Zero Egress Fees")] end WEB -->|REST + SSE| ORCH CLI -->|REST + SSE| ORCH ORCH -->|Tool Invocation| CONV ORCH -->|Tool Invocation| AUTH ORCH -->|Tool Invocation| GMAIL ORCH -->|Tool Invocation| TABLES ORCH -->|Tool Invocation| CUSTOM CONV -->|Sign Tokens| TOKEN AUTH -->|Sign Tokens| TOKEN GMAIL -->|Sign Tokens| TOKEN CONV -->|+ Signed Token| STORAGE AUTH -->|+ Signed Token| STORAGE GMAIL -->|+ Signed Token| STORAGE GMAIL -->|Publish Events| EVENT TABLES -->|Subscribe| EVENT TOKEN -.->|Keys & Data| R2 STORAGE -.->|Proxy| R2 EVENT -.->|Subscriptions| R2 HUB -.->|Publish| R2 CDN -.->|Serve| R2 classDef clientStyle fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000 classDef appStyle fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#000 classDef infraStyle fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px,color:#000 classDef dataStyle fill:#e8f5e9,stroke:#388e3c,stroke-width:4px,color:#000 class WEB,CLI clientStyle class ORCH,CONV,AUTH,GMAIL,TABLES,CUSTOM appStyle class TOKEN,STORAGE,EVENT,HUB,CDN infraStyle class R2 dataStyle

Layered Architecture: Clear separation between user-facing clients, domain-specific applications, reusable infrastructure services, and persistent storage. All components run on Cloudflare's global edge network.

Layer Responsibilities

Client Layer

The client layer provides user interfaces for interacting with the system. Two primary implementations exist: a web-based client built with Vite and served as a static single-page application, and a command-line interface for terminal-based interaction. Both clients communicate exclusively with the orchestrator application and load application metadata from the Apps CDN.

Application Layer

Applications are independent Cloudflare Workers implementing specific business capabilities. Each application exposes a set of tools (discrete functions) that can be invoked by the orchestrator. Applications are stateless; all persistent state is managed through the storage service. The orchestrator application holds special status as the coordination point for AI-powered planning and execution.

Infrastructure Layer

Infrastructure services provide shared capabilities required by all applications. The token service handles cryptographic signing, the storage service enforces permissions and proxies R2 operations, the event service coordinates publish/subscribe messaging, the hub service manages deployment, and the Apps CDN distributes application assets.

Key Architectural Properties

Stateless Execution: All Workers are stateless, enabling unlimited horizontal scaling
Declarative Security: Permissions declared in JSON schemas, automatically enforced by infrastructure
Service Isolation: Each application runs in isolated V8 environments with no shared memory
Edge Deployment: All components run on Cloudflare's global network (300+ locations)

Technology Stack

Component	Technology	Purpose
Runtime	Cloudflare Workers (V8 Isolates)	Serverless execution environment
Storage	R2 Object Storage	S3-compatible object storage with zero egress
Cryptography	Ed25519	High-speed public-key signatures
Language	TypeScript (strict mode)	Type-safe application development
Build System	Rollup, Vite	Module bundling and optimization
AI Models	Claude Sonnet 4.5, GPT-5	Language model inference

End-to-End Request Flow

This section illustrates a complete request lifecycle, from user query through semantic filtering, token-aware planning, tool execution, and response streaming. The flow demonstrates how the architecture's components coordinate to deliver sub-second response initiation while processing complex multi-step operations.

%%{init: {'theme':'base', 'themeVariables': { 'fontSize':'14px'}}}%% sequenceDiagram autonumber participant User as 👤 User
Web Client participant Orch as 🧠 Orchestrator
AI Coordinator participant CDN as 📦 Apps CDN
Tool Metadata participant OpenAI as 🤖 OpenAI
Embeddings participant Claude as 🎯 Claude Sonnet 4.5
Planning & Execution participant Gmail as 📧 Gmail App
Tool Execution participant Token as 🔑 Token Service
Crypto Signing participant Storage as 💾 Storage Service
Permissions participant R2 as ☁️ R2
Data Store rect rgb(230, 240, 255) note over User,R2: Phase 1: Query Preprocessing & Semantic Filtering User->>Orch: POST /plan
"Create a table from my latest emails" Orch->>CDN: GET /manifest.json CDN-->>Orch: List of 15 apps loop Load App Tools Orch->>CDN: GET /apps/{id}/tools.json CDN-->>Orch: Tool definitions + embeddings end note over Orch: Total: 87 tools loaded Orch->>OpenAI: Embed query (512-dim vector) OpenAI-->>Orch: Query embedding note over Orch: Cosine similarity filtering
87 tools → 8 tools (90% reduction) end rect rgb(240, 255, 240) note over User,R2: Phase 2: Token Analysis & Tool Availability note over Orch: Analyze required tokens:
gmail_access ✓ (available)
table_access ✓ (available) note over Orch: Build AI prompt with:
• 8 filtered tools (available)
• 2 usage examples (relevant)
• User's auth context end rect rgb(255, 245, 230) note over User,R2: Phase 3: AI Planning & Decision Orch->>Claude: createMessage()
system: tools + examples
user: query Claude-->>Orch: Tool decision: list_emails, create_table Orch->>User: SSE: toolDecision event end rect rgb(255, 240, 245) note over User,R2: Phase 4: Tool Execution - list_emails Orch->>Gmail: POST /execute
tool: list_emails
input: {maxResults: 10} Gmail->>Token: POST /sign-token
type: gmail_storage Token->>R2: Load private key R2-->>Token: Ed25519 key Token-->>Gmail: Signed token (<1ms) Gmail->>Storage: GET /emails + token Storage->>Token: Verify signature Token-->>Storage: Valid ✓ Storage->>R2: Fetch emails R2-->>Storage: Email data Storage-->>Gmail: Authorized data Gmail-->>Orch: {emails: [...10 items]} Orch->>User: SSE: toolRun complete end rect rgb(245, 240, 255) note over User,R2: Phase 5: Tool Execution - create_table note over Orch: Pass emails from previous tool Orch->>Gmail: POST /execute
tool: create_table
input: {data: emails, columns: [...]} note over Gmail: Create table object Gmail->>Storage: PUT /tables/{id} + token Storage-->>Gmail: Success Gmail-->>Orch: {tableId: "tbl_123", rows: 10} Orch->>User: SSE: toolRun complete end rect rgb(250, 250, 250) note over User,R2: Phase 6: Response Generation Orch->>Claude: Generate response with results Claude-->>Orch: "Created table with 10 emails..." Orch->>User: SSE: response event note over User: Display table with
custom renderer end note over User,R2: Total Time: ~800ms
Semantic Filtering: 10ms | Token Signing: <1ms | Tool Execution: ~400ms | LLM: ~350ms

Complete Request Lifecycle: Demonstrates semantic filtering (87→8 tools), cryptographic token signing (<1ms), permission verification, multi-tool coordination, and real-time streaming updates via Server-Sent Events.

Performance Breakdown

Phase	Latency	Cost Impact	Key Optimization
Semantic Filtering	~10ms	90% reduction	Cosine similarity on cached embeddings
Token Signing	<1ms	Negligible	Ed25519 performance + key caching
Permission Verification	<0.5ms	Negligible	3-phase validation with early exit
Tool Execution	~200ms/tool	Variable	Parallel execution when possible
AI Planning	~350ms	85% of total cost	Smaller context from filtering
Streaming Updates	0ms (async)	None	Server-Sent Events (SSE)

Request Flow Optimizations

Early Streaming: User sees "thinking" state within 100ms, maintaining engagement
Smart Caching: Tool embeddings and metadata cached at edge, zero fetch latency
Minimal Token Generation: Only requested operations require token signing, not tool loading
Context Reduction: 90% fewer tokens sent to LLM = 10x cost savings per request
Parallel Execution: Independent tools execute concurrently when dependencies allow

Cryptographic Security Model

The security architecture employs Ed25519 public-key cryptography for token signing and verification. Applications authenticate requests using signed JWT tokens that embed permission grants. A three-phase validation system ensures that storage operations are authorized before execution. Automatic key rotation occurs every 30 days, maintaining a rolling window of valid keys for token verification.

sequenceDiagram participant App as Application participant TokenSvc as Token Service participant R2 as R2 Storage participant StorageSvc as Storage Service Note over App,StorageSvc: Phase 1: Token Acquisition App->>TokenSvc: POST /sign-token
{appSecret, tokenType, payload} TokenSvc->>TokenSvc: Verify App Secret TokenSvc->>R2: Fetch tokens.json R2-->>TokenSvc: Token Schema TokenSvc->>TokenSvc: Apply Template Substitution TokenSvc->>R2: Load Private Key (kid) R2-->>TokenSvc: Ed25519 Private Key TokenSvc->>TokenSvc: Sign JWT with Ed25519 TokenSvc-->>App: Signed Token
{token, payload, signature, kid} Note over App,StorageSvc: Phase 2: Token Verification App->>StorageSvc: Storage Operation + Token StorageSvc->>StorageSvc: Extract kid from Token StorageSvc->>TokenSvc: GET /get-public-keys?kid={kid} TokenSvc->>R2: Load Public Keys R2-->>TokenSvc: Ed25519 Public Keys TokenSvc-->>StorageSvc: Public Key for kid StorageSvc->>StorageSvc: Verify Signature Note over StorageSvc: Phase 3: Permission Validation StorageSvc->>StorageSvc: Layer 1: Check App Secret StorageSvc->>StorageSvc: Layer 2: Check Token Permissions StorageSvc->>R2: Load Token Schema R2-->>StorageSvc: Declarative Permissions StorageSvc->>StorageSvc: Layer 3: Validate Against Schema alt Authorized StorageSvc->>R2: Execute Operation R2-->>StorageSvc: Result StorageSvc-->>App: Success else Unauthorized StorageSvc-->>App: 403 Forbidden end

Token signing and verification sequence with three-phase permission validation

Ed25519 Cryptographic Primitives

Ed25519 was selected for its performance characteristics and security properties. Signatures are generated in under 1ms on edge hardware, and verification completes in under 0.5ms. The algorithm provides 128-bit security with 32-byte public keys and 64-byte signatures, significantly smaller than RSA equivalents.

Three-Phase Permission Validation

Storage access requests undergo validation in three sequential phases, with each phase providing progressively granular control:

Phase 1: Application Secret Validation: Applications presenting a valid application secret are granted full access to their own storage namespace. This enables applications to manage their internal state without requiring per-operation token signing.
Phase 2: Embedded Token Permissions: Tokens may embed explicit permission grants as arrays of path patterns. The storage service checks whether the requested operation path matches any embedded permission. This phase enables cross-application access with explicit grants.
Phase 3: Declarative Schema Validation: If embedded permissions are not present, the storage service loads the token schema from the application's published configuration and validates the operation against declaratively specified permissions. This provides a fallback validation mechanism and enables permission updates without reissuing tokens.

Key Rotation Protocol

New Ed25519 keypairs are generated automatically every 30 days. The token service maintains four active private keys and publishes five public keys (current plus four historical). This overlap window ensures that tokens signed immediately before rotation remain valid during their lifetime. Old keys are archived to R2 but removed from active use.

Security Guarantees

Cryptographic Integrity: All tokens cryptographically signed, tampering detectable
Zero-Trust Architecture: Every operation validated, no implicit trust
Principle of Least Privilege: Tokens grant minimum required permissions
Forward Secrecy: Key rotation limits exposure window for compromised keys

Event-Driven Communication

The event system enables asynchronous, loosely-coupled communication between applications through a publish/subscribe pattern. Applications declare events they can emit, and other applications subscribe to receive those events. The event service manages subscription lifecycle, coordinates webhook setup and teardown, and delivers events to subscribers in parallel.

sequenceDiagram participant Sub as Subscriber App participant EventSvc as Event Service participant R2 as R2 Storage participant Source as Source App participant External as External Service Note over Sub,External: Phase 1: Subscription Sub->>EventSvc: POST /subscribe EventSvc->>EventSvc: Generate Key from Tokens EventSvc->>R2: Check Existing Subscriptions R2-->>EventSvc: No Active Subscribers EventSvc->>R2: Store Subscription EventSvc->>Source: Invoke start.ts Handler Source->>External: Setup Webhook External-->>Source: Webhook Configured Source->>R2: Store Listener State Source-->>EventSvc: Success EventSvc-->>Sub: {key, isFirstSubscriber: true} Note over Sub,External: Phase 2: Event Delivery External->>Source: Webhook Notification Source->>Source: Invoke handler.ts Source->>EventSvc: POST /send EventSvc->>EventSvc: Generate Key EventSvc->>R2: List Subscriptions for Key R2-->>EventSvc: [Subscriber List] par Parallel Delivery EventSvc->>Sub: POST /execute (Tool Invocation) Sub->>Sub: Process Event Sub-->>EventSvc: Success end EventSvc-->>Source: {delivered: 1, failed: 0} Note over Sub,External: Phase 3: Unsubscription Sub->>EventSvc: POST /unsubscribe EventSvc->>R2: Delete Subscription EventSvc->>R2: Check Remaining Subscribers R2-->>EventSvc: No More Subscribers EventSvc->>Source: Invoke stop.ts Handler Source->>External: Delete Webhook Source->>R2: Clean Listener State Source-->>EventSvc: Success EventSvc-->>Sub: {isLastSubscriber: true}

Event lifecycle from subscription through delivery to cleanup

Event Handlers

Event source applications implement three handler functions for each event type:

start.ts: Invoked when the first subscriber registers interest in an event. Typically configures webhooks with external services and stores listener state. Receives a unique key identifying the subscription group and token payloads for authentication with external services.
handler.ts: Invoked by external webhooks when events occur. Processes the webhook payload, fetches additional data if needed, and calls the event service to deliver events to subscribers. May be invoked repeatedly for a single subscription.
stop.ts: Invoked when the last subscriber unsubscribes. Responsible for cleaning up webhooks and removing listener state. Receives the subscription key for identifying which listener to clean up.

Key-Based Multi-Tenancy

Subscriptions are grouped by deterministic keys generated from token payloads. For example, a Gmail event subscription might generate a key from the Google user ID in the authentication token. This ensures that each user's subscription is independent, enabling per-user webhook configuration and isolated event streams.

Idempotency Guarantees

The event service implements idempotent subscription operations. Multiple subscription requests with identical parameters result in a single stored subscription. The start handler is invoked exactly once for each unique key, even if multiple subscribers register simultaneously. This property simplifies subscription logic and prevents resource leaks.

Event System Properties

Automatic Lifecycle: Start and stop handlers called automatically based on subscription state
Parallel Delivery: Events delivered to all subscribers concurrently
Token-Scoped: Subscriptions isolated by token payload values
Delivery Guarantees: At-least-once delivery with failure isolation

Deployment Infrastructure

The hub service provides a centralized deployment gateway that eliminates the need for application developers to possess Cloudflare credentials or understand deployment mechanics. Applications are packaged as multipart form uploads containing compiled JavaScript, configuration files, and assets. The hub service processes these uploads, generates semantic embeddings for tool discovery, and orchestrates deployment to Cloudflare's edge network.

flowchart TB DEV[Developer: yarn run publish] subgraph "Build Process" COMPILE[Rollup Compilation] BUNDLE[Bundle Assets] FORM[Create Multipart Form] end subgraph "Hub Service" RECEIVE[Parse Upload] VALIDATE[Validate Metadata] subgraph "Embedding Generation" EMB_APP[Embed App Description] EMB_TOOL[Embed Tool Descriptions] EMB_EX[Embed Usage Examples] STORE_EMB[Store embeddings.json] end UPLOAD[Upload to R2] DEPLOY[Deploy via Cloudflare API] CONFIG[Configure Environment] ROUTE[Create Routes] end subgraph "Cloudflare" WORKER[Worker Deployed] EDGE[Edge Network] end DEV --> COMPILE COMPILE --> BUNDLE BUNDLE --> FORM FORM --> RECEIVE RECEIVE --> VALIDATE VALIDATE --> EMB_APP EMB_APP --> EMB_TOOL EMB_TOOL --> EMB_EX EMB_EX --> STORE_EMB STORE_EMB --> UPLOAD UPLOAD --> DEPLOY DEPLOY --> CONFIG CONFIG --> ROUTE ROUTE --> WORKER WORKER --> EDGE

Deployment pipeline from local build through hub processing to edge deployment

Multipart Upload Format

Applications are packaged as multipart/form-data with the following components:

metadata: JSON object containing app ID, version, domain, and description
worker_script: Compiled JavaScript bundle for Worker execution
tools.json: Tool definitions with schemas and capability requirements
tokens.json: Token type definitions with permission patterns
events.json: Event definitions (optional)
examples/*.json: Usage examples for semantic discovery (optional)
Renderer modules: JavaScript modules for custom object visualization (optional)

Embedding Generation

The hub service generates 512-dimensional embeddings using OpenAI's text-embedding-3-small model for three categories of content:

Application Description: A single vector representing the application's overall purpose
Tool Descriptions: One vector per tool combining the tool name and description
Usage Examples: One vector per example combining the user query and execution plan

These embeddings enable the orchestrator to perform semantic similarity search during tool discovery, filtering irrelevant capabilities and reducing context size by approximately 90%.

Cloudflare API Integration

The hub service communicates with Cloudflare's Workers API to deploy applications. This includes uploading the compiled script, configuring environment bindings (R2 buckets, KV namespaces, secrets), and creating routes if custom domains are specified. The hub maintains Cloudflare API credentials, insulating application developers from infrastructure complexity.

Deployment Advantages

Zero Configuration: Developers require no Cloudflare account or credentials
Automated Optimization: Embedding generation happens automatically at publish time
Immutable Deployments: Each publish creates a new immutable deployment
Global Distribution: Applications automatically replicated to 300+ edge locations

AI Orchestration and Tool Discovery

The orchestrator application coordinates AI-powered planning and execution. When a user submits a query, the orchestrator performs semantic filtering to identify relevant tools, analyzes token requirements to determine tool availability, uses a two-phase decision process for efficient tool selection and input generation, executes the resulting plan, and streams updates to the client in real-time.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#fff3e0','primaryTextColor':'#000','primaryBorderColor':'#f57c00','fontSize':'13px'}}}%% flowchart TB START["User Query
'Create table from emails'"] subgraph DISCOVERY["TOOL DISCOVERY (~20ms)"] LOAD["Load from CDN
87 tools across 15 apps"] EMBED["Embed Query + Calculate Similarity"] REDUCE["Result: 8 relevant tools
90% reduction"] end subgraph TOKEN["TOKEN ANALYSIS (~5ms)"] CHECK["Check Required Tokens"] AVAIL["Available Tools"] UNAVAIL["Unavailable + Prerequisites"] end subgraph PLANNING["TWO-PHASE AI PLANNING"] direction TB subgraph PHASE1["Phase 1: Tool Selection (~150ms)"] P1_PROMPT["Simplified Prompt
descriptions only, no schemas"] P1_SELECT["AI selects WHICH tools
list_emails, create_table"] end subgraph PHASE2["Phase 2: Input Generation (~200ms)"] P2_SPLIT{"Schema
Complexity?"} P2_COMPLEX["Complex Tools
Individual AI calls
(parallel execution)"] P2_SIMPLE["Simple Tools
Batched AI call"] P2_INPUTS["Generated Inputs
Full JSON with values"] end end subgraph EXECUTION["EXECUTION (~400ms)"] EXEC["Execute Tools in Sequence
Stream progress via SSE"] end RESPOND["RESPONSE (~100ms)
AI generates summary"] START --> LOAD LOAD --> EMBED EMBED --> REDUCE REDUCE --> CHECK CHECK --> AVAIL CHECK --> UNAVAIL AVAIL --> P1_PROMPT UNAVAIL --> P1_PROMPT P1_PROMPT --> P1_SELECT P1_SELECT --> P2_SPLIT P2_SPLIT -->|"$defs, anyOf"| P2_COMPLEX P2_SPLIT -->|"Simple"| P2_SIMPLE P2_COMPLEX --> P2_INPUTS P2_SIMPLE --> P2_INPUTS P2_INPUTS --> EXEC EXEC --> RESPOND classDef discoveryStyle fill:#e3f2fd,stroke:#1976d2,stroke-width:2px classDef tokenStyle fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px classDef phase1Style fill:#fff3e0,stroke:#f57c00,stroke-width:2px classDef phase2Style fill:#e8f5e9,stroke:#388e3c,stroke-width:2px classDef execStyle fill:#ffebee,stroke:#c62828,stroke-width:2px class LOAD,EMBED,REDUCE discoveryStyle class CHECK,AVAIL,UNAVAIL tokenStyle class P1_PROMPT,P1_SELECT phase1Style class P2_SPLIT,P2_COMPLEX,P2_SIMPLE,P2_INPUTS phase2Style class EXEC,RESPOND execStyle

Intelligent Orchestration Pipeline: Semantic filtering achieves 90% tool reduction, followed by two-phase AI planning that separates tool selection from input generation. Complex schemas receive individual attention while simple schemas are batched for efficiency.

Semantic Tool Filtering

Tool discovery employs cosine similarity between the embedded user query and the embedded tool descriptions. Tools with similarity scores below 0.3 are excluded from the prompt. This filtering typically removes 80-90% of available tools, substantially reducing prompt size and associated API costs while maintaining high recall for relevant capabilities.

The filtering algorithm also considers application-level embeddings and usage example embeddings. If an application's description is semantically relevant, all of its tools receive a score boost. Similarly, if a usage example matches the query, the tools referenced in that example are prioritized.

Two-Phase Tool Decision

Rather than asking the AI to simultaneously select tools and generate their inputs, MALV separates these concerns into two phases for improved accuracy and cost efficiency:

Phase 1: Tool Selection: The AI receives a simplified prompt showing tool descriptions without detailed input schemas. This focused context allows the AI to match user intent to tool capabilities without distraction from schema complexity. The result is a list of selected tools (app + name pairs) and a "next step" observation providing strategic direction.
Phase 2: Input Generation: Selected tools are categorized by schema complexity. Tools with complex schemas ($defs, $ref, anyOf) receive individual AI calls executed in parallel, ensuring focused attention on intricate type structures. Simple tools are batched into a single call for efficiency. Full JSON schemas are provided only in this phase.

Complexity-Aware Batching

Schema complexity is scored based on structural features. Tools scoring above the threshold (e.g., those with recursive types or union schemas) are isolated to prevent malformed inputs:

Schema Feature	Complexity Impact	Handling
Simple properties	Low	Batched with other simple tools
Nested objects	Medium	May be batched if total score low
`$defs` / `$ref`	High	Individual AI call
`anyOf` unions	High	Individual AI call
Recursive types	Very High	Individual AI call with focused context

Token-Aware Tool Presentation

Tools declare required tokens in their definitions. The orchestrator analyzes available tokens (provided by the client) and categorizes tools as available or unavailable. Available tools are presented normally in the AI prompt. Unavailable tools are presented with instructions on how to unlock them, typically by invoking an authentication tool. This enables the AI to automatically guide users through authentication flows when necessary.

Streaming Execution

Tool execution results stream to the client using Server-Sent Events (SSE). The orchestrator emits events for: tool decisions (when the AI selects tools to invoke), tool execution start/end (with streaming logs), token creation (when tools generate new authentication tokens), and final response generation. This provides real-time visibility into system behavior and improves perceived performance.

Cost Optimization

The combination of semantic filtering and two-phase planning compounds cost savings:

Semantic filtering: 90% reduction in tools presented (20,000 → 2,000 tokens)
Phase 1 simplification: No schemas in selection phase (~50% further reduction)
Batched simple tools: Single call for multiple simple inputs
Parallel complex calls: Latency reduction through concurrent execution

At $3 per million input tokens (Claude Sonnet 4.5 pricing), these optimizations reduce per-request cost from ~$0.06 (naive approach) to ~$0.002, representing a 30x improvement.

Orchestration Properties

Dynamic Tool Loading: Tools loaded on-demand from CDN, enabling hot updates
Contextual Filtering: Only relevant tools included in prompts
Two-Phase Planning: Separation of selection and input generation improves accuracy
Complexity-Aware: Complex schemas receive focused AI attention
Transparent Authentication: AI guides users through auth flows automatically
Real-Time Feedback: Streaming updates provide execution visibility

Object Rendering System

Unlike traditional AI approaches that limit outputs to text responses, MALV implements a sophisticated object rendering system for rich data visualization. Objects are persistent, renderable data entities created by tools and stored in R2. Each object can be visualized through custom renderers with full lifecycle management, enabling interactive dashboards, data tables, research boards, and other rich UI components.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#e3f2fd','primaryTextColor':'#000','primaryBorderColor':'#1976d2','lineColor':'#1976d2','secondaryColor':'#fff3e0','tertiaryColor':'#f3e5f5'}}}%% flowchart TB subgraph TOOL["Tool Execution"] CREATE["Tool calls
object.set()"] end subgraph STORAGE["Persistent Storage"] R2[("R2 Bucket
Objects stored by type/id")] META["Object Metadata
id, type, name, references"] end subgraph CLIENT["Client Layer"] TABS["Tab Bar UI
Object navigation"] subgraph RENDERER["Dynamic Renderer Loading"] LOAD["Load from CDN
objects/{type}/web.js"] CAPS["Build Capabilities
storage, tool, ai access"] LIFE["Lifecycle Hooks
onDataUpdated, onUnmount"] end DISPLAY["Rich Visualization
Tables, Charts, Boards"] end CREATE -->|"Store metadata"| R2 R2 --> META META -->|"Tab appears"| TABS TABS -->|"User selects"| LOAD LOAD --> CAPS CAPS --> LIFE LIFE --> DISPLAY classDef toolStyle fill:#fff3e0,stroke:#f57c00,stroke-width:2px classDef storageStyle fill:#e8f5e9,stroke:#388e3c,stroke-width:2px classDef clientStyle fill:#e3f2fd,stroke:#1976d2,stroke-width:2px class CREATE toolStyle class R2,META storageStyle class TABS,LOAD,CAPS,LIFE,DISPLAY clientStyle

Object Lifecycle: Tools create objects with metadata references, objects appear as tabs in the UI, and custom renderers load dynamically from the CDN with full capability access.

Object Architecture

Objects follow a reference-based architecture where metadata stores pointers rather than actual data. When a tool creates a table object, it stores a tableId reference in the object metadata. The renderer then uses this reference to fetch the actual data through tool calls. This separation enables:

Efficient Storage: Object metadata remains lightweight regardless of data size
Live Data: Renderers always fetch current data, not stale snapshots
Cross-App Access: Objects can reference data stored in other applications

Custom Renderers

Each object type can define custom renderers for different environments. Web renderers are ES modules loaded dynamically from the Apps CDN. CLI renderers produce formatted terminal output. Renderers receive structured parameters including object info, capabilities, and lifecycle hooks.

// objects/{type}/web.ts - Web Renderer Signature
export default async function web(
  info: { id: string; name: string; metadata: ObjectMetadata },
  capabilities: { callTool, storage, ai },
  lifecycle: { onDataUpdated, onUnmount }
): Promise<HTMLElement>

Lifecycle Hooks

Renderers can subscribe to lifecycle events for reactive updates:

onDataUpdated(callback): Invoked when the underlying object data changes, enabling live updates without page refresh. The callback receives the new metadata, allowing renderers to re-fetch and re-render efficiently.
onUnmount(callback): Invoked when the object tab is closed or the user navigates away. Enables cleanup of subscriptions, WebSocket connections, or other resources held by the renderer.

Cross-App Object Storage

Objects can be stored in applications other than the one that created them, enabling team-wide sharing. The storage configuration in objects.json specifies the target app and path template:

{
  "storage": {
    "inApp": "@malv/auth",
    "path": "/teams/<token.team>/objects/",
    "tokenType": "account",
    "tokenFromApp": "@malv/auth"
  }
}

This configuration stores objects under the team's namespace in the auth app, making them accessible to all team members regardless of which app created them.

Object System Properties

Persistent Visualization: Objects survive sessions and appear across page reloads
Dynamic Loading: Renderers loaded on-demand from CDN, enabling hot updates
Full Capabilities: Renderers can call tools, access storage, and invoke AI
Team Sharing: Cross-app storage enables collaborative object access
Reactive Updates: Lifecycle hooks enable live data synchronization

Perceptions System

Traditional AI assistants react to explicit user requests. MALV's perception system enables proactive intelligence by defining contextual conditions that help the AI understand user state and suggest relevant actions. Apps declare "perceptions" that match token presence and storage values, surfacing suggested tasks when conditions are met.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#f3e5f5','primaryTextColor':'#000','primaryBorderColor':'#7b1fa2','lineColor':'#7b1fa2','secondaryColor':'#fff3e0'}}}%% flowchart LR subgraph INPUT["User Context"] TOKENS["Available Tokens
account, warehouse"] STORAGE["Storage Data
warehouse.lowStockThreshold = null"] end subgraph EVAL["Perception Evaluation"] FETCH["Fetch perception/*.json
from all apps"] subgraph CONDITIONS["Condition Matching"] TOK_CHECK["Token Conditions
exists / absent"] STOR_CHECK["Storage Conditions
equals, isEmpty, exists"] end MATCH["Matched Perceptions"] end subgraph OUTPUT["AI Context"] PROMPT["Inject into Prompt
Perception: User has warehouse
but threshold is null"] TASKS["Suggested Tasks
Set low stock threshold"] end TOKENS --> FETCH STORAGE --> FETCH FETCH --> TOK_CHECK FETCH --> STOR_CHECK TOK_CHECK --> MATCH STOR_CHECK --> MATCH MATCH --> PROMPT MATCH --> TASKS classDef inputStyle fill:#e3f2fd,stroke:#1976d2,stroke-width:2px classDef evalStyle fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px classDef outputStyle fill:#fff3e0,stroke:#f57c00,stroke-width:2px class TOKENS,STORAGE inputStyle class FETCH,TOK_CHECK,STOR_CHECK,MATCH evalStyle class PROMPT,TASKS outputStyle

Perception Pipeline: The orchestrator evaluates perception conditions against user tokens and storage data, injecting matched perceptions into the AI prompt for context-aware responses.

Perception Definition

Perceptions are defined in perception/*.json files within each app. Each perception specifies conditions that must be met and the contextual insight to provide when matched:

{
  "tokens": {
    "exists": { "@malv/inventory": "warehouse" },
    "absent": { "@malv/inventory": "product" }
  },
  "storage": {
    "@malv/inventory": {
      "/teams/<warehouse.teamId>/warehouses/<warehouse.warehouseId>/config.json": {
        "lowStockThreshold": { "operator": "equals", "value": null }
      }
    }
  },
  "perception": "User has a warehouse configured but hasn't set the low stock threshold",
  "tasks": ["Help user define low stock threshold", "Suggest reorder policies"]
}

Condition Types

Two types of conditions enable precise state matching:

Token Conditions: Check for token presence (exists) or absence (absent). Token conditions are fast to evaluate as they only require checking the client-provided token list. Use these to gate perceptions by authentication state or resource selection.
Storage Conditions: Query actual data values in storage. Paths support template substitution using token payload fields (e.g., <warehouse.teamId>). Storage conditions enable state-aware perceptions like "warehouse exists but threshold is not set."

Storage Operators

Storage conditions support multiple comparison operators for flexible matching:

Operator	Description	Example Use Case
`equals`	Exact value match (including null)	Check if objective is unset
`notEquals`	Value differs from specified	Check if status changed from draft
`exists`	Field is present (any value)	Check if configuration exists
`notExists`	Field is missing entirely	Detect unconfigured resources
`isEmpty`	Array is empty or string is blank	Check if no products added
`isNotEmpty`	Array has items or string has content	Check if inventory has items

Proactive Suggestions

When perceptions match, their suggested tasks are presented to the AI as actionable next steps. This transforms the assistant from purely reactive to contextually proactive:

AI sees in prompt:

Current Context:

User has warehouse "West Coast Distribution Center" but the low stock threshold is not configured.

Suggested Actions: Help user define low stock threshold, Suggest reorder policies

Contextual Prompt Injection: Matched perceptions inject context and suggestions into the AI prompt, enabling proactive guidance.

Perception System Properties

Declarative Conditions: Define rules in JSON, no custom code required
State-Aware: Storage queries enable deep context understanding
Cross-App: Perceptions evaluated across all apps with valid tokens
Template Paths: Dynamic path resolution from token payloads
Proactive Intelligence: AI suggests actions before user asks

Semantic Storage Search

Beyond semantic tool discovery, MALV extends vector-based search to storage data itself. When applications write to storage, embeddings are automatically generated in the background, enabling AI-powered discovery of relevant data across all applications. This transforms storage from a simple key-value system into an intelligent data layer.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#e8f5e9','primaryTextColor':'#000','primaryBorderColor':'#388e3c','lineColor':'#388e3c','secondaryColor':'#e3f2fd'}}}%% flowchart TB subgraph WRITE["Data Write Path"] direction LR TOOL["Tool writes data
storage.put(path, data)"] QUEUE["Background Queue"] EMBED["Generate Embedding
bge-base-en-v1.5"] INDEX["Store in Search Index
path, embedding, securityKey"] TOOL --> QUEUE QUEUE -->|"Async"| EMBED EMBED --> INDEX end subgraph SEARCH["Search Path"] direction LR QUERY["AI needs data
'Find project goals'"] SEMANTIC["Semantic Search
/search?q=...&types=storage"] FILTER["Filter by Security Keys
Only accessible data"] RESULTS["Ranked Results
paths, similarity scores"] FETCH["Fetch matched data
storage.get(path)"] QUERY --> SEMANTIC SEMANTIC --> FILTER FILTER --> RESULTS RESULTS --> FETCH end WRITE ~~~ SEARCH classDef writeStyle fill:#e8f5e9,stroke:#388e3c,stroke-width:2px classDef searchStyle fill:#e3f2fd,stroke:#1976d2,stroke-width:2px class TOOL,QUEUE,EMBED,INDEX writeStyle class QUERY,SEMANTIC,FILTER,RESULTS,FETCH searchStyle

Dual-Path Architecture: Write operations queue background embedding generation. Search operations filter by security keys and return ranked results for data retrieval.

Automatic Embedding Generation

When apps configure storage paths for searchability, the infrastructure automatically processes writes through an embedding pipeline:

Write Interception: Storage service detects writes to searchable paths
Content Extraction: Relevant text content extracted from JSON data
Embedding Generation: HuggingFace model (bge-base-en-v1.5) creates 768-dimensional vectors
Index Storage: Embeddings stored with path and security metadata

This process runs asynchronously, ensuring write latency is not affected by embedding computation.

Security Key Filtering

Semantic search respects the same permission model as direct storage access. Each indexed item includes security keys derived from token payloads. Search requests specify which security keys the user holds, and results are filtered to include only accessible data:

// Search request with security keys
GET /search?q=project%20goals&types=storage&securityKeys=["account:user123","team:team456"]

// Only returns data where indexed securityKey matches one of the provided keys

Cross-App Data Discovery

Semantic storage search operates across application boundaries, enabling powerful cross-app queries. An inventory app can discover relevant data from an orders app, or a reporting tool can find metrics from multiple data sources:

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#fff3e0','primaryTextColor':'#000','primaryBorderColor':'#f57c00'}}}%% graph LR QUERY["'Find low stock items'"] subgraph APPS["Cross-App Results"] R1["@malv/inventory
Product records
similarity: 0.89"] R2["@malv/orders
Purchase orders
similarity: 0.82"] R3["@malv/suppliers
Supplier contracts
similarity: 0.76"] end QUERY --> R1 QUERY --> R2 QUERY --> R3 classDef queryStyle fill:#e3f2fd,stroke:#1976d2,stroke-width:2px classDef resultStyle fill:#fff3e0,stroke:#f57c00,stroke-width:2px class QUERY queryStyle class R1,R2,R3 resultStyle

Cross-App Discovery: A single semantic query returns ranked results from multiple applications, unified by similarity scoring.

Location-Based Filtering

Search queries can specify location constraints to scope results to specific apps or paths. This enables targeted searches within a project namespace or across a specific application's data:

// Search within a specific warehouse
GET /search?q=stock&types=storage&locations={"@malv/inventory":"/teams/team1/warehouses/wh1"}

// Search across all inventory data
GET /search?q=products&types=storage&locations={"@malv/inventory":"*"}

Search Response Format

Search results include the storage path, source application, similarity score, and a content preview. The AI or application can then fetch the full data using standard storage operations:

{
  "query": "low stock alerts",
  "types": ["storage"],
  "results": [
    {
      "type": "storage",
      "appName": "@malv/inventory",
      "path": "/teams/team1/warehouses/wh1/products/prod_001.json",
      "preview": "Widget A stock level below threshold, 12 units remaining...",
      "similarity": 0.89,
      "securityKey": "f8d2c9b1a5f3e7d9"
    }
  ]
}

Technical Deep-Dive: How Semantic Storage Search Works Click to expand

This section explains the internal mechanisms that enable semantic storage search, including security key generation, path matching, embedding architecture, and the differences between local development and production environments.

Security Keys: Permission Boundaries as Search Filters

Security keys are the core mechanism that enables permission-aware search. Rather than storing full token data, MALV generates deterministic 16-character hashes that represent permission boundaries.

Extract Token Requirements Each storage path definition in storage.json specifies which tokens are required to access it. The system extracts these requirements along with the specific fields needed.

Build Canonical Key Token requirements and their values are sorted alphabetically and concatenated into a canonical string format: @app:tokenType/value1/value2|@app2:tokenType/value

Generate Hash The canonical string is hashed using SHA-256, then encoded as base64url and truncated to exactly 16 characters. This ensures consistent key size while maintaining 96 bits of entropy.

Example: Security Key Generation

// Storage path definition
path: ["teams", {teamId: {token: "account"}}, "warehouses", {warehouseId: {token: "current"}}]

// Actual path being written
/teams/team-123/warehouses/wh-456/data.json

// Extracted token requirements
[
  { app: "@malv/auth", tokenType: "account", fields: ["teamId"] },
  { app: "@malv/inventory", tokenType: "current", fields: ["warehouseId"] }
]

// Canonical key (sorted alphabetically)
"@malv/auth:account/team-123|@malv/inventory:current/wh-456"

// Final security key (16-char SHA-256 hash)
"f8d2c9b1a5f3e7d9"

Why hash to 16 characters? This design provides several benefits: consistent key size regardless of token complexity, well under Cloudflare Vectorize's 64-byte metadata index limit, and 96 bits of entropy which provides effectively zero collision probability for practical use.

Path Matching and Definition Resolution

When a storage write occurs, the system must match the concrete path to its definition in storage.json to determine how to index it.

Path Matching Process

// Concrete path from write operation
"/apps/@malv/inventory/private/teams/team-123/warehouses/west-coast/products/prod-1.json"

// Matched definition from storage.json
{
  "path": ["teams", {"teamId": {...}}, "warehouses", {"warehouseId": {...}}, "products", {"productId": {...}}, "data.json"],
  "description": "A product in the '' warehouse"
}

// Extracted path variables
{ teamId: "team-123", warehouseId: "west-coast", productId: "prod-1" }

// These variables are used for security key generation

Apps can opt out of embedding generation for specific paths by setting skipEmbedding: true in the path definition. This is useful for cache data, temporary files, or sensitive information that should not be searchable.

Content-to-Embedding Pipeline

JSON data is converted to searchable text before embedding generation. The system recursively extracts string values while filtering out metadata fields:

Text Extraction Rules

Include the path definition's description field first
Recursively extract all string values from the JSON data
Skip fields starting with underscore (_) or named id
Limit recursion depth to 5 levels
Truncate final text to 8,000 characters maximum

// Input data
{
  "id": "row-1",
  "_metadata": { "created": "2024-01-01" },
  "name": "John Doe",
  "email": "john@example.com",
  "role": "Sales Manager"
}

// Extracted searchable text (with description prepended)
"A row in the 'customers' table John Doe john@example.com Sales Manager"

Location-Based Filtering: Path Structure Hashes

Location filtering enables queries like "search for 'Widget A' only in warehouse inventory". Since security keys represent permission boundaries (not storage structure), two different path types might share the same security key. Path structure hashes solve this.

Prefix Hash Generation

// Path definition
["teams", {team}, "warehouses", {id}, "products", "*"]

// Extract static (non-variable) segments
["teams", "warehouses", "products"]

// Create hash input
"@malv/inventory:teams/warehouses/products"

// Generate 4-character prefix hash
"a3Kx"

Each embedding stores both its security key (for permission filtering) and its prefix hash (for location filtering). When a location query is made, the system:

Parses the location prefix (e.g., /teams/team-123/warehouses)
Loads the app's storage.json definitions
Finds definitions whose structure matches the prefix
Generates prefix hashes for matching definitions
Filters search results to only include vectors with matching prefixes

Vector Storage Architecture

Each indexed item stores the following metadata alongside its embedding vector:

Field	Type	Indexed	Purpose
`app`	string	Yes	Source application ID
`path`	string	No	Full storage path for data retrieval
`securityKey`	string (16 chars)	Yes	Permission boundary filter
`prefix`	string (4 chars)	Yes	Path structure identifier for location filtering
`description`	string	No	Human-readable description from definition
`preview`	string (max 200 chars)	No	Content preview for search results
`timestamp`	number	Yes	Embedding creation time

Local Development vs. Production

MALV uses different implementations for local development and production while maintaining identical security key logic:

Aspect	Local Development	Production
Vector Store	In-memory Map + JSON file persistence	Cloudflare Vectorize
Embedding Model	HuggingFace Transformers (Xenova)	Cloudflare Workers AI
Model ID	`Xenova/bge-base-en-v1.5`	`@cf/baai/bge-base-en-v1.5`
Dimensions	768	768
Data Storage	Local filesystem	Cloudflare R2
Filtering	JavaScript iteration	Vectorize metadata filter (`$in` operator)
Persistence	`~/.malv/storage-embeddings/vectors.json`	Managed by Cloudflare

Search Query Execution

When a search request arrives, the following steps occur:

Generate Query Embedding The search query text is converted to a 768-dimensional vector using the same embedding model used for indexing.

Apply Security Key Filter The user's security keys (derived from their tokens) are used to filter the search space. Only vectors with matching securityKey values are considered.

Apply Location Filter (if specified) If location constraints are provided, prefix hashes are generated and used to further filter results to matching path structures.

Compute Cosine Similarity For each candidate vector, cosine similarity is computed against the query embedding to produce a relevance score.

Return Ranked Results Results are sorted by similarity score and returned with metadata (path, app, preview, similarity).

Collision Probability Analysis

The 4-character base64url prefix provides 64⁴ = 16,777,216 possible values. For practical deployments:

Path Structures	Collision Probability
100	0.03%
500	0.7%
1,000	3%

A collision means two path structures share a prefix, resulting in slightly broader search results than intended. This is a minor precision issue, not a security concern - security keys still enforce proper permissions.

Semantic Storage Properties

Automatic Indexing: No manual embedding calls, infrastructure handles indexing
Permission-Aware: Security keys ensure users only find accessible data
Cross-App Intelligence: Single query spans all applications
Location Scoping: Filter by app or path for targeted searches
Zero Write Latency: Async embedding preserves storage performance

Message Summarization

Long conversations accumulate "behavioral gravity" - directive language, persuasive framing, and emotional tone that can bias AI responses when reflecting on history. MALV's background summarization system strips this gravity by converting messages into neutral, factual summaries that compress context while preserving essential information.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#e3f2fd','primaryTextColor':'#000','primaryBorderColor':'#1976d2','lineColor':'#1976d2'}}}%% flowchart LR subgraph ORIGINAL["Original Messages"] USER_MSG["User:
'I really need to know
which products are running
low in the warehouse!'"] ASST_MSG["Assistant:
'Want me to set up
low stock alerts? I can
configure thresholds and...'"] end subgraph PROCESS["Background Summarization"] QUEUE["Queue after respond"] SUMMARIZE["AI generates summaries
Fast model, batched"] end subgraph SUMMARIES["Neutral Summaries"] USER_SUM["Third-person:
'He wanted to know which
products were running low
in the warehouse'"] ASST_SUM["First-person:
'I offered to set up
low stock alerts and asked
about threshold preferences'"] end USER_MSG --> QUEUE ASST_MSG --> QUEUE QUEUE --> SUMMARIZE SUMMARIZE --> USER_SUM SUMMARIZE --> ASST_SUM classDef originalStyle fill:#ffebee,stroke:#c62828,stroke-width:2px classDef processStyle fill:#fff3e0,stroke:#f57c00,stroke-width:2px classDef summaryStyle fill:#e8f5e9,stroke:#388e3c,stroke-width:2px class USER_MSG,ASST_MSG originalStyle class QUEUE,SUMMARIZE processStyle class USER_SUM,ASST_SUM summaryStyle

Behavioral Gravity Removal: Original messages with directive language are transformed into neutral summaries. User messages become third-person observations; assistant messages become first-person event logs.

The Problem: Behavioral Gravity

When conversation history is passed to an AI, the language patterns in that history influence future responses. Phrases like "I really need," "this is critical," or "you should" create implicit pressure. Over long conversations, this accumulates, causing the AI to:

Echo the user's emotional tone rather than remaining objective
Over-commit to earlier suggestions rather than adapting to new information
Treat opinions stated confidently as established facts
Inherit biases from directive language in its own previous responses

Summarization Strategy

MALV applies different summarization strategies based on message role:

User Messages: Third-Person Past Tense: Converts user statements into observations about what the user wanted or did. This removes urgency and emotional framing while preserving intent. "I desperately need help with X" becomes "He wanted help with X."
Assistant Messages: First-Person Past Tense: Converts assistant statements into factual logs of actions taken. This removes persuasive language and hedging. "Want me to help you with X? I could also do Y..." becomes "I offered to help with X and mentioned Y as an option."

Background Processing

Summarization runs as a background job, queued after each response completes. This ensures summarization latency never impacts user-facing response time:

// Queued automatically after respond tool completes
await capabilities.tool.queue('@malv/orchestrator', 'summarize', {
  conversationId,
  assistantMessagePath
}, {
  maxBatchTimeout: 5000,  // Wait up to 5s for more items
  maxBatch: 5             // Process up to 5 messages together
});

Context Compression

Summaries enable efficient context compression for long conversations. When conversation history approaches token limits, the orchestrator can substitute summaries for older messages:

Message Type	Original Tokens	Summary Tokens	Compression
User message (avg)	~150	~40	73%
Assistant message (avg)	~400	~80	80%
Tool results	~200	~50	75%

Storage Structure

Summaries are stored alongside original messages, enabling flexible retrieval:

/conversations/{id}/messages/
├── msg_001.json           # Original user message
├── msg_001-summary.json   # Third-person summary
├── msg_002.json           # Original assistant message
├── msg_002-summary.json   # First-person summary
└── ...

Summarization Properties

Bias Removal: Strips emotional and directive language from history
Perspective Shift: Third-person for users, first-person for assistant
Zero Latency Impact: Background processing after response delivery
Context Compression: 70-80% token reduction for long conversations
Batched Execution: Multiple messages processed together for efficiency

Technical Specifications

Performance Characteristics

Metric	Value	Notes
Token Signing Latency	< 1ms	Ed25519 signature generation
Token Verification Latency	< 0.5ms	Ed25519 signature verification
Cold Start Time	< 50ms	V8 isolate initialization
Global Latency (P50)	< 50ms	Edge network proximity
Embedding Generation	~100ms	Per application at publish time
Semantic Filtering	< 10ms	Cosine similarity computation

Security Parameters

Parameter	Value
Signature Algorithm	Ed25519 (128-bit security)
Public Key Size	32 bytes
Signature Size	64 bytes
Key Rotation Period	30 days
Active Private Keys	4
Published Public Keys	5 (current + 4 historical)
Token Format	JWT (compact serialization)

Scalability Limits

Resource	Limit	Notes
Worker CPU Time	50ms (free), 15s (paid)	Per invocation
Worker Memory	128 MB	Per isolate
R2 Object Size	5 TB	Single object maximum
Request Rate	Unlimited	Auto-scaling
Storage Capacity	Unlimited	R2 bucket capacity

Embedding Configuration

Parameter	Value
Model	text-embedding-3-small
Dimensions	512
Distance Metric	Cosine similarity
Relevance Threshold	0.3
Cost per Embedding	$0.00002 per 1K tokens

Development Environment

TypeScript Version: 5.x with strict mode enabled
Build Tools: Rollup 4.x for Workers, Vite 5.x for web clients
Runtime Environment: Node.js 20.x for development, V8 isolates for production
Package Manager: Yarn 4.x with workspace support
Testing Framework: Jest 29.x with TypeScript integration
Code Quality: ESLint 8.x, Prettier 3.x

Architectural Benefits

The MALV architecture provides significant technical benefits through its design choices. By handling infrastructure concerns at the architecture level, developers can focus on application logic while benefiting from optimizations that would otherwise require substantial engineering effort to implement.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#e8f5e9','primaryTextColor':'#000','primaryBorderColor':'#2e7d32'}}}%% graph LR subgraph Traditional["Building from Scratch
12-18 months"] T1[Infrastructure Setup
3-4 months] T2[Security Implementation
2-3 months] T3[Tool System
2-3 months] T4[AI Integration
2-3 months] T5[Deployment Pipeline
1-2 months] T6[Monitoring & Ops
2-3 months] end subgraph MALV["Using MALV
Days to weeks"] M1[Define Tools
1-3 days] M2[Implement Logic
1-2 weeks] M3[Deploy
Minutes] end T1 --> T2 --> T3 --> T4 --> T5 --> T6 M1 --> M2 --> M3 style Traditional fill:#ffebee,stroke:#c62828,stroke-width:2px style MALV fill:#e8f5e9,stroke:#2e7d32,stroke-width:3px

Development Time Comparison: The MALV architecture handles infrastructure concerns, allowing developers to focus on application-specific logic rather than foundational systems.

Infrastructure Overhead Comparison

Component	Building from Scratch	Using MALV	Difference
Infrastructure Engineering	2-3 senior engineers (dedicated)	0 (handled by MALV)	Eliminated
Time to First Deployment	12-18 months	2-4 weeks	~95% reduction
Security Maintenance	Ongoing engineering overhead	Automated (key rotation, permissions)	Automated

Key Technical Benefits

1. Semantic Tool Discovery

Embedding-based filtering reduces token usage by filtering irrelevant tools before they reach the LLM. This optimization requires sophisticated embedding generation, caching, and similarity scoring infrastructure.

Implementation Complexity: Requires vector embeddings, cosine similarity computation, and intelligent caching
Token Reduction: 90% fewer tokens sent to the LLM per request
Accuracy Improvement: Smaller context windows reduce noise and improve AI response quality

2. Cryptographic Security Model

Ed25519 signature verification with automatic key rotation provides strong security guarantees without developer overhead. This approach requires cryptography expertise and key management infrastructure to implement correctly.

Implementation Complexity: Requires cryptography expertise, key management, and distributed verification
Enterprise Ready: Meets requirements for financial services and healthcare compliance
Auditability: Every operation is cryptographically signed for traceability

3. Edge-Native Architecture

Running on Cloudflare Workers eliminates traditional infrastructure concerns: no containers to manage, no orchestration complexity, no idle resources consuming budget, and no data egress fees.

Implementation Complexity: Requires adapting to edge constraints (no Node.js APIs, no filesystem, limited compute time)
Global Distribution: Automatic deployment to 300+ locations
Scaling: Auto-scaling with zero configuration or warm-up time

4. Declarative Permission System

Three-phase permission validation with template substitution enables secure multi-tenancy through JSON configuration rather than custom code. This eliminates a common source of security vulnerabilities.

Implementation Complexity: Requires centralized policy engine, path normalization, and token schema validation
Developer Experience: Permissions declared in JSON, automatically enforced
Compliance: Permission checks logged centrally for audit reporting

Architecture Comparison

The following table compares MALV's architectural choices against common alternatives:

Capability	Common Approach	MALV Approach
Tool Discovery	Send all tools to LLM	Semantic filtering (90% reduction)
Authentication	API keys or JWT	Ed25519 signatures with auto-rotation
Permissions	Manual ACLs per tool	Declarative templates, centrally enforced
Deployment	Docker + Kubernetes	Edge Workers (zero config, global scale)
Tool Integration	Custom per integration	Standardized tool interface
Event System	Manual webhook management	Automatic lifecycle (start/stop handlers)
Resource Usage	Always-on containers	Pay-per-request execution

Summary: Infrastructure Requirements

Building equivalent infrastructure from scratch typically requires:

12-18 months of engineering time
3-5 senior engineers with distributed systems expertise
Deep expertise in cryptography, embeddings, edge computing, and AI orchestration
Ongoing maintenance for security updates, performance optimization, and infrastructure evolution

The MALV architecture handles this infrastructure, allowing developers to focus on building application-specific functionality.

Overview