MALV: A Better Way to Build AI Products

Technical Architecture and Implementation Specification

Version 2.1 | Internal Technical Documentation

Overview

This document describes the architecture of MALV, an approach for building AI-powered applications with built-in security, intelligent cost optimization, and edge-native deployment.

Core Technical Features:

MALV handles the infrastructure layer that would typically require 12-18 months and a dedicated team to build, allowing developers to focus on application-specific logic rather than foundational systems.

Technical Foundation: Built entirely on Cloudflare Workers (V8 isolates), R2 object storage, and Ed25519 cryptography. TypeScript with strict typing throughout. Supports Claude Sonnet 4.5 and GPT-5 with automatic fallback.

System Architecture

Building with MALV means working with three distinct architectural layers: the client layer providing user interfaces, the application layer containing domain-specific business logic, and the infrastructure layer providing shared services for security, storage, and deployment. This separation enables independent scaling and deployment of components while maintaining system-wide consistency through well-defined interfaces.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#e3f2fd','primaryTextColor':'#000','primaryBorderColor':'#1976d2','lineColor':'#1976d2','secondaryColor':'#fff3e0','tertiaryColor':'#f3e5f5'}}}%% graph TB subgraph CLIENT["CLIENT LAYER
User Interfaces"] WEB["🌐 Web Client
Vite SPA + IndexedDB"] CLI["💻 CLI Client
Terminal Interface"] end subgraph APPS["APPLICATION LAYER
Cloudflare Workers - Domain Logic"] ORCH["🧠 Orchestrator
AI Planning & Semantic Filtering"] CONV["💬 Conversation
Message History & Objects"] AUTH["🔐 Authentication
OAuth Integration"] GMAIL["📧 Gmail
Email Integration"] TABLES["📊 Tables
Structured Data"] CUSTOM["⚙️ Custom Apps
Extensible Architecture"] end subgraph INFRA["INFRASTRUCTURE LAYER
Cloudflare Workers - Platform Services"] TOKEN["🔑 Token Service
Ed25519 Signing
<1ms latency
"] STORAGE["💾 Storage Service
3-Phase Permissions
<0.5ms verification
"] EVENT["📡 Event Service
Pub/Sub Management
Automatic Lifecycle
"] HUB["🚀 Hub Service
Deployment Gateway
Embedding Generation
"] CDN["📦 Apps CDN
Asset Distribution
300+ Edge Locations
"] end subgraph DATA["DATA LAYER
R2 Object Storage"] R2[("☁️ R2 Bucket
Unlimited Storage
Zero Egress Fees
")] end WEB -->|REST + SSE| ORCH CLI -->|REST + SSE| ORCH ORCH -->|Tool Invocation| CONV ORCH -->|Tool Invocation| AUTH ORCH -->|Tool Invocation| GMAIL ORCH -->|Tool Invocation| TABLES ORCH -->|Tool Invocation| CUSTOM CONV -->|Sign Tokens| TOKEN AUTH -->|Sign Tokens| TOKEN GMAIL -->|Sign Tokens| TOKEN CONV -->|+ Signed Token| STORAGE AUTH -->|+ Signed Token| STORAGE GMAIL -->|+ Signed Token| STORAGE GMAIL -->|Publish Events| EVENT TABLES -->|Subscribe| EVENT TOKEN -.->|Keys & Data| R2 STORAGE -.->|Proxy| R2 EVENT -.->|Subscriptions| R2 HUB -.->|Publish| R2 CDN -.->|Serve| R2 classDef clientStyle fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000 classDef appStyle fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#000 classDef infraStyle fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px,color:#000 classDef dataStyle fill:#e8f5e9,stroke:#388e3c,stroke-width:4px,color:#000 class WEB,CLI clientStyle class ORCH,CONV,AUTH,GMAIL,TABLES,CUSTOM appStyle class TOKEN,STORAGE,EVENT,HUB,CDN infraStyle class R2 dataStyle
Layered Architecture: Clear separation between user-facing clients, domain-specific applications, reusable infrastructure services, and persistent storage. All components run on Cloudflare's global edge network.

Layer Responsibilities

Client Layer

The client layer provides user interfaces for interacting with the system. Two primary implementations exist: a web-based client built with Vite and served as a static single-page application, and a command-line interface for terminal-based interaction. Both clients communicate exclusively with the orchestrator application and load application metadata from the Apps CDN.

Application Layer

Applications are independent Cloudflare Workers implementing specific business capabilities. Each application exposes a set of tools (discrete functions) that can be invoked by the orchestrator. Applications are stateless; all persistent state is managed through the storage service. The orchestrator application holds special status as the coordination point for AI-powered planning and execution.

Infrastructure Layer

Infrastructure services provide shared capabilities required by all applications. The token service handles cryptographic signing, the storage service enforces permissions and proxies R2 operations, the event service coordinates publish/subscribe messaging, the hub service manages deployment, and the Apps CDN distributes application assets.

Key Architectural Properties

  • Stateless Execution: All Workers are stateless, enabling unlimited horizontal scaling
  • Declarative Security: Permissions declared in JSON schemas, automatically enforced by infrastructure
  • Service Isolation: Each application runs in isolated V8 environments with no shared memory
  • Edge Deployment: All components run on Cloudflare's global network (300+ locations)

Technology Stack

Component Technology Purpose
Runtime Cloudflare Workers (V8 Isolates) Serverless execution environment
Storage R2 Object Storage S3-compatible object storage with zero egress
Cryptography Ed25519 High-speed public-key signatures
Language TypeScript (strict mode) Type-safe application development
Build System Rollup, Vite Module bundling and optimization
AI Models Claude Sonnet 4.5, GPT-5 Language model inference

End-to-End Request Flow

This section illustrates a complete request lifecycle, from user query through semantic filtering, token-aware planning, tool execution, and response streaming. The flow demonstrates how the architecture's components coordinate to deliver sub-second response initiation while processing complex multi-step operations.

%%{init: {'theme':'base', 'themeVariables': { 'fontSize':'14px'}}}%% sequenceDiagram autonumber participant User as 👤 User
Web Client participant Orch as 🧠 Orchestrator
AI Coordinator participant CDN as 📦 Apps CDN
Tool Metadata participant OpenAI as 🤖 OpenAI
Embeddings participant Claude as 🎯 Claude Sonnet 4.5
Planning & Execution participant Gmail as 📧 Gmail App
Tool Execution participant Token as 🔑 Token Service
Crypto Signing participant Storage as 💾 Storage Service
Permissions participant R2 as ☁️ R2
Data Store rect rgb(230, 240, 255) note over User,R2: Phase 1: Query Preprocessing & Semantic Filtering User->>Orch: POST /plan
"Create a table from my latest emails" Orch->>CDN: GET /manifest.json CDN-->>Orch: List of 15 apps loop Load App Tools Orch->>CDN: GET /apps/{id}/tools.json CDN-->>Orch: Tool definitions + embeddings end note over Orch: Total: 87 tools loaded Orch->>OpenAI: Embed query (512-dim vector) OpenAI-->>Orch: Query embedding note over Orch: Cosine similarity filtering
87 tools → 8 tools (90% reduction) end rect rgb(240, 255, 240) note over User,R2: Phase 2: Token Analysis & Tool Availability note over Orch: Analyze required tokens:
gmail_access ✓ (available)
table_access ✓ (available) note over Orch: Build AI prompt with:
• 8 filtered tools (available)
• 2 usage examples (relevant)
• User's auth context end rect rgb(255, 245, 230) note over User,R2: Phase 3: AI Planning & Decision Orch->>Claude: createMessage()
system: tools + examples
user: query Claude-->>Orch: Tool decision: list_emails, create_table Orch->>User: SSE: toolDecision event end rect rgb(255, 240, 245) note over User,R2: Phase 4: Tool Execution - list_emails Orch->>Gmail: POST /execute
tool: list_emails
input: {maxResults: 10} Gmail->>Token: POST /sign-token
type: gmail_storage Token->>R2: Load private key R2-->>Token: Ed25519 key Token-->>Gmail: Signed token (<1ms) Gmail->>Storage: GET /emails + token Storage->>Token: Verify signature Token-->>Storage: Valid ✓ Storage->>R2: Fetch emails R2-->>Storage: Email data Storage-->>Gmail: Authorized data Gmail-->>Orch: {emails: [...10 items]} Orch->>User: SSE: toolRun complete end rect rgb(245, 240, 255) note over User,R2: Phase 5: Tool Execution - create_table note over Orch: Pass emails from previous tool Orch->>Gmail: POST /execute
tool: create_table
input: {data: emails, columns: [...]} note over Gmail: Create table object Gmail->>Storage: PUT /tables/{id} + token Storage-->>Gmail: Success Gmail-->>Orch: {tableId: "tbl_123", rows: 10} Orch->>User: SSE: toolRun complete end rect rgb(250, 250, 250) note over User,R2: Phase 6: Response Generation Orch->>Claude: Generate response with results Claude-->>Orch: "Created table with 10 emails..." Orch->>User: SSE: response event note over User: Display table with
custom renderer end note over User,R2: Total Time: ~800ms
Semantic Filtering: 10ms | Token Signing: <1ms | Tool Execution: ~400ms | LLM: ~350ms
Complete Request Lifecycle: Demonstrates semantic filtering (87→8 tools), cryptographic token signing (<1ms), permission verification, multi-tool coordination, and real-time streaming updates via Server-Sent Events.

Performance Breakdown

Phase Latency Cost Impact Key Optimization
Semantic Filtering ~10ms 90% reduction Cosine similarity on cached embeddings
Token Signing <1ms Negligible Ed25519 performance + key caching
Permission Verification <0.5ms Negligible 3-phase validation with early exit
Tool Execution ~200ms/tool Variable Parallel execution when possible
AI Planning ~350ms 85% of total cost Smaller context from filtering
Streaming Updates 0ms (async) None Server-Sent Events (SSE)

Request Flow Optimizations

  • Early Streaming: User sees "thinking" state within 100ms, maintaining engagement
  • Smart Caching: Tool embeddings and metadata cached at edge, zero fetch latency
  • Minimal Token Generation: Only requested operations require token signing, not tool loading
  • Context Reduction: 90% fewer tokens sent to LLM = 10x cost savings per request
  • Parallel Execution: Independent tools execute concurrently when dependencies allow

Cryptographic Security Model

The security architecture employs Ed25519 public-key cryptography for token signing and verification. Applications authenticate requests using signed JWT tokens that embed permission grants. A three-phase validation system ensures that storage operations are authorized before execution. Automatic key rotation occurs every 30 days, maintaining a rolling window of valid keys for token verification.

sequenceDiagram participant App as Application participant TokenSvc as Token Service participant R2 as R2 Storage participant StorageSvc as Storage Service Note over App,StorageSvc: Phase 1: Token Acquisition App->>TokenSvc: POST /sign-token
{appSecret, tokenType, payload} TokenSvc->>TokenSvc: Verify App Secret TokenSvc->>R2: Fetch tokens.json R2-->>TokenSvc: Token Schema TokenSvc->>TokenSvc: Apply Template Substitution TokenSvc->>R2: Load Private Key (kid) R2-->>TokenSvc: Ed25519 Private Key TokenSvc->>TokenSvc: Sign JWT with Ed25519 TokenSvc-->>App: Signed Token
{token, payload, signature, kid} Note over App,StorageSvc: Phase 2: Token Verification App->>StorageSvc: Storage Operation + Token StorageSvc->>StorageSvc: Extract kid from Token StorageSvc->>TokenSvc: GET /get-public-keys?kid={kid} TokenSvc->>R2: Load Public Keys R2-->>TokenSvc: Ed25519 Public Keys TokenSvc-->>StorageSvc: Public Key for kid StorageSvc->>StorageSvc: Verify Signature Note over StorageSvc: Phase 3: Permission Validation StorageSvc->>StorageSvc: Layer 1: Check App Secret StorageSvc->>StorageSvc: Layer 2: Check Token Permissions StorageSvc->>R2: Load Token Schema R2-->>StorageSvc: Declarative Permissions StorageSvc->>StorageSvc: Layer 3: Validate Against Schema alt Authorized StorageSvc->>R2: Execute Operation R2-->>StorageSvc: Result StorageSvc-->>App: Success else Unauthorized StorageSvc-->>App: 403 Forbidden end
Token signing and verification sequence with three-phase permission validation

Ed25519 Cryptographic Primitives

Ed25519 was selected for its performance characteristics and security properties. Signatures are generated in under 1ms on edge hardware, and verification completes in under 0.5ms. The algorithm provides 128-bit security with 32-byte public keys and 64-byte signatures, significantly smaller than RSA equivalents.

Three-Phase Permission Validation

Storage access requests undergo validation in three sequential phases, with each phase providing progressively granular control:

Phase 1: Application Secret Validation
Applications presenting a valid application secret are granted full access to their own storage namespace. This enables applications to manage their internal state without requiring per-operation token signing.
Phase 2: Embedded Token Permissions
Tokens may embed explicit permission grants as arrays of path patterns. The storage service checks whether the requested operation path matches any embedded permission. This phase enables cross-application access with explicit grants.
Phase 3: Declarative Schema Validation
If embedded permissions are not present, the storage service loads the token schema from the application's published configuration and validates the operation against declaratively specified permissions. This provides a fallback validation mechanism and enables permission updates without reissuing tokens.

Key Rotation Protocol

New Ed25519 keypairs are generated automatically every 30 days. The token service maintains four active private keys and publishes five public keys (current plus four historical). This overlap window ensures that tokens signed immediately before rotation remain valid during their lifetime. Old keys are archived to R2 but removed from active use.

Security Guarantees

  • Cryptographic Integrity: All tokens cryptographically signed, tampering detectable
  • Zero-Trust Architecture: Every operation validated, no implicit trust
  • Principle of Least Privilege: Tokens grant minimum required permissions
  • Forward Secrecy: Key rotation limits exposure window for compromised keys

Event-Driven Communication

The event system enables asynchronous, loosely-coupled communication between applications through a publish/subscribe pattern. Applications declare events they can emit, and other applications subscribe to receive those events. The event service manages subscription lifecycle, coordinates webhook setup and teardown, and delivers events to subscribers in parallel.

sequenceDiagram participant Sub as Subscriber App participant EventSvc as Event Service participant R2 as R2 Storage participant Source as Source App participant External as External Service Note over Sub,External: Phase 1: Subscription Sub->>EventSvc: POST /subscribe EventSvc->>EventSvc: Generate Key from Tokens EventSvc->>R2: Check Existing Subscriptions R2-->>EventSvc: No Active Subscribers EventSvc->>R2: Store Subscription EventSvc->>Source: Invoke start.ts Handler Source->>External: Setup Webhook External-->>Source: Webhook Configured Source->>R2: Store Listener State Source-->>EventSvc: Success EventSvc-->>Sub: {key, isFirstSubscriber: true} Note over Sub,External: Phase 2: Event Delivery External->>Source: Webhook Notification Source->>Source: Invoke handler.ts Source->>EventSvc: POST /send EventSvc->>EventSvc: Generate Key EventSvc->>R2: List Subscriptions for Key R2-->>EventSvc: [Subscriber List] par Parallel Delivery EventSvc->>Sub: POST /execute (Tool Invocation) Sub->>Sub: Process Event Sub-->>EventSvc: Success end EventSvc-->>Source: {delivered: 1, failed: 0} Note over Sub,External: Phase 3: Unsubscription Sub->>EventSvc: POST /unsubscribe EventSvc->>R2: Delete Subscription EventSvc->>R2: Check Remaining Subscribers R2-->>EventSvc: No More Subscribers EventSvc->>Source: Invoke stop.ts Handler Source->>External: Delete Webhook Source->>R2: Clean Listener State Source-->>EventSvc: Success EventSvc-->>Sub: {isLastSubscriber: true}
Event lifecycle from subscription through delivery to cleanup

Event Handlers

Event source applications implement three handler functions for each event type:

start.ts
Invoked when the first subscriber registers interest in an event. Typically configures webhooks with external services and stores listener state. Receives a unique key identifying the subscription group and token payloads for authentication with external services.
handler.ts
Invoked by external webhooks when events occur. Processes the webhook payload, fetches additional data if needed, and calls the event service to deliver events to subscribers. May be invoked repeatedly for a single subscription.
stop.ts
Invoked when the last subscriber unsubscribes. Responsible for cleaning up webhooks and removing listener state. Receives the subscription key for identifying which listener to clean up.

Key-Based Multi-Tenancy

Subscriptions are grouped by deterministic keys generated from token payloads. For example, a Gmail event subscription might generate a key from the Google user ID in the authentication token. This ensures that each user's subscription is independent, enabling per-user webhook configuration and isolated event streams.

Idempotency Guarantees

The event service implements idempotent subscription operations. Multiple subscription requests with identical parameters result in a single stored subscription. The start handler is invoked exactly once for each unique key, even if multiple subscribers register simultaneously. This property simplifies subscription logic and prevents resource leaks.

Event System Properties

  • Automatic Lifecycle: Start and stop handlers called automatically based on subscription state
  • Parallel Delivery: Events delivered to all subscribers concurrently
  • Token-Scoped: Subscriptions isolated by token payload values
  • Delivery Guarantees: At-least-once delivery with failure isolation

Deployment Infrastructure

The hub service provides a centralized deployment gateway that eliminates the need for application developers to possess Cloudflare credentials or understand deployment mechanics. Applications are packaged as multipart form uploads containing compiled JavaScript, configuration files, and assets. The hub service processes these uploads, generates semantic embeddings for tool discovery, and orchestrates deployment to Cloudflare's edge network.

flowchart TB DEV[Developer: yarn run publish] subgraph "Build Process" COMPILE[Rollup Compilation] BUNDLE[Bundle Assets] FORM[Create Multipart Form] end subgraph "Hub Service" RECEIVE[Parse Upload] VALIDATE[Validate Metadata] subgraph "Embedding Generation" EMB_APP[Embed App Description] EMB_TOOL[Embed Tool Descriptions] EMB_EX[Embed Usage Examples] STORE_EMB[Store embeddings.json] end UPLOAD[Upload to R2] DEPLOY[Deploy via Cloudflare API] CONFIG[Configure Environment] ROUTE[Create Routes] end subgraph "Cloudflare" WORKER[Worker Deployed] EDGE[Edge Network] end DEV --> COMPILE COMPILE --> BUNDLE BUNDLE --> FORM FORM --> RECEIVE RECEIVE --> VALIDATE VALIDATE --> EMB_APP EMB_APP --> EMB_TOOL EMB_TOOL --> EMB_EX EMB_EX --> STORE_EMB STORE_EMB --> UPLOAD UPLOAD --> DEPLOY DEPLOY --> CONFIG CONFIG --> ROUTE ROUTE --> WORKER WORKER --> EDGE
Deployment pipeline from local build through hub processing to edge deployment

Multipart Upload Format

Applications are packaged as multipart/form-data with the following components:

Embedding Generation

The hub service generates 512-dimensional embeddings using OpenAI's text-embedding-3-small model for three categories of content:

  1. Application Description: A single vector representing the application's overall purpose
  2. Tool Descriptions: One vector per tool combining the tool name and description
  3. Usage Examples: One vector per example combining the user query and execution plan

These embeddings enable the orchestrator to perform semantic similarity search during tool discovery, filtering irrelevant capabilities and reducing context size by approximately 90%.

Cloudflare API Integration

The hub service communicates with Cloudflare's Workers API to deploy applications. This includes uploading the compiled script, configuring environment bindings (R2 buckets, KV namespaces, secrets), and creating routes if custom domains are specified. The hub maintains Cloudflare API credentials, insulating application developers from infrastructure complexity.

Deployment Advantages

  • Zero Configuration: Developers require no Cloudflare account or credentials
  • Automated Optimization: Embedding generation happens automatically at publish time
  • Immutable Deployments: Each publish creates a new immutable deployment
  • Global Distribution: Applications automatically replicated to 300+ edge locations

AI Orchestration and Tool Discovery

The orchestrator application coordinates AI-powered planning and execution. When a user submits a query, the orchestrator performs semantic filtering to identify relevant tools, analyzes token requirements to determine tool availability, uses a two-phase decision process for efficient tool selection and input generation, executes the resulting plan, and streams updates to the client in real-time.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#fff3e0','primaryTextColor':'#000','primaryBorderColor':'#f57c00','fontSize':'13px'}}}%% flowchart TB START["User Query
'Create table from emails'"] subgraph DISCOVERY["TOOL DISCOVERY (~20ms)"] LOAD["Load from CDN
87 tools across 15 apps"] EMBED["Embed Query + Calculate Similarity"] REDUCE["Result: 8 relevant tools
90% reduction"] end subgraph TOKEN["TOKEN ANALYSIS (~5ms)"] CHECK["Check Required Tokens"] AVAIL["Available Tools"] UNAVAIL["Unavailable + Prerequisites"] end subgraph PLANNING["TWO-PHASE AI PLANNING"] direction TB subgraph PHASE1["Phase 1: Tool Selection (~150ms)"] P1_PROMPT["Simplified Prompt
descriptions only, no schemas"] P1_SELECT["AI selects WHICH tools
list_emails, create_table"] end subgraph PHASE2["Phase 2: Input Generation (~200ms)"] P2_SPLIT{"Schema
Complexity?"} P2_COMPLEX["Complex Tools
Individual AI calls
(parallel execution)"] P2_SIMPLE["Simple Tools
Batched AI call"] P2_INPUTS["Generated Inputs
Full JSON with values"] end end subgraph EXECUTION["EXECUTION (~400ms)"] EXEC["Execute Tools in Sequence
Stream progress via SSE"] end RESPOND["RESPONSE (~100ms)
AI generates summary"] START --> LOAD LOAD --> EMBED EMBED --> REDUCE REDUCE --> CHECK CHECK --> AVAIL CHECK --> UNAVAIL AVAIL --> P1_PROMPT UNAVAIL --> P1_PROMPT P1_PROMPT --> P1_SELECT P1_SELECT --> P2_SPLIT P2_SPLIT -->|"$defs, anyOf"| P2_COMPLEX P2_SPLIT -->|"Simple"| P2_SIMPLE P2_COMPLEX --> P2_INPUTS P2_SIMPLE --> P2_INPUTS P2_INPUTS --> EXEC EXEC --> RESPOND classDef discoveryStyle fill:#e3f2fd,stroke:#1976d2,stroke-width:2px classDef tokenStyle fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px classDef phase1Style fill:#fff3e0,stroke:#f57c00,stroke-width:2px classDef phase2Style fill:#e8f5e9,stroke:#388e3c,stroke-width:2px classDef execStyle fill:#ffebee,stroke:#c62828,stroke-width:2px class LOAD,EMBED,REDUCE discoveryStyle class CHECK,AVAIL,UNAVAIL tokenStyle class P1_PROMPT,P1_SELECT phase1Style class P2_SPLIT,P2_COMPLEX,P2_SIMPLE,P2_INPUTS phase2Style class EXEC,RESPOND execStyle
Intelligent Orchestration Pipeline: Semantic filtering achieves 90% tool reduction, followed by two-phase AI planning that separates tool selection from input generation. Complex schemas receive individual attention while simple schemas are batched for efficiency.

Semantic Tool Filtering

Tool discovery employs cosine similarity between the embedded user query and the embedded tool descriptions. Tools with similarity scores below 0.3 are excluded from the prompt. This filtering typically removes 80-90% of available tools, substantially reducing prompt size and associated API costs while maintaining high recall for relevant capabilities.

The filtering algorithm also considers application-level embeddings and usage example embeddings. If an application's description is semantically relevant, all of its tools receive a score boost. Similarly, if a usage example matches the query, the tools referenced in that example are prioritized.

Two-Phase Tool Decision

Rather than asking the AI to simultaneously select tools and generate their inputs, MALV separates these concerns into two phases for improved accuracy and cost efficiency:

Phase 1: Tool Selection
The AI receives a simplified prompt showing tool descriptions without detailed input schemas. This focused context allows the AI to match user intent to tool capabilities without distraction from schema complexity. The result is a list of selected tools (app + name pairs) and a "next step" observation providing strategic direction.
Phase 2: Input Generation
Selected tools are categorized by schema complexity. Tools with complex schemas ($defs, $ref, anyOf) receive individual AI calls executed in parallel, ensuring focused attention on intricate type structures. Simple tools are batched into a single call for efficiency. Full JSON schemas are provided only in this phase.

Complexity-Aware Batching

Schema complexity is scored based on structural features. Tools scoring above the threshold (e.g., those with recursive types or union schemas) are isolated to prevent malformed inputs:

Schema Feature Complexity Impact Handling
Simple properties Low Batched with other simple tools
Nested objects Medium May be batched if total score low
$defs / $ref High Individual AI call
anyOf unions High Individual AI call
Recursive types Very High Individual AI call with focused context

Token-Aware Tool Presentation

Tools declare required tokens in their definitions. The orchestrator analyzes available tokens (provided by the client) and categorizes tools as available or unavailable. Available tools are presented normally in the AI prompt. Unavailable tools are presented with instructions on how to unlock them, typically by invoking an authentication tool. This enables the AI to automatically guide users through authentication flows when necessary.

Streaming Execution

Tool execution results stream to the client using Server-Sent Events (SSE). The orchestrator emits events for: tool decisions (when the AI selects tools to invoke), tool execution start/end (with streaming logs), token creation (when tools generate new authentication tokens), and final response generation. This provides real-time visibility into system behavior and improves perceived performance.

Cost Optimization

The combination of semantic filtering and two-phase planning compounds cost savings:

At $3 per million input tokens (Claude Sonnet 4.5 pricing), these optimizations reduce per-request cost from ~$0.06 (naive approach) to ~$0.002, representing a 30x improvement.

Orchestration Properties

  • Dynamic Tool Loading: Tools loaded on-demand from CDN, enabling hot updates
  • Contextual Filtering: Only relevant tools included in prompts
  • Two-Phase Planning: Separation of selection and input generation improves accuracy
  • Complexity-Aware: Complex schemas receive focused AI attention
  • Transparent Authentication: AI guides users through auth flows automatically
  • Real-Time Feedback: Streaming updates provide execution visibility

Object Rendering System

Unlike traditional AI approaches that limit outputs to text responses, MALV implements a sophisticated object rendering system for rich data visualization. Objects are persistent, renderable data entities created by tools and stored in R2. Each object can be visualized through custom renderers with full lifecycle management, enabling interactive dashboards, data tables, research boards, and other rich UI components.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#e3f2fd','primaryTextColor':'#000','primaryBorderColor':'#1976d2','lineColor':'#1976d2','secondaryColor':'#fff3e0','tertiaryColor':'#f3e5f5'}}}%% flowchart TB subgraph TOOL["Tool Execution"] CREATE["Tool calls
object.set()"] end subgraph STORAGE["Persistent Storage"] R2[("R2 Bucket
Objects stored by type/id")] META["Object Metadata
id, type, name, references"] end subgraph CLIENT["Client Layer"] TABS["Tab Bar UI
Object navigation"] subgraph RENDERER["Dynamic Renderer Loading"] LOAD["Load from CDN
objects/{type}/web.js"] CAPS["Build Capabilities
storage, tool, ai access"] LIFE["Lifecycle Hooks
onDataUpdated, onUnmount"] end DISPLAY["Rich Visualization
Tables, Charts, Boards"] end CREATE -->|"Store metadata"| R2 R2 --> META META -->|"Tab appears"| TABS TABS -->|"User selects"| LOAD LOAD --> CAPS CAPS --> LIFE LIFE --> DISPLAY classDef toolStyle fill:#fff3e0,stroke:#f57c00,stroke-width:2px classDef storageStyle fill:#e8f5e9,stroke:#388e3c,stroke-width:2px classDef clientStyle fill:#e3f2fd,stroke:#1976d2,stroke-width:2px class CREATE toolStyle class R2,META storageStyle class TABS,LOAD,CAPS,LIFE,DISPLAY clientStyle
Object Lifecycle: Tools create objects with metadata references, objects appear as tabs in the UI, and custom renderers load dynamically from the CDN with full capability access.

Object Architecture

Objects follow a reference-based architecture where metadata stores pointers rather than actual data. When a tool creates a table object, it stores a tableId reference in the object metadata. The renderer then uses this reference to fetch the actual data through tool calls. This separation enables:

Custom Renderers

Each object type can define custom renderers for different environments. Web renderers are ES modules loaded dynamically from the Apps CDN. CLI renderers produce formatted terminal output. Renderers receive structured parameters including object info, capabilities, and lifecycle hooks.

// objects/{type}/web.ts - Web Renderer Signature
export default async function web(
  info: { id: string; name: string; metadata: ObjectMetadata },
  capabilities: { callTool, storage, ai },
  lifecycle: { onDataUpdated, onUnmount }
): Promise<HTMLElement>

Lifecycle Hooks

Renderers can subscribe to lifecycle events for reactive updates:

onDataUpdated(callback)
Invoked when the underlying object data changes, enabling live updates without page refresh. The callback receives the new metadata, allowing renderers to re-fetch and re-render efficiently.
onUnmount(callback)
Invoked when the object tab is closed or the user navigates away. Enables cleanup of subscriptions, WebSocket connections, or other resources held by the renderer.

Cross-App Object Storage

Objects can be stored in applications other than the one that created them, enabling team-wide sharing. The storage configuration in objects.json specifies the target app and path template:

{
  "storage": {
    "inApp": "@malv/auth",
    "path": "/teams/<token.team>/objects/",
    "tokenType": "account",
    "tokenFromApp": "@malv/auth"
  }
}

This configuration stores objects under the team's namespace in the auth app, making them accessible to all team members regardless of which app created them.

Object System Properties

  • Persistent Visualization: Objects survive sessions and appear across page reloads
  • Dynamic Loading: Renderers loaded on-demand from CDN, enabling hot updates
  • Full Capabilities: Renderers can call tools, access storage, and invoke AI
  • Team Sharing: Cross-app storage enables collaborative object access
  • Reactive Updates: Lifecycle hooks enable live data synchronization

Perceptions System

Traditional AI assistants react to explicit user requests. MALV's perception system enables proactive intelligence by defining contextual conditions that help the AI understand user state and suggest relevant actions. Apps declare "perceptions" that match token presence and storage values, surfacing suggested tasks when conditions are met.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#f3e5f5','primaryTextColor':'#000','primaryBorderColor':'#7b1fa2','lineColor':'#7b1fa2','secondaryColor':'#fff3e0'}}}%% flowchart LR subgraph INPUT["User Context"] TOKENS["Available Tokens
account, warehouse"] STORAGE["Storage Data
warehouse.lowStockThreshold = null"] end subgraph EVAL["Perception Evaluation"] FETCH["Fetch perception/*.json
from all apps"] subgraph CONDITIONS["Condition Matching"] TOK_CHECK["Token Conditions
exists / absent"] STOR_CHECK["Storage Conditions
equals, isEmpty, exists"] end MATCH["Matched Perceptions"] end subgraph OUTPUT["AI Context"] PROMPT["Inject into Prompt
Perception: User has warehouse
but threshold is null
"] TASKS["Suggested Tasks
Set low stock threshold"] end TOKENS --> FETCH STORAGE --> FETCH FETCH --> TOK_CHECK FETCH --> STOR_CHECK TOK_CHECK --> MATCH STOR_CHECK --> MATCH MATCH --> PROMPT MATCH --> TASKS classDef inputStyle fill:#e3f2fd,stroke:#1976d2,stroke-width:2px classDef evalStyle fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px classDef outputStyle fill:#fff3e0,stroke:#f57c00,stroke-width:2px class TOKENS,STORAGE inputStyle class FETCH,TOK_CHECK,STOR_CHECK,MATCH evalStyle class PROMPT,TASKS outputStyle
Perception Pipeline: The orchestrator evaluates perception conditions against user tokens and storage data, injecting matched perceptions into the AI prompt for context-aware responses.

Perception Definition

Perceptions are defined in perception/*.json files within each app. Each perception specifies conditions that must be met and the contextual insight to provide when matched:

{
  "tokens": {
    "exists": { "@malv/inventory": "warehouse" },
    "absent": { "@malv/inventory": "product" }
  },
  "storage": {
    "@malv/inventory": {
      "/teams/<warehouse.teamId>/warehouses/<warehouse.warehouseId>/config.json": {
        "lowStockThreshold": { "operator": "equals", "value": null }
      }
    }
  },
  "perception": "User has a warehouse configured but hasn't set the low stock threshold",
  "tasks": ["Help user define low stock threshold", "Suggest reorder policies"]
}

Condition Types

Two types of conditions enable precise state matching:

Token Conditions
Check for token presence (exists) or absence (absent). Token conditions are fast to evaluate as they only require checking the client-provided token list. Use these to gate perceptions by authentication state or resource selection.
Storage Conditions
Query actual data values in storage. Paths support template substitution using token payload fields (e.g., <warehouse.teamId>). Storage conditions enable state-aware perceptions like "warehouse exists but threshold is not set."

Storage Operators

Storage conditions support multiple comparison operators for flexible matching:

Operator Description Example Use Case
equals Exact value match (including null) Check if objective is unset
notEquals Value differs from specified Check if status changed from draft
exists Field is present (any value) Check if configuration exists
notExists Field is missing entirely Detect unconfigured resources
isEmpty Array is empty or string is blank Check if no products added
isNotEmpty Array has items or string has content Check if inventory has items

Proactive Suggestions

When perceptions match, their suggested tasks are presented to the AI as actionable next steps. This transforms the assistant from purely reactive to contextually proactive:

AI sees in prompt:

Current Context:

User has warehouse "West Coast Distribution Center" but the low stock threshold is not configured.

Suggested Actions: Help user define low stock threshold, Suggest reorder policies

Contextual Prompt Injection: Matched perceptions inject context and suggestions into the AI prompt, enabling proactive guidance.

Perception System Properties

  • Declarative Conditions: Define rules in JSON, no custom code required
  • State-Aware: Storage queries enable deep context understanding
  • Cross-App: Perceptions evaluated across all apps with valid tokens
  • Template Paths: Dynamic path resolution from token payloads
  • Proactive Intelligence: AI suggests actions before user asks

Semantic Storage Search

Beyond semantic tool discovery, MALV extends vector-based search to storage data itself. When applications write to storage, embeddings are automatically generated in the background, enabling AI-powered discovery of relevant data across all applications. This transforms storage from a simple key-value system into an intelligent data layer.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#e8f5e9','primaryTextColor':'#000','primaryBorderColor':'#388e3c','lineColor':'#388e3c','secondaryColor':'#e3f2fd'}}}%% flowchart TB subgraph WRITE["Data Write Path"] direction LR TOOL["Tool writes data
storage.put(path, data)"] QUEUE["Background Queue"] EMBED["Generate Embedding
bge-base-en-v1.5"] INDEX["Store in Search Index
path, embedding, securityKey"] TOOL --> QUEUE QUEUE -->|"Async"| EMBED EMBED --> INDEX end subgraph SEARCH["Search Path"] direction LR QUERY["AI needs data
'Find project goals'"] SEMANTIC["Semantic Search
/search?q=...&types=storage"] FILTER["Filter by Security Keys
Only accessible data"] RESULTS["Ranked Results
paths, similarity scores"] FETCH["Fetch matched data
storage.get(path)"] QUERY --> SEMANTIC SEMANTIC --> FILTER FILTER --> RESULTS RESULTS --> FETCH end WRITE ~~~ SEARCH classDef writeStyle fill:#e8f5e9,stroke:#388e3c,stroke-width:2px classDef searchStyle fill:#e3f2fd,stroke:#1976d2,stroke-width:2px class TOOL,QUEUE,EMBED,INDEX writeStyle class QUERY,SEMANTIC,FILTER,RESULTS,FETCH searchStyle
Dual-Path Architecture: Write operations queue background embedding generation. Search operations filter by security keys and return ranked results for data retrieval.

Automatic Embedding Generation

When apps configure storage paths for searchability, the infrastructure automatically processes writes through an embedding pipeline:

  1. Write Interception: Storage service detects writes to searchable paths
  2. Content Extraction: Relevant text content extracted from JSON data
  3. Embedding Generation: HuggingFace model (bge-base-en-v1.5) creates 768-dimensional vectors
  4. Index Storage: Embeddings stored with path and security metadata

This process runs asynchronously, ensuring write latency is not affected by embedding computation.

Security Key Filtering

Semantic search respects the same permission model as direct storage access. Each indexed item includes security keys derived from token payloads. Search requests specify which security keys the user holds, and results are filtered to include only accessible data:

// Search request with security keys
GET /search?q=project%20goals&types=storage&securityKeys=["account:user123","team:team456"]

// Only returns data where indexed securityKey matches one of the provided keys

Cross-App Data Discovery

Semantic storage search operates across application boundaries, enabling powerful cross-app queries. An inventory app can discover relevant data from an orders app, or a reporting tool can find metrics from multiple data sources:

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#fff3e0','primaryTextColor':'#000','primaryBorderColor':'#f57c00'}}}%% graph LR QUERY["'Find low stock items'"] subgraph APPS["Cross-App Results"] R1["@malv/inventory
Product records
similarity: 0.89"] R2["@malv/orders
Purchase orders
similarity: 0.82"] R3["@malv/suppliers
Supplier contracts
similarity: 0.76"] end QUERY --> R1 QUERY --> R2 QUERY --> R3 classDef queryStyle fill:#e3f2fd,stroke:#1976d2,stroke-width:2px classDef resultStyle fill:#fff3e0,stroke:#f57c00,stroke-width:2px class QUERY queryStyle class R1,R2,R3 resultStyle
Cross-App Discovery: A single semantic query returns ranked results from multiple applications, unified by similarity scoring.

Location-Based Filtering

Search queries can specify location constraints to scope results to specific apps or paths. This enables targeted searches within a project namespace or across a specific application's data:

// Search within a specific warehouse
GET /search?q=stock&types=storage&locations={"@malv/inventory":"/teams/team1/warehouses/wh1"}

// Search across all inventory data
GET /search?q=products&types=storage&locations={"@malv/inventory":"*"}

Search Response Format

Search results include the storage path, source application, similarity score, and a content preview. The AI or application can then fetch the full data using standard storage operations:

{
  "query": "low stock alerts",
  "types": ["storage"],
  "results": [
    {
      "type": "storage",
      "appName": "@malv/inventory",
      "path": "/teams/team1/warehouses/wh1/products/prod_001.json",
      "preview": "Widget A stock level below threshold, 12 units remaining...",
      "similarity": 0.89,
      "securityKey": "f8d2c9b1a5f3e7d9"
    }
  ]
}

Semantic Storage Properties

  • Automatic Indexing: No manual embedding calls, infrastructure handles indexing
  • Permission-Aware: Security keys ensure users only find accessible data
  • Cross-App Intelligence: Single query spans all applications
  • Location Scoping: Filter by app or path for targeted searches
  • Zero Write Latency: Async embedding preserves storage performance

Message Summarization

Long conversations accumulate "behavioral gravity" - directive language, persuasive framing, and emotional tone that can bias AI responses when reflecting on history. MALV's background summarization system strips this gravity by converting messages into neutral, factual summaries that compress context while preserving essential information.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#e3f2fd','primaryTextColor':'#000','primaryBorderColor':'#1976d2','lineColor':'#1976d2'}}}%% flowchart LR subgraph ORIGINAL["Original Messages"] USER_MSG["User:
'I really need to know
which products are running
low in the warehouse!'
"] ASST_MSG["Assistant:
'Want me to set up
low stock alerts? I can
configure thresholds and...'
"] end subgraph PROCESS["Background Summarization"] QUEUE["Queue after respond"] SUMMARIZE["AI generates summaries
Fast model, batched"] end subgraph SUMMARIES["Neutral Summaries"] USER_SUM["Third-person:
'He wanted to know which
products were running low
in the warehouse'
"] ASST_SUM["First-person:
'I offered to set up
low stock alerts and asked
about threshold preferences'
"] end USER_MSG --> QUEUE ASST_MSG --> QUEUE QUEUE --> SUMMARIZE SUMMARIZE --> USER_SUM SUMMARIZE --> ASST_SUM classDef originalStyle fill:#ffebee,stroke:#c62828,stroke-width:2px classDef processStyle fill:#fff3e0,stroke:#f57c00,stroke-width:2px classDef summaryStyle fill:#e8f5e9,stroke:#388e3c,stroke-width:2px class USER_MSG,ASST_MSG originalStyle class QUEUE,SUMMARIZE processStyle class USER_SUM,ASST_SUM summaryStyle
Behavioral Gravity Removal: Original messages with directive language are transformed into neutral summaries. User messages become third-person observations; assistant messages become first-person event logs.

The Problem: Behavioral Gravity

When conversation history is passed to an AI, the language patterns in that history influence future responses. Phrases like "I really need," "this is critical," or "you should" create implicit pressure. Over long conversations, this accumulates, causing the AI to:

Summarization Strategy

MALV applies different summarization strategies based on message role:

User Messages: Third-Person Past Tense
Converts user statements into observations about what the user wanted or did. This removes urgency and emotional framing while preserving intent. "I desperately need help with X" becomes "He wanted help with X."
Assistant Messages: First-Person Past Tense
Converts assistant statements into factual logs of actions taken. This removes persuasive language and hedging. "Want me to help you with X? I could also do Y..." becomes "I offered to help with X and mentioned Y as an option."

Background Processing

Summarization runs as a background job, queued after each response completes. This ensures summarization latency never impacts user-facing response time:

// Queued automatically after respond tool completes
await capabilities.tool.queue('@malv/orchestrator', 'summarize', {
  conversationId,
  assistantMessagePath
}, {
  maxBatchTimeout: 5000,  // Wait up to 5s for more items
  maxBatch: 5             // Process up to 5 messages together
});

Context Compression

Summaries enable efficient context compression for long conversations. When conversation history approaches token limits, the orchestrator can substitute summaries for older messages:

Message Type Original Tokens Summary Tokens Compression
User message (avg) ~150 ~40 73%
Assistant message (avg) ~400 ~80 80%
Tool results ~200 ~50 75%

Storage Structure

Summaries are stored alongside original messages, enabling flexible retrieval:

/conversations/{id}/messages/
├── msg_001.json           # Original user message
├── msg_001-summary.json   # Third-person summary
├── msg_002.json           # Original assistant message
├── msg_002-summary.json   # First-person summary
└── ...

Summarization Properties

  • Bias Removal: Strips emotional and directive language from history
  • Perspective Shift: Third-person for users, first-person for assistant
  • Zero Latency Impact: Background processing after response delivery
  • Context Compression: 70-80% token reduction for long conversations
  • Batched Execution: Multiple messages processed together for efficiency

Technical Specifications

Performance Characteristics

Metric Value Notes
Token Signing Latency < 1ms Ed25519 signature generation
Token Verification Latency < 0.5ms Ed25519 signature verification
Cold Start Time < 50ms V8 isolate initialization
Global Latency (P50) < 50ms Edge network proximity
Embedding Generation ~100ms Per application at publish time
Semantic Filtering < 10ms Cosine similarity computation

Security Parameters

Parameter Value
Signature Algorithm Ed25519 (128-bit security)
Public Key Size 32 bytes
Signature Size 64 bytes
Key Rotation Period 30 days
Active Private Keys 4
Published Public Keys 5 (current + 4 historical)
Token Format JWT (compact serialization)

Scalability Limits

Resource Limit Notes
Worker CPU Time 50ms (free), 15s (paid) Per invocation
Worker Memory 128 MB Per isolate
R2 Object Size 5 TB Single object maximum
Request Rate Unlimited Auto-scaling
Storage Capacity Unlimited R2 bucket capacity

Embedding Configuration

Parameter Value
Model text-embedding-3-small
Dimensions 512
Distance Metric Cosine similarity
Relevance Threshold 0.3
Cost per Embedding $0.00002 per 1K tokens

Development Environment

TypeScript Version
5.x with strict mode enabled
Build Tools
Rollup 4.x for Workers, Vite 5.x for web clients
Runtime Environment
Node.js 20.x for development, V8 isolates for production
Package Manager
Yarn 4.x with workspace support
Testing Framework
Jest 29.x with TypeScript integration
Code Quality
ESLint 8.x, Prettier 3.x

Architectural Benefits

The MALV architecture provides significant technical benefits through its design choices. By handling infrastructure concerns at the architecture level, developers can focus on application logic while benefiting from optimizations that would otherwise require substantial engineering effort to implement.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#e8f5e9','primaryTextColor':'#000','primaryBorderColor':'#2e7d32'}}}%% graph LR subgraph Traditional["Building from Scratch
12-18 months"] T1[Infrastructure Setup
3-4 months] T2[Security Implementation
2-3 months] T3[Tool System
2-3 months] T4[AI Integration
2-3 months] T5[Deployment Pipeline
1-2 months] T6[Monitoring & Ops
2-3 months] end subgraph MALV["Using MALV
Days to weeks"] M1[Define Tools
1-3 days] M2[Implement Logic
1-2 weeks] M3[Deploy
Minutes] end T1 --> T2 --> T3 --> T4 --> T5 --> T6 M1 --> M2 --> M3 style Traditional fill:#ffebee,stroke:#c62828,stroke-width:2px style MALV fill:#e8f5e9,stroke:#2e7d32,stroke-width:3px
Development Time Comparison: The MALV architecture handles infrastructure concerns, allowing developers to focus on application-specific logic rather than foundational systems.

Infrastructure Overhead Comparison

Component Building from Scratch Using MALV Difference
Infrastructure Engineering 2-3 senior engineers
(dedicated)
0
(handled by MALV)
Eliminated
Time to First Deployment 12-18 months 2-4 weeks ~95% reduction
Security Maintenance Ongoing engineering overhead Automated (key rotation, permissions) Automated

Key Technical Benefits

1. Semantic Tool Discovery

Embedding-based filtering reduces token usage by filtering irrelevant tools before they reach the LLM. This optimization requires sophisticated embedding generation, caching, and similarity scoring infrastructure.

  • Implementation Complexity: Requires vector embeddings, cosine similarity computation, and intelligent caching
  • Token Reduction: 90% fewer tokens sent to the LLM per request
  • Accuracy Improvement: Smaller context windows reduce noise and improve AI response quality

2. Cryptographic Security Model

Ed25519 signature verification with automatic key rotation provides strong security guarantees without developer overhead. This approach requires cryptography expertise and key management infrastructure to implement correctly.

  • Implementation Complexity: Requires cryptography expertise, key management, and distributed verification
  • Enterprise Ready: Meets requirements for financial services and healthcare compliance
  • Auditability: Every operation is cryptographically signed for traceability

3. Edge-Native Architecture

Running on Cloudflare Workers eliminates traditional infrastructure concerns: no containers to manage, no orchestration complexity, no idle resources consuming budget, and no data egress fees.

  • Implementation Complexity: Requires adapting to edge constraints (no Node.js APIs, no filesystem, limited compute time)
  • Global Distribution: Automatic deployment to 300+ locations
  • Scaling: Auto-scaling with zero configuration or warm-up time

4. Declarative Permission System

Three-phase permission validation with template substitution enables secure multi-tenancy through JSON configuration rather than custom code. This eliminates a common source of security vulnerabilities.

  • Implementation Complexity: Requires centralized policy engine, path normalization, and token schema validation
  • Developer Experience: Permissions declared in JSON, automatically enforced
  • Compliance: Permission checks logged centrally for audit reporting

Architecture Comparison

The following table compares MALV's architectural choices against common alternatives:

Capability Common Approach MALV Approach
Tool Discovery Send all tools to LLM Semantic filtering (90% reduction)
Authentication API keys or JWT Ed25519 signatures with auto-rotation
Permissions Manual ACLs per tool Declarative templates, centrally enforced
Deployment Docker + Kubernetes Edge Workers (zero config, global scale)
Tool Integration Custom per integration Standardized tool interface
Event System Manual webhook management Automatic lifecycle (start/stop handlers)
Resource Usage Always-on containers Pay-per-request execution

Summary: Infrastructure Requirements

Building equivalent infrastructure from scratch typically requires:

  • 12-18 months of engineering time
  • 3-5 senior engineers with distributed systems expertise
  • Deep expertise in cryptography, embeddings, edge computing, and AI orchestration
  • Ongoing maintenance for security updates, performance optimization, and infrastructure evolution

The MALV architecture handles this infrastructure, allowing developers to focus on building application-specific functionality.