bkataru

Building AI Infrastructure for a Post-Framework World

Image description

2025 was the year I stopped treating AI infrastructure like a prototype problem. The field matured fast—we went from barely having tools to drowning in incompatible ones. Turns out trading "not enough options" for "too many fragmented options" isn't actually an upgrade.

The Fragmentation Problem

Every major AI framework built its own protocol stack. Anthropic's MCP, OpenAI's function calling, LangChain's abstractions—each one solves similar problems while creating vendor lock-in, protocol silos, and guaranteed rewrites down the line.

What I learned building these projects: we need infrastructure that outlives any single framework. Better primitives, not more abstractions.

Technical Achievements: A Year in Review

1. PocketFlow-Zig: Flow-Based Agent Framework

Repository: PocketFlow-Zig

Started as a side project, ended up becoming the official Zig implementation of PocketFlow—a minimalist framework for building LLM-powered workflows. The design goal was explicit: build just enough framework to let agents construct other agents.

What I Built:

PocketFlow now runs across multiple platforms as a base layer for agentic workflows.

2. TOON-Zig: Efficient Token Serialization

Repository: toon-zig

TOON (Token Oriented Object Notation) is a data serialization format optimized for LLM consumption. Built the Zig implementation from scratch:

What Got Built:

Getting spec compliance while keeping performance high took some careful engineering.

3. HF-Hub-Zig: HuggingFace Integration Layer

Repository: hf-hub-zig

Zig needed native HuggingFace Hub API support, especially for GGUF model management. So I built it.

What It Does:

How It Works:

4. Zenmap: Cross-Platform Memory Mapping

Repository: zenmap

A single-file Zig library for memory mapping large files, built specifically for GGUF model handling.

The Challenge:

5. Igllama: Zig-Based Ollama Alternative

Repository: igllama

A Zig-based alternative to Ollama that uses the Zig build system's embedded Clang toolchain to compile and run llama.cpp.

What Makes It Different:

Production Work at Dirmacs

I've been working with Suprabhat Rapolu and the team at Dirmacs Global Services to ship production AI infrastructure in Rust:

A.R.E.S: Production Agent Framework

Repository: ares

A production agentic chatbot library built in Rust with:

Current Status: Actively dogfooding this in pilot projects. Design goals keep evolving.

Daedra: Web Research MCP Server

Repository: daedra

High-performance DuckDuckGo-powered web search MCP server written in Rust. Gives AI assistants web search and page fetching as tools.

Features:

Lancor: LLaMA.cpp Client Library

Repository: lancor

A Rust client library for llama.cpp's OpenAI-compatible API server.

Goal: Simple, straightforward integration with existing Rust AI workflows.

What Ties This Together

All these projects share a common thread: explicit control over system boundaries. In a world where frameworks come and go, infrastructure needs to:

  1. Outlive any single vendor - Protocol-level interop
  2. Show explicit costs - No magic, no hidden abstractions
  3. Stay fast - Zero-cost abstractions where possible
  4. Run everywhere - Build once, deploy anywhere

What's Next for 2026

Foundation's built. Now comes production hardening:

The goal stays the same: build infrastructure that lets developers construct robust AI systems without getting locked into any framework.


All projects are MIT licensed and open source. Pull requests and technical discussions welcome.


links :)