If you’re writing async Python, you’ve probably blocked the event loop without knowing it. Your code runs. Your tests pass. But in production, p90 latencies spike and timeouts appear seemingly at random. The culprit? Synchronous code hiding inside your async def functions. Python’s asyncio is cooperative. When you await something, you’re yielding control back to the event loop so other tasks can run. But if you call synchronous code, even accidentally, the entire event loop freezes. Every other coroutine waits. Every concurrent request hangs. Every other user gets blocked. ...
The OpenAI Compatibility Paradox
The promise of a standardized interface for LLMs via OpenAI-compatible endpoints is compelling. In theory, it allows for a plug-and-play architecture where switching models is as trivial as changing a base_url. In practice, this compatibility is often an illusion. I’ve spent the past year building a multi-provider LLM backend, and the pattern is always the same: things work for basic text generation, then break the moment you need production-critical features. ...
Streaming Partial JSON from LLMs in Go
The Problem LLMs stream JSON token by token. Your structured output arrives as: {"project": {"name": "Mo {"project": {"name": "Mobile App", "status": "in_prog {"project": {"name": "Mobile App", "status": "in_progress"}, "tasks": [{"title": "UI Redes ... Standard encoding/json fails on every chunk except the last: json.Unmarshal([]byte(`{"project": {"name": "Mo`), &result) // error: unexpected end of JSON input This was recently highlighted by swyx as a #1 or #2 performance issue in AI applications. You’re forced to wait for the complete response before showing anything to users - negating the entire point of streaming with json mode or structured output. ...
Jina 🤝 Linkerd
Let’s discuss today about a way to debug your Executors & Flows using Linkerd service mesh. What is a Service Mesh? From Linkerd docs A service mesh is a tool for adding observability, security, and reliability features to “cloud native” applications by transparently inserting this functionality at the platform layer rather than the application layer. Few of the commonly used service meshes are Linkerd, Istio, Consul etc. Why Linkerd? Significantly lighter than competitors. Basic request tracing with no additional config. Telemetry & monitoring exposing Golden metrics. The simplicity of its usage is mind-blowing. Just annotate the K8S Deployment, and you’re done. How does Linkerd work? While this majorly underestimates the magic Linkerd brings, here are the major components. Feel free to go through the architecture in official docs. ...
Jina ❤️ Serverless
What is Jina? From the Docs Jina is a framework that empowers anyone to build cross-modal and multi-modal applications on the cloud. What is serverless? From Wikipedia Serverless computing is a cloud computing execution model in which the cloud provider allocates machine resources on demand, taking care of the servers on behalf of their customers. Also, “Serverless” is a misnomer in the sense that servers are still used by cloud service providers to execute code for developers. However, developers of serverless applications are not concerned with capacity planning, configuration, management, maintenance, fault tolerance, or scaling of containers, VMs, or physical servers. ...