Stay Tuned - Sanjeev Sarda | Stay Tuned

Annotations and LLMs

July 31, 2025 · 6 min read

High Performance Developer

Notes and ideas on annotations and LLMs. Using annotations in conjunction with LLM dev tooling as well as generating annotation processors with LLMs

Annotations

LLM Caching

July 29, 2025 · 4 min read

Sanjeev Sarda

High Performance Developer

I never write about caches and caching, so I thought I'd cover some basics on LLM caching. Covers inference and prompt caching.

LLMCaching

LLM Gateways - TensorZero

July 29, 2025 · 9 min read

Sanjeev Sarda

High Performance Developer

Notes on the TensorZero LLM gateway. Covers templates, schemas, feedback, retries, evals, DICL, MIPRO, model-prompt-inference optimization.

TensorZero

Ollama Basics

July 28, 2025 · 5 min read

Sanjeev Sarda

High Performance Developer

Some basics on Ollama. Includes some details on quantization, vector DBs, model storage, model format and modelfiles.

ollama

OpenAI APIs

July 27, 2025 · 5 min read

Sanjeev Sarda

High Performance Developer

Comparisons of the OpenAI service offering with that of Anthropic. Includes context window, rate limits and model optimization.

openAI

Anthropic APIs

July 27, 2025 · 9 min read

Sanjeev Sarda

High Performance Developer

My notes on the design of Anthropic's APIs and some general design considerations for provider based APIs and SDKs. Covers rate limiting, service tiers, SSE flow and some of the REST API endpoints.

anthropicAPI

Java Unsafe and Native Memory Access

July 25, 2025 · 2 min read

Sanjeev Sarda

High Performance Developer

Notes on the evolution of Java Unsafe and off heap memory access - touches on Unsafe, FFM API and the Agrona DirectBuffer.

capo

False Sharing and Padding

July 25, 2025 · 5 min read

Sanjeev Sarda

High Performance Developer

Some notes on false sharing and cache line padding.

falseSharing

Basics of CPU Caches

July 24, 2025 · 10 min read

Sanjeev Sarda

High Performance Developer

Some notes on the basics of CPU caches - covers locality, write policies, hierarchy and inclusion policies.

capo

Cost Aware Prompt Optimization

July 22, 2025 · 4 min read

Sanjeev Sarda

High Performance Developer

Notes on "CAPO: Cost Aware Prompt Optimization" (June 2025) from the Munich Center for Machine Learning.

capo