17 posts tagged with "LLMs" | Stay Tuned

Annotations and LLMs

July 31, 2025 · 6 min read

High Performance Developer

Notes and ideas on annotations and LLMs. Using annotations in conjunction with LLM dev tooling as well as generating annotation processors with LLMs

Annotations

LLM Caching

July 29, 2025 · 4 min read

Sanjeev Sarda

High Performance Developer

I never write about caches and caching, so I thought I'd cover some basics on LLM caching. Covers inference and prompt caching.

LLMCaching

LLM Gateways - TensorZero

July 29, 2025 · 9 min read

Sanjeev Sarda

High Performance Developer

Notes on the TensorZero LLM gateway. Covers templates, schemas, feedback, retries, evals, DICL, MIPRO, model-prompt-inference optimization.

TensorZero

Ollama Basics

July 28, 2025 · 5 min read

Sanjeev Sarda

High Performance Developer

Some basics on Ollama. Includes some details on quantization, vector DBs, model storage, model format and modelfiles.

ollama

OpenAI APIs

July 27, 2025 · 5 min read

Sanjeev Sarda

High Performance Developer

Comparisons of the OpenAI service offering with that of Anthropic. Includes context window, rate limits and model optimization.

openAI

Anthropic APIs

July 27, 2025 · 9 min read

Sanjeev Sarda

High Performance Developer

My notes on the design of Anthropic's APIs and some general design considerations for provider based APIs and SDKs. Covers rate limiting, service tiers, SSE flow and some of the REST API endpoints.

anthropicAPI

Cost Aware Prompt Optimization

July 22, 2025 · 4 min read

Sanjeev Sarda

High Performance Developer

Notes on "CAPO: Cost Aware Prompt Optimization" (June 2025) from the Munich Center for Machine Learning.

capo

LLM Guardrails and Safety Patterns

July 19, 2025 · 2 min read

Sanjeev Sarda

High Performance Developer

Notes on AI/LLM guardrails and safety patterns from a book on "Agentic Design Patterns" by one of Google's Distinguished Engineers, Antonio Gulli.

walking