Annotations and LLMs
Notes and ideas on annotations and LLMs. Using annotations in conjunction with LLM dev tooling as well as generating annotation processors with LLMs
Notes and ideas on annotations and LLMs. Using annotations in conjunction with LLM dev tooling as well as generating annotation processors with LLMs
I never write about caches and caching, so I thought I'd cover some basics on LLM caching. Covers inference and prompt caching.
Notes on the TensorZero LLM gateway. Covers templates, schemas, feedback, retries, evals, DICL, MIPRO, model-prompt-inference optimization.
Some basics on Ollama. Includes some details on quantization, vector DBs, model storage, model format and modelfiles.
Comparisons of the OpenAI service offering with that of Anthropic. Includes context window, rate limits and model optimization.
My notes on the design of Anthropic's APIs and some general design considerations for provider based APIs and SDKs. Covers rate limiting, service tiers, SSE flow and some of the REST API endpoints.
Notes on the evolution of Java Unsafe and off heap memory access - touches on Unsafe, FFM API and the Agrona DirectBuffer.
Some notes on false sharing and cache line padding.
Some notes on the basics of CPU caches - covers locality, write policies, hierarchy and inclusion policies.
Notes on "CAPO: Cost Aware Prompt Optimization" (June 2025) from the Munich Center for Machine Learning.