LLM Caching
· 4 min read
I never write about caches and caching, so I thought I'd cover some basics on LLM caching. Covers inference and prompt caching.
I never write about caches and caching, so I thought I'd cover some basics on LLM caching. Covers inference and prompt caching.
Some notes on false sharing and cache line padding.
Some notes on the basics of CPU caches - covers locality, write policies, hierarchy and inclusion policies.