Prompt Caching Playground
Optimize your prompt structure and observe cache hits. Cache engages at
≥ 1024 tokens
, with cacheable chunks in
128-token
increments (e.g., 1024, 1152, 1280…).
Setup
Chat Completions API
Responses API (supports prompt_cache_key)
gpt-4o-mini
gpt-4o
gpt-4.1-mini
Load demo scenario…
A) Static instructions + Doc A + Q1
A2) Same instructions + Doc A + Q2
B) Same instructions + Doc B + Q1
Load
Repeat instructions
Repeat doc
Prompt Parts
Initial Instructions (static; put these first for best cache hit rate)
Document Context (semi-static; keep above user input)
User Question (dynamic; place last)
Send
Send Variant (new Q)
Clear Log
Copy Log
Result
cache status
latency
timestamp
Prompt Tokens
–
Cached Tokens
–
Cache Hit %
–
Completion Tokens
–
Total Tokens
–
Model
–
Endpoint
–
prompt_cache_key
–
Response will appear here…
Raw JSON
Most recent request JSON
—
Most recent response JSON
—
Run Log