Chronicle
Written
— Updated
- Provides a proxy which can be called instead of the normal URL, then passes a request to the LLM provider and returns the response. Chronicle can be embedded directly into a Rust application or can run as a standalone server.
Task List
Up Next
- Write documentation
- Basic UI for visualizing runs, steps, etc.
Soon
- Option to skip logging model messages
- Python client
- This should be both a normal client and have the ability to wrap other common clients such as the OpenAI SDK. Needs to be compatible with tools like instructor, DSPy, etc.
- Add fixture tests for Fireworks
- Add fixture tests for Together
- Add fixture tests for Ollama
- Add fixture tests for Anyscale
- Add fixture tests for DeepInfra
- When waiting to retry, detect if the request has disconnected and cancel it
- Add test for JSON
response_format
Later/Maybe
- Function to validate configuration and return a list of errors
- API Management
- Endpoints to manage aliases
- Endpoints to manage API keys
- Watch and reload config file
- Request Management
- Allow passing through user-agent header
- Simple Request caching
- Cache responses based on provider/model/messages
- Support in-memory, disk, database, Redis
- Fetch secrets from AWS secrets store
- When looping around providers with retries, omit providers who had an unrecoverable error.
- Option for provider-level or model-level rate limiting
- So that new requests coming in will automatically wait or fallback
- Monitor error rates from providers and auto-switch
- This can be built into the alias system perhaps
- New Providers
- New Provider: Cohere
- Very different API than the others, it’s different enough from a feature perspective that I’m not sure it’s worth translating between request formats.
- New Provider: OpenRouter
- New Provider: Google Gemini
- OctoAI
- Lepton
- New Provider: Cohere
- Specific Provider Upgrades
- Claude: support "is_error" flag in tool results
- Claude: support images in tool results
- Other Modalities
- Support binary upload APIs like Deepgram as well
- Logging
- Send logged data to arbitrary HTTP endpoint
- This should be done in a way that it can sent to something like Kafka, Elasticsearch, or Clickhouse using only the configuration, no custom code
- Send logged data to S3
- As JSON files? As Parquet?
- Figure out what to do about large data
- saving input and outputs is useful but can start taking up a lot of space. Ideally we can have something that places records in cloud storage, just need to figure out the formats and so on and if/how to make things queryable before they land in storage.
- Send logged data to arbitrary HTTP endpoint
- Analysis
- visualize by arbitrary metadata
- Ability to create database indexes on arbitrary metadata even in JSON fields
- Price Tracking
- Associate each provider and its calls with a pricing plan
- Fetch and update prices for each provider
Done
- Support prompt caching — Aug 25th, 2024
- Support tools with Ollama
- New Provider: AWS Bedrock — Jul 8th, 2024
- Run and Step tracing — Jun 30th, 2024
- Add Mistral support — Jun 19th, 2024
- Streaming with Groq and Ollama — Jun 1st, 2024
- Enhance test suite with real-world cases — May 31st, 2024
- This uses streaming and non-streaming responses from various provider types, for both regular text and tool calls.
- Streaming with Claude — May 31st, 2024
- Streaming support — May 31st, 2024
- "Simple" API can build for Postgres
- Dropped the "full web app" version of the API. This will come back at some later time
- Anthropic now supports "required" tool mode
- Recover from Groq function calling failure
- Endpoint for generic event logging — May 8th, 2024
- Take the same metadata that we use for LLM calls, store them in a different table with just an event type and data json blob.
- Support tool use fields — May 1st, 2024
- Simpler API server that supports SQLite — Apr 30th, 2024
- This will just use the built-in proxy tables, but is better for simpler use since it writes to SQLite
- Autoload config files from the XDG directories
- Javascript client — Apr 29th, 2024
- This comes with a Chronicle-specific client, and can also redirect clients such as the OpenAI SDK using a custom fetch function.
- Submit request metadata (org/user/workflow id) via HTTP headers — Apr 29th, 2024
- API should have default to do everything without authorization
- Do this by not only setting up a default user, but also adding it as the anonymous fallback
- Testing — Apr 26th, 2024
- For API mode, add data tables as Filigree models instead of using the built-in tables
- When multiple providers are in use, keep retrying even on normally un-retryable errors
- Allow configuring fallback provider and model on retry. — Apr 24th, 2024
- This is part of the model alias configuration. Basically instead of a single provider and model there's an array of provider/model/apikey tuples
- Support model/provider aliases — Apr 23rd, 2024
- Support api keys — Apr 23rd, 2024
- These can only be referenced by aliases
- Save metadata into SQLite or Postgres — Apr 22nd, 2024
- Load model and provider definitions from a configuration file — Apr 22nd, 2024
- Store and load model and provider definitions from the database — Apr 22nd, 2024
- Configurable user agent for HTTP client — Apr 21st, 2024
- Link requests to internal users/orgs/projects — Apr 21st, 2024
- Configurable timeout — Apr 20th, 2024
- Common format chat messages and responses — Apr 19th, 2024
- Automatic retry with rate-limit support — Apr 19th, 2023
- Endpoint that proxies the call — Apr 19th, 2024
- Send all relevant metadata as Otel traces — Apr 19th, 2024
- Probably take some code from Promptbox and change that to use this as a library, since it already has some of the needed functionality
- Maintain a price sheet with input/output token price per provider and model
- Each price sheet entry as an active flag
- When prices are updated for a model, add a new entry and mark it active
- In the future have a scraper or other mechanism of getting latest price data for each model
- Support multiple methods of output:
- Record in a postgres table
- Output OpenTelemetry
- Consider allowing metadata such as org and user ID can be sent in a cookie or in HTTP headers in addition to the body. Not sure how useful this is though.
- For each entry, record:
- Org ID
- User ID
- Run ID (ID linking related prompt calls together)
- Workflow Name
- Workflow Step
- Arbitrary other metadata
- endpoint called
- provider and model used
- input text
- output text
- input token count
- output token count
- which price sheet row was used
- response time