Chronicle

Written 2024-03-29 — Updated 2024-08-26

Provides a proxy which can be called instead of the normal URL, then passes a request to the LLM provider and returns the response. Chronicle can be embedded directly into a Rust application or can run as a standalone server.
Task List
- Up Next
  - Write documentation
  - Basic UI for visualizing runs, steps, etc.
- Soon
  - Option to skip logging model messages
  - Python client
    - This should be both a normal client and have the ability to wrap other common clients such as the OpenAI SDK. Needs to be compatible with tools like instructor, DSPy, etc.
  - Add fixture tests for Fireworks
  - Add fixture tests for Together
  - Add fixture tests for Ollama
  - Add fixture tests for Anyscale
  - Add fixture tests for DeepInfra
  - When waiting to retry, detect if the request has disconnected and cancel it
  - Add test for JSON response_format
- Later/Maybe
  - Function to validate configuration and return a list of errors
  - API Management
    - Endpoints to manage aliases
    - Endpoints to manage API keys
  - Watch and reload config file
  - Request Management
    - Allow passing through user-agent header
    - Simple Request caching
      - Cache responses based on provider/model/messages
      - Support in-memory, disk, database, Redis
    - Fetch secrets from AWS secrets store
    - When looping around providers with retries, omit providers who had an unrecoverable error.
    - Option for provider-level or model-level rate limiting
      - So that new requests coming in will automatically wait or fallback
    - Monitor error rates from providers and auto-switch
      - This can be built into the alias system perhaps
  - New Providers
    - New Provider: Cohere
      - Very different API than the others, it’s different enough from a feature perspective that I’m not sure it’s worth translating between request formats.
    - New Provider: OpenRouter
    - New Provider: Google Gemini
    - OctoAI
    - Lepton
  - Specific Provider Upgrades
    - Claude: support "is_error" flag in tool results
    - Claude: support images in tool results
  - Other Modalities
    - Support binary upload APIs like Deepgram as well
  - Logging
    - Send logged data to arbitrary HTTP endpoint
      - This should be done in a way that it can sent to something like Kafka, Elasticsearch, or Clickhouse using only the configuration, no custom code
    - Send logged data to S3
      - As JSON files? As Parquet?
    - Figure out what to do about large data
      - saving input and outputs is useful but can start taking up a lot of space. Ideally we can have something that places records in cloud storage, just need to figure out the formats and so on and if/how to make things queryable before they land in storage.
  - Analysis
    - visualize by arbitrary metadata
    - Ability to create database indexes on arbitrary metadata even in JSON fields
  - Price Tracking
    - Associate each provider and its calls with a pricing plan
    - Fetch and update prices for each provider
- Done
  - Support prompt caching — Aug 25th, 2024
  - Support tools with Ollama
  - New Provider: AWS Bedrock — Jul 8th, 2024
    - https://lib.rs/crates/aws-sdk-bedrockruntime
    - https://docs.aws.amazon.com/bedrock/latest/APIReference/API_Operations_Amazon_Bedrock_Runtime.html
  - Run and Step tracing — Jun 30th, 2024
  - Add Mistral support — Jun 19th, 2024
  - Streaming with Groq and Ollama — Jun 1st, 2024
  - Enhance test suite with real-world cases — May 31st, 2024
    - This uses streaming and non-streaming responses from various provider types, for both regular text and tool calls.
  - Streaming with Claude — May 31st, 2024
  - Streaming support — May 31st, 2024
  - "Simple" API can build for Postgres
    - Dropped the "full web app" version of the API. This will come back at some later time
  - Anthropic now supports "required" tool mode
  - Recover from Groq function calling failure
  - Endpoint for generic event logging — May 8th, 2024
    - Take the same metadata that we use for LLM calls, store them in a different table with just an event type and data json blob.
  - Support tool use fields — May 1st, 2024
  - Simpler API server that supports SQLite — Apr 30th, 2024
    - This will just use the built-in proxy tables, but is better for simpler use since it writes to SQLite
    - Autoload config files from the XDG directories
  - Javascript client — Apr 29th, 2024
    - This comes with a Chronicle-specific client, and can also redirect clients such as the OpenAI SDK using a custom fetch function.
  - Submit request metadata (org/user/workflow id) via HTTP headers — Apr 29th, 2024
  - API should have default to do everything without authorization
    - Do this by not only setting up a default user, but also adding it as the anonymous fallback
  - Testing — Apr 26th, 2024
  - For API mode, add data tables as Filigree models instead of using the built-in tables
  - When multiple providers are in use, keep retrying even on normally un-retryable errors
  - Allow configuring fallback provider and model on retry. — Apr 24th, 2024
    - This is part of the model alias configuration. Basically instead of a single provider and model there's an array of provider/model/apikey tuples
  - Support model/provider aliases — Apr 23rd, 2024
  - Support api keys — Apr 23rd, 2024
    - These can only be referenced by aliases
  - Save metadata into SQLite or Postgres — Apr 22nd, 2024
  - Load model and provider definitions from a configuration file — Apr 22nd, 2024
  - Store and load model and provider definitions from the database — Apr 22nd, 2024
  - Configurable user agent for HTTP client — Apr 21st, 2024
  - Link requests to internal users/orgs/projects — Apr 21st, 2024
  - Configurable timeout — Apr 20th, 2024
  - Common format chat messages and responses — Apr 19th, 2024
  - Automatic retry with rate-limit support — Apr 19th, 2023
  - Endpoint that proxies the call — Apr 19th, 2024
  - Send all relevant metadata as Otel traces — Apr 19th, 2024
Probably take some code from Promptbox and change that to use this as a library, since it already has some of the needed functionality
Maintain a price sheet with input/output token price per provider and model
- Each price sheet entry as an active flag
- When prices are updated for a model, add a new entry and mark it active
- In the future have a scraper or other mechanism of getting latest price data for each model
Support multiple methods of output:
- Record in a postgres table
- Output OpenTelemetry
Consider allowing metadata such as org and user ID can be sent in a cookie or in HTTP headers in addition to the body. Not sure how useful this is though.
For each entry, record:
- Org ID
- User ID
- Run ID (ID linking related prompt calls together)
- Workflow Name
- Workflow Step
- Arbitrary other metadata
- endpoint called
- provider and model used
- input text
- output text
- input token count
- output token count
- which price sheet row was used
- response time

Chronicle

Task List

Up Next

Soon

Later/Maybe

Done