Chronicle

Written — Updated
  • Provides a proxy which can be called instead of the normal URL, then passes a request to the LLM provider and returns the response. Chronicle can be embedded directly into a Rust application or can run as a standalone server.
  • Task List

    • Up Next

      • Write documentation
      • Basic UI for visualizing runs, steps, etc.
    • Soon

      • Option to skip logging model messages
      • Python client
        • This should be both a normal client and have the ability to wrap other common clients such as the OpenAI SDK. Needs to be compatible with tools like instructor, DSPy, etc.
      • Add fixture tests for Fireworks
      • Add fixture tests for Together
      • Add fixture tests for Ollama
      • Add fixture tests for Anyscale
      • Add fixture tests for DeepInfra
      • When waiting to retry, detect if the request has disconnected and cancel it
      • Add test for JSON response_format
    • Later/Maybe

      • Function to validate configuration and return a list of errors
      • API Management
        • Endpoints to manage aliases
        • Endpoints to manage API keys
      • Watch and reload config file
      • Request Management
        • Allow passing through user-agent header
        • Simple Request caching
          • Cache responses based on provider/model/messages
          • Support in-memory, disk, database, Redis
        • Fetch secrets from AWS secrets store
        • When looping around providers with retries, omit providers who had an unrecoverable error.
        • Option for provider-level or model-level rate limiting
          • So that new requests coming in will automatically wait or fallback
        • Monitor error rates from providers and auto-switch
          • This can be built into the alias system perhaps
      • New Providers
        • New Provider: Cohere
          • Very different API than the others, it’s different enough from a feature perspective that I’m not sure it’s worth translating between request formats.
        • New Provider: OpenRouter
        • New Provider: Google Gemini
        • OctoAI
        • Lepton
      • Specific Provider Upgrades
        • Claude: support "is_error" flag in tool results
        • Claude: support images in tool results
      • Other Modalities
        • Support binary upload APIs like Deepgram as well
      • Logging
        • Send logged data to arbitrary HTTP endpoint
          • This should be done in a way that it can sent to something like Kafka, Elasticsearch, or Clickhouse using only the configuration, no custom code
        • Send logged data to S3
          • As JSON files? As Parquet?
        • Figure out what to do about large data
          • saving input and outputs is useful but can start taking up a lot of space. Ideally we can have something that places records in cloud storage, just need to figure out the formats and so on and if/how to make things queryable before they land in storage.
      • Analysis
        • visualize by arbitrary metadata
        • Ability to create database indexes on arbitrary metadata even in JSON fields
      • Price Tracking
        • Associate each provider and its calls with a pricing plan
        • Fetch and update prices for each provider
    • Done

      • Support prompt caching — Aug 25th, 2024
      • Support tools with Ollama
      • New Provider: AWS Bedrock — Jul 8th, 2024
      • Run and Step tracing — Jun 30th, 2024
      • Add Mistral support — Jun 19th, 2024
      • Streaming with Groq and Ollama — Jun 1st, 2024
      • Enhance test suite with real-world cases — May 31st, 2024
        • This uses streaming and non-streaming responses from various provider types, for both regular text and tool calls.
      • Streaming with Claude — May 31st, 2024
      • Streaming support — May 31st, 2024
      • "Simple" API can build for Postgres
        • Dropped the "full web app" version of the API. This will come back at some later time
      • Anthropic now supports "required" tool mode
      • Recover from Groq function calling failure
      • Endpoint for generic event logging — May 8th, 2024
        • Take the same metadata that we use for LLM calls, store them in a different table with just an event type and data json blob.
      • Support tool use fields — May 1st, 2024
      • Simpler API server that supports SQLite — Apr 30th, 2024
        • This will just use the built-in proxy tables, but is better for simpler use since it writes to SQLite
        • Autoload config files from the XDG directories
      • Javascript client — Apr 29th, 2024
        • This comes with a Chronicle-specific client, and can also redirect clients such as the OpenAI SDK using a custom fetch function.
      • Submit request metadata (org/user/workflow id) via HTTP headers — Apr 29th, 2024
      • API should have default to do everything without authorization
        • Do this by not only setting up a default user, but also adding it as the anonymous fallback
      • Testing — Apr 26th, 2024
      • For API mode, add data tables as Filigree models instead of using the built-in tables
      • When multiple providers are in use, keep retrying even on normally un-retryable errors
      • Allow configuring fallback provider and model on retry. — Apr 24th, 2024
        • This is part of the model alias configuration. Basically instead of a single provider and model there's an array of provider/model/apikey tuples
      • Support model/provider aliases — Apr 23rd, 2024
      • Support api keys — Apr 23rd, 2024
        • These can only be referenced by aliases
      • Save metadata into SQLite or Postgres — Apr 22nd, 2024
      • Load model and provider definitions from a configuration file — Apr 22nd, 2024
      • Store and load model and provider definitions from the database — Apr 22nd, 2024
      • Configurable user agent for HTTP client — Apr 21st, 2024
      • Link requests to internal users/orgs/projects — Apr 21st, 2024
      • Configurable timeout — Apr 20th, 2024
      • Common format chat messages and responses — Apr 19th, 2024
      • Automatic retry with rate-limit support — Apr 19th, 2023
      • Endpoint that proxies the call — Apr 19th, 2024
      • Send all relevant metadata as Otel traces — Apr 19th, 2024
  • Probably take some code from Promptbox and change that to use this as a library, since it already has some of the needed functionality
  • Maintain a price sheet with input/output token price per provider and model
    • Each price sheet entry as an active flag
    • When prices are updated for a model, add a new entry and mark it active
    • In the future have a scraper or other mechanism of getting latest price data for each model
  • Support multiple methods of output:
    • Record in a postgres table
    • Output OpenTelemetry
  • Consider allowing metadata such as org and user ID can be sent in a cookie or in HTTP headers in addition to the body. Not sure how useful this is though.
  • For each entry, record:
    • Org ID
    • User ID
    • Run ID (ID linking related prompt calls together)
    • Workflow Name
    • Workflow Step
    • Arbitrary other metadata
    • endpoint called
    • provider and model used
    • input text
    • output text
    • input token count
    • output token count
    • which price sheet row was used
    • response time

Thanks for reading! If you have any questions or comments, please send me a note on Twitter.