Chronicle

Written — Updated
  • Provides a proxy which can be called instead of the normal URL, then passes a request to the LLM provider and returns the response. Chronicle can be embedded directly into a Rust application or can run as a standalone server.
  • Task List

    • Up Next

      • Support tool use fields
    • Soon

      • Javascript client
        • This should be both a normal client and have the ability to wrap other common clients such as the OpenAI SDK
      • Python client
        • This should be both a normal client and have the ability to wrap other common clients such as the OpenAI SDK
      • Easy way to set up clients from JS and Python to call the proxy with appropriate headers
        • This is most useful in concert with other libraries that make LLM calls, like dspy and instructor
      • Extra providers
        • Fireworks
        • Together
    • Later/Maybe

      • Send logged data to arbitrary HTTP endpoint
        • This should be done in a way that it can sent to something like Elasticsearch without custom code
      • Global rate limiting
      • When looping around providers with retries, omit providers who had an unrecoverable error.
      • Analysis
        • visualize by arbitrary metadata
        • Ability to create database indexes on arbitrary metadata even in JSON fields
      • Price Tracking
        • Associate each provider and its calls with a pricing plan
        • Fetch and update prices for each provider
      • Support binary upload APIs like Deepgram as well
      • Support streaming responses
      • Submit request metadata (org/user/workflow id) via HTTP headers or cookies?
    • Done

      • API should have default to do everything without authorization
        • Do this by not only setting up a default user, but also adding it as the anonymous fallback
      • Testing — Apr 26th, 2024
      • For API mode, add data tables as Filigree models instead of using the built-in tables
      • When multiple providers are in use, keep retrying even on normally un-retryable errors
      • Allow configuring fallback provider and model on retry. — Apr 24th, 2024
        • This is part of the model alias configuration. Basically instead of a single provider and model there's an array of provider/model/apikey tuples
      • Support model/provider aliases — Apr 23rd, 2024
      • Support api keys — Apr 23rd, 2024
        • These can only be referenced by aliases
      • Save metadata into SQLite or Postgres — Apr 22nd, 2024
      • Load model and provider definitions from a configuration file — Apr 22nd, 2024
      • Store and load model and provider definitions from the database — Apr 22nd, 2024
      • Configurable user agent for HTTP client — Apr 21st, 2024
      • Link requests to internal users/orgs/projects — Apr 21st, 2024
      • Configurable timeout — Apr 20th, 2024
      • Common format chat messages and responses — Apr 19th, 2024
      • Automatic retry with rate-limit support — Apr 19th, 2023
      • Endpoint that proxies the call — Apr 19th, 2024
      • Send all relevant metadata as Otel traces — Apr 19th, 2024
  • Probably take some code from Promptbox and change that to use this as a library, since it already has some of the needed functionality
  • Maintain a price sheet with input/output token price per provider and model
    • Each price sheet entry as an active flag
    • When prices are updated for a model, add a new entry and mark it active
    • In the future have a scraper or other mechanism of getting latest price data for each model
  • Support multiple methods of output:
    • Record in a postgres table
    • Output OpenTelemetry
  • Consider allowing metadata such as org and user ID can be sent in a cookie or in HTTP headers in addition to the body. Not sure how useful this is though.
  • For each entry, record:
    • Org ID
    • User ID
    • Run ID (ID linking related prompt calls together)
    • Workflow Name
    • Workflow Step
    • Arbitrary other metadata
    • endpoint called
    • provider and model used
    • input text
    • output text
    • input token count
    • output token count
    • which price sheet row was used
    • response time

Thanks for reading! If you have any questions or comments, please send me a note on Twitter.