Data Retrieval Application Manager

Written
  • This is a planned project, not started yet.
  • This will be a library for OLAP-style data analysis servers
    • Handle loading databases, database aliases, and so on
    • Expose HTTP endpoints
    • Eventually, add clustering and sharding features too
  • Features
    • Add database from S3
      • Each database is a directory in cloud storage
    • Add database with only some metadata downloaded and the rest on S3?
    • Delete database from disk
    • Get stats on loaded databases
    • Database aliases
      • Get aliases
      • Add alias
      • Delete alias
    • Query code can create endpoints to run queries against the database.
    • Automatic backup and restore of configuration
  • Features V2 - Clustering
    • Sync aliases across all members of the cluster
    • With one command, have all members of the graph download a database.
    • Database sharding
      • The large data which would be distributed between shards
      • Also some global data which could be replicated to all database servers
      • Server has the ability to query which shards are on which cluster members.
      • Analysis code can run against all the shards on a server (or specific shards) and optionally run a reduce step across those shards.
      • Requires a smart client that can request the analysis on all the appropriate server instances and shards. V3 will make this better.
    • Ensure that the cluster as a whole has at least N copies of a database.
      • Update this when cluster members join/leave.
    • Track which cluster members have which databases
    • Query for which cluster members contain a database.
    • When cluster members join, bring them up to date, possibly by reclustering.
  • Features V3 - Transparent clustering
    • Queries can be sent to a single member, which will handle sending the appropriate queries to the other servers and reducing all the steps together.

Thanks for reading! If you have any questions or comments, please send me a note on Twitter.