Data Retrieval Application Manager

Written
  • This is a planned project, not started yet.
  • This will be a library for OLAP-style data analysis servers
    • Handle loading databases, database aliases, and so on
    • Expose HTTP endpoints
    • Eventually, add clustering and sharding features too
  • Features
    • Add database from S3
      • Each database is a directory in cloud storage
    • Add database with only some metadata downloaded and the rest on S3?
    • Delete database from disk
    • Get stats on loaded databases
    • Database aliases
      • Get aliases
      • Add alias
      • Delete alias
    • Query code can create endpoints to run queries against the database.
    • Automatic backup and restore of configuration
  • Features V2 - Clustering
    • Sync aliases across all members of the cluster
    • With one command, have all members of the graph download a database.
    • Database sharding
      • The large data which would be distributed between shards
      • Also some global data which could be replicated to all database servers
      • Server has the ability to query which shards are on which cluster members.
      • Analysis code can run against all the shards on a server (or specific shards) and optionally run a reduce step across those shards.
      • Requires a smart client that can request the analysis on all the appropriate server instances and shards. V3 will make this better.
    • Ensure that the cluster as a whole has at least N copies of a database.
      • Update this when cluster members join/leave.
    • Track which cluster members have which databases
    • Query for which cluster members contain a database.
    • When cluster members join, bring them up to date, possibly by reclustering.
  • Features V3 - Transparent clustering
    • Queries can be sent to a single member, which will handle sending the appropriate queries to the other servers and reducing all the steps together.