Data Retrieval Application Manager

Written 2023-04-24

This is a planned project, not started yet.
This will be a library for OLAP-style data analysis servers
- Handle loading databases, database aliases, and so on
- Expose HTTP endpoints
- Eventually, add clustering and sharding features too
Features
- Add database from S3
  - Each database is a directory in cloud storage
- Add database with only some metadata downloaded and the rest on S3?
- Delete database from disk
- Get stats on loaded databases
- Database aliases
  - Get aliases
  - Add alias
  - Delete alias
- Query code can create endpoints to run queries against the database.
- Automatic backup and restore of configuration
Features V2 - Clustering
- Sync aliases across all members of the cluster
- With one command, have all members of the graph download a database.
- Database sharding
  - The large data which would be distributed between shards
  - Also some global data which could be replicated to all database servers
  - Server has the ability to query which shards are on which cluster members.
  - Analysis code can run against all the shards on a server (or specific shards) and optionally run a reduce step across those shards.
  - Requires a smart client that can request the analysis on all the appropriate server instances and shards. V3 will make this better.
- Ensure that the cluster as a whole has at least N copies of a database.
  - Update this when cluster members join/leave.
- Track which cluster members have which databases
- Query for which cluster members contain a database.
- When cluster members join, bring them up to date, possibly by reclustering.
Features V3 - Transparent clustering
- Queries can be sent to a single member, which will handle sending the appropriate queries to the other servers and reducing all the steps together.