Data Retrieval Application Manager
Written
- This is a planned project, not started yet.
- This will be a library for OLAP-style data analysis servers
- Handle loading databases, database aliases, and so on
- Expose HTTP endpoints
- Eventually, add clustering and sharding features too
- Features
- Add database from S3
- Each database is a directory in cloud storage
- Add database with only some metadata downloaded and the rest on S3?
- Delete database from disk
- Get stats on loaded databases
- Database aliases
- Get aliases
- Add alias
- Delete alias
- Query code can create endpoints to run queries against the database.
- Automatic backup and restore of configuration
- Add database from S3
- Features V2 - Clustering
- Sync aliases across all members of the cluster
- With one command, have all members of the graph download a database.
- Database sharding
- The large data which would be distributed between shards
- Also some global data which could be replicated to all database servers
- Server has the ability to query which shards are on which cluster members.
- Analysis code can run against all the shards on a server (or specific shards) and optionally run a reduce step across those shards.
- Requires a smart client that can request the analysis on all the appropriate server instances and shards. V3 will make this better.
- Ensure that the cluster as a whole has at least N copies of a database.
- Update this when cluster members join/leave.
- Track which cluster members have which databases
- Query for which cluster members contain a database.
- When cluster members join, bring them up to date, possibly by reclustering.
- Features V3 - Transparent clustering
- Queries can be sent to a single member, which will handle sending the appropriate queries to the other servers and reducing all the steps together.