flux-capacitor

Without using any costly database, this solution complements Amazon Kinesis with the following capabilities:

Long-term archival of records.
Making both current and archived records accessible to SQL-based exploration and analysis.
Replay of archived records:
- which supports key-value compaction so only the last record for a key is replayed.
- which supports bounded replay (one needn’t replay the full archive).
- which supports filtered replay (only replay records matching some criteria).
- which supports annotating records as they are replayed in order to alter consumer behavior, such as to force overwrite.
- which, with consumer cooperation, provides some definition of eventual consistency with respect to records that arrive on a stream concurrently with a replay operation, without requiring this solution to mediate the flow of the stream.

Project Status

In active development for use at CommerceHub.
Capable of using SQL to search a stream archive and a live stream.
Stream archival capability to come next.
Message replay capability to follow.

Assumptions and Applicability Constraints

This is mostly an integration project, light on actual software. The AWS CLI will be used, and is assumed to be installed and configured.
This will probably be more of an ephemeral tool than a service, but the archival portion will have to run at least once every 24 hours (the Kinesis record expiration time) in order to not miss any records.
The initial implementation might only support JSON records, but further contributions should be able to remove that as a requirement.
The initial implementation might only support a single Kinesis stream, but further contributions should be able to remove that as a requirement.
Data and cluster security is currently left to the user.

Technical Goals

Configure and launch a process (TBD, there are many options) to archive blocks of Amazon Kinesis records to Amazon S3 before they expire, possibly via Amazon EMRFS.
Launch an Amazon EMR cluster including the Hive application.
Deploy Apache Drill to the cluster.
Configure Apache Drill to read archived records from Amazon S3, possibly via EMRFS.
Configure Amazon EMR Hive to expose an Amazon Kinesis stream as an externally-stored table.
Configure the Amazon EMR Hive Metastore for consumption by Apache Drill.
Configure Apache Drill to read from Amazon Kinesis via Amazon EMR Hive.
To the greatest extent possible without storing another copy of the data, provide a unified and de-duplicated view spanning current and archived Amazon Kinesis records.
(TBD) Provide a basic UI or API to initiate search and replay operations, and monitor progress.

Prerequisites

Bash shell installed at /bin/bash
AWS CLI installed and configured with your credentials and default region (you can run aws configure to do so interactively)

Getting Started

Create a config file. Either:
Make a copy of conf/defaults.conf and edit the copy, or
Create a new file that will contain only overrides, and import the defaults by following the directions at the top of conf/defaults.conf
Run ./upload-resources <config file>
Run ./launch-cluster <config file> and note the cluster-id that is printed to stdout; future commands will require it.
Run ./wait-until-ready <cluster-id>
Run ./forward-local-ports <cluster-id> <private-key-file>
As with any new SSH host, you will have to accept an authenticity warning the first time you connect to a cluster.
Once it's forwarding, this process will not exit, nor print any output.
Run ./terminate-clusters <cluster-id> when done to avoid recurring charges.
For additional advanced operations, explore the emr subcommand of the AWS CLI.

Name	Name	Last commit message	Last commit date
Latest commit aheiss1 Pass arguments to ssh-to-master script as they were passed into forwa… Jul 15, 2015 c87ae83 · Jul 15, 2015 History 26 Commits
conf	conf	Update AMI version and tweak script for compatibility	Jul 14, 2015
resources/s3/emr	resources/s3/emr	Update AMI version and tweak script for compatibility	Jul 14, 2015
.gitignore	.gitignore	Scripted launch of cluster with no apps, bootstrap actions, or steps	Jul 8, 2015
LICENSE	LICENSE	Initial commit	Jul 6, 2015
README.md	README.md	Update README.md	Jul 13, 2015
forward-local-ports	forward-local-ports	Pass arguments to ssh-to-master script as they were passed into forwa…	Jul 15, 2015
launch-cluster	launch-cluster	Resolve race conditions restarting Drill after classpath modification…	Jul 13, 2015
ssh-to-master	ssh-to-master	All scripts print their purpose when run with no arguments	Jul 10, 2015
terminate-clusters	terminate-clusters	Add terminate-clusters command	Jul 13, 2015
upload-resources	upload-resources	All scripts print their purpose when run with no arguments	Jul 10, 2015
wait-until-ready	wait-until-ready	tweak doc	Jul 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

flux-capacitor

Project Status

Assumptions and Applicability Constraints

Technical Goals

Prerequisites

Getting Started

About

Releases

Packages

Languages

License

aheiss1/flux-capacitor

Folders and files

Latest commit

History

Repository files navigation

flux-capacitor

Project Status

Assumptions and Applicability Constraints

Technical Goals

Prerequisites

Getting Started

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages