-
Notifications
You must be signed in to change notification settings - Fork 4
Home
Kraken is the collector for the Mashable Data Pipeline.
.-' `'.
/ \\
| ;
| | ___.--,
_.._ |0) ~ (0) | _.---'`__.-( (_.
__.--'`_.. '.__.\\ '--. \\_.-' ,.--'` `""`
( ,.--'` ',__ /./; ;, '.__.'` __
_`) ) .---.__.' / | |\\ \\__..--"" """--.,_
`---' .'.''-._.-'`_./ /\\ '. \\ _.-~~~````~~~-._`-.__.'
------------.' _.-'-| |-\\ \\ '.-----------------------
/ .' \\ \\ '. '-._)
/ / \\ \\ `=.__`~-.
/ / `) ) / / `"".`\\
, _.-'.' / / ( ( / /
`--~` .-'.' '.'. | (
( (` ) ) '-;
'-; (-'
Kraken has the following dependencies
- Ruby 2.1+
- Redis 3.0+
- Kafka (Confluent Platform - not needed if running in standalone mode)
If Ruby does not exist on your current system, you can install ruby with the commands listed below. To verify your Ruby installation, run ruby -v
in your terminal.
Debian GNU/Linux and Ubuntu use the apt package manager. You can use it like this:
sudo apt-get install ruby-full
CentOS, Fedora, and RHEL use the yum package manager. You can use it like this:
sudo yum install ruby
On OS X El Capitan, Yosemite, Mavericks, and macOS Sierra, Ruby 2.0 is included. OS X Mountain Lion, Lion, and Snow Leopard ship with Ruby 1.8.7. You can skip this step and move to Install RVM to install a version of ruby with the ruby version manager.
Many people on OS X use Homebrew as a package manager. To install homebrew, run the following command:
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
It is really easy to get a newer version of Ruby using Homebrew:
brew install ruby
RVM stands for Ruby Version Manager. It allows you to install multiple versions of ruby on your machine. Run through the installation steps found on the RVM website.
You can then run the following with any version of ruby.
rvm install 2.1.1
To switch to utilizing ruby 2.1.1, run the rvm use 2.1.1
, replacing 2.1.1 with any version you installed.
If you have a Mac, you can skip this step.
RubyGems is Ruby's package manager, it allows us to install libraries to our projects.
sudo apt-get install rubygems
sudo yum install rubygems
To test your installation, run gem
in your terminal.
Bundler provides a consistent environment for Ruby projects by tracking and installing the exact gems and versions that are needed. It resolves dependencies of our libraries and current project.
Install bundler utilizing RubyGems
gem install bundler
Rake is a Make-like program implemented in Ruby. Tasks and dependencies are specified in standard Ruby syntax. It allows us to automate some tasks -- like bootstrapping Kraken automatically by simply running rake
in the terminal.
gem install rake
Redis is an in-memory data structure store, used as a database, cache and message broker. Redis is utilized to schedule our Scripts, similar to the Cron jobs execute on MashStat.
The best way to install redis, is to compile it from its source:
wget http://download.redis.io/redis-stable.tar.gz
tar xvzf redis-stable.tar.gz
cd redis-stable
make
sudo make install
This will compile redis, and install it to your /usr/local/bin/
directory.
A more detailed installation document can be found on the redis website.
Kraken is this repository. First, clone this repository with git
.
git clone [email protected]:mashable/kraken.git
Open up a terminal session and navigate to the kraken directory.
cd kraken
Ensure you installed Ruby, RVM, Bundler, RubyGems, Rake, and Redis. before continuing.
To "bootstrap" Kraken, install and run the test suite, run the rake
command in your terminal. You'll see similar output like below:
gem install bundler
Successfully installed bundler-1.15.4
Parsing documentation for bundler-1.15.4
Done installing documentation for bundler after 4 seconds
1 gem installed
bundle
....
Using avromatic 0.27.0
Using twitter 6.1.0
Bundle complete! 38 Gemfile dependencies, 111 gems now installed.
Use `bundle info [gemname]` to see where a bundled gem is installed.
STANDALONE=1 bin/rspec
/Users/banderson/.rvm/gems/ruby-2.4.1/gems/avro-1.8.2/lib/avro/schema.rb:350: warning: constant ::Fixnum is deprecated
/Users/banderson/.rvm/gems/ruby-2.4.1/gems/avro-1.8.2/lib/avro/schema.rb:350: warning: constant ::Fixnum is deprecated
....................
Finished in 1.28 seconds (files took 2.22 seconds to load)
20 examples, 0 failures
If all tests pass with Zero failures, Kraken is installed with all of its standalone dependencies.
Kraken comes packaged with a command-line interface called Soles. Soles allows us to run a variety of commands pertaining to the status of Kraken jobs, running Jobs outside of their scheduled time with a console, configuring the Kafka cluster, and running Kraken in daemon mode.
Running Kraken in Standalone Mode is for development and testing purposes. Utilizing the Kafka Cluster setup will be an exact replica of what is running on AWS. The following commands are what will generally be utilized in development mode.
Note: you must start up a redis server prior to running any commands. To start up redis, run redis-server
in your terminal.
For those familiar with Ruby, you will write unit tests for your Workers (import scripts). Running rake spec
will execute your tests in standalone mode.
Running ./bin/kraken console
will open up an interactive ruby command-line, including all of our worker definitions. This allows us to run Ruby code and/or our workers outside of their defined scheduled times. To run a worker, open up a terminal and run the following:
$ ./bin/kraken console
$ 2.4.1 :001 > ::Bitly::UrlExpanderWorker.new.perform("http://bit.ly/KG1k4t")
=> Some job output
Spins up a web application on port 8080 to view the Sidekiq queue (current jobs pending / are executing). View the sidekiq web interface here after running ./bin/kraken app web
.
TODO: Docker installation, running the container, configure Kraken.
Kraken is configured utilizing Yaml. Yaml is simply a structured data format for configuring a program. Krakens main configuration file is located in config/config.yml
.
There is a section in config.yml
called common. This is where you include any new tokens that need to be utilized by an API. You can also include various ID's, ie an Instagram page ID required for the GraphAPI.
If we would add the following to our config.yml
file:
common:
instagram:
page_id: 12345
We can access the value throughout Kraken by running
Kraken.config.value("instagram.page_id")