Skip to content
Chris Heald edited this page Oct 25, 2017 · 4 revisions

Kraken

Kraken is the collector for the Mashable Data Pipeline.

                     .-'   `'.
                    /         \\
                    |         ;
                    |         |           ___.--,
           _.._     |0) ~ (0) |    _.---'`__.-( (_.
    __.--'`_.. '.__.\\    '--. \\_.-' ,.--'`     `""`
   ( ,.--'`   ',__ /./;   ;, '.__.'`    __
   _`) )  .---.__.' / |   |\\   \\__..--""  """--.,_
  `---' .'.''-._.-'`_./  /\\ '.  \\ _.-~~~````~~~-._`-.__.'
------------.' _.-'-|  |-\\  \\  '.-----------------------
            / .'     \\  \\   '. '-._)
           / /        \\  \\    `=.__`~-.
          / /          `) )    / / `"".`\\
    , _.-'.'           / /    ( (     / /
     `--~`          .-'.'      '.'.  | (
                   ( (`          ) )  '-;
                    '-;         (-'

Table of Contents

Installation

Kraken has the following dependencies

  • Ruby 2.1+
  • Redis 3.0+
  • Kafka (Confluent Platform - not needed if running in standalone mode)

Installing Ruby

If Ruby does not exist on your current system, you can install ruby with the commands listed below. To verify your Ruby installation, run ruby -v in your terminal.

apt (Debian or Ubuntu)

Debian GNU/Linux and Ubuntu use the apt package manager. You can use it like this:

sudo apt-get install ruby-full

yum (CentOS, Fedora, or RHEL)

CentOS, Fedora, and RHEL use the yum package manager. You can use it like this:

sudo yum install ruby

Homebrew (OS X)

On OS X El Capitan, Yosemite, Mavericks, and macOS Sierra, Ruby 2.0 is included. OS X Mountain Lion, Lion, and Snow Leopard ship with Ruby 1.8.7. You can skip this step and move to Install RVM to install a version of ruby with the ruby version manager.

Many people on OS X use Homebrew as a package manager. To install homebrew, run the following command:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

It is really easy to get a newer version of Ruby using Homebrew:

brew install ruby

Install RVM

RVM stands for Ruby Version Manager. It allows you to install multiple versions of ruby on your machine. Run through the installation steps found on the RVM website.

You can then run the following with any version of ruby.

rvm install 2.1.1

To switch to utilizing ruby 2.1.1, run the rvm use 2.1.1, replacing 2.1.1 with any version you installed.

Install RubyGems

If you have a Mac, you can skip this step.

RubyGems is Ruby's package manager, it allows us to install libraries to our projects.

With apt-get

sudo apt-get install rubygems

With yum

sudo yum install rubygems

To test your installation, run gem in your terminal.

Install Bundler

Bundler provides a consistent environment for Ruby projects by tracking and installing the exact gems and versions that are needed. It resolves dependencies of our libraries and current project.

Install bundler utilizing RubyGems

gem install bundler

Install Rake

Rake is a Make-like program implemented in Ruby. Tasks and dependencies are specified in standard Ruby syntax. It allows us to automate some tasks -- like bootstrapping Kraken automatically by simply running rake in the terminal.

gem install rake

Install Redis

Redis is an in-memory data structure store, used as a database, cache and message broker. Redis is utilized to schedule our Scripts, similar to the Cron jobs execute on MashStat.

The best way to install redis, is to compile it from its source:

wget http://download.redis.io/redis-stable.tar.gz
tar xvzf redis-stable.tar.gz
cd redis-stable
make
sudo make install

This will compile redis, and install it to your /usr/local/bin/ directory.

A more detailed installation document can be found on the redis website.

Install Kraken

Kraken is this repository. First, clone this repository with git.

git clone [email protected]:mashable/kraken.git

Open up a terminal session and navigate to the kraken directory.

cd kraken

Ensure you installed Ruby, RVM, Bundler, RubyGems, Rake, and Redis. before continuing.

To "bootstrap" Kraken, install and run the test suite, run the rake command in your terminal. You'll see similar output like below:

gem install bundler
Successfully installed bundler-1.15.4
Parsing documentation for bundler-1.15.4
Done installing documentation for bundler after 4 seconds
1 gem installed
bundle

....

Using avromatic 0.27.0
Using twitter 6.1.0
Bundle complete! 38 Gemfile dependencies, 111 gems now installed.
Use `bundle info [gemname]` to see where a bundled gem is installed.

STANDALONE=1 bin/rspec
/Users/banderson/.rvm/gems/ruby-2.4.1/gems/avro-1.8.2/lib/avro/schema.rb:350: warning: constant ::Fixnum is deprecated
/Users/banderson/.rvm/gems/ruby-2.4.1/gems/avro-1.8.2/lib/avro/schema.rb:350: warning: constant ::Fixnum is deprecated
....................

Finished in 1.28 seconds (files took 2.22 seconds to load)
20 examples, 0 failures

If all tests pass with Zero failures, Kraken is installed with all of its standalone dependencies.

Running Kraken

Kraken comes packaged with a command-line interface called Soles. Soles allows us to run a variety of commands pertaining to the status of Kraken jobs, running Jobs outside of their scheduled time with a console, configuring the Kafka cluster, and running Kraken in daemon mode.

Standalone Mode

Running Kraken in Standalone Mode is for development and testing purposes. Utilizing the Kafka Cluster setup will be an exact replica of what is running on AWS. The following commands are what will generally be utilized in development mode.

Note: you must start up a redis server prior to running any commands. To start up redis, run redis-server in your terminal.

rake spec

For those familiar with Ruby, you will write unit tests for your Workers (import scripts). Running rake spec will execute your tests in standalone mode.

./bin/kraken console

Running ./bin/kraken console will open up an interactive ruby command-line, including all of our worker definitions. This allows us to run Ruby code and/or our workers outside of their defined scheduled times. To run a worker, open up a terminal and run the following:

$ ./bin/kraken console
$ 2.4.1 :001 > ::Bitly::UrlExpanderWorker.new.perform("http://bit.ly/KG1k4t")
=> Some job output

./bin/kraken app web

Spins up a web application on port 8080 to view the Sidekiq queue (current jobs pending / are executing). View the sidekiq web interface here after running ./bin/kraken app web.

Kafka Cluster

TODO: Docker installation, running the container, configure Kraken.

How to Configure Kraken

Kraken is configured utilizing Yaml. Yaml is simply a structured data format for configuring a program. Krakens main configuration file is located in config/config.yml.

Adding Configurations

There is a section in config.yml called common. This is where you include any new tokens that need to be utilized by an API. You can also include various ID's, ie an Instagram page ID required for the GraphAPI.

Accessing Configuration Values

If we would add the following to our config.yml file:

common:
	instagram:
		page_id: 12345

We can access the value throughout Kraken by running

Kraken.config.value("instagram.page_id")
Clone this wiki locally