Update

Please see the following blog posts for the latests updates:

ETL Language Showdown - Sept. 2014
ETL Language Showdown Part 2 - Now with Python - May. 2015

ETL Language Showdown

This repo implements the same map reduce ETL (Extract-Transform-Load) task in multiple languages in an effort to compare language productivity, terseness and readability. The performance comparisons should not be taken seriously. If anything, it is a bigger indication of my skillset in that language rather than their performance capabilities.

The Task

Count the number of tweets that mention 'knicks' in their message and bucket based on the neighborhood of origin. The ~1GB dataset for this task, sampled below, contains a tweet's message and its NYC neighborhood. It can be downloaded here.

91	west-brighton	Brooklyn	Uhhh
121	turtle-bay-east-midtown	Manhattan	Say anything
175	morningside-heights	Manhattan	It feels half-cheating half-fulfilling to cite myself.

Initial Assumption

These tasks are not run on Hadoop but do run concurrently. Performance numbers are moot since the CPU mostly sits idle waiting on Disk IO.
**UPDATE: Boy was the IO bound assumption wrong.

The Languages

Ruby 2.2.2
Golang 1.4.2 - Imperative
Scala 2.11.4 - Both Imperative and Functional
Elixir 1.0.4 - Functional
Python 3

Scala

Uses Akka (Supervisors and Actors)

Results

Ruby w/ Celluloid (Global Interpreter Lock Bound, single core)	43.7s
JRuby w/ Celluloid	15.8s
Ruby w/ grosser/parallel (not GNU Parallel)	10.9s
Python w/ Pool	11.7s
Elixir	21.8s
Scala	8.8s
Scala w/ Substring (Skipped regex for performance analysis)	8.3s
Golang	32.8s
Golang w/ Substring (Skipped regex for performance analysis)	7.8s
Node w/ Cluster	TODO

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
elixir		elixir
fixtures		fixtures
golang		golang
nim		nim
python		python
ruby		ruby
scala		scala
.gitignore		.gitignore
README.md		README.md
reference_output		reference_output
run_elixir		run_elixir
run_go		run_go
run_nim		run_nim
run_python		run_python
run_ruby		run_ruby
run_ruby_parallel		run_ruby_parallel
run_scala		run_scala

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Update

ETL Language Showdown

The Task

Initial Assumption

The Languages

Scala

Results

About

Releases

Packages

Languages

maxgrenderjones/etl-language-comparison

Folders and files

Latest commit

History

Repository files navigation

Update

ETL Language Showdown

The Task

Initial Assumption

The Languages

Scala

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages