Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore titan-hbase support in hadoop2 branch #159

Open
wants to merge 67 commits into
base: hadoop2
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
b4c2054
Update doc submodule pointer to latest wiki
dalaro Oct 17, 2013
bf00616
Merge branch 'master' of github.com:thinkaurelius/faunus
dalaro Oct 24, 2013
93e0988
added more useful properties for reduce phase handling.
okram Nov 21, 2013
dad1699
Corrected edgeSerializer invocation
mbroecheler Nov 22, 2013
62b9abe
Merge branch 'master' of https://github.com/thinkaurelius/faunus
mbroecheler Nov 22, 2013
9415c61
Merge branch 'master' of github.com:thinkaurelius/faunus
dalaro Nov 22, 2013
9868205
Update CHANGELOG with Cassandra partitioner change
dalaro Nov 22, 2013
b5f99f1
Update wiki submodule pointer to latest
dalaro Nov 22, 2013
14a1dd2
Set release date for 0.4.1 to today
dalaro Nov 24, 2013
268026a
Update doc commit pointer
dalaro Nov 24, 2013
dbe769c
Change titan.version 0.4.1-SNAPSHOT to 0.4.1
dalaro Nov 24, 2013
8924cb3
Update doc commit pointer
dalaro Nov 24, 2013
06f0bb7
BlueprintsGraphOutputMapReduce is now a MapReduce and a Map. This ens…
okram Nov 24, 2013
4b09cc9
added more properties for bulk loader and a counter for BlueprintsGra…
okram Nov 24, 2013
cf84573
updated CHANGELOG.textile.
okram Nov 24, 2013
4df2bc7
updated CHANGELOG.textile.
okram Nov 24, 2013
002c274
Merge branch 'master' of github.com:thinkaurelius/faunus
dalaro Nov 24, 2013
fa2e1d8
Update doc commit pointer
dalaro Nov 24, 2013
3b3fad2
[maven-release-plugin] prepare release 0.4.1
dalaro Nov 24, 2013
84d17c2
[maven-release-plugin] prepare for next development iteration
dalaro Nov 24, 2013
2bbc129
updated BlueprintsGraphOutputMapReduce javadoc
okram Nov 25, 2013
589e2d7
minor tweaks to bulk loader code.
okram Nov 25, 2013
4b7da22
added titan elasticsearch dependency.
okram Dec 11, 2013
c20c94a
updated CHANGELOG.textile
okram Dec 11, 2013
107e79a
set <titan.version> in pom.xml to be 0.4.2-SNAPSHOT for staging for D…
okram Jan 8, 2014
6db187a
Changing from Titan 0.4.2-SNAPSHOT to 0.4.2
dalaro Jan 8, 2014
b3e8e5a
Update doc submodule commit pointer
dalaro Jan 8, 2014
7b037ea
Update doc submodule commit pointer
dalaro Jan 8, 2014
a34d2f7
[maven-release-plugin] prepare for next development iteration
dalaro Jan 8, 2014
74a44b3
[maven-release-plugin] prepare release 0.4.2
dalaro Jan 8, 2014
d45c183
Set titan.version to 0.4.3-SNAPSHOT
dalaro Jan 9, 2014
cb61640
Update CHANGELOG.textile
dalaro Jan 9, 2014
ca414e9
Merge tag '0.4.2' into hadoop2
Mar 12, 2014
aeb7d74
Restore titan-hbase input/output format support
Mar 12, 2014
8159b3e
Re-enable many of the FaunusCompiler tests
Mar 12, 2014
9706b73
Convert hadoop-client version into a property
Nov 14, 2013
e33d0f6
Throw an informative error message when RDF input params are not prop…
joshsh Mar 20, 2014
d781b6a
added incremental bulk loading of edges to BlueprintsGraphOutputMapRe…
okram Apr 15, 2014
5b5a842
minor tweak to BlueprintsScript.groovy example.
okram Apr 15, 2014
997e442
Changelog tweaks
joshsh Apr 15, 2014
87fb236
minor tweak to BlueprintsScript.groovy example.
okram Apr 15, 2014
3cd2731
added @dkuppitz BlueprintsScript.
okram Apr 16, 2014
008c06a
CHANGELOG stuff.
okram Apr 16, 2014
b0dbddd
Fixed initialization of Kryo in Faunus based on recent changes in Titan.
mbroecheler Apr 17, 2014
57659b5
remove aduna repository.
okram Apr 17, 2014
afc07fb
Avoid conflicts with reserved tokens when importing RDF graphs using …
joshsh Apr 19, 2014
8225829
Set 0.4.3 release date to Apr 21, 2014
dalaro Apr 19, 2014
624cd1a
Switch from Titan 0.4.3-SNAPSHOT to just 0.4.3
dalaro Apr 19, 2014
4803b36
Update doc submodule commit pointer
dalaro Apr 19, 2014
60e4f5e
Modify pom to stop uploading zip/tars to Sonatype
dalaro Apr 19, 2014
7a5d3d2
Update maven-release-plugin and git-scm version
dalaro Apr 19, 2014
c014710
Different attempt to exclude tar/zip from sonatype
dalaro Apr 19, 2014
a3c36ec
[maven-release-plugin] prepare release 0.4.3
dalaro Apr 19, 2014
cd1230b
[maven-release-plugin] prepare for next development iteration
dalaro Apr 19, 2014
fd8c3b8
gh-pages-update.sh script tweaks
dalaro Apr 19, 2014
e953886
CHANGELOG update and README.
okram Apr 19, 2014
b891a41
Update Titan dep to 0.4.4-SNAPSHOT
dalaro Apr 21, 2014
6e32489
Merge branch 'master' of github.com:thinkaurelius/faunus
dalaro Apr 21, 2014
a72d259
Update doc submodule commit pointer
dalaro Apr 22, 2014
a133d80
Update doc submodule commit pointer
dalaro Apr 22, 2014
6c7650a
gh-pages-update.sh tweaks from 0.4.3 release
dalaro Apr 22, 2014
81f3d94
Changelog entry for 0.4.4
dalaro Apr 22, 2014
81e92be
Update doc submodule commit pointer
dalaro Apr 22, 2014
8dc20d7
Update titan version to 0.4.4
dalaro Apr 22, 2014
f7f7a11
[maven-release-plugin] prepare release 0.4.4
dalaro Apr 22, 2014
0d0f5d5
Merge commit 'f7f7a11dabc60a931c65da8b092ea6be6eac3f75' into hadoop2
May 12, 2014
4d00589
Update the version name
May 12, 2014
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 44 additions & 3 deletions CHANGELOG.textile
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,58 @@ Faunus: Graph Analytics Engine

h2. Faunus 0.x.y

h3. Version 0.4.1 (NOT OFFICIALLY RELEASED YET)
h3. Version 0.4.4 (Apr 22, 2014)

```xml
<dependency>
<groupId>com.thinkaurelius.faunus</groupId>
<artifactId>faunus</artifactId>
<version>0.4.1-SNAPSHOT</version>
<version>0.4.4</version>
</dependency>
```

Incremented Titan version

h3. Version 0.4.3 (Apr 21, 2014)

```xml
<dependency>
<groupId>com.thinkaurelius.faunus</groupId>
<artifactId>faunus</artifactId>
<version>0.4.3</version>
</dependency>
```

* Added error handling for invalid RDF parameters
* Added support for incremental edge loading with @BlueprintsGraphOutputMapReduce@
* Bumped to support Titan 0.4.3

h3. Version 0.4.2 (Jan 9, 2014)

```xml
<dependency>
<groupId>com.thinkaurelius.faunus</groupId>
<artifactId>faunus</artifactId>
<version>0.4.2</version>
</dependency>
```

* Added Titan ElasticSearch dependency so its available in the Hadoop job jar
* Bumped to support Titan 0.4.2

h3. Version 0.4.1 (Nov 24, 2013)

```xml
<dependency>
<groupId>com.thinkaurelius.faunus</groupId>
<artifactId>faunus</artifactId>
<version>0.4.1</version>
</dependency>
```

* Fixed a severe bug in @Configuration@ entry orderings and @MapSequence@
* Changed default Cassandra partitioner from Random to Murmur3
* Broke @BlueprintsGraphOutputMapReduce@ into a MapReduce and then a Map (speeds up edge loading)

h3. Version 0.4.0 (Oct 16, 2013)

Expand All @@ -40,7 +81,7 @@ h3. Version 0.4.0 (Oct 16, 2013)
* Added support for @has(key)@ and @hasNot(key)@
* Migrated from @Query.Compare@ to @Compare@ with Blueprints 2.4.0
* The variables @hdfs@ and @local@ are available to @gremlin.sh -e@
* Remove @SequenceFile@ migration model via Faunus (unsustainable)
* Removed @SequenceFile@ migration model via Faunus (unsustainable)

==<hr/>==

Expand Down
2 changes: 1 addition & 1 deletion README.textile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ h2. Features
*** "Apache HBase":http://hbase.apache.org/
** "Rexster":http://rexster.tinkerpop.com fronted graph databases
** "GraphSON":https://github.com/tinkerpop/blueprints/wiki/GraphSON-Reader-and-Writer-Library text format stored in HDFS
** EdgeList multi-relational text format stored in HDFS
** EdgeList multi-relational text format stored in HDFS
*** "RDF":http://www.w3.org/RDF/ text formats stored in HDFS
** Hadoop binary "sequence files":http://wiki.apache.org/hadoop/SequenceFile stored in HDFS
** User defined import/export "scripts":https://github.com/thinkaurelius/faunus/wiki/Script-Format
Expand Down
3 changes: 3 additions & 0 deletions bin/faunus.properties
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,6 @@ faunus.output.location.overwrite=true
# mapred.reduce.tasks=3
# mapred.job.reuse.jvm.num.tasks=-1
# mapred.task.timeout=5400000
# mapred.reduce.parallel.copies=50
# io.sort.factor=100
# io.sort.mb=200
1 change: 1 addition & 0 deletions bin/titan-cassandra-output.properties
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ faunus.graph.output.titan.infer-schema=true
# faunus.graph.output.blueprints.script-file=BlueprintsScript.groovy
# controls size of transaction
mapred.max.split.size=5242880
# mapred.reduce.tasks=10
mapred.job.reuse.jvm.num.tasks=-1

faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
Expand Down
1 change: 1 addition & 0 deletions bin/titan-hbase-output.properties
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ faunus.graph.output.titan.infer-schema=true
# faunus.graph.output.blueprints.script-file=BlueprintsScript.groovy
# controls size of transaction
mapred.max.split.size=5242880
# mapred.reduce.tasks=10
mapred.job.reuse.jvm.num.tasks=-1

faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
Expand Down
36 changes: 32 additions & 4 deletions data/BlueprintsScript.groovy
Original file line number Diff line number Diff line change
@@ -1,18 +1,22 @@
import com.thinkaurelius.faunus.FaunusEdge
import com.thinkaurelius.faunus.FaunusVertex
import com.tinkerpop.blueprints.Edge
import com.tinkerpop.blueprints.Graph
import com.tinkerpop.blueprints.Vertex
import com.tinkerpop.gremlin.java.GremlinPipeline
import org.apache.hadoop.mapreduce.Mapper

import static com.thinkaurelius.faunus.formats.BlueprintsGraphOutputMapReduce.Counters.*
import static com.thinkaurelius.faunus.formats.BlueprintsGraphOutputMapReduce.LOGGER

/**
* This script is used to determine vertex uniqueness within a pre-existing graph.
* If the vertex already exists in the graph, return it.
* Else, if the vertex does not already exist, create it and return it.
* Any arbitrary function can be implemented, but the one here implements an index lookup on a unique key.
* This script is used to determine vertex and edge uniqueness within a pre-existing graph.
* If the vertex/edge already exists in the graph, return it.
* Else, if the vertex/edge does not already exist, create it and return it.
* Any arbitrary function can be implemented. The two examples provided are typical scenarios.
*
* @author Marko A. Rodriguez (http://markorodriguez.com)
* @author Daniel Kuppitz (daniel at thinkaurelius.com)
*/
def Vertex getOrCreateVertex(final FaunusVertex faunusVertex, final Graph graph, final Mapper.Context context) {
final String uniqueKey = "name";
Expand All @@ -33,9 +37,33 @@ def Vertex getOrCreateVertex(final FaunusVertex faunusVertex, final Graph graph,
context.getCounter(VERTICES_WRITTEN).increment(1l);
}

// if vertex existed or not, add all the properties of the faunusVertex to the blueprintsVertex
for (final String property : faunusVertex.getPropertyKeys()) {
blueprintsVertex.setProperty(property, faunusVertex.getProperty(property));
context.getCounter(VERTEX_PROPERTIES_WRITTEN).increment(1l);
}
return blueprintsVertex;
}

def Edge getOrCreateEdge(final FaunusEdge faunusEdge, final Vertex blueprintsOutVertex, final Vertex blueprintsInVertex, final Graph graph, final Mapper.Context context) {
final String edgeLabel = faunusEdge.getLabel();
final GremlinPipeline blueprintsEdgePipe = blueprintsOutVertex.outE(edgeLabel).as("e").inV().retain([blueprintsInVertex]).range(0, 1).back("e")
final Edge blueprintsEdge;

if (blueprintsEdgePipe.hasNext()) {
blueprintsEdge = blueprintsEdgePipe.next();
if (blueprintsEdgePipe.hasNext()) {
LOGGER.error("There's more than one edge labeled '" + edgeLabel + "' between vertex #" + blueprintsOutVertex.getId() + " and vertex #" + blueprintsInVertex.getId());
}
} else {
blueprintsEdge = graph.addEdge(null, blueprintsOutVertex, blueprintsInVertex, edgeLabel);
context.getCounter(EDGES_WRITTEN).increment(1l);
}

// if edge existed or not, add all the properties of the faunusEdge to the blueprintsEdge
for (final String key : faunusEdge.getPropertyKeys()) {
blueprintsEdge.setProperty(key, faunusEdge.getProperty(key));
context.getCounter(EDGE_PROPERTIES_WRITTEN).increment(1l);
}
return blueprintsEdge;
}
2 changes: 1 addition & 1 deletion doc
Submodule doc updated from 27469c to 42c374
113 changes: 78 additions & 35 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
</parent>
<groupId>com.thinkaurelius.faunus</groupId>
<artifactId>faunus</artifactId>
<version>0.4.1-SNAPSHOT</version>
<version>0.4.4-hadoop2</version>
<packaging>jar</packaging>
<url>http://thinkaurelius.github.com/faunus/</url>
<name>Faunus: Graph Analytics Engine</name>
Expand All @@ -19,8 +19,9 @@
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<sesame.version>2.6.10</sesame.version>
<tinkerpop.version>2.4.0</tinkerpop.version>
<titan.version>0.4.1-SNAPSHOT</titan.version>
<titan.version>0.4.4-hadoop2</titan.version>
<slf4j.version>1.7.5</slf4j.version>
<hadoop.version>2.2.0</hadoop.version>
</properties>
<developers>
<developer>
Expand Down Expand Up @@ -52,8 +53,8 @@
<connection>scm:git:[email protected]:thinkaurelius/faunus.git</connection>
<developerConnection>scm:git:[email protected]:thinkaurelius/faunus.git</developerConnection>
<url>[email protected]:thinkaurelius/faunus.git</url>
<tag>HEAD</tag>
</scm>
<tag>0.4.4</tag>
</scm>
<dependencyManagement>
<dependencies>
<dependency>
Expand Down Expand Up @@ -92,7 +93,7 @@
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.2.0</version>
<version>${hadoop.version}</version>
</dependency>
<!-- TITAN -->
<dependency>
Expand All @@ -105,11 +106,16 @@
<artifactId>titan-cassandra</artifactId>
<version>${titan.version}</version>
</dependency>
<!--<dependency>
<dependency>
<groupId>com.thinkaurelius.titan</groupId>
<artifactId>titan-hbase</artifactId>
<version>${titan.version}</version>
</dependency>-->
</dependency>
<dependency>
<groupId>com.thinkaurelius.titan</groupId>
<artifactId>titan-es</artifactId>
<version>${titan.version}</version>
</dependency>
<!-- RDF PARSING -->
<dependency>
<groupId>org.openrdf.sesame</groupId>
Expand Down Expand Up @@ -162,13 +168,6 @@
<scope>test</scope>
</dependency>
</dependencies>
<repositories>
<repository>
<id>aduna-repo</id>
<name>Aduna repository</name>
<url>http://repo.aduna-software.org/maven2/releases</url>
</repository>
</repositories>
<build>
<!-- Used during release:perform to upload to S3 -->
<extensions>
Expand Down Expand Up @@ -302,7 +301,14 @@
</plugin>
<plugin>
<artifactId>maven-release-plugin</artifactId>
<version>2.4.1</version>
<version>2.5</version>
<dependencies>
<dependency>
<groupId>org.apache.maven.scm</groupId>
<artifactId>maven-scm-provider-gitexe</artifactId>
<version>1.9</version>
</dependency>
</dependencies>
<configuration>
<goals>deploy</goals>
<pushChanges>false</pushChanges>
Expand Down Expand Up @@ -366,6 +372,33 @@
</executions>
</plugin>

<!--
The distribution descriptor includes files written to the build
directory by both the standalone and hadoop-job descriptors.
That's why the execution for the distribution descriptor comes last.
-->
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.4</version>
<executions>
<execution>
<id>assemble-zip-and-tar</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
<configuration>
<attach>false</attach>
<finalName>${project.artifactId}</finalName>
<tarLongFileMode>gnu</tarLongFileMode>
<descriptors>
<descriptor>src/assembly/distribution.xml</descriptor>
</descriptors>
</configuration>
</execution>
</executions>
</plugin>

<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
Expand Down Expand Up @@ -434,33 +467,43 @@
<workingDirectory>${project.basedir}</workingDirectory>
</configuration>
</execution>
</executions>
</plugin>

<!--
The distribution descriptor includes files written to the build
directory by both the standalone and hadoop-job descriptors.
That's why the execution for the distribution descriptor comes last.
-->
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.4</version>
<executions>
<!-- GPG plugin can't sign a file. It can only either:
* sign all artifacts attached to the project or
* sign and deploy a file
-->
<execution>
<id>build-attached-assemblies</id>
<id>sign-zip</id>
<phase>package</phase>
<goals>
<goal>single</goal>
<goal>exec</goal>
</goals>
<configuration>
<attach>true</attach>
<finalName>${project.artifactId}</finalName>
<tarLongFileMode>gnu</tarLongFileMode>
<descriptors>
<descriptor>src/assembly/distribution.xml</descriptor>
</descriptors>
<executable>gpg</executable>
<arguments>
<argument>--detach-sign</argument>
<argument>--armor</argument>
<argument>${project.artifactId}-${project.version}.zip</argument>
</arguments>
<workingDirectory>${project.build.directory}</workingDirectory>
</configuration>
</execution>
<execution>
<id>sign-tar</id>
<phase>package</phase>
<goals>
<goal>exec</goal>
</goals>
<configuration>
<executable>gpg</executable>
<arguments>
<argument>--detach-sign</argument>
<argument>--armor</argument>
<argument>${project.artifactId}-${project.version}.tar.bz2</argument>
</arguments>
<workingDirectory>${project.build.directory}</workingDirectory>
</configuration>
</execution>

</executions>
</plugin>

Expand Down
2 changes: 1 addition & 1 deletion src/main/java/com/thinkaurelius/faunus/FaunusElement.java
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ public abstract class FaunusElement implements Element, WritableComparable<Faunu
WritableComparator.define(FaunusElement.class, new Comparator());
}

protected static final KryoSerializer serialize = new KryoSerializer(true);
protected static final KryoSerializer serialize = new KryoSerializer();

protected static final Map<String, String> TYPE_MAP = new HashMap<String, String>() {
@Override
Expand Down
Loading