Wednesday, July 1, 2015

BigInsights 4 - Getting Giraph Running


Apache Giraph is an implementation of the Bulk Synchronous Parallel programming model designed to work on top of Apache Hadoop. I wanted to try it on top of my BigInsights 4.0.0.1 install but it wasn't clear how to make it work as one of the Giraph Quick Start guide steps involves installing a particular version of Hadoop. This is what I did (I'm assuming you already have an install of BigInsights 4 - if not you can try this in the Quick Start VM)

Download Giraph

As BigInsights 4 supports YARN I downloaded the Giraph binary archive for Hadoop2 v1.1.0 from one of the mirrors.




I got giraph-dist-1.1.0-hadoop2-bin.tar.gz

Run An Example

Giraph comes with a set of example programs. The Quick Start guide describes how to run the SimpleShortestPaths example but it is set up to run this from a local build of the source code which provides an example jar which includes all of the dependencies. As far as I can tell this jar is not provided in the binary distributions and probably wouldn't be much use if it were as the dependencies are at different versions compared to the jars used in BigInsights 4 based in the ODP configuration.

So I constructed a new run command to take account of the jars available with BigInisghts 4:

export GH=/home/slaws/graph/giraph-1.1.0-hadoop2-for-hadoop-2.5.1
export GL=${GH}/lib
export HADOOP_CLASSPATH=${GH}/giraph-core-1.1.0-hadoop2.jar:${GH}/giraph-examples-1.1.0-hadoop2.jar:${GL}/*

export LIBJARS=${GH}/giraph-core-1.1.0-hadoop2.jar,${GH}/giraph-examples-1.1.0-h
adoop2.jar,${GL}/netty-3.6.2.Final.jar,${GL}/netty-all-4.0.14.Final.jar,${GL}/ty
petools-0.2.1.jar,${GL}/metrics-core-2.2.0.jar,${GL}/metrics-core-3.0.0.jar,${GL
}/json-20090211.jar,${GL}/fastutil-6.5.4.jar,${GL}/base64-2.3.8.jar





# remove output file first
hadoop fs -rmdir /tmp/graph/giraph-qs/shortestpaths

# submit a job via YARN to run the Giraph example
hadoop jar ${GH}/giraph-examples-1.1.0-hadoop2.jar org.apache.giraph.GiraphRunne
r -libjars ${LIBJARS} org.apache.giraph.examples.SimpleShortestPathsComputation
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vi
p /tmp/graph/giraph-qs/tiny_graph.txt -vof org.apache.giraph.io.formats.IdWithVa
lueTextOutputFormat -op /tmp/graph/giraph-qs/shortestpaths -w 1 -ca giraph.Split
MasterWorker=false


The differences here compared to the original as a follows

The export HADOOP_CLASSPATH line adds enough jars for the driver progam to operate.

The  export LIBJARS line creates  a classpath with just enough jars to run the example while not overlapping with the jar versions in BigInsights 4. Note that this is exploited in the hadoop jar line using the -libjars parameter which specified which jars are going to be copied to the compute nodes for processing.

The -ca giraph.SplitMasterWorker=false parameter just allows me to run the example on BigInsights 4 installed on a single node.

After getting the example working I then went on to write an example of my own. There are some points that weren't obvious to me initially so I'll write about those is a separate post.





No comments: