3. Build the Cluster

1. Requirements

We need an SD Card with Lubuntu, Hadoop and Spark installed.

2. Build the Cluster

2.1 Clone the SD Card

sudo shutdown 0 of your orangepi and remove the SD Card. If you are using Linux you can be happy since cloning an SD Card is super simple, just execute dd if=/dev/sdcard1 of=~/sdimage, dd if=~/sdimage of=/dev/sdcard2. If you are a Windows user you can use again Win32 Disc Imager. Instead of writing we now read out the data from SD Card, to some location. Write just write the saved image to the new SD Card as described in Section 1.

2.2 Configure the Nodes

2.2.1 Set unique static IP Adresses

Because the nodes need to communicate with each other we have to set a for each node static ip address first. For example if we want to have the node the 192.168.1.101 open sudo nano /etc/network/interfaces, quote out source-directory /etc/network/interfaces.d and insert

auto eth0
iface eth0 inet static
address 192.168.1.101
netmask 255.255.255.0
broadcast 192.168.1.255
gateway 192.168.1.1
dns-nameservers 192.168.1.1
. For this example, we will use 192.168.1.101 as master and 192.168.1.102 as slave.

Next we want to map the nodes and avoid an error message IPv6 we will disable IPv6 by opening sudo nano /etc/sysctl.conf and adding
192.168.1.101 master
192.168.1.102 slave-1

# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

2.2.2 Esatablish SSH Connections

Because we do not want to enter the passwords each time the cluster boots, we first want to establish an SSH Connection between the nodes.
First establish a connection from the master to the master itself ssh-keygen -t rsa -P "" and cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys then connect the local machine ssh localhost and establish a connection between master and slave ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@192.168.1.102, ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@slave

2.2.3 Configure Hadoop (ALL Nodes)

First we need decide where to store the data

sudo mkdir -p /usr/local/hadoop/tmp
sudo chown hduser:hadoop /usr/local/hadoop/tmp
sudo mkdir -p /usr/local/hadoop/name
sudo chown hduser:hadoop /usr/local/hadoop/name/
sudo mkdir -p /usr/local/hadoop/data
sudo chown hduser:hadoop /usr/local/hadoop/data

Hint:
If you want to build a real cluster you might think about using an external drive, then you should use the folder at the harddrive and not the rather small SD Card.

Change to su hduser, go to cd $HADOOP_CONF_DIR and open nano core-site.xml and replace

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
</configuration>

Next, open nano hdfs-site.xml and replace

<configuration>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/data</value>
<final>true</final>
</property>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/name/</value>
<final>true</final>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>

The default value of df.replication is 3, however since we build a cluster with 2 nodes we will use 2.

Change back to su hduser on the main node
hdfs namenode -format
Next start the hdfs:

cd $HADOOP_HOME/sbin
start-dfs.sh

To check your installation go to http://192.168.1.101:50070, if you see something like this then you succeeded.

3. Configure Spark

First we need to grant some folder permissions at all nodes as the orangepi user sudo chown hduser:hadoop ./spark-2.1.0-bin-hadoop2.7 and sudo chmod 750 ./spark-2.1.0-bin-hadoop2.7. Next we need to edit the configuration file at the master node at in the /opt/spark-2.1.0-bin-hadoop2.7/conf folder by sudo cp slaves.template slaves and add the lines

192.168.1.101
192.168.1.102

then after sudo cp spark-env.sh.template spark-env.sh we need to add the line SPARK_MASTER_HOST=192.168.1.101 to sudo nano /etc/spark/conf/spark-env.sh
to start the Spark server execute $SPARK_HOME/sbin/start-all.sh from the Spark main folder.

To check your installation go to http://192.168.1.101:8080, if everything turns out to be correct you should see something like this: Congratulation, you successfully set up an Hadoop/Spark Cluster. In the next Section we will install some software and carry out the first analysis.

1 thought on “3. Build the Cluster

  1. Pingback: 2. Install Hadoop and Spark – The Big Data Blog

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.