Installing Apache Spark

First things first, got to have latest Java

1
2
3
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer

Next install Scala. The latest version as of today is 2.11.7, but I ran into brick wall with that version. So using 2.10.6

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ wget http://www.scala-lang.org/files/archive/scala-2.10.6.deb
$ sudo dpkg -i scala-2.10.6.deb

$ scala
Welcome to Scala version 2.10.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_66).
Type in expressions to have them evaluated.
Type :help for more information.

scala> :q

$ sudo apt-get install git
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.0.tgz
$ tar xvf spark-1.6.0.tgz
$ rm spark-1.6.0.tgz
$ cd spark-1.6.0/
$ build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.4 -DskipTests clean package

Notes:
:q is to quit from scala shell
Spark does not yet support its JDBC component for Scala 2.11.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
./run-example SparkPi
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/02/13 06:56:35 INFO SparkContext: Running Spark version 1.6.0
16/02/13 06:56:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/02/13 06:56:35 WARN Utils: Your hostname, .... resolves to a loopback address: 127.0.1.1; using 192.168.1.140 instead (on interface wlan0)
16/02/13 06:56:35 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/02/13 06:56:35 INFO SecurityManager: Changing view acls to: ...
16/02/13 06:56:35 INFO SecurityManager: Changing modify acls to: ....
16/02/13 06:56:35 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(...); users with modify permissions: Set(...)
16/02/13 06:56:36 INFO Utils: Successfully started service '
sparkDriver' on port 34966.
16/02/13 06:56:36 INFO Slf4jLogger: Slf4jLogger started
16/02/13 06:56:36 INFO Remoting: Starting remoting
16/02/13 06:56:36 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.1.140:49553]
16/02/13 06:56:36 INFO Utils: Successfully started service '
sparkDriverActorSystem' on port 49553.
16/02/13 06:56:36 INFO SparkEnv: Registering MapOutputTracker
16/02/13 06:56:36 INFO SparkEnv: Registering BlockManagerMaster
16/02/13 06:56:36 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-1e81dc84-91ff-4503-ab3e-7fa8adbac78e
16/02/13 06:56:36 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
16/02/13 06:56:36 INFO SparkEnv: Registering OutputCommitCoordinator
16/02/13 06:56:36 INFO Utils: Successfully started service '
SparkUI' on port 4040.
16/02/13 06:56:36 INFO SparkUI: Started SparkUI at http://192.168.1.140:4040
16/02/13 06:56:36 INFO HttpFileServer: HTTP File server directory is /tmp/spark-29f5d77e-a0ea-4986-95fb-6d3b9104c18f/httpd-6774316e-0973-43c7-b58d-3cc0a1991f95
16/02/13 06:56:36 INFO HttpServer: Starting HTTP Server
16/02/13 06:56:36 INFO Utils: Successfully started service '
HTTP file server' on port 42625.
16/02/13 06:56:37 INFO SparkContext: Added JAR file:/home/.../spark/spark-1.6.0/examples/target/scala-2.10/spark-examples-1.6.0-hadoop2.6.4.jar at http://192.168.1.140:42625/jars/spark-examples-1.6.0-hadoop2.6.4.jar with timestamp 1455368197141
16/02/13 06:56:37 INFO Executor: Starting executor ID driver on host localhost
16/02/13 06:56:37 INFO Utils: Successfully started service '
org.apache.spark.network.netty.NettyBlockTransferService' on port 58239.
16/02/13 06:56:37 INFO NettyBlockTransferService: Server created on 58239
16/02/13 06:56:37 INFO BlockManagerMaster: Trying to register BlockManager
16/02/13 06:56:37 INFO BlockManagerMasterEndpoint: Registering block manager localhost:58239 with 511.1 MB RAM, BlockManagerId(driver, localhost, 58239)
16/02/13 06:56:37 INFO BlockManagerMaster: Registered BlockManager
16/02/13 06:56:37 INFO SparkContext: Starting job: reduce at SparkPi.scala:36
16/02/13 06:56:37 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:36) with 2 output partitions
16/02/13 06:56:37 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:36)
16/02/13 06:56:37 INFO DAGScheduler: Parents of final stage: List()
16/02/13 06:56:37 INFO DAGScheduler: Missing parents: List()
16/02/13 06:56:37 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:32), which has no missing parents
16/02/13 06:56:38 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1888.0 B, free 1888.0 B)
16/02/13 06:56:38 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1202.0 B, free 3.0 KB)
16/02/13 06:56:38 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:58239 (size: 1202.0 B, free: 511.1 MB)
16/02/13 06:56:38 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
16/02/13 06:56:38 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:32)
16/02/13 06:56:38 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/02/13 06:56:38 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 2156 bytes)
16/02/13 06:56:38 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, partition 1,PROCESS_LOCAL, 2156 bytes)
16/02/13 06:56:38 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
16/02/13 06:56:38 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
16/02/13 06:56:38 INFO Executor: Fetching http://192.168.1.140:42625/jars/spark-examples-1.6.0-hadoop2.6.4.jar with timestamp 1455368197141
16/02/13 06:56:38 INFO Utils: Fetching http://192.168.1.140:42625/jars/spark-examples-1.6.0-hadoop2.6.4.jar to /tmp/spark-29f5d77e-a0ea-4986-95fb-6d3b9104c18f/userFiles-8a089d63-89cb-4bf4-a116-b1dcfb15e6c1/fetchFileTemp3377846363451941090.tmp
16/02/13 06:56:38 INFO Executor: Adding file:/tmp/spark-29f5d77e-a0ea-4986-95fb-6d3b9104c18f/userFiles-8a089d63-89cb-4bf4-a116-b1dcfb15e6c1/spark-examples-1.6.0-hadoop2.6.4.jar to class loader
16/02/13 06:56:38 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 1031 bytes result sent to driver
16/02/13 06:56:38 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1031 bytes result sent to driver
16/02/13 06:56:38 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 762 ms on localhost (1/2)
16/02/13 06:56:38 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 736 ms on localhost (2/2)
16/02/13 06:56:38 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/02/13 06:56:38 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:36) finished in 0.777 s
16/02/13 06:56:38 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:36, took 1.003757 s

Pi is roughly 3.13756 <<-- There it is!


16/02/13 06:56:38 INFO SparkUI: Stopped Spark web UI at http://192.168.1.140:4040
16/02/13 06:56:38 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/02/13 06:56:38 INFO MemoryStore: MemoryStore cleared
16/02/13 06:56:38 INFO BlockManager: BlockManager stopped
16/02/13 06:56:38 INFO BlockManagerMaster: BlockManagerMaster stopped
16/02/13 06:56:38 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/02/13 06:56:38 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/02/13 06:56:38 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/02/13 06:56:38 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
16/02/13 06:56:39 INFO SparkContext: Successfully stopped SparkContext
16/02/13 06:56:39 INFO ShutdownHookManager: Shutdown hook called
16/02/13 06:56:39 INFO ShutdownHookManager: Deleting directory /tmp/spark-29f5d77e-a0ea-4986-95fb-6d3b9104c18f
16/02/13 06:56:39 INFO ShutdownHookManager: Deleting directory /tmp/spark-29f5d77e-a0ea-4986-95fb-6d3b9104c18f/httpd-6774316e-0973-43c7-b58d-3cc0a1991f95

Install SBT. The most current version is 0.13.9

1
2
3
4
echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 642AC823
sudo apt-get update
sudo apt-get install sbt

Leave a Reply

Your email address will not be published. Required fields are marked *