Spark Java HelloWorld

1. First, download & install eclipse Mars for Ubuntu 15 (pretty staight forward from here)

2. Create an Maven Project in Eclipse. Straight forward

3. Adding Spark Depedency

1
2
3
4
5
6
    <dependency> <!-- Spark dependency -->
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.10</artifactId>
      <version>1.1.1</version>
      <scope>provided</scope>
    </dependency>

4. Hello World Java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import java.util.Arrays;
import java.util.List;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.VoidFunction;

/**
 * Hello world!
 *
 */

public class App {
  public static void main(String[] args) {

    // Local mode
    SparkConf sparkConf = new SparkConf().setAppName("HelloWorld").setMaster("local");
    JavaSparkContext ctx = new JavaSparkContext(sparkConf);
    String[] arr = new String[] { "A1", "B2", "C3", "D4", "F5" };
    List<String> inputList = Arrays.asList(arr);
    JavaRDD<String> inputRDD = ctx.parallelize(inputList);
    inputRDD.foreach(new VoidFunction<String>() {

      public void call(String input) throws Exception {
        System.out.println(input);

      }
    });

  }
}

5. Output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
16/02/13 07:27:07 WARN util.Utils: Your hostname, vichu-Lenovo-Z50-70 resolves to a loopback address: 127.0.1.1; using 192.168.1.140 instead (on interface wlan0)
16/02/13 07:27:07 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/02/13 07:27:07 INFO spark.SecurityManager: Changing view acls to: vichu
16/02/13 07:27:07 INFO spark.SecurityManager: Changing modify acls to: vichu
16/02/13 07:27:07 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(vichu); users with modify permissions: Set(vichu)
16/02/13 07:27:07 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/02/13 07:27:07 INFO Remoting: Starting remoting
16/02/13 07:27:07 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.1.140:39886]
16/02/13 07:27:07 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@192.168.1.140:39886]
16/02/13 07:27:07 INFO util.Utils: Successfully started service 'sparkDriver' on port 39886.
16/02/13 07:27:07 INFO spark.SparkEnv: Registering MapOutputTracker
16/02/13 07:27:07 INFO spark.SparkEnv: Registering BlockManagerMaster
16/02/13 07:27:07 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20160213072707-5f39
16/02/13 07:27:08 INFO util.Utils: Successfully started service 'Connection manager for block manager' on port 56037.
16/02/13 07:27:08 INFO network.ConnectionManager: Bound socket to port 56037 with id = ConnectionManagerId(192.168.1.140,56037)
16/02/13 07:27:08 INFO storage.MemoryStore: MemoryStore started with capacity 945.8 MB
16/02/13 07:27:08 INFO storage.BlockManagerMaster: Trying to register BlockManager
16/02/13 07:27:08 INFO storage.BlockManagerMasterActor: Registering block manager 192.168.1.140:56037 with 945.8 MB RAM, BlockManagerId(<driver>, 192.168.1.140, 56037, 0)
16/02/13 07:27:08 INFO storage.BlockManagerMaster: Registered BlockManager
16/02/13 07:27:08 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-e86643a4-7d63-4b1b-9b3c-95178861aa1e
16/02/13 07:27:08 INFO spark.HttpServer: Starting HTTP Server
16/02/13 07:27:08 INFO server.Server: jetty-8.1.14.v20131031
16/02/13 07:27:08 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:44346
16/02/13 07:27:08 INFO util.Utils: Successfully started service 'HTTP file server' on port 44346.
16/02/13 07:27:08 INFO server.Server: jetty-8.1.14.v20131031
16/02/13 07:27:08 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
16/02/13 07:27:08 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
16/02/13 07:27:08 INFO ui.SparkUI: Started SparkUI at http://192.168.1.140:4040
16/02/13 07:27:08 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@192.168.1.140:39886/user/HeartbeatReceiver
16/02/13 07:27:08 INFO spark.SparkContext: Starting job: foreach at App.java:24
16/02/13 07:27:08 INFO scheduler.DAGScheduler: Got job 0 (foreach at App.java:24) with 1 output partitions (allowLocal=false)
16/02/13 07:27:08 INFO scheduler.DAGScheduler: Final stage: Stage 0(foreach at App.java:24)
16/02/13 07:27:08 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/02/13 07:27:08 INFO scheduler.DAGScheduler: Missing parents: List()
16/02/13 07:27:08 INFO scheduler.DAGScheduler: Submitting Stage 0 (ParallelCollectionRDD[0] at parallelize at App.java:23), which has no missing parents
16/02/13 07:27:08 INFO storage.MemoryStore: ensureFreeSpace(1504) called with curMem=0, maxMem=991753666
16/02/13 07:27:08 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1504.0 B, free 945.8 MB)
16/02/13 07:27:08 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from Stage 0 (ParallelCollectionRDD[0] at parallelize at App.java:23)
16/02/13 07:27:08 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
16/02/13 07:27:08 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1224 bytes)
16/02/13 07:27:08 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
A1
B2
C3
D4
F5
16/02/13 07:27:08 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 585 bytes result sent to driver
16/02/13 07:27:08 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 39 ms on localhost (1/1)
16/02/13 07:27:08 INFO scheduler.DAGScheduler: Stage 0 (foreach at App.java:24) finished in 0.054 s
16/02/13 07:27:08 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/02/13 07:27:08 INFO spark.SparkContext: Job finished: foreach at App.java:24, took 0.240135089 s

Leave a Reply

Your email address will not be published. Required fields are marked *