1. First, download & install eclipse Mars for Ubuntu 15 (pretty staight forward from here)
2. Create an Maven Project in Eclipse. Straight forward
3. Adding Spark Depedency
1 2 3 4 5 6 | <dependency> <!-- Spark dependency --> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.1.1</version> <scope>provided</scope> </dependency> |
4. Hello World Java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | import java.util.Arrays; import java.util.List; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.VoidFunction; /** * Hello world! * */ public class App { public static void main(String[] args) { // Local mode SparkConf sparkConf = new SparkConf().setAppName("HelloWorld").setMaster("local"); JavaSparkContext ctx = new JavaSparkContext(sparkConf); String[] arr = new String[] { "A1", "B2", "C3", "D4", "F5" }; List<String> inputList = Arrays.asList(arr); JavaRDD<String> inputRDD = ctx.parallelize(inputList); inputRDD.foreach(new VoidFunction<String>() { public void call(String input) throws Exception { System.out.println(input); } }); } } |
5. Output
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | 16/02/13 07:27:07 WARN util.Utils: Your hostname, vichu-Lenovo-Z50-70 resolves to a loopback address: 127.0.1.1; using 192.168.1.140 instead (on interface wlan0) 16/02/13 07:27:07 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address 16/02/13 07:27:07 INFO spark.SecurityManager: Changing view acls to: vichu 16/02/13 07:27:07 INFO spark.SecurityManager: Changing modify acls to: vichu 16/02/13 07:27:07 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(vichu); users with modify permissions: Set(vichu) 16/02/13 07:27:07 INFO slf4j.Slf4jLogger: Slf4jLogger started 16/02/13 07:27:07 INFO Remoting: Starting remoting 16/02/13 07:27:07 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.1.140:39886] 16/02/13 07:27:07 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@192.168.1.140:39886] 16/02/13 07:27:07 INFO util.Utils: Successfully started service 'sparkDriver' on port 39886. 16/02/13 07:27:07 INFO spark.SparkEnv: Registering MapOutputTracker 16/02/13 07:27:07 INFO spark.SparkEnv: Registering BlockManagerMaster 16/02/13 07:27:07 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20160213072707-5f39 16/02/13 07:27:08 INFO util.Utils: Successfully started service 'Connection manager for block manager' on port 56037. 16/02/13 07:27:08 INFO network.ConnectionManager: Bound socket to port 56037 with id = ConnectionManagerId(192.168.1.140,56037) 16/02/13 07:27:08 INFO storage.MemoryStore: MemoryStore started with capacity 945.8 MB 16/02/13 07:27:08 INFO storage.BlockManagerMaster: Trying to register BlockManager 16/02/13 07:27:08 INFO storage.BlockManagerMasterActor: Registering block manager 192.168.1.140:56037 with 945.8 MB RAM, BlockManagerId(<driver>, 192.168.1.140, 56037, 0) 16/02/13 07:27:08 INFO storage.BlockManagerMaster: Registered BlockManager 16/02/13 07:27:08 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-e86643a4-7d63-4b1b-9b3c-95178861aa1e 16/02/13 07:27:08 INFO spark.HttpServer: Starting HTTP Server 16/02/13 07:27:08 INFO server.Server: jetty-8.1.14.v20131031 16/02/13 07:27:08 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:44346 16/02/13 07:27:08 INFO util.Utils: Successfully started service 'HTTP file server' on port 44346. 16/02/13 07:27:08 INFO server.Server: jetty-8.1.14.v20131031 16/02/13 07:27:08 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 16/02/13 07:27:08 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 16/02/13 07:27:08 INFO ui.SparkUI: Started SparkUI at http://192.168.1.140:4040 16/02/13 07:27:08 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@192.168.1.140:39886/user/HeartbeatReceiver 16/02/13 07:27:08 INFO spark.SparkContext: Starting job: foreach at App.java:24 16/02/13 07:27:08 INFO scheduler.DAGScheduler: Got job 0 (foreach at App.java:24) with 1 output partitions (allowLocal=false) 16/02/13 07:27:08 INFO scheduler.DAGScheduler: Final stage: Stage 0(foreach at App.java:24) 16/02/13 07:27:08 INFO scheduler.DAGScheduler: Parents of final stage: List() 16/02/13 07:27:08 INFO scheduler.DAGScheduler: Missing parents: List() 16/02/13 07:27:08 INFO scheduler.DAGScheduler: Submitting Stage 0 (ParallelCollectionRDD[0] at parallelize at App.java:23), which has no missing parents 16/02/13 07:27:08 INFO storage.MemoryStore: ensureFreeSpace(1504) called with curMem=0, maxMem=991753666 16/02/13 07:27:08 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1504.0 B, free 945.8 MB) 16/02/13 07:27:08 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from Stage 0 (ParallelCollectionRDD[0] at parallelize at App.java:23) 16/02/13 07:27:08 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 16/02/13 07:27:08 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1224 bytes) 16/02/13 07:27:08 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0) A1 B2 C3 D4 F5 16/02/13 07:27:08 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 585 bytes result sent to driver 16/02/13 07:27:08 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 39 ms on localhost (1/1) 16/02/13 07:27:08 INFO scheduler.DAGScheduler: Stage 0 (foreach at App.java:24) finished in 0.054 s 16/02/13 07:27:08 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 16/02/13 07:27:08 INFO spark.SparkContext: Job finished: foreach at App.java:24, took 0.240135089 s |