I've installed Spark and components locally and I'm able to execute PySpark code in Jupyter, iPython and via spark-submit - however receiving the following WARNING's:
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/ayubk/spark-3.0.1-bin-hadoop3.2/jars/spark-unsafe_2.12-3.0.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
20/12/27 07:54:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
The .py file executes but should I be worried about these warnings? Don't want to start writing some code to later find that it doesn't execute down the line. FYI installed PySpark locally. Here's the code:
test.txt
:
This is a test file
This is the second line - TEST
This is the third line
this IS THE fourth LINE - tEsT
test.py
:
import pyspark
sc = pyspark.SparkContext.getOrCreate()
# sc = pyspark.SparkContext(master='local[*]') # or 'local[2]' ?
lines = sc.textFile("test.txt")
llist = lines.collect()
for line in llist:
print(line)
print("SparkContext version:\t", sc.version) # return SparkContext version
print("python version:\t", sc.pythonVer) # return python version
print("master URL:\t", sc.master) # master URL to connect to
print("path where spark is installed on worker nodes:\t", sc.sparkHome) # path where spark is installed on worker nodes
print("name of spark user running SparkContext:\t", sc.sparkUser()) # name of spark user running SparkContext
PATHs:
export SPARK_HOME=/Users/ayubk/spark-3.0.1-bin-hadoop3.2
export PATH=$SPARK_HOME:$PATH
export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH
export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
export PYSPARK_PYTHON=python3
bash terminal:
$ spark-3.0.1-bin-hadoop3.2/bin/spark-submit test.py
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/ayubk/spark-3.0.1-bin-hadoop3.2/jars/spark-unsafe_2.12-3.0.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
20/12/27 08:00:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/12/27 08:00:01 INFO SparkContext: Running Spark version 3.0.1
20/12/27 08:00:01 INFO ResourceUtils: ==============================================================
20/12/27 08:00:01 INFO ResourceUtils: Resources for spark.driver:
20/12/27 08:00:01 INFO ResourceUtils: ==============================================================
20/12/27 08:00:01 INFO SparkContext: Submitted application: test.py
20/12/27 08:00:01 INFO SecurityManager: Changing view acls to: ayubk
20/12/27 08:00:01 INFO SecurityManager: Changing modify acls to: ayubk
20/12/27 08:00:01 INFO SecurityManager: Changing view acls groups to:
20/12/27 08:00:01 INFO SecurityManager: Changing modify acls groups to:
20/12/27 08:00:01 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ayubk); groups with view permissions: Set(); users with modify permissions: Set(ayubk); groups with modify permissions: Set()
20/12/27 08:00:02 INFO Utils: Successfully started service 'sparkDriver' on port 51254.
20/12/27 08:00:02 INFO SparkEnv: Registering MapOutputTracker
20/12/27 08:00:02 INFO SparkEnv: Registering BlockManagerMaster
20/12/27 08:00:02 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/12/27 08:00:02 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/12/27 08:00:02 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
20/12/27 08:00:02 INFO DiskBlockManager: Created local directory at /private/var/folders/11/13mml0s91q39ckbt584szkp00000gn/T/blockmgr-a99e3df1-6d15-4158-8e09-568910c2b045
20/12/27 08:00:02 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
20/12/27 08:00:02 INFO SparkEnv: Registering OutputCommitCoordinator
20/12/27 08:00:02 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/12/27 08:00:02 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.101:4040
20/12/27 08:00:02 INFO Executor: Starting executor ID driver on host 192.168.1.101
20/12/27 08:00:02 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51255.
20/12/27 08:00:02 INFO NettyBlockTransferService: Server created on 192.168.1.101:51255
20/12/27 08:00:02 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/12/27 08:00:02 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.101, 51255, None)
20/12/27 08:00:02 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.101:51255 with 434.4 MiB RAM, BlockManagerId(driver, 192.168.1.101, 51255, None)
20/12/27 08:00:02 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.101, 51255, None)
20/12/27 08:00:03 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.101, 51255, None)
20/12/27 08:00:03 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 175.8 KiB, free 434.2 MiB)
20/12/27 08:00:03 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 27.1 KiB, free 434.2 MiB)
20/12/27 08:00:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.101:51255 (size: 27.1 KiB, free: 434.4 MiB)
20/12/27 08:00:03 INFO SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:0
20/12/27 08:00:04 INFO FileInputFormat: Total input files to process : 1
20/12/27 08:00:04 INFO SparkContext: Starting job: collect at /Users/ayubk/test.py:9
20/12/27 08:00:04 INFO DAGScheduler: Got job 0 (collect at /Users/ayubk/test.py:9) with 2 output partitions
20/12/27 08:00:04 INFO DAGScheduler: Final stage: ResultStage 0 (collect at /Users/ayubk/test.py:9)
20/12/27 08:00:04 INFO DAGScheduler: Parents of final stage: List()
20/12/27 08:00:04 INFO DAGScheduler: Missing parents: List()
20/12/27 08:00:04 INFO DAGScheduler: Submitting ResultStage 0 (test.txt MapPartitionsRDD[1] at textFile at NativeMethodAccessorImpl.java:0), which has no missing parents
20/12/27 08:00:04 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.0 KiB, free 434.2 MiB)
20/12/27 08:00:04 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KiB, free 434.2 MiB)
20/12/27 08:00:04 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.1.101:51255 (size: 2.3 KiB, free: 434.4 MiB)
20/12/27 08:00:04 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1223
20/12/27 08:00:04 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (test.txt MapPartitionsRDD[1] at textFile at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0, 1))
20/12/27 08:00:04 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
20/12/27 08:00:04 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.1.101, executor driver, partition 0, PROCESS_LOCAL, 7367 bytes)
20/12/27 08:00:04 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 192.168.1.101, executor driver, partition 1, PROCESS_LOCAL, 7367 bytes)
20/12/27 08:00:04 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
20/12/27 08:00:04 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
20/12/27 08:00:04 INFO HadoopRDD: Input split: file:/Users/ayubk/test.txt:52+52
20/12/27 08:00:04 INFO HadoopRDD: Input split: file:/Users/ayubk/test.txt:0+52
20/12/27 08:00:04 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 956 bytes result sent to driver
20/12/27 08:00:04 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1003 bytes result sent to driver
20/12/27 08:00:04 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 156 ms on 192.168.1.101 (executor driver) (1/2)
20/12/27 08:00:04 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 142 ms on 192.168.1.101 (executor driver) (2/2)
20/12/27 08:00:04 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
20/12/27 08:00:04 INFO DAGScheduler: ResultStage 0 (collect at /Users/ayubk/test.py:9) finished in 0.241 s
20/12/27 08:00:04 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
20/12/27 08:00:04 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
20/12/27 08:00:04 INFO DAGScheduler: Job 0 finished: collect at /Users/ayubk/test.py:9, took 0.296115 s
This is a test file
This is the second line - TEST
This is the third line
this IS THE fourth LINE - tEsT
SparkContext version: 3.0.1
python version: 3.7
master URL: local[*]
path where spark is installed on worker nodes: None
name of spark user running SparkContext: ayubk
20/12/27 08:00:04 INFO SparkContext: Invoking stop() from shutdown hook
20/12/27 08:00:04 INFO SparkUI: Stopped Spark web UI at http://192.168.1.101:4040
20/12/27 08:00:04 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/12/27 08:00:04 INFO MemoryStore: MemoryStore cleared
20/12/27 08:00:04 INFO BlockManager: BlockManager stopped
20/12/27 08:00:04 INFO BlockManagerMaster: BlockManagerMaster stopped
20/12/27 08:00:04 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/12/27 08:00:04 INFO SparkContext: Successfully stopped SparkContext
20/12/27 08:00:04 INFO ShutdownHookManager: Shutdown hook called
20/12/27 08:00:04 INFO ShutdownHookManager: Deleting directory /private/var/folders/11/13mml0s91q39ckbt584szkp00000gn/T/spark-eb41b5d5-16e2-4938-8049-8f923e6cb46c
20/12/27 08:00:04 INFO ShutdownHookManager: Deleting directory /private/var/folders/11/13mml0s91q39ckbt584szkp00000gn/T/spark-76d186fb-cf42-4898-92db-050a73f9fcb7
20/12/27 08:00:04 INFO ShutdownHookManager: Deleting directory /private/var/folders/11/13mml0s91q39ckbt584szkp00000gn/T/spark-eb41b5d5-16e2-4938-8049-8f923e6cb46c/pyspark-ee1fe6ab-a27f-4be6-b8d8-06594704da12
Edit: Tried to install Java8:
brew update
brew tap adoptopenjdk/openjdk
brew search jdk
brew install --cask adoptopenjdk8
Although when typing this java -version
, I'm getting this:
openjdk version "13" 2019-09-17
OpenJDK Runtime Environment (build 13+33)
OpenJDK 64-Bit Server VM (build 13+33, mixed mode, sharing)
Install Java 8 instead of Java 11, which is known to give this sort of warnings with Spark.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With