Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pyspark: how to show current directory?

Hi I'm using pyspark interactively. I think I'm failing loading a LOCAL file correctly.

how do I check current directory, so that I can go to browser to take a look at that actual file?

Or is the default directory where pyspark is? Thanks

like image 731
YJZ Avatar asked Nov 07 '25 00:11

YJZ


2 Answers

You can't load local file unless you have same file in all workers under same path. For example if you want to read data.csv file in spark, copy this file to all workers under same path(say /tmp/data.csv). Now you can use sc.textFile("file:///tmp/data.csv") to create RDD.

Current working directory is the folder from where you have started pyspark. You can start pyspark using ipython and run pwd command to check working directory. [Set PYSPARK_DRIVER_PYTHON=/path/to/ipython in spark-env.sh to use ipython]

like image 109
user3343061 Avatar answered Nov 10 '25 01:11

user3343061


import os
cwd = os.getcwd()    
print(cwd)
like image 35
Amrutha Ajayakumar Avatar answered Nov 10 '25 00:11

Amrutha Ajayakumar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!