Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read specific column in pyspark?

I am new to pyspark. I want to read specific column from input file. I know how to do this in pandas

df=pd.read_csv('file.csv',usecols=[0,1,2])

But Is there any functionality similar to this operation in pyspark?

like image 829
Mohamed Thasin ah Avatar asked Dec 14 '25 04:12

Mohamed Thasin ah


1 Answers

Hi you can use map to select specific columns

from pyspark import SQLContext
from pyspark import SparkConf, SparkContext
conf = SparkConf().setAppName("ReadCSV")
sc = SparkContext(conf=conf) 
sqlctx = SQLContext(sc)
df=sc.textFile("te2.csv") \
   .map(lambda line: line.split(";")) \
   .map(lambda line: (line[0],line[3])) \
   .toDF()
like image 105
zlidime Avatar answered Dec 16 '25 20:12

zlidime



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!