Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to get input file name of a record in spark dataframe?

I am creating a dataframe in spark by loading tab separated files from s3. I need to get the input file name information of each record in the dataframe for further processing. I tried

dataframe.select(inputFileName())

But I am getting null value for input_file_name. somebody please help me to solve this issue.

like image 710
ab_ Avatar asked Oct 11 '16 04:10

ab_


1 Answers

You can create a new column on the data frame using withColumn and input_file_name():

dataframe.withColumn("input_file", input_file_name())
like image 163
Psidom Avatar answered Dec 04 '22 03:12

Psidom