Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PySpark: TypeError: 'str' object is not callable in dataframe operations

Tags:

python

pyspark

I am reading files from a folder in a loop and creating dataframes from these. However, I am getting this weird error TypeError: 'str' object is not callable. Please find the code here:

for yr in range (2014,2018):
  cat_bank_yr = sqlCtx.read.csv(cat_bank_path+str(yr)+'_'+h1+'bank.csv000',sep='|',schema=schema)
  cat_bank_yr=cat_bank_yr.withColumn("cat_ledger",trim(lower(col("cat_ledger"))))
  cat_bank_yr=cat_bank_yr.withColumn("category",trim(lower(col("category"))))

The code runs for one iteration and then stops at the line

cat_bank_yr=cat_bank_yr.withColumn("cat_ledger",trim(lower(col("cat_ledger")))) 

with the above error.

Can anyone help out?

like image 624
pnv Avatar asked Dec 13 '22 10:12

pnv


1 Answers

Your code looks fine - if the error indeed happens in the line you say it happens, you probably accidentally overwrote one of the PySpark function with a string.

To check this, put the following line directly above your for loop and see whether the code runs without an error now:

from pyspark.sql.functions import col, trim, lower

Alternatively, double-check whether the code really stops in the line you said, or check whether col, trim, lower are what you expect them to be by calling them like this:

col

should return

function pyspark.sql.functions._create_function.._(col)

like image 107
Thomas Avatar answered Jan 03 '23 04:01

Thomas