I am reading files from a folder in a loop and creating dataframes from these.
However, I am getting this weird error TypeError: 'str' object is not callable
.
Please find the code here:
for yr in range (2014,2018):
cat_bank_yr = sqlCtx.read.csv(cat_bank_path+str(yr)+'_'+h1+'bank.csv000',sep='|',schema=schema)
cat_bank_yr=cat_bank_yr.withColumn("cat_ledger",trim(lower(col("cat_ledger"))))
cat_bank_yr=cat_bank_yr.withColumn("category",trim(lower(col("category"))))
The code runs for one iteration and then stops at the line
cat_bank_yr=cat_bank_yr.withColumn("cat_ledger",trim(lower(col("cat_ledger"))))
with the above error.
Can anyone help out?
Your code looks fine - if the error indeed happens in the line you say it happens, you probably accidentally overwrote one of the PySpark function with a string.
To check this, put the following line directly above your for loop and see whether the code runs without an error now:
from pyspark.sql.functions import col, trim, lower
Alternatively, double-check whether the code really stops in the line you said, or check whether col, trim, lower are what you expect them to be by calling them like this:
col
should return
function pyspark.sql.functions._create_function.._(col)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With