Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pyspark sql : AttributeError: 'NoneType' object has no attribute 'join'

def main(inputs, output):

    sdf = spark.read.csv(inputs, schema=observation_schema)
    sdf.registerTempTable('filtertable')

    result = spark.sql("""
    SELECT * FROM filtertable WHERE qflag IS NULL
    """).show()

    temp_max = spark.sql(""" SELECT date, station, value FROM filtertable WHERE (observation = 'TMAX')""").show()
    temp_min = spark.sql(""" SELECT date, station, value FROM filtertable WHERE (observation = 'TMIN')""").show()

    result = temp_max.join(temp_min, condition1).select(temp_max('date'), temp_max('station'), ((temp_max('TMAX')-temp_min('TMIN'))/10)).alias('Range'))

Error:

Traceback (most recent call last):
  File "/Users/syedikram/Documents/temp_range_sql.py", line 96, in <module>
    main(inputs, output)
  File "/Users/syedikram/Documents/temp_range_sql.py", line 52, in main
    result = temp_max.join(temp_min, condition1).select(temp_max('date'), temp_max('station'), ((temp_max('TMAX')-temp_min('TMIN')/10)).alias('Range'))
AttributeError: 'NoneType' object has no attribute 'join'

Performing on join operation gives me Nonetype object error. Looking online didn't help as there is little documentation online for pyspark sql. What am I doing wrong here?

like image 522
Syed Ikram Avatar asked Jan 01 '23 18:01

Syed Ikram


1 Answers

Remove the .show() from temp_max and temp_min because show only prints a string and does not return anything (hence you get AttributeError: 'NoneType' object has no attribute 'join').

like image 88
pansen Avatar answered May 02 '23 12:05

pansen