Pyspark: Using lambda function and .withColumn produces a none-type error I'm having trouble understanding

Question

I have the following code below. Essentially what I'm trying to do is to generate some new columns from the values in existing ones. After I do that, I save the dataframe with the new columns as a table in the cluster. Sorry I'm new to pyspark still.

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
from pyspark.sql.functions import udf, array
from pyspark.sql.types import DecimalType
import numpy as np
import math

df = sqlContext.sql('select * from db.mytable')

angle_av = udf(lambda (x, y): -10 if x == 0 else math.atan2(y/x)*180/np.pi, DecimalType(20,10))

df = df.withColumn('a_v_angle', angle_av(array('a_v_real', 'a_v_imag')))

df.createOrReplaceTempView('temp')

sqlContext.sql('create table new_table as select * from temp')

These operations actually don't produce any errors. I then attempt to store the df as a table and get the following error (i'm guessing since this is when the operations are actually executed):

File "/usr/hdp/current/spark2-client/python/pyspark/worker.py", line 171, in main
    process()
  File "/usr/hdp/current/spark2-client/python/pyspark/worker.py", line 166, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/usr/hdp/current/spark2-client/python/pyspark/worker.py", line 103, in <lambda>
    func = lambda _, it: map(mapper, it)
  File "<string>", line 1, in <lambda>
  File "/usr/hdp/current/spark2-client/python/pyspark/worker.py", line 70, in <lambda>
    return lambda *a: f(*a)
  File "<stdin>", line 14, in <lambda>
TypeError: unsupported operand type(s) for /: 'NoneType' and 'NoneType'
    at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
    at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)

user8753487 · Accepted Answer

This happens because input values are null / None. function should check its input and proceed accordingly.

 f x == 0  or x is None

or just

if not x

Pyspark: Using lambda function and .withColumn produces a none-type error I'm having trouble understanding

Tags:

dataframe

lambda

nonetype

apache-spark

pyspark

Kevin Frankola

1 Answers

user8753487

Recent Activity

Donate For Us

Pyspark: Using lambda function and .withColumn produces a none-type error I'm having trouble understanding

Tags:

dataframe

lambda

nonetype

apache-spark

pyspark

Kevin Frankola

1 Answers

user8753487

Related questions

Recent Activity

Donate For Us