Error in Spark while declaring a UDF

Tags:

I am trying to create a udf which takes a value(array) in a column and returns an array containing only unique elements . Please see the code below in Spark (version-1.6.1):

def uniq_array(col_array):
    x = np.unique(col_array)
    return x

uniq_array_udf = udf(uniq_array,ArrayType())

However, I am continuously running into the error: TypeError: __init__() takes at least 2 arguments (1 given)

Can anyone please help me resolve the error as soon as possible?

Thanks!

337

asked Aug 16 '16 19:08

Preyas

1 Answers

For ArrayType, the type of the contents of the array also needs to be specified, eg

def uniq_array(col_array):
    x = np.unique(col_array)
    return x

uniq_array_udf = udf(uniq_array,ArrayType(IntegerType()))

184

answered Oct 09 '22 22:10

David

Related questions
                            
                                Python: using "if not" on multiple items
                            
                                error occurs when installing cryptography for scrapy in virtualenv on OS X [closed]
                            
                                Python IF multiple "and" "or" in one statement
                            
                                pandas to sql server
                            
                                How to pass an empty parameter to a python function?
                            
                                Flushing numpy memmap to npy file
                            
                                How to get the python Counter output ordered by order of inputs?
                            
                                How to get data labels on a Seaborn pointplot?
                            
                                Split Pandas Series into DataFrame by delimiter
                            
                                Wait for class to exist before continuing with selenium in Firefox
                            
                                pyspark row number dataframe
                            
                                Selecting top n elements from each group in pandas groupby
                            
                                how to sort dataframe based on particular (string)columns using python pandas?
                            
                                Accessing Username and Password in django request header returns None
                            
                                Scrapy 1.1.0 - no active project
                            
                                True + True = 2. Elegantly perform boolean arithmetic?
                            
                                Python Spyder initializing Hello World Kivi app once?
                            
                                What status code should a PATCH request with no changes return?
                            
                                Is the continue statement necessary in a Python while loop?
                            
                                Copying a list using a[:] or copy() in python is shallow? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Error in Spark while declaring a UDF

Tags:

python

apache-spark

pyspark

bigdata

Preyas

People also ask

1 Answers

David

Recent Activity

Donate For Us