Applying Mapping Function on DataFrame

Tags:

I have just started using databricks/pyspark. Im using python/spark 2.1. I have uploaded data to a table. This table is a single column full of strings. I wish to apply a mapping function to each element in the column. I load the table into a dataframe:

df = spark.table("mynewtable")

The only way I could see was others saying was to convert it to RDD to apply the mapping function and then back to dataframe to show the data. But this throws up job aborted stage failure:

df2 = df.select("_c0").rdd.flatMap(lambda x: x.append("anything")).toDF()

All i want to do is just apply any sort of map function to my data in the table. For example append something to each string in the column, or perform a split on a char, and then put that back into a dataframe so i can .show() or display it.

777

asked Jul 30 '17 20:07

yahalom

1 Answers

You cannot:

Use flatMap because it will flatten the Row
You cannot use append because:
- tuple or Row have no append method
- append (if present on collection) is executed for side effects and returns None

I would use withColumn:

df.withColumn("foo", lit("anything"))

but map should work as well:

df.select("_c0").rdd.flatMap(lambda x: x + ("anything", )).toDF()

Edit (given the comment):

You probably want an udf

from pyspark.sql.functions import udf

def iplookup(s):
    return ... # Some lookup logic

iplookup_udf = udf(iplookup)

df.withColumn("foo", iplookup_udf("c0"))

Default return type is StringType, so if you want something else you should adjust it.

162

answered Oct 19 '22 21:10

Alper t. Turker

Related questions
                            
                                selecting second child in beautiful soup with soup.select?
                            
                                When should I use function currying in Python?
                            
                                What is with this change of unpacking behavior from Python2 to Python3
                            
                                Flask - access the request in after_request or teardown_request
                            
                                PhantomJS returning empty web page (python, Selenium)
                            
                                animated subplots using matplotlib
                            
                                Nim equivalent of Python's list comprehension
                            
                                Skipping lines, csv.DictReader
                            
                                get_dummies python memory error
                            
                                How to select the first file in a directory?
                            
                                sqlAlchemy, concrete inheritance, but parent has foreignKey
                            
                                .whl is not a valid wheel filename, storing debug log for failure in C:\
                            
                                Python write line by line to a text file [duplicate]
                            
                                Replace strings in a file using regular expressions [closed]
                            
                                Opencv imshow() freezes when updating
                            
                                Is it possible to assign a default value when unpacking?
                            
                                Python TypeError: cannot convert the series to <class 'int'> when trying to do math on dataframe
                            
                                mac os zsh: command not found: pip [duplicate]
                            
                                python tkInter browse folder button
                            
                                Value error: Input arrays should have the same number of samples as target arrays. Found 1600 input samples and 6400 target samples

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Applying Mapping Function on DataFrame

Tags:

python

apache-spark

pyspark

yahalom

People also ask

1 Answers

Alper t. Turker

Recent Activity

Donate For Us