map values in a dataframe from a dictionary using pyspark

Tags:

I want to know how to map values in a specific column in a dataframe.

I have a dataframe which looks like:

df = sc.parallelize([('india','japan'),('usa','uruguay')]).toDF(['col1','col2'])

+-----+-------+
| col1|   col2|
+-----+-------+
|india|  japan|
|  usa|uruguay|
+-----+-------+

I have a dictionary from where I want to map the values.

dicts = sc.parallelize([('india','ind'), ('usa','us'),('japan','jpn'),('uruguay','urg')])

The output I want is:

+-----+-------+--------+--------+
| col1|   col2|col1_map|col2_map|
+-----+-------+--------+--------+
|india|  japan|     ind|     jpn|
|  usa|uruguay|      us|     urg|
+-----+-------+--------+--------+

I have tried using the lookup function but it doesn't work. It throws error SPARK-5063. Following is my approach which failed:

def map_val(x):
    return dicts.lookup(x)[0]

myfun = udf(lambda x: map_val(x), StringType())

df = df.withColumn('col1_map', myfun('col1')) # doesn't work
df = df.withColumn('col2_map', myfun('col2')) # doesn't work

815

asked May 13 '18 23:05

YOLO

1 Answers

I think the easier way is just to use a simple dictionary and df.withColumn.

from itertools import chain
from pyspark.sql.functions import create_map, lit

simple_dict = {'india':'ind', 'usa':'us', 'japan':'jpn', 'uruguay':'urg'}

mapping_expr = create_map([lit(x) for x in chain(*simple_dict.items())])

df = df.withColumn('col1_map', mapping_expr[df['col1']])\
       .withColumn('col2_map', mapping_expr[df['col2']])

df.show(truncate=False)

answered Sep 19 '22 18:09

Ali AzG

Related questions
                            
                                Python : Reverse Order Of List [duplicate]
                            
                                Modify dict values inplace
                            
                                Trouble with Django sending email though smtp.gmail.com
                            
                                Get diagonal without using numpy?
                            
                                How to do encapsulation in Python?
                            
                                Shift list elements to the right and shift list element at the end to the beginning
                            
                                Getting "__init__() got an unexpected keyword argument 'document'" this error in python I'm working with Word2Vec and gensim
                            
                                Python 'No module named' error; 'package' is not a package
                            
                                How to fix "invalid argument: invalid 'expiry'" in Selenium when adding cookies to a chromedriver?
                            
                                Aborting, target uses selinux but Python bindings (libselinux-Python) aren't installed
                            
                                Get subdomain from URL using Python
                            
                                How do I replace commas in a string [closed]
                            
                                defaultdict and tuples
                            
                                Python convert Tuple to Integer
                            
                                Ignoring output from subprocess.Popen [duplicate]
                            
                                SyntaxError: multiple statements found while compiling a single statement
                            
                                Unknown command: shell_plus and --settings
                            
                                How to perform ceiling-division in integer arithmetic? [duplicate]
                            
                                how to extract x,y coordinates from OpenCV "cv2.keypoint" object?
                            
                                Add 1 day to my date in Python [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

map values in a dataframe from a dictionary using pyspark

Tags:

python

apache-spark

pyspark

YOLO

People also ask

1 Answers

Ali AzG

Recent Activity

Donate For Us