pyspark create dictionary from data in two columns

Tags:

pyspark

I have a pyspark dataframe with two columns:

[Row(zip_code='58542', dma='MIN'),
 Row(zip_code='58701', dma='MIN'),
 Row(zip_code='57632', dma='MIN'),
 Row(zip_code='58734', dma='MIN')]

How can I make a key:value pair out of the data inside the columns?

e.g.:

{
 "58542":"MIN",
 "58701:"MIN",
 etc..
}

I would like to avoid using collect for performance reasons. I've tried a few things but can't seem to get just the values.

723

asked Sep 04 '18 19:09

3 Answers

You can simply do this:

dict = {row['zipcode']:row['dma'] for row in df.collect()}
print(dict)
#{'58542': 'MIN', '58701': 'MIN', '57632': 'MIN', '58734': 'MIN'}

129

answered Sep 19 '22 11:09

There is one more way to convert your dataframe into dict. for that you need to convert your dataframe into key-value pair rdd as it will be applicable only to key-value pair rdd. since dictionary itself a combination of key value pairs.

data = [
    Row(zip_code='58542', dma='MIN'),
    Row(zip_code='58701', dma='MIN'),
    Row(zip_code='57632', dma='MIN'),
    Row(zip_code='58734', dma='MIN')
]

>>> data.show();
+---+--------+
|dma|zip_code|
+---+--------+
|MIN|   58542|
|MIN|   58701|
|MIN|   57632|
|MIN|   58734|
+---+--------+

converting your dataframe into rdd.

newrdd = data.rdd

since you want zip_code as your key and dma as value, so have selected rdd element '1' as key and element '0' as value.

keypair_rdd = newrdd.map(lambda x : (x[1],x[0]))

once you have key-pair rdd then simply use collectAsMap to convert it into a dictonary

>>> dict = keypair_rdd.collectAsMap()
>>> print dict
{u'58542': u'MIN', u'57632': u'MIN', u'58734': u'MIN', u'58701': u'MIN'}

>>> dict.keys()
[u'58542', u'57632', u'58734', u'58701']

looks value for specific key:

>>> dict.get('58542')
u'MIN'

answered Sep 18 '22 11:09

vikrant rana

Related questions
                            
                                How to read an array of integers from single line of input in python3
                            
                                What's the advantage of a trailing underscore in Python naming?
                            
                                Split large text file(around 50GB) into multiple files
                            
                                Disable images in Selenium Python
                            
                                How to find all ordered pairs of elements in array of integers whose sum lies in a given range of value
                            
                                Django 1.6: How to access static files in view
                            
                                Find all locations / cities / places in a text
                            
                                Group by one columns and find sum and max value for another in pandas
                            
                                Using PyCharm I want to show plot extra figure windows
                            
                                RaspBerry doesn't find pip3
                            
                                How can I filter a pcap file by specific protocol using python?
                            
                                Example use of assert in Python?
                            
                                How to make a OptionMenu maintain the same width?
                            
                                return statement in for loops
                            
                                How to iterate over a dictionary
                            
                                Python read log files and get lines containing specific words
                            
                                How do I find directory of the Python running script from inside the script?
                            
                                __next__ in generators and iterators and what is a method-wrapper?
                            
                                Pandas select rows if ID appear several time
                            
                                How to get the key from value in a dictionary in Python? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pyspark create dictionary from data in two columns

Tags:

python

pyspark

too_many_questions

People also ask

3 Answers

BICube

pault

converting your dataframe into rdd.

since you want zip_code as your key and dma as value, so have selected rdd element '1' as key and element '0' as value.

looks value for specific key:

vikrant rana

Recent Activity

Donate For Us