pyspark - create DataFrame Grouping columns in map type structure

Tags:

My DataFrame has the following structure:

-------------------------
| Brand | type |  amount|
-------------------------
|  B   |   a  |   10   |
|  B   |   b  |   20   |
|  C   |   c  |   30   |
-------------------------

I want to reduce the amount of rows by grouping type and amount into one single column of type: Map So Brand will be unique and MAP_type_AMOUNT will have key,value for each type amount combination.

I think Spark.sql might have some functions to help in this process, or do I have to get the RDD being the DataFrame and make my "own" conversion to map type?

Expected:

Click to copy

   -------------------------
    | Brand | MAP_type_AMOUNT 
    -------------------------
    |  B    | {a: 10, b:20} |
    |  C    | {c: 30}       |
    -------------------------

479

asked Aug 06 '17 12:08

Alg_D

1 Answers

Slight improvement to Prem's answer (sorry I can't comment yet)

Use func.create_map instead of func.struct. See documentation

Click to copy

import pyspark.sql.functions as func
df = sc.parallelize([('B','a',10),('B','b',20),
('C','c',30)]).toDF(['Brand','Type','Amount'])

df_converted = df.groupBy("Brand").\
    agg(func.collect_list(func.create_map(func.col("Type"),
    func.col("Amount"))).alias("MAP_type_AMOUNT"))

print df_converted.collect()

Output:

Click to copy

[Row(Brand=u'B', MAP_type_AMOUNT=[{u'a': 10}, {u'b': 20}]),
 Row(Brand=u'C', MAP_type_AMOUNT=[{u'c': 30}])]

156

answered Oct 23 '22 03:10

osbon123

Related questions
                            
                                How to identify a string as being a byte literal?
                            
                                Create dictionary where keys are variable names
                            
                                Problems importing imblearn python package on ipython notebook
                            
                                TypeError: not enough arguments for format string - Python SQL connection while using %Y-%m
                            
                                How to lower all the elements in a pandas dataframe?
                            
                                Where can I access request parameters in Django Rest Framework?
                            
                                Converting month to quarter in Pandas dataframe
                            
                                How to disable robots.txt when you launch scrapy shell?
                            
                                Create new dataframe in pandas with dynamic names also add new column
                            
                                Python: Using list comprehensions to filter a list by a list of substrings
                            
                                Nexus repository manager as pip local server not working properly
                            
                                Use `np.diff` but assume the input starts with an extra zero
                            
                                Python TooManyRedirects: Exceeded 30 redirects
                            
                                Django settings LOGOUT_REDIRECT_URL doesn't work
                            
                                clever any() like function to check if at least n elements are True?
                            
                                How to add gaussian noise in an image in Python using PyMorph
                            
                                Python regex to match punctuation at end of string
                            
                                Is it possible to get the default background color using curses in python?
                            
                                How to remove RunTimeWarning Errors from code?
                            
                                How to find index of minimum non zero element with numpy?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pyspark - create DataFrame Grouping columns in map type structure

Tags:

python

dictionary

sql

pyspark

spark-dataframe

Alg_D

People also ask

1 Answers

osbon123

Recent Activity

Donate For Us