pyspark - create DataFrame Grouping columns in map type structure

My DataFrame has the following structure:

| Brand | type |  amount|
|  B   |   a  |   10   |
|  B   |   b  |   20   |
|  C   |   c  |   30   |

I want to reduce the amount of rows by grouping type and amount into one single column of type: Map So Brand will be unique and MAP_type_AMOUNT will have key,value for each type amount combination.

I think Spark.sql might have some functions to help in this process, or do I have to get the RDD being the DataFrame and make my "own" conversion to map type?


    | Brand | MAP_type_AMOUNT 
    |  B    | {a: 10, b:20} |
    |  C    | {c: 30}       |
1 Answers

Slight improvement to Prem's answer (sorry I can't comment yet)

Use func.create_map instead of func.struct. See documentation

import pyspark.sql.functions as func
df = sc.parallelize([('B','a',10),('B','b',20),

df_converted = df.groupBy("Brand").\

print df_converted.collect()


[Row(Brand=u'B', MAP_type_AMOUNT=[{u'a': 10}, {u'b': 20}]),
 Row(Brand=u'C', MAP_type_AMOUNT=[{u'c': 30}])]
