Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create new DataFrame with dict

Tags:

pyspark

I had one dict, like:

cMap = {"k1" : "v1", "k2" : "v1", "k3" : "v2", "k4" : "v2"}

and one DataFrame A, like:

+---+
|key|
+----
| k1|
| k2|
| k3|
| k4|
+---+

to create the DataFame above with code:

data = [('k1'),
    ('k2'),
    ('k3'),
    ('k4')]
A = spark.createDataFrame(data, ['key'])

I want to get the new DataFrame, like:

+---+----------+----------+
|key|   v1     |    v2    |
+---+----------+----------+
| k1|true      |false     |
| k2|true      |false     |
| k3|false     |true      |
| k4|false     |true      |
+---+----------+----------+

I wish to get some suggestions, thanks!

like image 924
Ivan Lee Avatar asked May 03 '17 05:05

Ivan Lee


People also ask

Can we create DataFrame from dictionary of lists?

It is the most commonly used pandas object. Creating pandas data-frame from lists using dictionary can be achieved in multiple ways. Let's discuss different ways to create a DataFrame one by one. With this method in Pandas, we can transform a dictionary of lists into a dataframe.

How do I create a new data frame?

To create a dataframe, we need to import pandas. Dataframe can be created using dataframe() function. The dataframe() takes one or two parameters. The first one is the data which is to be filled in the dataframe table.

Is pandas DataFrame a dictionary?

Here is yet another example of how useful and powerful Pandas is. Pandas can create dataframes from many kinds of data structures—without you having to write lots of lengthy code. One of those data structures is a dictionary.

When we create DataFrame from dictionary of list then keys becomes the?

It will return a Dataframe i.e. As all the dictionaries in the list had similar keys, so the keys became the column names. Then for each key, values of that key in all the dictionaries became the column values. As we didn't provide any index argument, so dataframe has default indexes i.e. 0 to N-1.


1 Answers

I just wanted to contribute a different and possibly easier way to solve this.

In my code I convert a dict to a pandas dataframe, which I find is much easier. Then I directly convert the pandas dataframe to spark.

data = {'visitor': ['foo', 'bar', 'jelmer'], 
        'A': [0, 1, 0],
        'B': [1, 0, 1],
        'C': [1, 0, 0]}

df = pd.DataFrame(data)
ddf = spark.createDataFrame(df)

Output:
+---+---+---+-------+
|  A|  B|  C|visitor|
+---+---+---+-------+
|  0|  1|  1|    foo|
|  1|  0|  0|    bar|
|  0|  1|  0| jelmer|
+---+---+---+-------+
like image 124
J. Offenberg Avatar answered Oct 01 '22 19:10

J. Offenberg