Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use dictionary data to append data to pandas dataframe

I have a dataframe and 2 separate dictionaries. Both dictionaries have the same keys but have different values. dict_1 has key-value pairs where the values are unique ids that correspond with the dataframe df. I want to be able to use the 2 dictionaries and the unique ids from the dict_1 to append the values of dict_2 into the dataframe df.

Example of dataframe df:

col_1    col_2    id   col_3
 100      500     a1    478
 785      400     a1    490
 ...      ...     a1    ...
 ...      ...     a2    ...
 ...      ...     a2    ...
 ...      ...     a2    ...
 ...      ...     a3    ...
 ...      ...     a3    ...
 ...      ...     a3    ...
 ...      ...     a4    ...
 ...      ...     a4    ...
 ...      ...     a4    ...

Example of dict_1:

1:['a1', 'a3'],
2:['a2', 'a4'],
3:[...],
4:[...],
5:[...],
.

Example of dict_2:

1:[0, 1],
2:[1, 1],
3:[...],
4:[...],
5:[...],
.

I'm trying to append the data from dict_2 using id's from dict_1 into the main df. In a sense add the 2 values (or n values) from the lists of dict_2 as 2 columns (or n columns) into the df.

Resultant df:

col_1    col_2    id   col_3   new_col_1   new_col_2 
 100      500     a1    478        0           1
 785      400     a1    490        0           1
 ...      ...     a1    ...        0           1
 ...      ...     a2    ...        1           1
 ...      ...     a2    ...        1           1
 ...      ...     a2    ...        1           1
 ...      ...     a3    ...        0           1
 ...      ...     a3    ...        0           1
 ...      ...     a3    ...        0           1
 ...      ...     a4    ...        1           1
 ...      ...     a4    ...        1           1
 ...      ...     a4    ...        1           1
like image 848
user9996043 Avatar asked May 26 '20 16:05

user9996043


People also ask

How do you append a dictionary as a row to a DataFrame in Python?

Append Dict as Row to DataFrame You can create a DataFrame and append a new row to this DataFrame from dict, first create a Python Dictionary and use append() function, this method is required to pass ignore_index=True in order to append dict as a row to DataFrame, not using this will get you an error.

Can dictionaries use append?

Appending element(s) to a dictionaryTo append an element to an existing dictionary, you have to use the dictionary name followed by square brackets with the key name and assign a value to it.

How do you turn a dictionary into a data frame?

You can convert a dictionary to Pandas Dataframe using df = pd. DataFrame. from_dict(my_dict) statement.

How to convert a dictionary to a Dataframe in pandas?

Let’s say there are two keys to it that is the name of the country and its capital. I will make a list of all the dictionary that represents the keys and value field in each dictionary. Now when you get the list of dictionary then You will use the pandas function DataFrame () to modify it into dataframe.

How to append Dataframe in pandas?

Pandas is one of those packages and makes importing and analyzing data much easier. Pandas dataframe.append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value.

How to add dictionary keys and values as pandas columns?

How to Add Dictionary Keys and Values as Pandas Columns? 1 Step 1: Import the necessary libraries#N#Here I will use only the pandas library for creating dataframe.#N#import pandas as... 2 Step 2: Create a List of Dictionary items#N#Before converting a dictionary into the data frame lets creates a sample... 3 Step 3: Create a Dataframe More ...

How to create a Dataframe from a loop?

What you need to do is to build your dictionary with your loop, then at then end of your loop, you can use your dictionary to create a dataframe with: df1 = pd.DataFrame(podcast_dict) And append using pd.concat: df_podcast = pd.concat([df_podcast, df1])


4 Answers

IIUC, the keys in your two dictionaries are aligned. One way is to create a dataframe with a column id containing the values in dict_1 and 2 (in this case but can be more) columns from the values in dict_2 aligned on the same key. Then use merge on id to get the result back in df

# the two dictionaries. note in dict_2 I added an element for the list in key 2
# to show it works for any number of columns
dict_1 = {1:['a1', 'a3'],2:['a2', 'a4'],}
dict_2 = {1:[0,1],2:[1,1,2]} 

#create a dataframe from dict_2, here it might be something easier but can't find it
df_2 = pd.concat([pd.Series(vals, name=key) 
                  for key, vals in dict_2.items()], axis=1).T
print(df_2) #index are the keys, and columns are the future new_col_x
     0    1    2
1  0.0  1.0  NaN
2  1.0  1.0  2.0

#concat with the dict_1 once explode the values in the list, 
# here just a print to see what it's doing
print (pd.concat([pd.Series(dict_1, name='id').explode(),df_2], axis=1))
   id    0    1    2
1  a1  0.0  1.0  NaN
1  a3  0.0  1.0  NaN
2  a2  1.0  1.0  2.0
2  a4  1.0  1.0  2.0

# use previous concat, with a rename to change column names and merge to df
df = df.merge(pd.concat([pd.Series(dict_1, name='id').explode(),df_2], axis=1)
                .rename(columns=lambda x: f'new_col_{x+1}' 
                                          if isinstance(x, int) else x), 
              on='id', how='left')

and you get

print (df)
   col_1 col_2  id col_3  new_col_1  new_col_2  new_col_3
0    100   500  a1   478        0.0        1.0        NaN
1    785   400  a1   490        0.0        1.0        NaN
2    ...   ...  a1   ...        0.0        1.0        NaN
3    ...   ...  a2   ...        1.0        1.0        2.0
4    ...   ...  a2   ...        1.0        1.0        2.0
5    ...   ...  a2   ...        1.0        1.0        2.0
6    ...   ...  a3   ...        0.0        1.0        NaN
7    ...   ...  a3   ...        0.0        1.0        NaN
8    ...   ...  a3   ...        0.0        1.0        NaN
9    ...   ...  a4   ...        1.0        1.0        2.0
10   ...   ...  a4   ...        1.0        1.0        2.0
11   ...   ...  a4   ...        1.0        1.0        2.0
like image 181
Ben.T Avatar answered Oct 08 '22 11:10

Ben.T


Let us try explode with map

s=pd.Series(dict_1).explode().reset_index()
s.columns=[1,2]
df['new_1']=df.id.map(dict(zip(s[2],s[1])))

#s=pd.Series(dict_2).explode().reset_index()
#s.columns=[1,2]
#df['new_2']=df.id.map(dict(zip(s[2],s[1])))
like image 21
BENY Avatar answered Oct 08 '22 11:10

BENY


Assume you have 'n values from the lists of dict_2 and want to construct n new columns in df' such as

dict_2 = {1: [0, 1], 2: [1, 1, 6, 9]}

Using dict comprehension to construct a new dictionary from dict_2 and dict_1 and use it to construct a new dataframe with orient='index'. Chaining rename and add_prefix. Finally, merge it back to df with option left_on='id', right_index=True

key_dict = {x: v for k, v in dict_2.items() for x in dict_1[k]}

df_add = (pd.DataFrame.from_dict(key_dict, orient='index')
                      .rename(lambda x: int(x)+1, axis=1).add_prefix('newcol_'))
    
df_final = df.merge(df_add, left_on='id', right_index=True)

Out[33]:
   col_1 col_2  id col_3  newcol_1  newcol_2  newcol_3  newcol_4
0    100   500  a1   478         0         1       NaN       NaN
1    785   400  a1   490         0         1       NaN       NaN
2    ...   ...  a1   ...         0         1       NaN       NaN
3    ...   ...  a2   ...         1         1       6.0       9.0
4    ...   ...  a2   ...         1         1       6.0       9.0
5    ...   ...  a2   ...         1         1       6.0       9.0
6    ...   ...  a3   ...         0         1       NaN       NaN
7    ...   ...  a3   ...         0         1       NaN       NaN
8    ...   ...  a3   ...         0         1       NaN       NaN
9    ...   ...  a4   ...         1         1       6.0       9.0
10   ...   ...  a4   ...         1         1       6.0       9.0
11   ...   ...  a4   ...         1         1       6.0       9.0
like image 45
Andy L. Avatar answered Oct 08 '22 11:10

Andy L.


Construct a DataFrame that combines both of the dicts along the keys. Use the DataFrame.from_dict constructor and pandas will deal with the alignment on keys.

Then use wide_to_long to reshape it so that each 'id' in dict_1 gets linked with all of the columns in dict_2. Then this is a simple merge to join back to the original.

Sample Data

dict_1 = {1: ['a1', 'a3'], 2: ['a2', 'a4']}
dict_2 = {1: [0, 1], 2: [1, 1, 2]} 

Code

df1 = pd.concat([pd.DataFrame.from_dict(dict_1, orient='index').add_prefix('id'),
                 pd.DataFrame.from_dict(dict_2, orient='index').add_prefix('new_col')], axis=1)
#  id0 id1  new_col0  new_col1  new_col2
#1  a1  a3         0         1       NaN
#2  a2  a4         1         1       2.0

df1 = (pd.wide_to_long(df1, i=[x for x in df1.columns if 'new_col' in x],
                       j='will_drop', stubnames=['id'])
         .reset_index().drop(columns='will_drop'))
#   new_col0  new_col1  new_col2  id
#0         0         1       NaN  a1
#1         0         1       NaN  a3
#2         1         1       2.0  a2
#3         1         1       2.0  a4

df = df.merge(df1, how='left')

   col_1 col_2  id col_3  new_col0  new_col1  new_col2
0    100   500  a1   478         0         1       NaN
1    785   400  a1   490         0         1       NaN
2    ...   ...  a1   ...         0         1       NaN
3    ...   ...  a2   ...         1         1       2.0
4    ...   ...  a2   ...         1         1       2.0
5    ...   ...  a2   ...         1         1       2.0
6    ...   ...  a3   ...         0         1       NaN
7    ...   ...  a3   ...         0         1       NaN
8    ...   ...  a3   ...         0         1       NaN
9    ...   ...  a4   ...         1         1       2.0
10   ...   ...  a4   ...         1         1       2.0
11   ...   ...  a4   ...         1         1       2.0
like image 2
ALollz Avatar answered Oct 08 '22 11:10

ALollz