Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a dataframe from dictionary and both key and value are rows

I have a dictionary where keys are patient ids, and values are same for all keys: [1, 2, 3], indicating each patient will visit the clinic 3 times. How can I convert it to a dataframe where both the keys and the values are rows?

Dictionary:

patients = ['Patient01', 'patient02', 'patient03']
visits = [1,2,3]
dictionary = {k:visits for k in patients}

output:

{'Patient01': [1, 2, 3],
 'patient02': [1, 2, 3],
 'patient03': [1, 2, 3]}

I tried

pd.DataFrame.from_dict(dictionary, orient = 'index')

but the output is

            0   1   2
patient02   1   2   3
patient03   1   2   3
patient01   1   2   3

and what I want is like this:

          visit_num
patient01  1
patient01  2
patient01  3
patient02  1
patient02  2
patient02  3
patient03  1
patient03  2
patient03  3
like image 366
Karen Liu Avatar asked Nov 30 '22 08:11

Karen Liu


2 Answers

Use pd.stack() on the dataframe you created:

df = pd.DataFrame.from_dict(dictionary, orient = 'index')

new_df = df.stack().reset_index(level=1, drop=True).to_frame(name='visit_num')

>>> new_df
           visit num
Patient01          1
Patient01          2
Patient01          3
patient02          1
patient02          2
patient02          3
patient03          1
patient03          2
patient03          3

Note of explanation:

df.stack does most of the work here, taking your original df

           0  1  2
Patient01  1  2  3
patient02  1  2  3
patient03  1  2  3

and turns it into the following multi-indexed pandas.Series:

Patient01  0    1
           1    2
           2    3
patient02  0    1
           1    2
           2    3
patient03  0    1
           1    2
           2    3

The rest of the line (.reset_index() and .to_frame()) is simply there to get it into a nice dataframe format.

like image 91
sacuL Avatar answered Dec 01 '22 21:12

sacuL


Use melt:

df = pd.DataFrame.from_dict(dictionary, orient = 'index')
df.reset_index()\
  .melt('index',value_name='visit_num')\
  .drop('variable', axis=1)\
  .sort_values('index') #if you wish to get your order

Output:

       index  visit_num
1  Patient01          1
4  Patient01          2
7  Patient01          3
2  patient02          1
5  patient02          2
8  patient02          3
0  patient03          1
3  patient03          2
6  patient03          3
like image 45
Scott Boston Avatar answered Dec 01 '22 21:12

Scott Boston