Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to join a multi-index series to a single index dataframe with Pandas?

Consider the following single index DataFrame:

      energy    fat
1      2000      28
2      1900      17
3      2200      30
4      1750      15
5      1800      18
6      1600      12

I also have a multindex Series:

1  vitamin-c    0.0004
   vitamin-a    0.0150
2  vitamin-c    0.0030
3  vitamin-d    1.2000
   vitamin-e    1.0007
   vitamin-c    1.2020
4  vitamin-a    0.0780
5  vitamin-b    0.9650
6  vitamin-e    1.9801
   vitamin-c    1.0011

How can I join the two so the result looks like this:

      energy    fat          vitamins
1      2000      28     vitamin-c    0.0004
                        vitamin-a    0.0150
2      1900      17     vitamin-c    0.0030
3      2200      30     vitamin-d    1.2000
                        vitamin-e    1.0007
                        vitamin-c    1.2020
4      1750      15     vitamin-a    0.0780
5      1800      18     vitamin-b    0.9650
6      1600      12     vitamin-e    1.9801
                        vitamin-c    1.0011

I tried df.join(series, how = 'inner') but all I got is the following error message:

"ValueError: cannot join with no level specified and no overlapping names"

Can someone please explain me what im doing wrong here and how i can achieve the combination of the two ? Thank you !

like image 519
solub Avatar asked Jan 03 '23 06:01

solub


2 Answers

Option 1
I don't suggest moving things into the index that shouldn't be there.
That said, you can use pd.DataFrame.join if your index levels are appropriately named, or rather they match so pandas knows what to join on.

df.rename_axis('ord').join(s.rename_axis(['ord', 'vit']).rename('val'))

               energy  fat     val
ord vit                           
1   vitamin-c    2000   28  0.0004
    vitamin-a    2000   28  0.0150
2   vitamin-c    1900   17  0.0030
3   vitamin-d    2200   30  1.2000
    vitamin-e    2200   30  1.0007
    vitamin-c    2200   30  1.2020
4   vitamin-a    1750   15  0.0780
5   vitamin-b    1800   18  0.9650
6   vitamin-e    1600   12  1.9801
    vitamin-c    1600   12  1.0011

In a couple more lines to add readability

s = s.rename_axis(['ord', 'vit']).rename('val')
df = df.rename_axis('ord')

df.join(s)

               energy  fat     val
ord vit                           
1   vitamin-c    2000   28  0.0004
    vitamin-a    2000   28  0.0150
2   vitamin-c    1900   17  0.0030
3   vitamin-d    2200   30  1.2000
    vitamin-e    2200   30  1.0007
    vitamin-c    2200   30  1.2020
4   vitamin-a    1750   15  0.0780
5   vitamin-b    1800   18  0.9650
6   vitamin-e    1600   12  1.9801
    vitamin-c    1600   12  1.0011

Option 2
We can also use pd.concat with loc and pd.Index.get_level_values

pd.concat(
    [df.loc[s.index.get_level_values(0)].set_index(s.index), s.rename('val')],
    axis=1
)

             energy  fat     val
1 vitamin-c    2000   28  0.0004
  vitamin-a    2000   28  0.0150
2 vitamin-c    1900   17  0.0030
3 vitamin-d    2200   30  1.2000
  vitamin-e    2200   30  1.0007
  vitamin-c    2200   30  1.2020
4 vitamin-a    1750   15  0.0780
5 vitamin-b    1800   18  0.9650
6 vitamin-e    1600   12  1.9801
  vitamin-c    1600   12  1.0011
like image 163
piRSquared Avatar answered Jan 13 '23 13:01

piRSquared


If you add names to the index/multiindex you can use a join:

In [11]: df
Out[11]:
   energy  fat
n
1    2000   28
2    1900   17
3    2200   30
4    1750   15
5    1800   18
6    1600   12

In [12]: df2
Out[12]:
                val
n vitamin
1 vitamin-c  0.0004
  vitamin-a  0.0150
2 vitamin-c  0.0030
3 vitamin-d  1.2000
  vitamin-e  1.0007
  vitamin-c  1.2020
4 vitamin-a  0.0780
5 vitamin-b  0.9650
6 vitamin-e  1.9801
  vitamin-c  1.0011

In [13]: df.join(df2)
Out[13]:
             energy  fat     val
n vitamin
1 vitamin-c    2000   28  0.0004
  vitamin-a    2000   28  0.0150
2 vitamin-c    1900   17  0.0030
3 vitamin-d    2200   30  1.2000
  vitamin-e    2200   30  1.0007
  vitamin-c    2200   30  1.2020
4 vitamin-a    1750   15  0.0780
5 vitamin-b    1800   18  0.9650
6 vitamin-e    1600   12  1.9801
  vitamin-c    1600   12  1.0011

Note: Do this by setting the .index.names:

In [21]: df.index.names = ["n"]  # or .name = "n"

In [22]: df2.index.names = ["n", "vitamin"]
like image 33
Andy Hayden Avatar answered Jan 13 '23 13:01

Andy Hayden