Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DataFrame.add() -- ignore missing columns

I have the following two DataFrames:

>>> history
              above below
asn   country
12345 US          5     4
      MX          6     3
54321 MX          4     5
>>> current
              above below
asn   country
12345 MX          1     0
54321 MX          0     1
      US          1     0

I keep a running count of the "above" and "below" values in the history DataFrame like so:

>>> history = history.add(current, fill_value=0)
>>> history
               above  below
asn   country              
12345 MX         7.0    3.0
      US         5.0    4.0
54321 MX         4.0    6.0
      US         1.0    0.0

This works so long as there are no extra columns in the current DataFrame. However when I add an extra column:

>>> current
              above below cruft
asn   country
12345 MX          1     0   999
54321 MX          0     1   999
      US          1     0   999

I get the following:

>>> history = history.add(current, fill_value=0)
>>> history
               above  below cruft
asn   country              
12345 MX         7.0    3.0 999.0
      US         5.0    4.0   NaN
54321 MX         4.0    6.0 999.0
      US         1.0    0.0 999.0

I want this extra column to be ignored, since it's not present in both DataFrames. The desired output is just:

>>> history
               above  below
asn   country              
12345 MX         7.0    3.0
      US         5.0    4.0
54321 MX         4.0    6.0
      US         1.0    0.0
like image 237
stevendesu Avatar asked Feb 28 '18 22:02

stevendesu


3 Answers

In [27]: history.add(current, fill_value=0)[history.columns]
Out[27]:
               above  below
asn   country
12345 MX         7.0    3.0
      US         5.0    4.0
54321 MX         4.0    6.0
      US         1.0    0.0
like image 135
MaxU - stop WAR against UA Avatar answered Sep 29 '22 14:09

MaxU - stop WAR against UA


Ummm a new way

pd.concat([df1,df2],join ='inner',axis=0).sum(level=[0,1])
like image 42
BENY Avatar answered Sep 29 '22 14:09

BENY


You can first specify a list of columns you need in your final output:

cols_to_return = ["above", "below"]
history = history[cols_to_return].add(current[cols_to_return], fill_value=0)

By specifying columns beforehand really helps you track what you are doing and debugging future issues.

like image 29
TYZ Avatar answered Sep 29 '22 13:09

TYZ