I have the following two DataFrames:
>>> history
above below
asn country
12345 US 5 4
MX 6 3
54321 MX 4 5
>>> current
above below
asn country
12345 MX 1 0
54321 MX 0 1
US 1 0
I keep a running count of the "above" and "below" values in the history
DataFrame like so:
>>> history = history.add(current, fill_value=0)
>>> history
above below
asn country
12345 MX 7.0 3.0
US 5.0 4.0
54321 MX 4.0 6.0
US 1.0 0.0
This works so long as there are no extra columns in the current
DataFrame. However when I add an extra column:
>>> current
above below cruft
asn country
12345 MX 1 0 999
54321 MX 0 1 999
US 1 0 999
I get the following:
>>> history = history.add(current, fill_value=0)
>>> history
above below cruft
asn country
12345 MX 7.0 3.0 999.0
US 5.0 4.0 NaN
54321 MX 4.0 6.0 999.0
US 1.0 0.0 999.0
I want this extra column to be ignored, since it's not present in both DataFrames. The desired output is just:
>>> history
above below
asn country
12345 MX 7.0 3.0
US 5.0 4.0
54321 MX 4.0 6.0
US 1.0 0.0
In [27]: history.add(current, fill_value=0)[history.columns]
Out[27]:
above below
asn country
12345 MX 7.0 3.0
US 5.0 4.0
54321 MX 4.0 6.0
US 1.0 0.0
Ummm a new way
pd.concat([df1,df2],join ='inner',axis=0).sum(level=[0,1])
You can first specify a list of columns you need in your final output:
cols_to_return = ["above", "below"]
history = history[cols_to_return].add(current[cols_to_return], fill_value=0)
By specifying columns beforehand really helps you track what you are doing and debugging future issues.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With