I have the following two DataFrames:
import pandas as pd
df = pd.DataFrame([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]],
index = [0, 0.25, 0.50, 0.75, 1],
columns = [0, 0.25, 0.50, 0.75, 1])
df_cross = pd.DataFrame([[0.0, 0.25],
[0.0, 0.75],
[0.5, 1]],
columns = ['indexes_to_keep',
'cols_to_keep'])
df
:
0.00 0.25 0.50 0.75 1.00
0.00 0 0 0 0 0
0.25 0 0 0 0 0
0.50 0 0 0 0 0
0.75 0 0 0 0 0
1.00 0 0 0 0 0
df_cross
:
indexes_to_keep cols_to_keep
0 0.0 0.25
1 0.0 0.75
2 0.5 1.00
In the df
I have my storaged data, and the df_cross contains the indexes and columns that I want to keep the values. The values in df
which the index and columns do not match with any row of df_cross
I want to replace by a string (for example "NaN").
The expected output is:
0.00 0.25 0.50 0.75 1.00
0.00 NaN 0 NaN 0 NaN
0.25 NaN NaN NaN NaN NaN
0.50 NaN NaN NaN NaN 0
0.75 NaN NaN NaN NaN NaN
1.00 NaN NaN NaN NaN NaN
Thanks in advance.
get_value() function is used to quickly retrieve the single value in the data frame at the passed column and index. The input to the function is the row label and the column label.
You can easily replace a value in pandas data frames by just specifying its column and its index. Having the dataframe above, we will replace some of its values. We are using the loc function of pandas. The first variable is the index of the value we want to replace and the second is its column.
By default, specifying a new column with set_index() deletes the original index. If the argument append is set to True , the specified column will be added as a new level index.
Pandas does not support setting elements with arrays of coordinates. You would need to use numpy:
# integer locs
rows = df.index.get_indexer(df_cross.indexes_to_keep)
cols = df.columns.get_indexer(df_cross.cols_to_keep)
# where we want to keep the data
mask = np.full(df.shape, False)
mask[rows, cols] = True
df[:] = df.where(mask)
Another way, with just Pandas, to create mask
is:
mask = (df_cross.assign(val=True)
.set_index(['indexes_to_keep', 'cols_to_keep'])
['val'].unstack(fill_value=False)
)
Output:
0.00 0.25 0.50 0.75 1.00
0.00 NaN 0.0 NaN 0.0 NaN
0.25 NaN NaN NaN NaN NaN
0.50 NaN NaN NaN NaN 0.0
0.75 NaN NaN NaN NaN NaN
1.00 NaN NaN NaN NaN NaN
Let us try crosstab
on df_cross
, then use where
to mask the values
s = pd.crosstab(*df_cross.values.T)
df.where(s == 1)
0.00 0.25 0.50 0.75 1.00
0.00 NaN 0.0 NaN 0.0 NaN
0.25 NaN NaN NaN NaN NaN
0.50 NaN NaN NaN NaN 0.0
0.75 NaN NaN NaN NaN NaN
1.00 NaN NaN NaN NaN NaN
PS: pd.crosstab(*df_cross.values.T)
is just a syntactical shortcut and is effectively equivalent to using pd.crosstab(df.indexes_to_keep, df.cols_to_keep)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With