Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How keep a value in a DataFrame using the values of another DataFrame as indexes and columns reference (and replace the others)?

I have the following two DataFrames:

import pandas as pd

df = pd.DataFrame([[0, 0, 0, 0, 0],
                   [0, 0, 0, 0, 0],
                   [0, 0, 0, 0, 0],
                   [0, 0, 0, 0, 0],
                   [0, 0, 0, 0, 0]],
                  index = [0, 0.25, 0.50, 0.75, 1],
                  columns = [0, 0.25, 0.50, 0.75, 1])

df_cross = pd.DataFrame([[0.0, 0.25],
                         [0.0, 0.75],
                         [0.5, 1]],
                        columns = ['indexes_to_keep',
                                   'cols_to_keep'])

df:

      0.00  0.25  0.50  0.75  1.00
0.00     0     0     0     0     0
0.25     0     0     0     0     0
0.50     0     0     0     0     0
0.75     0     0     0     0     0
1.00     0     0     0     0     0

df_cross:

   indexes_to_keep  cols_to_keep
0              0.0          0.25
1              0.0          0.75
2              0.5          1.00

In the df I have my storaged data, and the df_cross contains the indexes and columns that I want to keep the values. The values in df which the index and columns do not match with any row of df_cross I want to replace by a string (for example "NaN").

The expected output is:

     0.00 0.25 0.50 0.75 1.00
0.00  NaN    0  NaN    0  NaN
0.25  NaN  NaN  NaN  NaN  NaN
0.50  NaN  NaN  NaN  NaN    0
0.75  NaN  NaN  NaN  NaN  NaN
1.00  NaN  NaN  NaN  NaN  NaN

Thanks in advance.

like image 379
Romero_91 Avatar asked Nov 03 '21 03:11

Romero_91


People also ask

How do I get the value of a column in a DataFrame using an index?

get_value() function is used to quickly retrieve the single value in the data frame at the passed column and index. The input to the function is the row label and the column label.

How do you replace a value in a DataFrame using an index?

You can easily replace a value in pandas data frames by just specifying its column and its index. Having the dataframe above, we will replace some of its values. We are using the loc function of pandas. The first variable is the index of the value we want to replace and the second is its column.

What happens to a DataFrame when you assign one of the columns as the new index?

By default, specifying a new column with set_index() deletes the original index. If the argument append is set to True , the specified column will be added as a new level index.


Video Answer


2 Answers

Pandas does not support setting elements with arrays of coordinates. You would need to use numpy:

# integer locs
rows = df.index.get_indexer(df_cross.indexes_to_keep)
cols = df.columns.get_indexer(df_cross.cols_to_keep)

# where we want to keep the data
mask = np.full(df.shape, False)
mask[rows, cols] = True

df[:] = df.where(mask)

Another way, with just Pandas, to create mask is:

mask = (df_cross.assign(val=True)
          .set_index(['indexes_to_keep', 'cols_to_keep'])
          ['val'].unstack(fill_value=False)
       )

Output:

      0.00  0.25  0.50  0.75  1.00
0.00   NaN   0.0   NaN   0.0   NaN
0.25   NaN   NaN   NaN   NaN   NaN
0.50   NaN   NaN   NaN   NaN   0.0
0.75   NaN   NaN   NaN   NaN   NaN
1.00   NaN   NaN   NaN   NaN   NaN
like image 81
Quang Hoang Avatar answered Oct 20 '22 03:10

Quang Hoang


Let us try crosstab on df_cross, then use where to mask the values

s = pd.crosstab(*df_cross.values.T)
df.where(s == 1)

      0.00  0.25  0.50  0.75  1.00
0.00   NaN   0.0   NaN   0.0   NaN
0.25   NaN   NaN   NaN   NaN   NaN
0.50   NaN   NaN   NaN   NaN   0.0
0.75   NaN   NaN   NaN   NaN   NaN
1.00   NaN   NaN   NaN   NaN   NaN

PS: pd.crosstab(*df_cross.values.T) is just a syntactical shortcut and is effectively equivalent to using pd.crosstab(df.indexes_to_keep, df.cols_to_keep)

like image 5
Shubham Sharma Avatar answered Oct 20 '22 04:10

Shubham Sharma