I have the following two DataFrames: <pre class="prettyprint"><code>import pandas as pd df = pd.DataFrame([[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]], index = [0, 0.25, 0.50, 0.75, 1], columns = [0, 0.25, 0.50, 0.75, 1]) df_cross = pd.DataFrame([[0.0, 0.25], [0.0, 0.75], [0.5, 1]], columns = ['indexes_to_keep', 'cols_to_keep']) </code></pre> <code>df</code>: <pre class="prettyprint"><code> 0.00 0.25 0.50 0.75 1.00 0.00 0 0 0 0 0 0.25 0 0 0 0 0 0.50 0 0 0 0 0 0.75 0 0 0 0 0 1.00 0 0 0 0 0 </code></pre> <code>df_cross</code>: <pre class="prettyprint"><code> indexes_to_keep cols_to_keep 0 0.0 0.25 1 0.0 0.75 2 0.5 1.00 </code></pre> In the <code>df</code> I have my storaged data, and the df_cross contains the indexes and columns that I want to keep the values. The values in <code>df</code> which the index and columns do not match with any row of <code>df_cross</code> I want to replace by a string (for example "NaN"). The expected output is: <pre class="prettyprint"><code> 0.00 0.25 0.50 0.75 1.00 0.00 NaN 0 NaN 0 NaN 0.25 NaN NaN NaN NaN NaN 0.50 NaN NaN NaN NaN 0 0.75 NaN NaN NaN NaN NaN 1.00 NaN NaN NaN NaN NaN </code></pre> Thanks in advance.

Pandas does not support setting elements with arrays of coordinates. You would need to use numpy: <pre class="prettyprint"><code># integer locs rows = df.index.get_indexer(df_cross.indexes_to_keep) cols = df.columns.get_indexer(df_cross.cols_to_keep) # where we want to keep the data mask = np.full(df.shape, False) mask[rows, cols] = True df[:] = df.where(mask) </code></pre> Another way, with just Pandas, to create <code>mask</code> is: <pre class="prettyprint"><code>mask = (df_cross.assign(val=True) .set_index(['indexes_to_keep', 'cols_to_keep']) ['val'].unstack(fill_value=False) ) </code></pre> Output: <pre class="prettyprint"><code> 0.00 0.25 0.50 0.75 1.00 0.00 NaN 0.0 NaN 0.0 NaN 0.25 NaN NaN NaN NaN NaN 0.50 NaN NaN NaN NaN 0.0 0.75 NaN NaN NaN NaN NaN 1.00 NaN NaN NaN NaN NaN </code></pre>

How keep a value in a DataFrame using the values of another DataFrame as indexes and columns reference (and replace the others)?

Tags:

python

replace

pandas

dataframe

I have the following two DataFrames:

import pandas as pd

df = pd.DataFrame([[0, 0, 0, 0, 0],
                   [0, 0, 0, 0, 0],
                   [0, 0, 0, 0, 0],
                   [0, 0, 0, 0, 0],
                   [0, 0, 0, 0, 0]],
                  index = [0, 0.25, 0.50, 0.75, 1],
                  columns = [0, 0.25, 0.50, 0.75, 1])

df_cross = pd.DataFrame([[0.0, 0.25],
                         [0.0, 0.75],
                         [0.5, 1]],
                        columns = ['indexes_to_keep',
                                   'cols_to_keep'])

df:

      0.00  0.25  0.50  0.75  1.00
0.00     0     0     0     0     0
0.25     0     0     0     0     0
0.50     0     0     0     0     0
0.75     0     0     0     0     0
1.00     0     0     0     0     0

df_cross:

   indexes_to_keep  cols_to_keep
0              0.0          0.25
1              0.0          0.75
2              0.5          1.00

In the df I have my storaged data, and the df_cross contains the indexes and columns that I want to keep the values. The values in df which the index and columns do not match with any row of df_cross I want to replace by a string (for example "NaN").

The expected output is:

     0.00 0.25 0.50 0.75 1.00
0.00  NaN    0  NaN    0  NaN
0.25  NaN  NaN  NaN  NaN  NaN
0.50  NaN  NaN  NaN  NaN    0
0.75  NaN  NaN  NaN  NaN  NaN
1.00  NaN  NaN  NaN  NaN  NaN

Thanks in advance.

379

asked Nov 03 '21 03:11

Romero_91

Video Answer

2 Answers

Pandas does not support setting elements with arrays of coordinates. You would need to use numpy:

# integer locs
rows = df.index.get_indexer(df_cross.indexes_to_keep)
cols = df.columns.get_indexer(df_cross.cols_to_keep)

# where we want to keep the data
mask = np.full(df.shape, False)
mask[rows, cols] = True

df[:] = df.where(mask)

Another way, with just Pandas, to create mask is:

mask = (df_cross.assign(val=True)
          .set_index(['indexes_to_keep', 'cols_to_keep'])
          ['val'].unstack(fill_value=False)
       )

Output:

      0.00  0.25  0.50  0.75  1.00
0.00   NaN   0.0   NaN   0.0   NaN
0.25   NaN   NaN   NaN   NaN   NaN
0.50   NaN   NaN   NaN   NaN   0.0
0.75   NaN   NaN   NaN   NaN   NaN
1.00   NaN   NaN   NaN   NaN   NaN

answered Oct 20 '22 03:10

Quang Hoang

Let us try crosstab on df_cross, then use where to mask the values

s = pd.crosstab(*df_cross.values.T)
df.where(s == 1)

      0.00  0.25  0.50  0.75  1.00
0.00   NaN   0.0   NaN   0.0   NaN
0.25   NaN   NaN   NaN   NaN   NaN
0.50   NaN   NaN   NaN   NaN   0.0
0.75   NaN   NaN   NaN   NaN   NaN
1.00   NaN   NaN   NaN   NaN   NaN

PS: pd.crosstab(*df_cross.values.T) is just a syntactical shortcut and is effectively equivalent to using pd.crosstab(df.indexes_to_keep, df.cols_to_keep)

answered Oct 20 '22 04:10

Shubham Sharma

Related questions
                            
                                Simple way to delete existing pods from Python
                            
                                AttributeError: module 'google.cloud.vision' has no attribute 'types'
                            
                                Sendgrid Authenticate with API Keys
                            
                                Pytorch RuntimeError: expected scalar type Float but found Byte
                            
                                What exactly is Keras's CategoricalCrossEntropy doing?
                            
                                Python, Avoid ugly nested for loop
                            
                                Google Ads API - "failed with status "PERMISSION_DENIED" - "User doesn't have permission to access customer."
                            
                                Django: What's the difference between Queryset.union() and the OR operator?
                            
                                With BERT Text Classification, ValueError: too many dimensions 'str' error occuring
                            
                                Example code from typing library causes TypeError: 'type' object is not subscriptable, why?
                            
                                Python regex to match 6-digit numbers of different formats
                            
                                How to efficiently perform addition over large loops in python
                            
                                Getting ImportError when using torchtext
                            
                                ImportError: cannot import name '_ColumnEntity' Ubuntu20.10 [duplicate]
                            
                                Regex to group words separated by space
                            
                                Python: Sorting items from top left to bottom right with OpenCV
                            
                                Pandas percentage change using group by
                            
                                How to calculate ratio of values in a pandas dataframe column?
                            
                                How to add a row in a special form
                            
                                Overloading operators using __getattr__ in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With