I am trying to subset a dataframe but want the new dataframe to have same size of original dataframe. Attaching the input, output and the expected output. <pre class="prettyprint"><code>df_input = pd.DataFrame([[1,2,3,4,5], [2,1,4,7,6], [5,6,3,7,0]], columns=["A", "B","C","D","E"]) df_output=pd.DataFrame(df_input.iloc[1:2,:]) df_expected_output=pd.DataFrame([[0,0,0,0,0], [2,1,4,7,6], [0,0,0,0,0]], columns=["A", "B","C","D","E"]) </code></pre> Please suggest the way forward.

Set the index after you subset back to the original with <code>reindex</code>. This will set all the values for the new rows to <code>NaN</code>, which you can replace with 0 via <code>fillna</code>. Since <code>NaN</code> is a <code>floa</code>t type, you can convert everything back to <code>int</code> with <code>astype</code>. <pre class="prettyprint"><code> df_input.iloc[1:2,:].reindex(df_input.index).fillna(0).astype(int) </code></pre>

Setup <pre class="prettyprint"><code>df = pd.DataFrame([[1,2,3,4,5], [2,1,4,7,6], [5,6,3,7,0]], columns=["A", "B","C","D","E"]) output = df_input.iloc[1:2,:] </code></pre> <hr> You can create a <code>mask</code> and use multiplication: <pre class="prettyprint"><code>m = df.index.isin(output.index) m[:, None] * df </code></pre> <pre class="prettyprint"><code> A B C D E 0 0 0 0 0 0 1 2 1 4 7 6 2 0 0 0 0 0 </code></pre>

I will using <code>where</code> + <code>between</code> <pre class="prettyprint"><code>df_input.where(df_input.index.to_series().between(1,1),other=0) Out[611]: A B C D E 0 0 0 0 0 0 1 2 1 4 7 6 2 0 0 0 0 0 </code></pre>

Subsetting pandas dataframe and retain original size

Tags:

python

pandas

dataframe

I am trying to subset a dataframe but want the new dataframe to have same size of original dataframe.
Attaching the input, output and the expected output.

df_input = pd.DataFrame([[1,2,3,4,5], [2,1,4,7,6], [5,6,3,7,0]], columns=["A", "B","C","D","E"])

df_output=pd.DataFrame(df_input.iloc[1:2,:])

df_expected_output=pd.DataFrame([[0,0,0,0,0], [2,1,4,7,6], [0,0,0,0,0]], columns=["A", "B","C","D","E"])

Please suggest the way forward.

243

asked Nov 30 '18 18:11

Abhishek Kulkarni

3 Answers

Set the index after you subset back to the original with reindex. This will set all the values for the new rows to NaN, which you can replace with 0 via fillna. Since NaN is a float type, you can convert everything back to int with astype.

 df_input.iloc[1:2,:].reindex(df_input.index).fillna(0).astype(int)

126

answered Nov 14 '22 22:11

Kyle

Setup

df = pd.DataFrame([[1,2,3,4,5], [2,1,4,7,6], [5,6,3,7,0]], columns=["A", "B","C","D","E"])
output = df_input.iloc[1:2,:]

You can create a mask and use multiplication:

m = df.index.isin(output.index)
m[:, None] * df

   A  B  C  D  E
0  0  0  0  0  0
1  2  1  4  7  6
2  0  0  0  0  0

answered Nov 14 '22 23:11

user3483203

I will using where + between

df_input.where(df_input.index.to_series().between(1,1),other=0)
Out[611]: 
   A  B  C  D  E
0  0  0  0  0  0
1  2  1  4  7  6
2  0  0  0  0  0

answered Nov 14 '22 21:11

BENY

Related questions
                            
                                Get percentages of a column based off of another column but with different categories
                            
                                List sort based on another shorter list
                            
                                File "<string>", line 1, in <module> NameError: name ' ' is not defined in ATOM [duplicate]
                            
                                Pandas: for all set of duplicate entries in a particular column, grab some information
                            
                                Pyinstaller generated exe doesn't work properly
                            
                                How to store %%time values in a variable in Jupyter? [duplicate]
                            
                                Django - Filter the prefetch_related queryset
                            
                                Error- AttributeError: 'DirectoryIterator' object has no attribute 'ndim in autoencoder design in keras
                            
                                How to connect to Odoo database from an android application
                            
                                Is there a faster alternative to np.diff?
                            
                                Why does Exception proxy __str__ onto the args?
                            
                                How to send python output to telegram CHANNEL not to Group and gmail email group
                            
                                How can i check that a list is in my array in python
                            
                                How to return a list of frequencies for a certain value in a dict
                            
                                In python, how do I invert a 2D dictionary?
                            
                                Error in Google Colaboratory - AttributeError: module 'PIL.Image' has no attribute 'register_decoder'
                            
                                Pandas: Enumerate duplicates in index
                            
                                Python "in" and "==" confusion
                            
                                Log Python Systemd output to log file
                            
                                How to return rows with Null values in pyspark dataframe?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With