Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subsetting pandas dataframe and retain original size

I am trying to subset a dataframe but want the new dataframe to have same size of original dataframe.
Attaching the input, output and the expected output.

df_input = pd.DataFrame([[1,2,3,4,5], [2,1,4,7,6], [5,6,3,7,0]], columns=["A", "B","C","D","E"])

df_output=pd.DataFrame(df_input.iloc[1:2,:])

df_expected_output=pd.DataFrame([[0,0,0,0,0], [2,1,4,7,6], [0,0,0,0,0]], columns=["A", "B","C","D","E"])  

Please suggest the way forward.

like image 243
Abhishek Kulkarni Avatar asked Nov 30 '18 18:11

Abhishek Kulkarni


People also ask

Are Pandas memory efficient?

The default pandas data types are not the most memory efficient. This is especially true for text data columns with relatively few unique values (commonly referred to as “low-cardinality” data). By using more efficient data types, you can store larger datasets in memory.

What is subsetting in Pandas?

With Selection, Slicing, Indexing and Filtering There are many different ways of subsetting a Pandas DataFrame. You may need to select specific columns with all rows. Sometimes, you want to select specific rows with all columns or select rows and columns that meet a specific criterion, etc.

How do you reset the index of a data frame?

Use DataFrame.reset_index() function We can use DataFrame. reset_index() to reset the index of the updated DataFrame. By default, it adds the current row index as a new column called 'index' in DataFrame, and it will create a new row index as a range of numbers starting at 0.

How to check the size and structure of The Dataframe in pandas?

We learned how to check the size and structure of the data by using the .info () function within pandas. This gave us useful information like the number of rows and columns, the size memory usage of the dataframe and the data type of each column.

How to slice a pandas Dataframe by position?

To slice a Pandas dataframe by position use the ilocattribute. Remember index starts from 0 to (number of rows/columns - 1). To slice rows by index position. df.iloc[0:2,:] Output: A B C D 0 0 1 2 3 1 4 5 6 7 To slice columns by index position.

What is Dataframe in Python pandas?

Posted on 16th October 2019 One of the essential features that a data analysis tool must provide users for working with large data-sets is the ability to select, slice, and filter data easily. Pandas provide this feature through the use of DataFrames. A data frame consists of data, which is arranged in rows and columns, and row and column labels.

How to select rows and columns in a pandas Dataframe?

Select rows and columns using labels You can select rows and columns in a Pandas DataFrame by using their corresponding labels. To select a single column.


3 Answers

Set the index after you subset back to the original with reindex. This will set all the values for the new rows to NaN, which you can replace with 0 via fillna. Since NaN is a float type, you can convert everything back to int with astype.

 df_input.iloc[1:2,:].reindex(df_input.index).fillna(0).astype(int)
like image 126
Kyle Avatar answered Nov 14 '22 22:11

Kyle


Setup

df = pd.DataFrame([[1,2,3,4,5], [2,1,4,7,6], [5,6,3,7,0]], columns=["A", "B","C","D","E"])
output = df_input.iloc[1:2,:]

You can create a mask and use multiplication:

m = df.index.isin(output.index)
m[:, None] * df

   A  B  C  D  E
0  0  0  0  0  0
1  2  1  4  7  6
2  0  0  0  0  0
like image 32
user3483203 Avatar answered Nov 14 '22 23:11

user3483203


I will using where + between

df_input.where(df_input.index.to_series().between(1,1),other=0)
Out[611]: 
   A  B  C  D  E
0  0  0  0  0  0
1  2  1  4  7  6
2  0  0  0  0  0
like image 33
BENY Avatar answered Nov 14 '22 21:11

BENY