I am trying to subset a dataframe but want the new dataframe to have same size of original dataframe.
Attaching the input, output and the expected output.
df_input = pd.DataFrame([[1,2,3,4,5], [2,1,4,7,6], [5,6,3,7,0]], columns=["A", "B","C","D","E"])
df_output=pd.DataFrame(df_input.iloc[1:2,:])
df_expected_output=pd.DataFrame([[0,0,0,0,0], [2,1,4,7,6], [0,0,0,0,0]], columns=["A", "B","C","D","E"])
Please suggest the way forward.
The default pandas data types are not the most memory efficient. This is especially true for text data columns with relatively few unique values (commonly referred to as “low-cardinality” data). By using more efficient data types, you can store larger datasets in memory.
With Selection, Slicing, Indexing and Filtering There are many different ways of subsetting a Pandas DataFrame. You may need to select specific columns with all rows. Sometimes, you want to select specific rows with all columns or select rows and columns that meet a specific criterion, etc.
Use DataFrame.reset_index() function We can use DataFrame. reset_index() to reset the index of the updated DataFrame. By default, it adds the current row index as a new column called 'index' in DataFrame, and it will create a new row index as a range of numbers starting at 0.
We learned how to check the size and structure of the data by using the .info () function within pandas. This gave us useful information like the number of rows and columns, the size memory usage of the dataframe and the data type of each column.
To slice a Pandas dataframe by position use the ilocattribute. Remember index starts from 0 to (number of rows/columns - 1). To slice rows by index position. df.iloc[0:2,:] Output: A B C D 0 0 1 2 3 1 4 5 6 7 To slice columns by index position.
Posted on 16th October 2019 One of the essential features that a data analysis tool must provide users for working with large data-sets is the ability to select, slice, and filter data easily. Pandas provide this feature through the use of DataFrames. A data frame consists of data, which is arranged in rows and columns, and row and column labels.
Select rows and columns using labels You can select rows and columns in a Pandas DataFrame by using their corresponding labels. To select a single column.
Set the index after you subset back to the original with reindex
. This will set all the values for the new rows to NaN
, which you can replace with 0 via fillna
. Since NaN
is a floa
t type, you can convert everything back to int
with astype
.
df_input.iloc[1:2,:].reindex(df_input.index).fillna(0).astype(int)
Setup
df = pd.DataFrame([[1,2,3,4,5], [2,1,4,7,6], [5,6,3,7,0]], columns=["A", "B","C","D","E"])
output = df_input.iloc[1:2,:]
You can create a mask
and use multiplication:
m = df.index.isin(output.index)
m[:, None] * df
A B C D E
0 0 0 0 0 0
1 2 1 4 7 6
2 0 0 0 0 0
I will using where
+ between
df_input.where(df_input.index.to_series().between(1,1),other=0)
Out[611]:
A B C D E
0 0 0 0 0 0
1 2 1 4 7 6
2 0 0 0 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With