Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe: How to set values after an index to 0

I have a Pandas dataframe, each row contains a name followed by many numbers in the columns. After a specific index for each row (calculated uniquely in every row), I want to set all the remaining values in that row to 0.

So, I tried out a few things and have the below working code:

for i in range(n):
    index = np.where(df.columns == df['match_this_value'][i])[0].item()
    df.iloc[i, index] = df['take_this_value'][i].day 
    df.iloc[i, (index+1):] = 0

However, this takes quite long as my dataset is very large. The runtime is about 70 seconds for my sample dataset, as my entire dataset is much longer. Is there a faster way to do this? Furthermore, is there a better way to do this manipulation without looping through each row?


EDIT: Sorry I should have specified how the index is calculated. the Index is calculated through an np.where by compared all of the columns of the dataframe (for each row) against one specific column and finding the match. so something like:

index = np.where(df.columns == df['match_this_value'][i])[0].item()

Once I have this index, I set the value at that column to the value of another column in the df. The entire code right now looks like this:

for i in range(n):
    index = np.where(df.columns == df['match_this_value'][i])[0].item()
    df.iloc[i, index] = df['take_this_value'][i].day 
    df.iloc[i, (index+1):] = 0
like image 279
Mat R Avatar asked Jun 21 '19 12:06

Mat R


People also ask

How do I set the index of a Dataframe in pandas?

pandas.DataFrame.set_index ¶ DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False) [source] ¶ Set the DataFrame index using existing columns. Set the DataFrame index (row labels) using one or more existing columns or arrays (of the correct length).

How to replace values in pandas Dataframe?

Depending on your needs, you may use either of the following methods to replace values in Pandas DataFrame: (1) Replace a single value with a new value for an individual DataFrame column: (2) Replace multiple values with a new value for an individual DataFrame column:

Why is my index not continuous in pandas?

Often We start with a huge dataframe in Pandas and after manipulating/filtering the dataframe, we end up with much smaller dataframe. When we look at the smaller dataframe, it might still carry the row index of the original dataframe. If the original index are numbers, now we have indexes that are not continuous.

How to set multiple columns as the index of a Dataframe?

In this method, we can set multiple columns of the Pandas DataFrame object as its index by creating a list of column names of the DataFrame then passing it to the set_index () function. That’s why in this case, the index is called multi-index.


1 Answers

you could do :


import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4, 4), columns=list('ABCD'))

#           A         B         C         D
# 0  0.750017  0.582230  1.411253 -0.379428
# 1 -0.747129  1.800677 -1.243459 -0.098760
# 2 -0.742997 -0.035036  1.012052 -0.767602
# 3 -0.694679  1.013968 -1.000412  0.752191

indexes = np.random.choice(range(df.shape[1]), df.shape[0])
# array([0, 3, 1, 1])
df_indexes = np.tile(range(df.shape[1]), (df.shape[0], 1))
df[df_indexes>indexes[:, None]] = 0
print(df) 
#           A         B         C        D
# 0  0.750017  0.000000  0.000000  0.00000
# 1 -0.747129  1.800677 -1.243459 -0.09876
# 2 -0.742997 -0.035036  0.000000  0.00000
# 3 -0.694679  1.013968  0.000000  0.00000

So here you include a boolean mask df_indexes>indexes[:, None], and indexes here would be replaced with your "specific indexes"

like image 126
Ayoub ZAROU Avatar answered Nov 15 '22 00:11

Ayoub ZAROU