Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I set dataframe values without using iterrows()?

Original DataSet

In [2]: import pandas as pd
   ...: 
   ...: # Original DataSet
   ...: d = {'A': [1,1,1,1,2,2,2,2,3],
   ...:      'B': ['a','a','a','x','b','b','b','x','c'],
   ...:      'C': [11,22,33,44,55,66,77,88,99],}
   ...: 
   ...: df = pd.DataFrame(d)
   ...: df

Out[2]: 
   A  B   C
0  1  a  11
1  1  a  22
2  1  a  33
3  1  x  44
4  2  b  55
5  2  b  66
6  2  b  77
7  2  x  88
8  3  c  99

Given a dataframe, I would like a flexible, efficient way to reset specific values based on certain conditions in two columns.

Conditions:

  • in Column B: for any row with value 'x',
  • in Column C: set the value of these row-elements to the value of the next row.

Desired Outcome

Out[3]: 
   A  B   C
0  1  a  11
1  1  a  22
2  1  a  33
3  1  x  55
4  2  b  55
5  2  b  66
6  2  b  77
7  2  x  99
8  3  c  99

I learned I can accomplish this using iterrows() (see below),

# Code that produces the above outcome
for idx, x_row in df[df['B'] == 'x'].iterrows():
    df.loc[idx, 'C'] = df.loc[idx+1, 'C']
df

but I need to do this many times, and I understand iterrows() is slow. Are there better pandas-y, broadcasting-like ways of getting the desired outcome more efficiently?

like image 742
pylang Avatar asked Jul 06 '15 06:07

pylang


People also ask

What is better than Iterrows?

Vectorization is always the best choice. Pandas come with df. values() function to convert the data frame to a list of list format. It took 14 seconds to iterate through a data frame with 10 million records that are around 56x times faster than iterrows().

How do you set a Dataframe value?

You can set cell value of pandas dataframe using df.at[row_label, column_label] = 'Cell Value'. It is the fastest method to set the value of the cell of the pandas dataframe. Dataframe at property of the dataframe allows you to access the single value of the row/column pair using the row and column labels.

Is apply better than Iterrows?

By using apply and specifying one as the axis, we can run a function on every row of a dataframe. This solution also uses looping to get the job done, but apply has been optimized better than iterrows , which results in faster runtimes.

For what purpose Iterrows () and Iteritems () functions are used?

iterrows() - used for iterating over the rows as (index, series) pairs. iteritems() - used for iterating over the (key, value) pairs.


1 Answers

This should do what you want:

df.C[df.B == 'x'] = df.C.shift(-1)
like image 133
maxymoo Avatar answered Oct 06 '22 01:10

maxymoo