Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python: using .iterrows() to create columns

Tags:

python

pandas

I am trying to use a loop function to create a matrix of whether a product was seen in a particular week.

Each row in the df (representing a product) has a close_date (the date the product closed) and a week_diff (the number of weeks the product was listed).

import pandas mydata = [{'subid' : 'A', 'Close_date_wk': 25, 'week_diff':3},           {'subid' : 'B', 'Close_date_wk': 26, 'week_diff':2},           {'subid' : 'C', 'Close_date_wk': 27, 'week_diff':2},] df = pandas.DataFrame(mydata) 

My goal is to see how many alternative products were listed for each product in each date_range

I have set up the following loop:

for index, row in df.iterrows():     i = 0     max_range = row['Close_date_wk']         min_range = int(row['Close_date_wk'] - row['week_diff'])     for i in range(min_range,max_range):         col_head = 'job_week_'  +  str(i)         row[col_head] = 1 

Can you please help explain why the "row[col_head] = 1" line is neither adding a column, nor adding a value to that column for that row.

For example, if:

row A has date range 1,2,3  row B has date range 2,3   row C has date range 3,4,5' 

then ideally I would like to end up with

row A has 0 alternative products in week 1           1 alternative products in week 2           2 alternative products in week 3 row B has 1 alternative products in week 2           2 alternative products in week 3 &c.. 
like image 926
citydreams Avatar asked Jul 16 '15 15:07

citydreams


People also ask

How do I create a new column in Iterrows?

If you want to add a column to a DataFrame by calling a function on another column, the iterrows() method in combination with a for loop is not the preferred way to go. Instead, you'll want to use apply() .

What is Iterrows () in Python?

The iterrows() method generates an iterator object of the DataFrame, allowing us to iterate each row in the DataFrame. Each iteration produces an index object and a row object (a Pandas Series object).

What is the use of Iterrows () and Iteritems () Explain with proper examples?

This function returns each index value along with a series that contain the data in each row. iterrows() - used for iterating over the rows as (index, series) pairs. iteritems() - used for iterating over the (key, value) pairs. itertuples() - used for iterating over the rows as namedtuples.

Is Iterrows faster than apply?

This solution also uses looping to get the job done, but apply has been optimized better than iterrows , which results in faster runtimes.


1 Answers

You can't mutate the df using row here to add a new column, you'd either refer to the original df or use .loc, .iloc, or .ix, example:

In [29]:  df = pd.DataFrame(columns=list('abc'), data = np.random.randn(5,3)) df Out[29]:           a         b         c 0 -1.525011  0.778190 -1.010391 1  0.619824  0.790439 -0.692568 2  1.272323  1.620728  0.192169 3  0.193523  0.070921  1.067544 4  0.057110 -1.007442  1.706704 In [30]:  for index,row in df.iterrows():     df.loc[index,'d'] = np.random.randint(0, 10) df Out[30]:           a         b         c  d 0 -1.525011  0.778190 -1.010391  9 1  0.619824  0.790439 -0.692568  9 2  1.272323  1.620728  0.192169  1 3  0.193523  0.070921  1.067544  0 4  0.057110 -1.007442  1.706704  9 

You can modify existing rows:

In [31]: # reset the df by slicing df = df[list('abc')] for index,row in df.iterrows():     row['b'] = np.random.randint(0, 10) df Out[31]:           a  b         c 0 -1.525011  8 -1.010391 1  0.619824  2 -0.692568 2  1.272323  8  0.192169 3  0.193523  2  1.067544 4  0.057110  3  1.706704 

But adding a new column using row won't work:

In [35]:  df = df[list('abc')] for index,row in df.iterrows():     row['d'] = np.random.randint(0,10) df Out[35]:           a  b         c 0 -1.525011  8 -1.010391 1  0.619824  2 -0.692568 2  1.272323  8  0.192169 3  0.193523  2  1.067544 4  0.057110  3  1.706704 
like image 191
EdChum Avatar answered Sep 23 '22 09:09

EdChum