python: using .iterrows() to create columns

Tags:

python

pandas

I am trying to use a loop function to create a matrix of whether a product was seen in a particular week.

Each row in the df (representing a product) has a close_date (the date the product closed) and a week_diff (the number of weeks the product was listed).

import pandas mydata = [{'subid' : 'A', 'Close_date_wk': 25, 'week_diff':3},           {'subid' : 'B', 'Close_date_wk': 26, 'week_diff':2},           {'subid' : 'C', 'Close_date_wk': 27, 'week_diff':2},] df = pandas.DataFrame(mydata)

My goal is to see how many alternative products were listed for each product in each date_range

I have set up the following loop:

for index, row in df.iterrows():     i = 0     max_range = row['Close_date_wk']         min_range = int(row['Close_date_wk'] - row['week_diff'])     for i in range(min_range,max_range):         col_head = 'job_week_'  +  str(i)         row[col_head] = 1

Can you please help explain why the "row[col_head] = 1" line is neither adding a column, nor adding a value to that column for that row.

For example, if:

row A has date range 1,2,3  row B has date range 2,3   row C has date range 3,4,5'

then ideally I would like to end up with

row A has 0 alternative products in week 1           1 alternative products in week 2           2 alternative products in week 3 row B has 1 alternative products in week 2           2 alternative products in week 3 &c..

926

asked Jul 16 '15 15:07

citydreams

1 Answers

You can't mutate the df using row here to add a new column, you'd either refer to the original df or use .loc, .iloc, or .ix, example:

In [29]:  df = pd.DataFrame(columns=list('abc'), data = np.random.randn(5,3)) df Out[29]:           a         b         c 0 -1.525011  0.778190 -1.010391 1  0.619824  0.790439 -0.692568 2  1.272323  1.620728  0.192169 3  0.193523  0.070921  1.067544 4  0.057110 -1.007442  1.706704 In [30]:  for index,row in df.iterrows():     df.loc[index,'d'] = np.random.randint(0, 10) df Out[30]:           a         b         c  d 0 -1.525011  0.778190 -1.010391  9 1  0.619824  0.790439 -0.692568  9 2  1.272323  1.620728  0.192169  1 3  0.193523  0.070921  1.067544  0 4  0.057110 -1.007442  1.706704  9

You can modify existing rows:

In [31]: # reset the df by slicing df = df[list('abc')] for index,row in df.iterrows():     row['b'] = np.random.randint(0, 10) df Out[31]:           a  b         c 0 -1.525011  8 -1.010391 1  0.619824  2 -0.692568 2  1.272323  8  0.192169 3  0.193523  2  1.067544 4  0.057110  3  1.706704

But adding a new column using row won't work:

In [35]:  df = df[list('abc')] for index,row in df.iterrows():     row['d'] = np.random.randint(0,10) df Out[35]:           a  b         c 0 -1.525011  8 -1.010391 1  0.619824  2 -0.692568 2  1.272323  8  0.192169 3  0.193523  2  1.067544 4  0.057110  3  1.706704

191

answered Sep 23 '22 09:09

EdChum

Related questions
                            
                                Logging users out of a Django site after N minutes of inactivity
                            
                                pydev doesn't find python library after installation
                            
                                How do I write text in subscript in the axis labels and the legend?
                            
                                PyLint Best Practices?
                            
                                Placing Custom Images in a Plot Window--as custom data markers or to annotate those markers
                            
                                How to add items to a QComboBox in PyQt/PySide
                            
                                rpy2 install on windows 7
                            
                                Pandas dataframe values equality test
                            
                                Python SyntaxError: invalid syntax end=''
                            
                                Group by multiple keys and summarize/average values of a list of dictionaries
                            
                                How can I resolve TypeError with StringIO in Python 2.7?
                            
                                numpy - evaluate function on a grid of points
                            
                                Python: How to create log file everyday using logging module?
                            
                                What is vectorization? [closed]
                            
                                Shapely: Polygon from String?
                            
                                How can I use Sphinx' Autodoc-extension for private methods?
                            
                                What is a basic example of single inheritance using the super() keyword in Python?
                            
                                Create url without request execution
                            
                                Python two-dimensional array - changing an element [closed]
                            
                                Return value of x = os.system(..) [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With