if statements for panda dataframes in Python

Tags:

pandas

I have a dataframe that looks like this:

timestamp                      0            1            2            3                                           
2013-04-17 05:00:00     4.335212  2655.140854  2655.140854  2655.140854   
2013-04-17 05:10:00     2.224966  2655.140854  2655.140854  2655.140854   
2013-04-17 05:20:00     2.409150  2655.140854  2655.140854  2655.140854   
2013-04-17 05:30:00  2655.140854  2655.140854  2655.140854  2655.140854

I need to impose an if statement criteria on every value in the dataframe, I have tried using:

dirt = dirt.astype(float)
for ind, i in enumerate(dirt):
    if i < 0:
        dirt[ind] = i + 360
    if i > 360:
        dirt[ind] = i - 360

However the addition and subtraction are not occurring on any of the values. Any ideas?

877

asked Oct 20 '15 12:10

2 Answers

You should use .iterrows() instead of enumerate(df) . When you do enumerate(df) you simply get the column names , which would not meet your condition. iterrows() returns the index and the row (as a pandas.Series) every iteration.

But for your requirement, you can iterate over df.columns and do what you want in a vectorized way for every column. Example -

for col in df.columns:
    df.loc[df[col] < 0,col] += 360
    df.loc[df[col] > 360,col] -= 360

I am using columns instead of rows assuming the number of columns would be much less than the number of rows, hence we would be doing the actual loop for much less iterations (and using the vectorized addition for more data simultaneously).

Demo -

In [128]: df
Out[128]:
                               0            1            2            3
timestamp
2013-04-17 05:00:00     4.335212  2655.140854  2655.140854  2655.140854
2013-04-17 05:10:00     2.224966  2655.140854  2655.140854  2655.140854
2013-04-17 05:20:00     2.409150  2655.140854  2655.140854  2655.140854
2013-04-17 05:30:00  2655.140854  2655.140854  2655.140854  2655.140854

In [134]: for col in df.columns:
   .....:     df.loc[df[col] < 0,col] += 360
   .....:     df.loc[df[col] > 360,col] -= 360
   .....:

In [135]: df
Out[135]:
                               0            1            2            3
timestamp
2013-04-17 05:00:00     4.335212  2295.140854  2295.140854  2295.140854
2013-04-17 05:10:00     2.224966  2295.140854  2295.140854  2295.140854
2013-04-17 05:20:00     2.409150  2295.140854  2295.140854  2295.140854
2013-04-17 05:30:00  2295.140854  2295.140854  2295.140854  2295.140854

136

answered Oct 22 '22 05:10

Anand S Kumar

You may use masking with where and update to update existing dataframe values like this:

In [188]: df
Out[188]: 
                              0            1            2            3
timestamp                                                             
2013-04-1705:00:00     4.335212  2655.140854  2655.140854  2655.140854
2013-04-1705:10:00     2.224966  2655.140854  2655.140854  2655.140854
2013-04-1705:20:00     2.409150  2655.140854  2655.140854  2655.140854
2013-04-1705:30:00  2655.140854  2655.140854  2655.140854  2655.140854

In [189]: df_small = df.where(df < 0).apply(lambda x: x + 360)

In [190]: df_small
Out[190]: 
                     0   1   2   3
timestamp                         
2013-04-1705:00:00 NaN NaN NaN NaN
2013-04-1705:10:00 NaN NaN NaN NaN
2013-04-1705:20:00 NaN NaN NaN NaN
2013-04-1705:30:00 NaN NaN NaN NaN

In [191]: df_large = df.where(df > 360).apply(lambda x: x - 360)

In [192]: df_large
Out[192]: 
                              0            1            2            3
timestamp                                                             
2013-04-1705:00:00          NaN  2295.140854  2295.140854  2295.140854
2013-04-1705:10:00          NaN  2295.140854  2295.140854  2295.140854
2013-04-1705:20:00          NaN  2295.140854  2295.140854  2295.140854
2013-04-1705:30:00  2295.140854  2295.140854  2295.140854  2295.140854

In [193]: df.update(df_small)

In [194]: df.update(df_large)

In [195]: df
Out[195]: 
                              0            1            2            3
timestamp                                                             
2013-04-1705:00:00     4.335212  2295.140854  2295.140854  2295.140854
2013-04-1705:10:00     2.224966  2295.140854  2295.140854  2295.140854
2013-04-1705:20:00     2.409150  2295.140854  2295.140854  2295.140854
2013-04-1705:30:00  2295.140854  2295.140854  2295.140854  2295.140854

Note:

This will potentially cater the corner cases if you happen to have conditions like: "value" < 360 then +360 else -360 but the sequence of the update will cause the results reapply, ie. 1 + 360 = 361, then 361 > 360 so it becomes 1 again.

But for your use case, I think @AnandSKumar's method is very clean and close to what you're looking for.

answered Oct 22 '22 06:10

Anzel

Related questions
                            
                                How to Catch Hover and Mouse Leave Signal In PyQt5
                            
                                Pandas dataframe groupby and combine multiple row values
                            
                                Logistic Regression with sklearn
                            
                                Clean way to read a null-terminated (C-style) string from a file?
                            
                                NumPy - What is broadcasting?
                            
                                How to import _ssl in python 2.7.6?
                            
                                How to map or nest Python 2.7 function generators?
                            
                                Upgraded to OSX 10.11 El Capitan, now cannot use MySQL with Python/Django
                            
                                lxml: some XML from URL give this lxml.etree.XMLSyntaxError
                            
                                PyQt QLineEdit with history
                            
                                Python: Process hangs with futex(0x2a5fcc0, FUTEX_WAIT_PRIVATE, 0, NULL in multithreading
                            
                                Avoiding global variables but also too many function arguments (Python)
                            
                                ValueError: continuous-multioutput is not supported
                            
                                Getting fields from quickfix message
                            
                                Recursion in Pyparsing
                            
                                HTTP server in ZMQ or How to handle a POST request with pyzmq?
                            
                                Passing parameter to a pyqt thread when started
                            
                                graphQL multiple mutations transaction
                            
                                Fitting exponential function through two data points with scipy curve_fit
                            
                                Error: suffix or operands invalid for `vbroadcastss'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

if statements for panda dataframes in Python

Tags:

python

pandas

holto1

People also ask

2 Answers

Anand S Kumar

Anzel

Recent Activity

Donate For Us