Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

if statements for panda dataframes in Python

Tags:

python

pandas

I have a dataframe that looks like this:

timestamp                      0            1            2            3                                           
2013-04-17 05:00:00     4.335212  2655.140854  2655.140854  2655.140854   
2013-04-17 05:10:00     2.224966  2655.140854  2655.140854  2655.140854   
2013-04-17 05:20:00     2.409150  2655.140854  2655.140854  2655.140854   
2013-04-17 05:30:00  2655.140854  2655.140854  2655.140854  2655.140854 

I need to impose an if statement criteria on every value in the dataframe, I have tried using:

dirt = dirt.astype(float)
for ind, i in enumerate(dirt):
    if i < 0:
        dirt[ind] = i + 360
    if i > 360:
        dirt[ind] = i - 360

However the addition and subtraction are not occurring on any of the values. Any ideas?

like image 877
holto1 Avatar asked Oct 20 '15 12:10

holto1


People also ask

Can you use if statement in pandas?

pandas is a Python library built to work with relational data at scale. As you work with values captured in pandas Series and DataFrames, you can use if-else statements and their logical structure to categorize and manipulate your data to reveal new insights.


2 Answers

You should use .iterrows() instead of enumerate(df) . When you do enumerate(df) you simply get the column names , which would not meet your condition. iterrows() returns the index and the row (as a pandas.Series) every iteration.

But for your requirement, you can iterate over df.columns and do what you want in a vectorized way for every column. Example -

for col in df.columns:
    df.loc[df[col] < 0,col] += 360
    df.loc[df[col] > 360,col] -= 360

I am using columns instead of rows assuming the number of columns would be much less than the number of rows, hence we would be doing the actual loop for much less iterations (and using the vectorized addition for more data simultaneously).

Demo -

In [128]: df
Out[128]:
                               0            1            2            3
timestamp
2013-04-17 05:00:00     4.335212  2655.140854  2655.140854  2655.140854
2013-04-17 05:10:00     2.224966  2655.140854  2655.140854  2655.140854
2013-04-17 05:20:00     2.409150  2655.140854  2655.140854  2655.140854
2013-04-17 05:30:00  2655.140854  2655.140854  2655.140854  2655.140854

In [134]: for col in df.columns:
   .....:     df.loc[df[col] < 0,col] += 360
   .....:     df.loc[df[col] > 360,col] -= 360
   .....:

In [135]: df
Out[135]:
                               0            1            2            3
timestamp
2013-04-17 05:00:00     4.335212  2295.140854  2295.140854  2295.140854
2013-04-17 05:10:00     2.224966  2295.140854  2295.140854  2295.140854
2013-04-17 05:20:00     2.409150  2295.140854  2295.140854  2295.140854
2013-04-17 05:30:00  2295.140854  2295.140854  2295.140854  2295.140854
like image 136
Anand S Kumar Avatar answered Oct 22 '22 05:10

Anand S Kumar


You may use masking with where and update to update existing dataframe values like this:

In [188]: df
Out[188]: 
                              0            1            2            3
timestamp                                                             
2013-04-1705:00:00     4.335212  2655.140854  2655.140854  2655.140854
2013-04-1705:10:00     2.224966  2655.140854  2655.140854  2655.140854
2013-04-1705:20:00     2.409150  2655.140854  2655.140854  2655.140854
2013-04-1705:30:00  2655.140854  2655.140854  2655.140854  2655.140854

In [189]: df_small = df.where(df < 0).apply(lambda x: x + 360)

In [190]: df_small
Out[190]: 
                     0   1   2   3
timestamp                         
2013-04-1705:00:00 NaN NaN NaN NaN
2013-04-1705:10:00 NaN NaN NaN NaN
2013-04-1705:20:00 NaN NaN NaN NaN
2013-04-1705:30:00 NaN NaN NaN NaN

In [191]: df_large = df.where(df > 360).apply(lambda x: x - 360)

In [192]: df_large
Out[192]: 
                              0            1            2            3
timestamp                                                             
2013-04-1705:00:00          NaN  2295.140854  2295.140854  2295.140854
2013-04-1705:10:00          NaN  2295.140854  2295.140854  2295.140854
2013-04-1705:20:00          NaN  2295.140854  2295.140854  2295.140854
2013-04-1705:30:00  2295.140854  2295.140854  2295.140854  2295.140854

In [193]: df.update(df_small)

In [194]: df.update(df_large)

In [195]: df
Out[195]: 
                              0            1            2            3
timestamp                                                             
2013-04-1705:00:00     4.335212  2295.140854  2295.140854  2295.140854
2013-04-1705:10:00     2.224966  2295.140854  2295.140854  2295.140854
2013-04-1705:20:00     2.409150  2295.140854  2295.140854  2295.140854
2013-04-1705:30:00  2295.140854  2295.140854  2295.140854  2295.140854

Note:

This will potentially cater the corner cases if you happen to have conditions like: "value" < 360 then +360 else -360 but the sequence of the update will cause the results reapply, ie. 1 + 360 = 361, then 361 > 360 so it becomes 1 again.

But for your use case, I think @AnandSKumar's method is very clean and close to what you're looking for.

like image 3
Anzel Avatar answered Oct 22 '22 06:10

Anzel