Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas changing cell values based on another cell

I am currently formatting data from two different data sets. One of the dataset reflects an observation count of people in room on hour basis, the second one is a count of people based on wifi logs generated in 5 minutes interval.

After merging these two dataframes into one, I run into the issue where each hour (as "10:00:00") has the data from the original set, but the other data (every 5min like "10:47:14") does not include this data.

Here is how the merge dataframe looks:

        room       time              con     auth  capacity    %     Count  module    size 
0       B002    Mon Nov 02 10:32:06  23      23       90       NaN    NaN   NaN        NaN`  
1       B002    Mon Nov 02 10:37:10  25      25       90       NaN    NaN   NaN        NaN`  
12527   B002    Mon Nov 02 10:00:00  NaN     NaN      90       50%    45.0  COMP30520   60`  
12528   B002    Mon Nov 02 11:00:00  NaN     NaN      90       0%     0.0   COMP30520   60`

Is there a way for me to go through the dataframe and find all the information regarding the "occupancy", "occupancyCount", "module" and "size" from 11:00:00 and write it to all the cells that are of the same day and where the hour is between 10:00:00 and 10:59:59?

That would allow me to have all the information on each row and then allow me to gather the min(), max() and median() based on 'day' and 'hour'.

To answer the comment for the original dataframes, here there are:
first dataframe:

    time                room    module      size
0   Mon Nov 02 09:00:00 B002    COMP30190   29
1   Mon Nov 02 10:00:00 B002    COMP40660   53

second dataframe:

        room    time                  con   auth  capacity  %     Count
0       B002    Mon Nov 02 20:32:06   0     0     NaN       NaN   NaN
1       B002    Mon Nov 02 20:37:10   0     0     NaN       NaN   NaN
2       B002    Mon Nov 02 20:42:12   0     0     NaN       NaN   NaN
12797   B008    Wed Nov 11 13:00:00   NaN   NaN   40        25    10.0
12798   B008    Wed Nov 11 14:00:00   NaN   NaN   40        50    20.0
12799   B008    Wed Nov 11 15:00:00   NaN   NaN   40        25    10.0

this is how these two dataframes were merged together:

DFinal = pd.merge(DF, d3, left_on=["room", "time"], right_on=["room", "time"], how="outer", left_index=False, right_index=False)

Any help with this would be greatly appreciated.

Thanks a lot,

-Romain

like image 913
RomainD Avatar asked Oct 19 '22 05:10

RomainD


1 Answers

Somewhere to start:

b = df[(df['time'] > X) & (df['time'] < Y)]

selects all the elements within times X and Y

And then

df.loc[df['column_name'].isin(b)]

Gives you the rows you want (ie - between X and Y) and you can just assign as you see fit. I think you'll want to assign the values of the selected rows to those of row number X?

Hope that helps.

Note that these function are cut and paste jobs from
[1] Filter dataframe rows if value in column is in a set list of values
[2] Select rows from a DataFrame based on values in a column in pandas

like image 96
draco_alpine Avatar answered Oct 21 '22 00:10

draco_alpine