Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

merge dataframes by date after persision reduced

Thank you for looking at this....

Need to reduce the precision of IoT Sensor data timestamps and merge.

I have two csv files with the following data

CSV-1

datetime,temperature
2017-06-13 22:20:11.309,82.4
2017-06-13 22:19:54.004,82.4
2017-06-13 22:19:36.661,82.4
2017-06-13 22:19:19.359,82.4

CSV-2

datetime,humidity
2017-06-13 22:07:30.723,63.0
2017-06-13 22:07:13.448,63.0
2017-06-13 22:06:56.115,63.0
2017-06-13 22:06:38.806,63.0

Note that the datetime entries are to the millisecond. I am using the following code to reduce precision to seconds.

ugt = pd.read_csv('ugt.csv', parse_dates=True, index_col=0)
ugh = pd.read_csv('ugh.csv', parse_dates=True, index_col=0)

ugt.index = ugt.index.map(lambda x: x.replace(microsecond=0))
ugh.index = ugh.index.map(lambda x: x.replace(microsecond=0))

That produces the following dataframes:

                     temperature
datetime                        
2017-06-13 22:06:57         82.4 <---
2017-06-13 22:06:37         82.4
2017-06-13 22:06:20         82.4
2017-06-13 22:06:03         82.0 <---

                 humidity
datetime                     
2017-06-13 22:06:57      63.0 <---
2017-06-13 22:06:38      63.0
2017-06-13 22:06:21      63.0
2017-06-13 22:06:03      63.0 <---

Note that some of the timestamps match (see <---) to the second, others do not. this is due to limitations of the various sensors ability to perform readings. There is no consistently to frequency.

Then we create a master dataframe that are populated with rows for ever second of the day for the time period that we collected data from all the sensors.

                     temperature  humidity
2017-04-25 12:00:00            0         0
2017-04-25 12:00:01            0         0
2017-04-25 12:00:02            0         0
2017-04-25 12:00:03            0         0
2017-04-25 12:00:04            0         0

We can not figure out how to use pandas concat, merge, append the two csv files into the master dataframe based on datetime. What we want is the following:

                     temperature  humidity
2017-04-25 12:00:00            0         0
2017-04-25 12:00:01            82.0      0
2017-04-25 12:00:02            0         44.0
2017-04-25 12:00:03            0         0
2017-04-25 12:00:04            82.0      44.0
2017-04-25 12:00:05            0         0
2017-04-25 12:00:06            82.0      0
2017-04-25 12:00:07            0         0
2017-04-25 12:00:08            82.0      44.0

There are additional sensors we'll add in the future.... light, CO2, so almost every second will eventually have a column with data in it.

We also want to perform some analysis on how frequency various sensors are able to collect data and their accuracy, hence the use of the master dataframe.

You all rock! Thanks for your help.

like image 451
Steven Fowler Avatar asked Feb 16 '26 05:02

Steven Fowler


1 Answers

temp (temperature) dataframe:

                 datetime  temperature
0  2017-06-13 22:20:11.309         82.4
1  2017-06-13 22:19:54.004         82.4
2  2017-06-13 22:19:36.661         82.4
3  2017-06-13 22:19:19.359         82.4

humid dataframe:

                 datetime  humidity
0  2017-06-13 22:07:30.723      63.0
1  2017-06-13 22:07:13.448      63.0
2  2017-06-13 22:06:56.115      63.0
3  2017-06-13 22:06:38.806      63.0



temp.datetime = pd.to_datetime(temp.datetime) #convert to datetime dtype
temp.set_index('datetime', inplace=True) #make it the index
temp.index = temp.index.round('S') #and now round to the second

Now the temp dataframe looks like:

                     temperature
datetime                        
2017-06-13 22:20:11         82.4
2017-06-13 22:19:54         82.4
2017-06-13 22:19:37         82.4
2017-06-13 22:19:19         82.4

Do the same for humid df:

humid.datetime = pd.to_datetime(humid.datetime) 
humi.set_index('datetime', inplace=True) 
humid.index = humid.index.round('S') 

Now humid is:

                     humidity
datetime                     
2017-06-13 22:07:31      63.0
2017-06-13 22:07:13      63.0
2017-06-13 22:06:56      63.0
2017-06-13 22:06:39      63.0

Reindex temp, replace the dates as you like:

temp = temp.reindex(pd.DatetimeIndex(start='2017-06-13 22:00', end='2017-06-13 22:20', freq='S'))
temp.head()

                     temperature
2017-06-13 22:00:00          NaN
2017-06-13 22:00:01          NaN
2017-06-13 22:00:02          NaN
2017-06-13 22:00:03          NaN
2017-06-13 22:00:04          NaN

And now left join:

out = pd.merge(temp, humid, left_index=True, right_index=True, how='left')

out.head():
                     temperature  humidity
2017-06-13 22:00:00          NaN       NaN
2017-06-13 22:00:01          NaN       NaN
2017-06-13 22:00:02          NaN       NaN
2017-06-13 22:00:03          NaN       NaN
2017-06-13 22:00:04          NaN       NaN

Make sure this actually worked:

out.loc['2017-06-13 22:07:31']
                     temperature  humidity
2017-06-13 22:07:31          NaN      63.0

Hooray!

like image 52
cfort Avatar answered Feb 17 '26 19:02

cfort



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!