Thank you for looking at this....
Need to reduce the precision of IoT Sensor data timestamps and merge.
I have two csv files with the following data
CSV-1
datetime,temperature
2017-06-13 22:20:11.309,82.4
2017-06-13 22:19:54.004,82.4
2017-06-13 22:19:36.661,82.4
2017-06-13 22:19:19.359,82.4
CSV-2
datetime,humidity
2017-06-13 22:07:30.723,63.0
2017-06-13 22:07:13.448,63.0
2017-06-13 22:06:56.115,63.0
2017-06-13 22:06:38.806,63.0
Note that the datetime entries are to the millisecond. I am using the following code to reduce precision to seconds.
ugt = pd.read_csv('ugt.csv', parse_dates=True, index_col=0)
ugh = pd.read_csv('ugh.csv', parse_dates=True, index_col=0)
ugt.index = ugt.index.map(lambda x: x.replace(microsecond=0))
ugh.index = ugh.index.map(lambda x: x.replace(microsecond=0))
That produces the following dataframes:
temperature
datetime
2017-06-13 22:06:57 82.4 <---
2017-06-13 22:06:37 82.4
2017-06-13 22:06:20 82.4
2017-06-13 22:06:03 82.0 <---
humidity
datetime
2017-06-13 22:06:57 63.0 <---
2017-06-13 22:06:38 63.0
2017-06-13 22:06:21 63.0
2017-06-13 22:06:03 63.0 <---
Note that some of the timestamps match (see <---) to the second, others do not. this is due to limitations of the various sensors ability to perform readings. There is no consistently to frequency.
Then we create a master dataframe that are populated with rows for ever second of the day for the time period that we collected data from all the sensors.
temperature humidity
2017-04-25 12:00:00 0 0
2017-04-25 12:00:01 0 0
2017-04-25 12:00:02 0 0
2017-04-25 12:00:03 0 0
2017-04-25 12:00:04 0 0
We can not figure out how to use pandas concat, merge, append the two csv files into the master dataframe based on datetime. What we want is the following:
temperature humidity
2017-04-25 12:00:00 0 0
2017-04-25 12:00:01 82.0 0
2017-04-25 12:00:02 0 44.0
2017-04-25 12:00:03 0 0
2017-04-25 12:00:04 82.0 44.0
2017-04-25 12:00:05 0 0
2017-04-25 12:00:06 82.0 0
2017-04-25 12:00:07 0 0
2017-04-25 12:00:08 82.0 44.0
There are additional sensors we'll add in the future.... light, CO2, so almost every second will eventually have a column with data in it.
We also want to perform some analysis on how frequency various sensors are able to collect data and their accuracy, hence the use of the master dataframe.
You all rock! Thanks for your help.
temp (temperature) dataframe:
datetime temperature
0 2017-06-13 22:20:11.309 82.4
1 2017-06-13 22:19:54.004 82.4
2 2017-06-13 22:19:36.661 82.4
3 2017-06-13 22:19:19.359 82.4
humid dataframe:
datetime humidity
0 2017-06-13 22:07:30.723 63.0
1 2017-06-13 22:07:13.448 63.0
2 2017-06-13 22:06:56.115 63.0
3 2017-06-13 22:06:38.806 63.0
temp.datetime = pd.to_datetime(temp.datetime) #convert to datetime dtype
temp.set_index('datetime', inplace=True) #make it the index
temp.index = temp.index.round('S') #and now round to the second
Now the temp dataframe looks like:
temperature
datetime
2017-06-13 22:20:11 82.4
2017-06-13 22:19:54 82.4
2017-06-13 22:19:37 82.4
2017-06-13 22:19:19 82.4
Do the same for humid df:
humid.datetime = pd.to_datetime(humid.datetime)
humi.set_index('datetime', inplace=True)
humid.index = humid.index.round('S')
Now humid is:
humidity
datetime
2017-06-13 22:07:31 63.0
2017-06-13 22:07:13 63.0
2017-06-13 22:06:56 63.0
2017-06-13 22:06:39 63.0
Reindex temp, replace the dates as you like:
temp = temp.reindex(pd.DatetimeIndex(start='2017-06-13 22:00', end='2017-06-13 22:20', freq='S'))
temp.head()
temperature
2017-06-13 22:00:00 NaN
2017-06-13 22:00:01 NaN
2017-06-13 22:00:02 NaN
2017-06-13 22:00:03 NaN
2017-06-13 22:00:04 NaN
And now left join:
out = pd.merge(temp, humid, left_index=True, right_index=True, how='left')
out.head():
temperature humidity
2017-06-13 22:00:00 NaN NaN
2017-06-13 22:00:01 NaN NaN
2017-06-13 22:00:02 NaN NaN
2017-06-13 22:00:03 NaN NaN
2017-06-13 22:00:04 NaN NaN
Make sure this actually worked:
out.loc['2017-06-13 22:07:31']
temperature humidity
2017-06-13 22:07:31 NaN 63.0
Hooray!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With