I have to work with time series data imported from some CSVs which may look like this:
import pandas as pd
csv_a = [["Sensor_1", '2019-05-25 10:00', 25, 60],
["Sensor_2", '2019-05-25 10:00', 30, 45],
["Sensor_1", '2019-05-25 10:05', 26, None],
["Sensor_2", '2019-05-25 10:05', 30, 46],
["Sensor_1", '2019-05-25 10:10', 27, 63],
["Sensor_1", '2019-05-25 10:20', 28, 62]]
df_a = pd.DataFrame(csv_a, columns=["Sensor", "Timestamp", "Temperature", "Humidity"])
df_a["Timestamp"] = (pd.to_datetime(df_a["Timestamp"]))
csv_b = [["Sensor_1", '2019-05-25 10:05', 1020],
["Sensor_2", '2019-05-25 10:05', 956],
["Sensor_3", '2019-05-25 10:05', 990],
["Sensor_1", '2019-05-25 10:10', 1021],
["Sensor_2", '2019-05-25 10:10', 957],
["Sensor_3", '2019-05-25 10:10', 992],
["Sensor_1", '2019-05-25 10:15', 1019]]
df_b = pd.DataFrame(csv_b, columns=["Sensor", "Timestamp", "Pressure"])
df_b["Timestamp"] = (pd.to_datetime(df_b["Timestamp"]))
As you can see, we have 3 sensors. Each sensor has its own time series with measures of temperature, humidity and pressure. However, the data is fragmented in two CSVs and it can have many gaps, etc.
The objetive is to join all the data in one ordered and regular dataframe like this:
Timestamp Sensor Temperature Humidity Pressure
0 2019-05-25 10:00:00 Sensor_1 25.0 60.0 NaN
1 2019-05-25 10:00:00 Sensor_2 30.0 45.0 NaN
2 2019-05-25 10:00:00 Sensor_3 NaN NaN NaN
3 2019-05-25 10:05:00 Sensor_1 26.0 NaN 1020.0
4 2019-05-25 10:05:00 Sensor_2 30.0 46.0 956.0
5 2019-05-25 10:05:00 Sensor_3 NaN NaN 990.0
6 2019-05-25 10:10:00 Sensor_1 27.0 63.0 1021.0
7 2019-05-25 10:10:00 Sensor_2 NaN NaN 957.0
8 2019-05-25 10:10:00 Sensor_3 NaN NaN 992.0
9 2019-05-25 10:15:00 Sensor_1 NaN NaN 1019.0
10 2019-05-25 10:15:00 Sensor_2 NaN NaN NaN
11 2019-05-25 10:15:00 Sensor_3 NaN NaN NaN
12 2019-05-25 10:20:00 Sensor_1 28.0 62.0 NaN
13 2019-05-25 10:20:00 Sensor_2 NaN NaN NaN
14 2019-05-25 10:20:00 Sensor_3 NaN NaN NaN
The logic behind this is to realize that, globally speaking, the data in the CSVs starts at 10:00 and ends at 10:20. And that we have 3 possible variables for 3 different sensors. So I want the 2 first columns (Timestamp and Sensor) to be regular, ordered and without gaps. The remaining columns (Temperature, Humidity and Pressure) will be filled when possible with the data from the CSV.
I have tried to perform this using the pandas merge function in many different ways but I can't obtain the result I want. I hope someone more experienced can help me.
Combine Two Series Using pandas. merge() can be used for all database join operations between DataFrame or named series objects. You have to pass an extra parameter “name” to the series in this case. For instance, pd. merge(S1, S2, right_index=True, left_index=True) .
Pandas merge() function is used to merge multiple Dataframes. We can use either pandas. merge() or DataFrame. merge() to merge multiple Dataframes.
You can create a DataFrame from multiple Series objects by adding each series as a columns. By using concat() method you can merge multiple series together into DataFrame. This takes several params, for our scenario we use list that takes series to combine and axis=1 to specify merge series as columns instead of rows.
First join both DataFrame
s together by concat
with DataFrame.set_index
and if possible duplicates use sum for unique MultiIndex
created by timestamps and Sensor
s.
Then add missing rows with DataFrame.reindex
by MultiIndex.from_product
with minumal and maximal dates by date_range
:
df = (pd.concat([df_a.set_index(['Timestamp','Sensor']),
df_b.set_index(['Timestamp','Sensor'])], sort=True)
.sum(level=[0,1],min_count=1))
d = df.index.get_level_values(0)
mux = pd.MultiIndex.from_product([pd.date_range(d.min(), d.max(), freq='5Min'),
df.index.get_level_values(1).unique()], names=df.index.names)
df = df.reindex(mux).reset_index()
print (df)
Timestamp Sensor Humidity Pressure Temperature
0 2019-05-25 10:00:00 Sensor_1 60.0 NaN 25.0
1 2019-05-25 10:00:00 Sensor_2 45.0 NaN 30.0
2 2019-05-25 10:00:00 Sensor_3 NaN NaN NaN
3 2019-05-25 10:05:00 Sensor_1 NaN 1020.0 26.0
4 2019-05-25 10:05:00 Sensor_2 46.0 956.0 30.0
5 2019-05-25 10:05:00 Sensor_3 NaN 990.0 NaN
6 2019-05-25 10:10:00 Sensor_1 63.0 1021.0 27.0
7 2019-05-25 10:10:00 Sensor_2 NaN 957.0 NaN
8 2019-05-25 10:10:00 Sensor_3 NaN 992.0 NaN
9 2019-05-25 10:15:00 Sensor_1 NaN 1019.0 NaN
10 2019-05-25 10:15:00 Sensor_2 NaN NaN NaN
11 2019-05-25 10:15:00 Sensor_3 NaN NaN NaN
12 2019-05-25 10:20:00 Sensor_1 62.0 NaN 28.0
13 2019-05-25 10:20:00 Sensor_2 NaN NaN NaN
14 2019-05-25 10:20:00 Sensor_3 NaN NaN NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With