I have three dataframes: timestamp (with timestamps), dataSun (with timestamps of sunrise and sunset), dataData (with different climate data). Dataframe timestamp
has datatype "int64"
.
timestamp.head()
timestamp
0 1521681600000
1 1521681900000
2 1521682200000
3 1521682500000
4 1521682800000
Dataframe dataSun
has also datatype "int64"
.
dataSun.head()
sunrise sunset
0 1521696105000 1521740761000
1 1521696105000 1521740761000
2 1521696105000 1521740761000
3 1521696105000 1521740761000
4 1521696105000 1521740761000
Dataframe with climate data dataData
has datatype "float64"
.
dataData.head()
temperature pressure humidity
0 2.490000 1018.000000 99.0
1 2.408333 1017.833333 99.0
2 2.326667 1017.666667 99.0
3 2.245000 1017.500000 99.0
4 2.163333 1017.333333 99.0
5 2.081667 1017.166667 99.0
I want to concatenate these three dataframes in one.
dataResult = pd.concat((timestamp, dataSun, dataData), axis = 1)
dataResult.head()
timestamp sunrise sunset temperature pressure
0 1521681600000 1.521696e+12 1.521741e+12 2.490000 1018.000000
1 1521681900000 1.521696e+12 1.521741e+12 2.408333 1017.833333
2 1521682200000 1.521696e+12 1.521741e+12 2.326667 1017.666667
3 1521682500000 1.521696e+12 1.521741e+12 2.245000 1017.500000
4 1521682800000 1.521696e+12 1.521741e+12 2.163333 1017.333333
5 1521683100000 1.521696e+12 1.521741e+12 2.081667 1017.166667
weatherMeasurements.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7188 entries, 0 to 7187
Data columns (total 6 columns):
timestamp 7188 non-null int64
sunrise 7176 non-null float64
sunset 7176 non-null float64
temperature 7176 non-null float64
pressure 7176 non-null float64
humidity 7176 non-null float64
dtypes: float64(5), int64(1)
Why pd.concat
has changes the datatype of the values DataSun
? I have tried different ways to concatenate the dataframes. For example, I concatenated only timestamp
and dataSun
in one dataframe, then I concatenated resulted dataframe with dataData
. But it was the same result.
How can I concatenate three dataframes and secure the datatypes?
Because of this -
timestamp 7188 non-null int64
sunrise 7176 non-null float64
...
timestamp
has 7188 non-null values, while sunrise
and onwards have 7176. It goes without saying that there are 12 values that are not non-null... meaning they're NaNs.
Since NaNs are of dtype=float
, every other value in that column is automatically upcasted to float, and float numbers that big are usually represented in scientific notation.
That's the why, but that doesn't really solve your problem. Your options at this point are
dropna
fillna
(Now you may downcast these rows to int.)
Alternatively, if you perform pd.concat
with join='inner'
, NaNs are not introduced and the dtypes are preserved.
pd.concat((timestamp, dataSun, dataData), axis=1, join='inner')
timestamp sunrise sunset temperature pressure \
0 1521681600000 1521696105000 1521740761000 2.490000 1018.000000
1 1521681900000 1521696105000 1521740761000 2.408333 1017.833333
2 1521682200000 1521696105000 1521740761000 2.326667 1017.666667
3 1521682500000 1521696105000 1521740761000 2.245000 1017.500000
4 1521682800000 1521696105000 1521740761000 2.163333 1017.333333
humidity
0 99.0
1 99.0
2 99.0
3 99.0
4 99.0
With option 3, an inner join is performed on the indexes of each dataframe.
As of pandas 1.0.0 I believe you have another option, which is to first use convert_dtypes. This converts the dataframe columns to dtypes that support pd.NA, avoiding the issues with NaNs discussed in this answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With