Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading a csv-file with pandas.read_csv and an index creates NaN entries

My .csv-file is comma separated, which is the standard setting from read_csv.

This is working:

T1 = pd.DataFrame(pd.read_csv(loggerfile, header = 2)) #header contains column "1"

But as soon as I add something to DataFrame's constructor besides the read_csv, all my values are suddenly NaN. Why? How to solve this?

datetimeIdx = pd.to_datetime( T1["1"] )                #timestamp-column
T2 = pd.DataFrame(pd.read_csv(loggerfile, header = 2), index = datetimeIdx)
like image 670
user2366975 Avatar asked Dec 26 '22 11:12

user2366975


1 Answers

It's not necessary to wrap read_csv in a DataFrame call, as it already returns a DataFrame.

If you want to change the index, you can use set_index or directly set the index:

T1 = pd.read_csv(loggerfile, header = 2)
T1.index = pd.DatetimeIndex(T1["1"])

If you want to keep the column in the dataframe as a datetime (and not string):

T1 = pd.read_csv(loggerfile, header = 2)
T1["1"] = pd.DatetimeIndex(T1["1"])
T2 = T1.set_index("1", drop=False)

But even better, you can do this directly in read_csv (assuming the column "1" is the first column):

pd.read_csv(loggerfile, header=2, index_col=0, parse_dates=True)

The reason it returns a DataFrame with NaNs is because the DataFrame() call with a DataFrame as input will do a reindex operation with the provided input. As none of the labels in datetimeIdx are in the original index of T1 you get a dataframe with all NaNs.

like image 122
joris Avatar answered Apr 07 '23 11:04

joris