How to merge two DataFrame columns and apply pandas.to_datetime to it?

Question

I''m learning to use pandas, to use it for some data analysis. The data is supplied as a csv file, with several columns, of which i only need to use 4 (date, time, o, c). I'll like to create a new DataFrame, which uses as index a DateTime64 number, this number is creating by merging the first two columns, applying pd.to_datetime on the merged string.

My loader code works fine:

st = pd.read_csv("C:/Data/stockname.txt", names=["date","time","o","h","l","c","vol"])

The challenge is converting the loaded DataFrame into a new one, with the right format. The below works but is very slow. Moreover, it just makes one column with the new datetime64 format, and doesnt make it the index.

My code

st_new = pd.concat([pd.to_datetime(st.date + " " + st.time), (st.o + st.c) / 2, st.vol], 
     axis = 1, ignore_index=True)

What would be a more pythonic way to merge two columns, and apply a function into the result? How to make the new column to be the index of the DataFrame?

Viktor Kerkez · Accepted Answer

You can do everythin in the read_csv function:

pd.read_csv('test.csv',
            parse_dates={'timestamp': ['date','time']},
            index_col='timestamp',
            usecols=['date', 'time', 'o', 'c'])

parse_dates tells the read_csv function to combine the date and time column into one timestamp column and parse it as a timestamp. (pandas is smart enough to know how to parse a date in various formats)

index_col sets the timestamp column to be the index.

usecols tells the read_csv function to select only the subset of the columns.

How to merge two DataFrame columns and apply pandas.to_datetime to it?

Tags:

python

pandas

Alessandro Quattrocchi

1 Answers

Viktor Kerkez

Recent Activity

Donate For Us

How to merge two DataFrame columns and apply pandas.to_datetime to it?

Tags:

python

pandas

Alessandro Quattrocchi

1 Answers

Viktor Kerkez

Related questions

Recent Activity

Donate For Us