Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to merge two DataFrame columns and apply pandas.to_datetime to it?

Tags:

python

pandas

I''m learning to use pandas, to use it for some data analysis. The data is supplied as a csv file, with several columns, of which i only need to use 4 (date, time, o, c). I'll like to create a new DataFrame, which uses as index a DateTime64 number, this number is creating by merging the first two columns, applying pd.to_datetime on the merged string.

My loader code works fine:

st = pd.read_csv("C:/Data/stockname.txt", names=["date","time","o","h","l","c","vol"])

The challenge is converting the loaded DataFrame into a new one, with the right format. The below works but is very slow. Moreover, it just makes one column with the new datetime64 format, and doesnt make it the index.

My code

st_new = pd.concat([pd.to_datetime(st.date + " " + st.time), (st.o + st.c) / 2, st.vol], 
     axis = 1, ignore_index=True)

What would be a more pythonic way to merge two columns, and apply a function into the result? How to make the new column to be the index of the DataFrame?

like image 493
Alessandro Quattrocchi Avatar asked Aug 07 '13 23:08

Alessandro Quattrocchi


1 Answers

You can do everythin in the read_csv function:

pd.read_csv('test.csv',
            parse_dates={'timestamp': ['date','time']},
            index_col='timestamp',
            usecols=['date', 'time', 'o', 'c'])

parse_dates tells the read_csv function to combine the date and time column into one timestamp column and parse it as a timestamp. (pandas is smart enough to know how to parse a date in various formats)

index_col sets the timestamp column to be the index.

usecols tells the read_csv function to select only the subset of the columns.

like image 103
Viktor Kerkez Avatar answered Oct 27 '22 00:10

Viktor Kerkez