I''m learning to use pandas, to use it for some data analysis. The data is supplied as a csv file, with several columns, of which i only need to use 4 (date, time, o, c). I'll like to create a new DataFrame, which uses as index a DateTime64 number, this number is creating by merging the first two columns, applying pd.to_datetime on the merged string.
My loader code works fine:
st = pd.read_csv("C:/Data/stockname.txt", names=["date","time","o","h","l","c","vol"])
The challenge is converting the loaded DataFrame into a new one, with the right format. The below works but is very slow. Moreover, it just makes one column with the new datetime64 format, and doesnt make it the index.
My code
st_new = pd.concat([pd.to_datetime(st.date + " " + st.time), (st.o + st.c) / 2, st.vol],
axis = 1, ignore_index=True)
What would be a more pythonic way to merge two columns, and apply a function into the result? How to make the new column to be the index of the DataFrame?
You can do everythin in the read_csv
function:
pd.read_csv('test.csv',
parse_dates={'timestamp': ['date','time']},
index_col='timestamp',
usecols=['date', 'time', 'o', 'c'])
parse_dates
tells the read_csv
function to combine the date
and time
column into one timestamp
column and parse it as a timestamp. (pandas is smart enough to know how to parse a date in various formats)
index_col
sets the timestamp
column to be the index.
usecols
tells the read_csv
function to select only the subset of the columns.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With