Say I create a pandas DataFrame with two columns, b
(a DateTime) and c
(an integer). Now I want to make a DatetimeIndex from the values in the first column (b
):
import pandas as pd import datetime as dt a=[1371215423523845, 1371215500149460, 1371215500273673, 1371215500296504, 1371215515568529, 1371215531603530, 1371215576463339, 1371215579939113, 1371215731215054, 1371215756231343, 1371215756417484, 1371215756519690, 1371215756551645, 1371215756578979, 1371215770164647, 1371215820891387, 1371215821305584, 1371215824925723, 1371215878061146, 1371215878173401, 1371215890324572, 1371215898024253, 1371215926634930, 1371215933513122, 1371216018210826, 1371216080844727, 1371216080930036, 1371216098471787, 1371216111858392, 1371216326271516, 1371216326357836, 1371216445401635, 1371216445401635, 1371216481057049, 1371216496791894, 1371216514691786, 1371216540337354, 1371216592180666, 1371216592339578, 1371216605823474, 1371216610332627, 1371216623042903, 1371216624749566, 1371216630631179, 1371216654267672, 1371216714011662, 1371216783761738, 1371216783858402, 1371216783858402, 1371216783899118, 1371216976339169, 1371216976589850, 1371217028278777, 1371217028560770, 1371217170996479, 1371217176184425, 1371217176318245, 1371217190349372, 1371217190394753, 1371217272797618, 1371217340235667, 1371217340358197, 1371217340433146, 1371217340463797, 1371217340490876, 1371217363797722, 1371217363797722, 1371217363890678, 1371217363922929, 1371217523548405, 1371217523548405, 1371217551181926, 1371217551181926, 1371217551262975, 1371217652579855, 1371218091071955, 1371218295006690, 1371218370005139, 1371218370133637, 1371218370133637, 1371218370158096, 1371218370262823, 1371218414896836, 1371218415013417, 1371218415050485, 1371218415050485, 1371218504396524, 1371218504396524, 1371218504481537, 1371218504517462, 1371218586980079, 1371218719953887, 1371218720621245, 1371218738776732, 1371218937926310, 1371218954785466, 1371218985347070, 1371218985421615, 1371219039790991, 1371219171650043] b=[dt.datetime.fromtimestamp(t/1000000.) for t in a] c = {'b':b, 'c':a[:]} df = pd.DataFrame(c) df.set_index(pd.DatetimeIndex(df['b'])) print df
Everything seems to work fine, except that when I print the DataFrame, it says that it has an Int64Index.
<class 'pandas.core.frame.DataFrame'> Int64Index: 100 entries, 0 to 99 Data columns (total 2 columns): b 100 non-null values c 100 non-null values dtypes: datetime64[ns](1), int64(1)
Am I doing something wrong or do I not understand the concept of Indeces properly?
Pandas set_index() is a method to set a List, Series or Data frame as index of a Data Frame. Index column can be set while making a data frame too. But sometimes a data frame is made out of two or more data frames and hence later index can be changed using this method.
To reset the index in pandas, you simply need to chain the function . reset_index() with the dataframe object.
To set the DataFrame index using existing columns or arrays in Pandas, use the set_index() method. The set_index() function sets the DataFrame index using existing columns. The index can replace the existing index or expand on it. Pandas DataFrame is nothing but an in-memory representation of an excel sheet via Python.
set_index
is not inplace (unless you pass inplace=True
). otherwise all correct
In [7]: df = df.set_index(pd.DatetimeIndex(df['b'])) In [8]: df Out[8]: <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 100 entries, 2013-06-14 09:10:23.523845 to 2013-06-14 10:12:51.650043 Data columns (total 2 columns): b 100 non-null values c 100 non-null values dtypes: datetime64[ns](1), int64(1)
also as a FYI, in forthcoming 0.12 release (next week), you can pass unit=us
to specify units of microseconds since epoch
In [13]: pd.to_datetime(a,unit='us') Out[13]: <class 'pandas.tseries.index.DatetimeIndex'> [2013-06-14 13:10:23.523845, ..., 2013-06-14 14:12:51.650043] Length: 100, Freq: None, Timezone: None
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With