Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vectorized pandas pd.Timestamp operation

Tags:

python

pandas

I am trying to convert a column of a pandas dataframe stored as integer (yyyymmddHHMM format) into a Timestamp. This column is actually a pandas dataframe index. Consider the following MWE:

def get_digits(vector,first_digit,last_digit):
    return (vector // 10**last_digit) % 10**(first_digit-last_digit)

data = {'timestamp':[201911200830,201807131820],'value':[1,2]}
df_t=pd.DataFrame(data)

The operations to obtain the Year and the Month for instance:

df_t.timestamp.values // 10**10
get_digits(df_t.timestamp.values,10,8)

Yield array([2019, 2018]) and array([11, 7]).

Strangely, pd.Timestamp does not seem to support arrays as inputs since the operation only works for a single input as below:

pd.Timestamp(df_t.timestamp.values[0] // 10**8, get_digits(df_t.timestamp.values[0],8,6), get_digits(df_t.timestamp.values[0],6,4), get_digits(df_t.timestamp.values[0],4,2), get_digits(df_t.timestamp.values[0],2,0))

Results in Timestamp('2019-11-20 08:30:00') as I would expect. But if I drop the [0] index it gives me the following error in the MWE:

TypeError: Cannot convert input [[2019 2018]] of type class 'numpy.ndarray' to Timestamp

Any ideas on how to contour this error?

like image 394
brodoll Avatar asked Apr 25 '26 00:04

brodoll


1 Answers

Use to_datetime with specify format of data - %Y%m%d%H%M means YYYYMMDDHHMM:

df_t['timestamp'] = pd.to_datetime(df_t['timestamp'], format='%Y%m%d%H%M')
print (df_t)
            timestamp  value
0 2019-11-20 08:30:00      1
1 2018-07-13 18:20:00      2
like image 192
jezrael Avatar answered Apr 27 '26 13:04

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!