Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - NumPy.Where with a Dictionary

I might be doing this wrong, or there might be a much better way than this, as i am still new to Python. Apologies upfront for any obvious mistakes.

I have a Pandas Dataframe with a STR column that holds a Date and Time. It is STR because the times are "Broadcast" formatted, which means there are 29 hours in the day. so we will see dates like 01/Jan/2018 29:59:59. As 1 second to that and its 02/Jan/2018 06:00:00.

My goal here is to convert this data to a real time. Which means any hour between 24 and 29 requires a date shift too. I have already split the STR into 2 new Columns ['Dt'] and ['Ti'], from ['Ti'], pulled out the Hour to a new Column as ['Hr'] and made it an INT.

I then applied a pd.to_datetime to the ['Dt'] and added a rule.

df['Dt'] = np.where(df['Hr'] > 23, df['Dt']+pd.DateOffset(1),df['Dt']+pd.DateOffset(0) )

this works perfect.

I now need to change the Hour to be real time, eg, 24 = 00, 25 = 02 etc.

I thought the best way was to use a DICT and map it, so i made a DICT,

HourMap = {'24':'00','25':'01','26':'02','27':'03','28':'04','29':'05','30':'06'}  

Then wrote this

df['Hr1'] = np.where(df['Hr'] > 23, df.replace({'Hr':HourMap}),df['Hr'])

But I get a "ValueError"

ValueError: operands could not be broadcast together with shapes (273,) (273,29) (273,)

I have looked at those rows in the dataframe and they are just normal INTs. On testing I can apply Maths to them (eg. df['Test'] = df['Hr'] + 1.

I did convert them to STR and try the same rules, but got the same error.

Am I just crazy?

Thanks,

like image 539
Runawaygeek Avatar asked Feb 25 '26 06:02

Runawaygeek


2 Answers

I believe need change:

df.replace({'Hr':HourMap})

to map and if some values is not matched and returned NaNs replace it to original values by fillna:

df['Hr'].map(HourMap).fillna(df['Hr'])
#alternative solution if performance is not important in large df
#df['Hr'].replace(HourMap)

because df.replace return all columns of DataFrame with replaced column Hr

like image 118
jezrael Avatar answered Feb 26 '26 19:02

jezrael


You really shouldn't be using a dictionary here, you don't even need the np.where. Use the modulo operator

In [1]: import numpy as np
In [2]: np.arange(31)%24
Out[2]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23,  0,  1,  2,  3,  4,  5, 6], dtype=int32)

You have numbers that 'wrap around' at 24, this is the text book use case for modulo. So the full code just becomes:

df['Hr1'] = df['Hr'] % 24

Also by the same token you can add to your dates without np.where by just making use of integer division

df['Dt'] = df['Dt']+pd.DateOffset(Df['Hr']//24)
like image 43
Dan Avatar answered Feb 26 '26 18:02

Dan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!