I retrieve data from a SQLITE Database (and transform it to a pandas dataframe) in the following format:
Driver | Date loading | Date unloading | Loading Adress | Unloading Address
Peter | 02.05.2020 | 03.05.2020 | 12342, Berlin | 14221, Utrecht
Peter | 03.05.2020 | 04.05.2020 | 14221, Utrecht | 13222, Amsterdam
Franz | 03.05.2020 | 03.05.2020 | 11111, Somewher| 11221, Somewhere2
Franz | 03.05.2020 | 05.05.2020 | 11223, Upsalla | 14231, Berlin
The date range can be specified for the query, so that it gives an overview over which driver has which transports to deliver within the specified date range, ordered by date.
The goal of the transformation I want to do is a weekly plan for each driver, with the dates from the range sorted in the available columns. So for the data above, this would look like the following:
Driver | 02.05.2020 | 03.05.2020 | 04.05.2020 | 05.05.2020 |
Peter | Loading: | Unloading: Unloading:
12342, Berlin 14221, Utrecht 13222, Amsterdam
Loading:
14221, Utrecht
Franz | | Loading: | | Unloading:
11111, Somewher 14231, Berlin
Unloading:
11221, Somewhere2
Loading:
11223, Upsalla
Is there any way to achieve the described output with dataframe operations? Within the single data columns I will need to keep the order, which is loading first, unloading second, and then go to the next data row if the date is the same.
I will write my pseudoish solution, where actually it is a solution, it only lacks a single entity, task_id
, I will elaborate this later. I will call your dateframe (1st one in question) df
and I create a transformed version as t_df
. This t_df
will be an unpivoted table where dates and addresses are unified.
I will create a data frame as follows:
Driver | Date | Task | Address
Peter | 02.05.2020 | Loading | 12342, Berlin
Peter | 03.05.2020 | Unloading | 14221, Utrecht
with this unpivot data frame I can now pivot it as you wanted Like a schedule.
m,n = df.shape
t_df = pd.DataFrame(columns=['driver', 'date', 'task', 'address'])
t_df['Driver'] = df['Driver'].tolist() * 2
t_df['Date'] = df['Date loading'].tolist() + df['Date unloading'].tolist()
t_df['Address'] = df['Loading Address'].tolist() + df['Unloading Address'].tolist()
t_df['Task'] = ['Loading'] * m + ['Unloading'] * m
Now, I add the values task + address as a column.
t_df['Compound'] = t_df[['Task', 'Address']].agg(': '.join, axis=1)
concat_array = lambda x: '; '.join(x)
schedule = pd.crosstab(index=t_df['Driver'], columns=t_df['Date'], values=t_df['Compound'],
aggfunc=concat_array)
I will get the following data frame:
Date 02.05.2020 ... 05.05.2020
Driver ...
Franz NaN ... Unloading: 14231, Berlin
Peter Loading: 12342, Berlin ... NaN
Now, as I said at the very beginning of the answer, you need some kind of task identifier to match which task belongs to which assuming there are multiple load & unload operations in the same day. You need to assign some kind of task_id
and then put it in the Compound
column.
Note: I used '; '
for separation of the tasks, you may want to use something else.
You can check complete code file in gist.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With