I'm looking for some advice on methodology/data structure/algorithmic approach for a problem I'm trying to solve.
I am writing a custom spreadsheet application in VBA. The spreadsheet is a labor scheduling & quote generation document. The user enters basic labor scheduling information which is then used to generate multiple different worksheets/documents with the source data presented in a variety of different layouts/formats.
To be honest Excel is the wrong application for this but it's what the users wanted and are comfortable with and I know VBA quite well so it's what I'm stuck with.
The key data the user inputs is in the below format, essentially an entry for each daily work call for each role.
╔═════╦════════╦════════════════╦════════════════╦═══════════════════╗
║ QTY ║ ROLE ║ START ║ END ║ DESCRIPTION ║
╠═════╬════════╬════════════════╬════════════════╬═══════════════════╣
║ 1 ║ Rigger ║ 6/15/17 08:00a ║ 6/15/17 04:00p ║ Travel to Prep ║
╠═════╬════════╬════════════════╬════════════════╬═══════════════════╣
║ 1 ║ Rigger ║ 6/16/17 08:00a ║ 6/16/17 06:00p ║ Prep ║
╠═════╬════════╬════════════════╬════════════════╬═══════════════════╣
║ 1 ║ Rigger ║ 6/17/17 08:00a ║ 6/17/17 07:00p ║ Prep ║
╠═════╬════════╬════════════════╬════════════════╬═══════════════════╣
║ 3 ║ Rigger ║ 6/18/17 06:00a ║ 6/18/17 05:00p ║ Travel to Install ║
╠═════╬════════╬════════════════╬════════════════╬═══════════════════╣
║ 3 ║ Rigger ║ 6/19/17 08:00a ║ 6/20/17 01:00a ║ Install ║
╠═════╬════════╬════════════════╬════════════════╬═══════════════════╣
║ 3 ║ Rigger ║ 6/20/17 10:00a ║ 6/20/17 08:00p ║ Install ║
╠═════╬════════╬════════════════╬════════════════╬═══════════════════╣
║ 3 ║ Rigger ║ 6/21/17 07:00a ║ 6/21/17 04:00p ║ Travel Home ║
╚═════╩════════╩════════════════╩════════════════╩═══════════════════╝
Typically the data is multiple roles across multiple days and will often have multiple instances of a role on some (but not necessarily all) days.
One of the data manipulations the code does is to take this source data and re-formats it into a summary table so that the user can assign names at a later date once people are assigned. It's also the basis for various other individual work-call sheets and calculating number of flights/hotel-nights etc.
╔══════╦═══════════╦═════════╦═════════╦═══════════════════════════════════════╗
║ NAME ║ ROLE ║ START ║ END ║ DESCRIPTION ║
╠══════╬═══════════╬═════════╬═════════╬═══════════════════════════════════════╣
║ ║ Rigger #1 ║ 6/15/17 ║ 6/21/17 ║ Trav | Prep | Trav | Install | Trav ║
╠══════╬═══════════╬═════════╬═════════╬═══════════════════════════════════════╣
║ ║ Rigger #2 ║ 6/18/17 ║ 6/21/17 ║ Trav | Install | Trav ║
╠══════╬═══════════╬═════════╬═════════╬═══════════════════════════════════════╣
║ ║ Rigger #3 ║ 6/18/17 ║ 6/21/17 ║ Trav | Install | Trav ║
╠══════╬═══════════╬═════════╬═════════╬═══════════════════════════════════════╣
║ ║ Sound ║ 6/15/17 ║ 6/22/17 ║ Trav | Install | Trav ║
╠══════╬═══════════╬═════════╬═════════╬═══════════════════════════════════════╣
║ ║ Crew #1 ║ 6/17/17 ║ 6/30/17 ║ Trav | Install | Show | Strike | Trav ║
╠══════╬═══════════╬═════════╬═════════╬═══════════════════════════════════════╣
║ ║ Crew #2 ║ 6/17/17 ║ 6/22/17 ║ Trav | Install | Trav ║
╠══════╬═══════════╬═════════╬═════════╬═══════════════════════════════════════╣
║ ║ Crew #2 ║ 6/26/17 ║ 6/30/17 ║ Trav | Strike | Trav ║
╚══════╩═══════════╩═════════╩═════════╩═══════════════════════════════════════╝
I convert the source data from n-qty rows into n x rows of qty 1 and append an instance count to the role value if there are multiple instances. This is currently achieved by looping the data array a few times and manipulating the data accordingly - psuedocode below
create 2-dimensional DataArray from source data
Loop DataArray
generate list of unique Roles
Sum all Qty values
Next
Create 2-dimensional OutputArray
size rows to match Sum of all Qty values in DataArray
size cols to match DataArray cols
//determine which UniqueRoles have multiple work instances
For Each UniqueRole in DataArray
Loop DataArray
Count unique Start Dates for UniqueRole
Sum Qty values for UniqueRole
Next
If Sum of UniqueRole Qtys > Count of UniqueRole unique Start Dates Then
Add UniqueRole to MultpleInstanceList
End If
Next UniqueRole
//copy data into new array, expand all n-qty rows into n x rows of 1 qty
Loop DataArray
Do While DataArray CurrentRow Qty Value > 1
Copy DataArray CurrentRow to OutputArray NewRow
Overwrite Qty value in OutputArray = 1
Reduce DataArray CurrentRow Qty value by 1
Loop
Copy DataArray CurrentRow to OutputArray NewRow
Next
//append count to Roles with multiple instances
For Each UniqueRole in MultipleInstanceList
Loop OutputArray
generate list of unique Start Dates for current UniqueRole
Next
For Each StartDate in UniqueStartDates
Loop OutputArray
generate row index list for matching UniqueRole AND StartDate
Next
initialize counter k = 1
For Each Row in RowIndexList
OutputArray(Row) Role value = Role value & " #" & k
k = k + 1
Next Row
Next StartDate
Next UniqueRoleVaue
I then generate the summary table from the expanded array.
This works fine for simple cases however when there are complex configurations it can produce inconsistant results when appending the instance number relative to the description value e.g…
╔═════╦════════╦═════════════════╦════════════════╦═══════════════════╗
║ QTY ║ ROLE ║ START ║ END ║ DESCRIPTION ║
╠═════╬════════╬═════════════════╬════════════════╬═══════════════════╣
║ 1 ║ Rigger ║ 6/15/17 08:00a ║ 6/15/17 04:00p ║ Travel to Prep ║
╠═════╬════════╬═════════════════╬════════════════╬═══════════════════╣
║ 1 ║ Rigger ║ 6/16/17 08:00a ║ 6/16/17 06:00p ║ Prep ║
╠═════╬════════╬═════════════════╬════════════════╬═══════════════════╣
║ 1 ║ Rigger ║ 6/17/17 08:00a ║ 6/17/17 07:00p ║ Prep ║
╠═════╬════════╬═════════════════╬════════════════╬═══════════════════╣
║ 3 ║ Rigger ║ 6/18/17 06:00a ║ 6/18/18 05:00p ║ Travel to Install ║
╠═════╬════════╬═════════════════╬════════════════╬═══════════════════╣
║ 3 ║ Rigger ║ 6/19/17 08:00a ║ 6/19/17 06:00p ║ Install ║
╠═════╬════════╬═════════════════╬════════════════╬═══════════════════╣
║ 1 ║ Rigger ║ 6/20/17 07:00a ║ 6/20/17 04:00p ║ Travel Home ║
╠═════╬════════╬═════════════════╬════════════════╬═══════════════════╣
║ 2 ║ Rigger ║ 6/20/17 08:00a ║ 6/20/17 06:00p ║ Install ║
╠═════╬════════╬═════════════════╬════════════════╬═══════════════════╣
║ 2 ║ Rigger ║ 6/21/17 07:00a ║ 6/21/17 04:00p ║ Travel Home ║
╚═════╩════════╩═════════════════╩════════════════╩═══════════════════╝
…will output…
╔═══════════╦═════════╦═════════╦════════════════════════════════════════════╗
║ ROLE ║ START ║ END ║ DESCRIPTION ║
╠═══════════╬═════════╬═════════╬════════════════════════════════════════════╣
║ Rigger #1 ║ 6/15/17 ║ 6/21/17 ║ Trav | Prep | Trav | Install | Trav | Trav ║
╠═══════════╬═════════╬═════════╬════════════════════════════════════════════╣
║ Rigger #2 ║ 6/18/17 ║ 6/21/17 ║ Trav | Install | Trav ║
╠═══════════╬═════════╬═════════╬════════════════════════════════════════════╣
║ Rigger #3 ║ 6/18/17 ║ 6/20/17 ║ Trav | Install ║
╚═══════════╩═════════╩═════════╩════════════════════════════════════════════╝
This entire operation occurs multiple times during normal document use and if the data order changes (which is possible) it can also produce inconsistant results between operations.
I'd prefer not to have to sort the source array as part of the operation as it's an expensive process which adds noticeable lag once the table gets into 10s of rows, even when using merge-sort or quick-sort.
I'm trying to keep this process as optimized as possible; a lot of other outputs use this expanded array, some of which effectively provide live feedback therefore the operation runs every time the user inputs data.
The list of possible descriptions that the user can select is pre-determined
╔═══════════════════════╗
║ Travel to Prep ║
╠═══════════════════════╣
║ Travel & Prep ║
╠═══════════════════════╣
║ Prep ║
╠═══════════════════════╣
║ Prep & Travel ║
╠═══════════════════════╣
║ Travel to Install ║
╠═══════════════════════╣
║ Travel & Install ║
╠═══════════════════════╣
║ Install ║
╠═══════════════════════╣
║ Travel to Show ║
╠═══════════════════════╣
║ Rehearsal ║
╠═══════════════════════╣
║ Show ║
╠═══════════════════════╣
║ Show & Dismantle ║
╠═══════════════════════╣
║ Travel to Dismantle ║
╠═══════════════════════╣
║ Dismantle ║
╠═══════════════════════╣
║ Travel Home ║
╠═══════════════════════╣
║ Travel to Site Survey ║
╠═══════════════════════╣
║ Site Survey ║
╠═══════════════════════╣
║ Dark Day ║
╚═══════════════════════╝
I think this is effectively a directed graph, the edges all have a direction (most are one way) and some nodes can self-loop. I have built an adjacency matrix for the above list as this seems to be a logical way to determine if the order of assignment is valid.
Is there is an efficient way to ensure all paths for a specific role are valid traversal paths and how best to re-assign the role instance numbers if one or more paths are not valid?
Or is it possible to use the sub-set of description values at each level of the path traversal during the expansion operation to ensure role instance numbers are assigned correctly to begin with?
Is there a particular area of graph theory that I should be looking at? Are graphs the right approach here? Is there an alternative approach that would work/be more efficient?
Any advice/help would be gratefully received.
Thanks
An array is the simplest and most widely used data structure. Other data structures like stacks and queues are derived from arrays.
No single data structure can be used as a solution to every problem. To make the best choice possible, we have to consider how the data structure will be used and how often. Try to choose the one that has the lowest costs for the operations done the most often. Compare different approaches!
Focusing on the following quote:
I'd prefer not to have to sort the source array as part of the operation as it's an expensive process which adds noticeable lag once the table gets into 10s of rows, even when using merge-sort or quick-sort.
I notice that you're referring to your source as an array. So are users entering information into a sheet range and from there you use VBA to load the data into an array? If so, I wonder if you are aware of Recordsets in VBA. From my recollection, they're not exactly straightforward to read data into, but once the data is in, then you can run a SQL query on it. This is not immune to performance issues, but I think you'll do better than taking a hit after 10's of records. And you'll probably not have to worry about sorting methodologies. If you're not familiar with SQL, I'd say it's definitely worth the time given the things you're trying to do.
Here's a guy who did it. He had a different reason for doing it, but you would be able to apply it in your case.
By the way, if your users are comfortable with Excel, then with work, they may still be comfortable with MS Access. No installation of new software, you can do SQL without loading data into new objects, and you can work with forms and develop reports more easily.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With