Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Separating groups into nearly-equal stacks

I have a list of documents and I want to display them grouped by the first letter of their name on a web-page, over three columns.

In short, something like this:

A | C | E
A | D | F
B | D | F
B | D | F
  | D | 

An important difference from say, the Windows Explorer view style is that I want letters to stay with each other. No breaking mid-group. To accommodate this, I don't care if one column is a couple of entries too tall.

I've started by sorting the array of documents by name and splitting them into a nested array. So I know (or can easily find out):

  • How many unique letters there are
  • How many letters there are in each group
  • How many entries there are in total
  • The mean average of how many values there should be in each column (ideally but not neccessarily)

I don't care what your answers come in. I'm looking for the algorithm rather than the implementation so you can code in anything you like (except perhaps Fortran). an explanation in HTML might be a toughie too.

I invite somebody to go wild on the tags because I couldn't think of anything relevant and no, this isn't homework, so please don't mark it as such.

like image 652
Oli Avatar asked Dec 22 '22 12:12

Oli


2 Answers

Perhaps it helps if you look at the problem like this:

For your example, you have a string like this:

AA BB C DDDD E FFF

The space positions are the places where you could start a new column. Everywhere else you mustn't to keep same letters in the same column. So you actually can mark the space position like this:

AA1BB2C3DDDD4E5FFF

Now you have 5 positions where you can either break the column or not, as it's a binary decision, use a string of 0's and 1's for this and brute force every possible combination:

12345

00000 -> no break at all, column count = 1, max. lines = 13
...
01010 -> your example, column count = 3, max. lines = 5
...
11111 -> breaks everywhere, column count = 6, max. lines = 4

This is a brute force attempt, but you can easily see that the 1's count affects the column count (column count = number of 1's + 1) and you want to minimize max. lines, this should be possible to somehow calculate without having to test each combination.

EDIT2: Didn't recognize you want 3 columns, this makes it easier as you know you will have only 3 1's, but it's still brute force.

EDIT: Another approach I'd favorize:

Write the letter counts like this:

A B C D E F
2 2 1 4 1 3

You can now join letters that are next to each other. Always chose the two letters with the lowest count sum:

2 2 1 4 1 3 - lowest = "2 1"
2  3  4 1 3 - lowest = "1 3"
2  3  4  4  - lowest = "2 3"
  5   4  4  - stop now, as we have 3 columns now

Result: AABBC, DDDD, EFFF

This perhaps won't lead to the optimal solution, but it's a nice and easy way to solve your problem, I think.

like image 70
schnaader Avatar answered Jan 15 '23 02:01

schnaader


Well, you can expect always to have some extra rows in each column. I mean, if you have 2 A's, 2 B's and 33 C's, then the third column would be pretty tall compared to others.

It's not the Knapsack problem because they have to be in order.

What you can do is:

  • Count the number of items.
  • See where the first third would fall.
  • If it's exactly a letter change spot, then you're lucky :)
  • If not, then minimize the distance between the third part split spot and the previous / next letter change spot - i.e. if there's a letter change 2 entries before and 10 entries after, then go for the previous one.
  • Finally, take the rest, divide by two and follow the same logic to split as near as you can from the mean value.
like image 40
Seb Avatar answered Jan 15 '23 01:01

Seb