Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for evenly spacing list items (playlist songs) along several categories (id3 tags)

I am having trouble designing an algorithm to assist in the creation of an mp3 playlist, although an algorithm for the more general case of evenly spacing items in a list could be adapted for my use.

The general case is that I would like to reorder items in a list to maximize their diversity along several axes.

My specific use-case is that I want to dump a bunch of songs into a collection, then run my algorithm over the collection to generate an ordered playlist. I would like the order to follow this set of criteria:

  1. maximize the distance between instances of the same artist
  2. maximize the distance between instances of the same genre
  3. maximize the distance between instances of category X
  4. etc for N categories

Obviously we could not guarantee to optimize ALL categories equally, so the first category would be weighted most important, second weighted less, etc. I definitely want to satisfy the first two criteria, but making the algorithm extensible to satisfy N would be fantastic. Maximizing randomness (shuffle) is not a priority. I just want to diversify the listening experience no matter where I come in on the playlist.

This seems close to the problem described and solved here, but I can't wrap my head around how to apply this when all of the items are in the same list with multiple dimensions, rather than separate lists of differing sizes.

This seems like a problem that would have been solved many times by now but I am not able to find any examples of it.

like image 402
Jay Avatar asked Aug 06 '15 20:08

Jay


2 Answers

This should be much faster than brute-force:

  1. Order all the songs randomly.

  2. Compute the weights for each song slot (i.e. how close is it to the same artist/genre/etc.). It will be a number from 1-N indicating how may songs away it is from a match. Lower is worse.

  3. Take the song with the lowest weight, and swap that song with a random other song.

  4. Re-compute the weights of the swapped songs. If either got worse, reverse the swap and go back to 3.

  5. For debugging, print the "lowest weight" and overall average weight. (debugging)

  6. Go to 2

You won't find the optimal this way, but it should give mediocre results pretty fast, and eventually improve.

Step 2 can be made fast this way: (pseudo code in Ruby)

# Find the closest match to a song in slot_number
def closest_match(slot_number)
  # Note: MAX can be less than N. Maybe nobody cares about songs more than 20 steps away.
  (1..MAX).each |step|
    return step if matches?(slot_number+step, slot_number) or matches?(slot_number-step, slot_number)
  end
  return MAX
end

# Given 2 slots, do the songs there match?
# Handles out-of-bounds
def matches?(x,y)
  return false if y > N or y < 1
  return false if x > N or x < 1
  s1 = song_at(x)
  s2 = song_at(y)
  return true if s1.artist == s2.artist or s1.genre == s2.genere
  return false
end

You also don't have to re-compute the whole array: If you cache the weights, you only need to recompute songs that have weight >=X if they are X steps away from a swapped song. Example:

|  Song1   |  Song2   |   Song3  |  Song4   |  Song5  |
| Weight=3 | Weight=1 | Weight=5 | Weight=3 | Weight=2|

If you are swapping Song 2, you don't have to re-compute song 5: It's 3 steps away from Song 3, but it's weight was 2, so it won't "see" Song 3.

like image 183
BraveNewCurrency Avatar answered Nov 15 '22 17:11

BraveNewCurrency


Your problem is probably NP-hard. To get a sense of it, here's a reduction to CLIQUE (an NP-hard problem). That doesn't prove that your problem is NP-hard, but at least gives an idea that there is a connection between the two problems. (To show definitively that your problem is NP-hard, you need a reduction in the other direction: show that CLIQUE can be reduced to your problem. I feel that it is possible, but getting the details right is fussy.)

Suppose you have n=6 songs, A, B, C, D, E, and F. Lay them out in a chart like this:

1 2 3 4 5 6
A A A A A A
B B B B B B
C C C C C C
D D D D D D
E E E E E E
F F F F F F

Connect each item in column 1 with an edge to every other item in every other column, except for items in the same row. So A in column 1 is connected to B, C, D, E, F in column 2, B, C, D, E, F in column 3, and so on. There are n^2 = 36 nodes in the graph and n*(n-1)^2 + n*(n-1)*(n-2) + n*(n-1)*(n-3) + ... = n*(n-1)*n*(n-1)/2 = O(n^4) edges in the graph.

A playlist is a maximum clique in this graph, in other words a selection which is mutually consistent (no song is played twice). So far, not so hard: it's possible to find many maximum cliques very quickly (just permutations of the songs).

Now we add information about the similarity of the songs as edge weights. Two songs that are similar and close get a low edge weight. Two songs that are similar and far apart get a higher edge weight. Now the problem is to find a maximum clique with maximum total edge weight, in other words the NP-hard problem CLIQUE.

There are some algorithms for attacking CLIQUE, but of course they're exponential in time. The best you're going to be able to do in a reasonable amount of time is either to run one of those algorithms and take the best result it can generate in that time, or to randomly generate permutations for a given amount of time and pick the one with the highest score. You might be able to get better results for natural data using something like simulated annealing to solve the optimization problem, but CLIQUE is "hard to approximate" so I have the feeling you won't get much better results that way than by randomly generating proposals and picking the highest scoring.

like image 1
Edward Doolittle Avatar answered Nov 15 '22 17:11

Edward Doolittle