Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm problem: select two stories per topic so that the same story is never selected for two different topics

In my workplace I have stumbled upon the following problem that I am asked to solve. A solution is preferred although not absolutely required.

There is a database with a set of stories, and each story has a set of topics associated with it. The topics are stored in a separate table of the form (storyid, topicid).

The question is how to select ideally 5 topics (or at least 2, if 5 is impossible) such that each topic has 2 stories (or 1, if 2 is impossible) that are not repeated in any of the other selected topics. The algorithm must also return which exactly are the "proper" stories associated with each topic.

Is this actually an NP-complete problem that has no efficient solution that goes beyond simple enumeration of all possibilities or does it have an efficient solution?

If it does not have an efficient solution, please try to prove it (although not absolutely necessary).

If it does have an efficient solution, please be kind and post it.

like image 300
akanevsky Avatar asked Jun 15 '11 21:06

akanevsky


1 Answers

A more general version of the problem is to select for all topics (or at least as many as possible) two stories so that the same story is never selected for two different topics.

Mark the stories with S1...Sm, and the topics with T1...Tn. Duplicate each topic, that is, introduce the new stories T'1...T'n, where T'i contains Sj if and only if Ti contains it. The problem can now be rephrased this way: select a different story for all topics (or as many as possible). Once you have your topic-story pairs, you can join the duplicated topics again, and each topic will have two stories.

The problem of finding the largest number of pairs possible while never selecting any element twice is called the maximum bipartite matching problem. (You can think of it as selecting the maximum number of non-connected edges from a bipartite graph.) There is a simple algorithm called augmenting path which solves it in O((m+n)e) steps (e being the number of edges) and some more sophisticated ones (such as the Hopcroft–Karp algorithm) which solve it in around O((m+n)^2.5) steps. The augmenting path algorithm consists of searching for "alternating" paths in the graph where the first edge of the path is not selected, the second is, the third isn't and so on, than inverting the selections on the path. This can be probably adapted to your case without actually doing the splitting and joining of topics.

This is a bit of an overkill because it will return two stories for all the topics, not only five; you can probably do a lot better when you only need to find stories for a limited number of topics. (Except for some edge cases, you can just select the five topics which have the largest number of stories, discard the stories not contained by them, and run the algorithm then.) At any rate it shows that the problem is far from being NP-hard.

like image 155
Tgr Avatar answered Oct 12 '22 20:10

Tgr