Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

designing a algorithm for a large data

I read one of this question being asked for a job interview of software engineer.

If there are 1000 websites and 1000 users, write a program and Data-structure such that i can query for the followin at real time: 1. Given any user, I get the list of all sites he/she has visited 2. Given any website, I get the list of all users who have visited it.

I think they wanted sort of a pseudo code or designing algorithm..

Can you guys give any tips for this?

like image 724
rain Avatar asked Jul 04 '12 06:07

rain


People also ask

Which algorithm is best for large datasets?

the Quick sort algorithm generally is the best for large data sets and long keys. This is because in the average case the time complexity is O(n logn).

What is data algorithm?

Algorithm is a step-by-step procedure, which defines a set of instructions to be executed in a certain order to get the desired output. Algorithms are generally created independent of underlying languages, i.e. an algorithm can be implemented in more than one programming language.


1 Answers

One thing is certain - in order to be able to answer both queries, you need to store all the pairs which mean that the user has visited the given website. So what I propose is the following:

You have a structure:

struct VisitPair{
  int websiteId;
  int userId;
  VisitPair* nextForUser;
  VisitPair* nextForWebsite;
};

nextForUser will point to the next pair for the given user or NULL if there is no next pair for the given user, similarly nextForWebsite will point to the next pair for the webSite. User and website will look something like:

struct User {
  char* name;
  VisitPair* firstPair;
};

struct Website {
  char* url;
  VisitPair* firstPair;
};

I assume both Website-s and users are stored in arrays, say these arrays are websites and users. Now adding a new visitPair is relatively easy:

void addNewPair(int siteId, int userId) {
  VisitPair* newPair = (VisitPair*)malloc(sizeof(VizitPair));
  newPair->nextForUser = users[userId]->firstPair;
  users[userid]->firstPair = newPair;
  newPair->nextForWesite = websites[siteId]->firstPair;
  websites[siteId]->firstPair = newPair;
}

Printing all users for a website and all the websites for a user is done by simply iterating over a list so you should be able to do that.

In short what I create is a structure that has two lists integrated. I do not think there can be a solution with better complexity as this one has linear complexity with respect to the answer and constant complexity for adding a pair.

Hope this helps.

like image 113
Ivaylo Strandjev Avatar answered Sep 20 '22 16:09

Ivaylo Strandjev