Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is faster: Appropriate data input or appropriate data structure?

Tags:

performance

c

I have a dataset whose columns look like this:

Consumer ID | Product ID | Time Period | Product Score
1           | 1          | 1           | 2
2           | 1          | 2           | 3

and so on.

As part of a program (written in C) I need to process the product scores given by all consumers for a particular product and time period combination for all possible combinations. Suppose that there are 3 products and 2 time periods. Then I need to process the product scores for all possible combinations as shown below:

Product ID | Time Period 
1          | 1
1          | 2
2          | 1
2          | 2
3          | 1
3          | 2

I will need to process the data along the above lines lots of times (> 10k) and the dataset is fairly large (e.g., 48k consumers, 100 products, 24 time periods etc). So speed is an issue.

I came up with two ways to process the data and am wondering which is the faster approach or perhaps it does not matter much? (speed matters but not at the cost of undue maintenance/readability):

  1. Sort the data on product id and time period and then loop through the data to extract data for all possible combinations.

  2. Store the consumer ids of all consumers who provided product scores for a particular combination of product id and time period and process the data accordingly.

Any thoughts? Any other way to speed up the processing? Thanks

like image 423
vad Avatar asked Oct 25 '22 14:10

vad


1 Answers

As with many performance-related questions, the only real, definitive answer is to write a benchmark. Speed will depend on many things, and it doesn't sound like you're talking about a straightforward case of a linear algorithm vs a quadratic algorithm (and even that would have an additional dependency on input size).

So implement both methods, run them on sample data, and time the results. This will be much faster and more conclusive than us trying to work it out in our heads with limited information.

like image 54
danben Avatar answered Nov 15 '22 07:11

danben