I'm having some memory problems with an application, but it's a bit difficult to figure out exactly where it is. I have two sets of data:
Pageviews
Memory use
I'd like to see exactly which pageviews are correlated with high memory usage. My guess is that I'll be doing a T-test of some kind to determine which pageviews are correlated with increased memory usage. However, I'm a bit uncertain as to what kind of T-test to go with. Can someone at least point me in the right direction?
I would suggest constructing a dataset with two columns. The first would be the proportion of each page appearances in the highest memory usage times of the distribution, and the second the proportion of those (same) pages for the rest of the values of the memory distribution.
Then you would have to perform a paired test to check if the median of the differences (high - rest) is less or equal to zero (H0), against the alternative hypothesis that the median of difference is greater than zero (H1). I would suggest using the non parametric test Wilcoxon Signed Ranks Test
which is a variation of Mann - Whitney Test
for paired samples. It also takes into account the magnitude of the differences in each pair, something that other tests ignore (e.g. sign test).
Keep in mind that ties (zero differences) present numerous problems in derivations of nonparametric methods and should be avoided. The preferable way to deal with ties is to add a slight bit of "noise" to the data. That is, complete the test after modifying tied values by adding a small enough random variable that will not affect the ranking of the differences
I hope that test's results and plotting the differences distribution will give you insight into where the problem is.
This is an implementation of Wilcoxon Signed Ranks Test in R language
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With