Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading a .csv file and doing simple statistics in ruby

Tags:

arrays

ruby

csv

I'm generating some load test results with jmeter and it outputs nicely formatted csv file, but now I need to do some number crunching with ruby. An example beginning of the csv file:

threadName,grpThreads,allThreads,URL,Latency,SampleCount,ErrorCount
Thread Group 1-1,1,1,urlXX,240,1,0
Thread Group 1-1,1,1,urlYY,463,1,0
Thread Group 1-2,1,1,urlXX,200,1,0
Thread Group 1-3,1,1,urlXX,212,1,0
Thread Group 1-2,1,1,urlYY,454,1,0
.
.
.
Thread Group 1-N,1,1,urlXX,210,1,0

Now, for statistics I need to read the first line of each thread group, add the Latency fields up and then divide with the amount of thread groups I have, to just get an average latency. Then iterate to the second line of every thread group and so forth..

I was thinking that maybe I would need to write some temporary sorted csv files for each thread group (the order of the url's are hit is always the same within a thread group) and then use those as input, add first lines, do math, add second lines until there are no more lines.

But since the amount of thread groups change, I haven't been able to write ruby so that it could flex around that... any code examples would be really appreciated :)

like image 696
tlatti Avatar asked Feb 01 '26 09:02

tlatti


1 Answers

[update] - Is this what you want, I wonder?

How about this - it's probably inefficient but does it do what you want?

CSV = File.readlines("data.csv")
CSV.shift # minus the header.

# Hash where key is grp name; value is list of HASHES with keys {:grp, :lat}
hash = CSV.
  map {|l| # Turn every line into a HASH of grp name and it's lats.
    fs = l.split(","); {:grp => fs[0], :lat => fs[4]} 
  }.
  group_by{|o| o[:grp]}

# The largest number of lines we have in any group
max_lines = hash.max_by{|gname, l| l.size}.size

# AVGS is a list of averages. 
# AVGS[0] is the average lat. for all the first lines,
# AVGS[1] is the average lat. for all second lines, etc.
AVGS = 
(0..(max_lines-1)).map{|lno| # line no
  total = # total latency for the i'th line...
    hash.map {|gname, l|
      if l[lno] then  l[lno][:lat].to_i
      else 0 end
    }
  total.reduce{|a,b| a+b} / (hash.size)
}

# So we have 'L' Averages - where L is the maximum number of
# lines in any group. You could do anything with this list
# of numbers... find the average again?
puts AVGS.inspect

Should return something like:

[217/*avg for 1st-liners*/, 305 /*avg for 2nd liners*/]
like image 176
Faiz Avatar answered Feb 03 '26 02:02

Faiz