I have an array of hashes (CSV rows, actually) and I need to find and keep all the rows that match two specific keys (user, section). Here is a sample of the data:
[
{ user: 1, role: "staff", section: 123 },
{ user: 2, role: "staff", section: 456 },
{ user: 3, role: "staff", section: 123 },
{ user: 1, role: "exec", section: 123 },
{ user: 2, role: "exec", section: 456 },
{ user: 3, role: "staff", section: 789 }
]
So what I would need to return is an array that contained only the rows where the same user/section combo appears more than once, like so:
[
{ user: 1, role: "staff", section: 123 },
{ user: 1, role: "exec", section: 123 },
{ user: 2, role: "staff", section: 456 },
{ user: 2, role: "exec", section: 456 }
]
The double loop solution I'm trying looks like this:
enrollments.each_with_index do |a, ai|
enrollments.each_with_index do |b, bi|
next if ai == bi
duplicates << b if a[2] == b[2] && a[6] == b[6]
end
end
but since the CSV is 145K rows it's taking forever.
How can I more efficiently get the output I need?
In terms of efficiency you might want to try this:
grouped = csv_arr.group_by{|row| [row[:user],row[:section]]}
filtered = grouped.values.select { |a| a.size > 1 }.flatten
The first statement groups the records by the :user
and :section
keys. the result is:
{[1, 123]=>[{:user=>1, :role=>"staff", :section=>123}, {:user=>1, :role=>"exec", :section=>123}],
[2, 456]=>[{:user=>2, :role=>"staff", :section=>456}, {:user=>2, :role=>"exec", :section=>456}],
[3, 123]=>[{:user=>3, :role=>"staff", :section=>123}],
[3, 789]=>[{:user=>3, :role=>"staff", :section=>789}]}
The second statement only selects the values of the groups with more than one member and then it flattens the result to give you:
[{:user=>1, :role=>"staff", :section=>123},
{:user=>1, :role=>"exec", :section=>123},
{:user=>2, :role=>"staff", :section=>456},
{:user=>2, :role=>"exec", :section=>456}]
This could improve the speed of your operation, but memory wise I can't say what the effect would be with a large input, because it would depend on your machine, resources and the size of file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With