Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to search in a large JSON array and find records by multiple keys

I have a very large dataset that's organized like this:

users = [
    {
        username: "Bill",
        gender: "Male",
        details: {
            city: "NY"
        }
    },
    {
        username: "Mary",
        gender: "Female",
        details: {
            city: "LA"
        }
    }
]

I need a quick way to search for multiple records by multiple values from multiple keys.

I have dot-separated list of keys:

keys = ["gender", "details.city"]

I need to do something like this (written in pseudo code):

my_users = users.any? {|user|
  keys.each do |key|
    user.key == "NY"
  end
}

I know this is not going to work. One of the reasons it will not work is that my list of keys is dot-separated, so I could either split it to an array of keys, as in ['gender'] and ['details']['city'], or convert the user hash to a dot-separated object with a method like:

def to_o
  JSON.parse to_json, object_class: OpenStruct
end
like image 328
Ben Avatar asked Jan 19 '26 02:01

Ben


2 Answers

I hope this method works like you want

def search(users, keys, value)
  users.select do |user|
    keys.any? do |key|
      user.dig(*key.split('.').map(&:to_sym)) == value
    end
  end
end

search(users, keys, 'NY')
#=> [{ :username => "Bill", :gender => "Male", :details => { :city => "NY" } }]
like image 185
demir Avatar answered Jan 20 '26 14:01

demir


For linear searching, demir's solution is a good one.

For the "must be quick" angle, you may find that an O(n) scan through your users array is too slow. To alleviate this, you may want to create an index:

require "set"
class Index
  def initialize(dataset)
    @index = make_index(dataset)
  end

  def find(conditions = {})
    conditions.inject(Set.new) { |o, e| o | @index[e.join(".")] }.to_a
  end

  private

  def make_keys(record, prefix = [])
    record.flat_map do |key, val|
      case val
      when Hash
        make_keys val, [key]
      else
        (prefix + [key, val]).join(".")
      end
    end
  end

  def make_index(dataset)
    dataset.each_with_object({}) do |record, index|
      make_keys(record).each { |key| (index[key] ||= []) << record }
    end
  end
end

index = Index.new(users)
p index.find("gender" => "Male", "details.city" => "NY")
# => [{:username=>"Bill", :gender=>"Male", :details=>{:city=>"NY"}}]

This takes O(n) time and costs extra memory to create the index once, but then each search of the dataset should happen in O(1) time. If you perform a bunch of searching after setting up the dataset once, something like this might be an option.

like image 28
Chris Heald Avatar answered Jan 20 '26 15:01

Chris Heald



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!