Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to save neo4j database?

I'm using neo4j for the first time, neography for Ruby. I have my data in csv files. I can successfully populate the database through my main file, i.e. create all nodes. So, for each csv file (here, user.csv), I'm doing -

def create_person(name, id)
  Neography::Node.create("name" => name, "id" => id)
end

CSV.foreach('user.csv', :headers => true) do |row|
  id = row[0].to_i()
  name = row[1]
  $persons[id] = create_person(name, id)
end

Likewise for other files. There are two issues now. Firstly, if my files are very small, then it goes fine, but when files are slightly big, I get (I'm dealing with 4 1MB files) -

SocketError: Too many open files (http://localhost:7474)

Another issue is that I don't want to do this (populate db) every time I run this ruby file. I want to populate the data once and then don't want to touch the database. After that I only want to run queries on it. Can anyone please tell me how to populate it and save it? And then how can I load it whenever I want to use it. Thank you.

like image 436
theharshest Avatar asked Oct 02 '22 17:10

theharshest


2 Answers

Create a @neo client:

  @neo = Neography::Rest.new

Create a queue:

  @queue = []

Make use of the BATCH api for data loading.

def create_person(name, id)
  @queue << [:create_node, {"name" => name, "id" => id}]
  if @queue.size >= 500
    batch_results = neo.batch *@queue
    @queue = []
    batch_results.each do |result|
      id = result["body"]["self"].split('/').last
      $persons[id] = result
    end
  end
end

Run through you csv file:

CSV.foreach('user.csv', :headers => true) do |row|
  create_person(row[1], row[0].to_i)
end

Get the leftovers:

    batch_results = @neo.batch *@queue
    batch_results.each do |result|
      id = result["body"]["self"].split('/').last
      $persons[id] = result
    end

An example of data loading via the rest api can be seen here => https://github.com/maxdemarzi/neo_crunch/blob/master/neo_crunch.rb

An example of using a queue for writes can be seen here => http://maxdemarzi.com/2013/09/05/scaling-writes/

like image 179
Max De Marzi Avatar answered Oct 13 '22 11:10

Max De Marzi


Sounds as if you run these requests in parallel or don't reuse http connections.

Did you try to do @neo=Neography::Rest.new and @neo.create_node({...}) I think that one reuses the http connections.

like image 27
Michael Hunger Avatar answered Oct 13 '22 11:10

Michael Hunger