As part of my Rails application, I've written a little importer that sucks in data from our LDAP system and crams it into a User table. Unfortunately, the LDAP-related code leaks huge amounts of memory while iterating over our 32K users, and I haven't been able to figure out how to fix the issue.
The problem seems to be related to the LDAP library in some way, as when I remove the calls to the LDAP stuff, memory usage stabilizes nicely. Further, the objects that are proliferating are Net::BER::BerIdentifiedString and Net::BER::BerIdentifiedArray, both part of the LDAP library.
When I run the import, memory usage eventually peaks at over 1GB. I need to find some way to correct my code if the problem is there, or to work around the LDAP memory issues if that's where the problem lies. (Or if there's a better LDAP library for large imports for Ruby, I'm open to that as well.)
Here's the pertinent bit of our my code:
require 'net/ldap'
require 'pp'
class User < ActiveRecord::Base
validates_presence_of :name, :login, :email
# This method is resonsible for populating the User table with the
# login, name, and email of anybody who might be using the system.
def self.import_all
# initialization stuff. set bind_dn, bind_pass, ldap_host, base_dn and filter
ldap = Net::LDAP.new
ldap.host = ldap_host
ldap.auth bind_dn, bind_pass
ldap.bind
begin
# Build the list
records = records_updated = new_records = 0
ldap.search(:base => base_dn, :filter => filter ) do |entry|
name = entry.givenName.to_s.strip + " " + entry.sn.to_s.strip
login = entry.name.to_s.strip
email = login + "@txstate.edu"
user = User.find_or_initialize_by_login :name => name, :login => login, :email => email
if user.name != name
user.name = name
user.save
logger.info( "Updated: " + email )
records_updated = records_updated + 1
elsif user.new_record?
user.save
new_records = new_records + 1
else
# update timestamp so that we can delete old records later
user.touch
end
records = records + 1
end
# delete records that haven't been updated for 7 days
records_deleted = User.destroy_all( ["updated_at < ?", Date.today - 7 ] ).size
logger.info( "LDAP Import Complete: " + Time.now.to_s )
logger.info( "Total Records Processed: " + records.to_s )
logger.info( "New Records: " + new_records.to_s )
logger.info( "Updated Records: " + records_updated.to_s )
logger.info( "Deleted Records: " + records_deleted.to_s )
end
end
end
Thanks in advance for any help/pointers!
By the way, I did ask about this in the net/ldap support forum as well, but didn't get any useful pointers there.
One very important thing to note is that you never use the result of the method call. That means that you should pass :return_result => false
to ldap.search
:
ldap.search(:base => base_dn, :filter => filter, :return_result => false ) do |entry|
From the docs: "When :return_result => false, #search will return only a Boolean, to indicate whether the operation succeeded. This can improve performance with very large result sets, because the library can discard each entry from memory after your block processes it."
In other words, if you don't use this flag, all entries will be stored in memory, even if you do not need them outside the block! So, use this option.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With