Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What could cause duplicate records to be created by Rails?

We are noticing a lot of duplicate records are being created in various tables in our database, but are at a loss as to why this is happening. Interestingly, while the records are otherwise duplicate (down to even the created_at stamps!), on our users table, the password salt and hash are different on each record -- which leads me to believe that somehow Rails is somehow running transactions/save operations twice. Obviously, we are not calling save or create multiple times in the application code.

This duplication does not seem to happen with every record saved in the database, and we cannot seem to infer a pattern yet. There is also a validates_uniqueness_of validation on the User model (though not a unique key on the table yet; we need to clean up all the duplicates to be able to do that) -- so Rails should stop itself if a record already exists, but if the requests are firing simultaneously that's a race condition.

We are currently running Rails 3.2.2 behind Passenger 3.0.11/nginx on our app servers (currently 2 of them), and have one central nginx webserver which sends requests upstream to an app server. Could this setup somehow cause processes to be duplicated or something? Would it matter that requests aren't locked to one upstream server (ie. if one user requests a page that includes static content like images, one or both app servers may be used)? (I feel like that's grasping at straws but I want to cover every possibility)

What else could cause this to happen?

Update: As an example, a user was created today which got duplicate records. Both have the created_at stamp of 2012-03-28 16:48:11, and all columns except for hashed_password and salt are identical. From the request log, I can see the following:

App Server 1:

Started POST "/en/apply/create_user" for 1.2.3.4 at 2012-03-28 12:47:19 -0400
[2012-03-28 12:47:19] INFO : Processing by ApplyController#create_user as HTML
[2012-03-28 12:47:20] INFO :   Rendered apply/new_user.html.erb within layouts/template (192.8ms)

Started POST "/en/apply/create_user" for 1.2.3.4 at 2012-03-28 12:48:10 -0400
[2012-03-28 12:48:10] INFO : Processing by ApplyController#create_user as HTML
[2012-03-28 12:48:11] INFO : Redirected to apply/initialize_job_application/3517
[2012-03-28 12:48:11] INFO :  /app/controllers/apply_controller.rb:263:in `block (2 levels) in create_user'

App Server 2:

Started POST "/en/apply/create_user" for 1.2.3.4 at 2012-03-28 12:48:10 -0400
[2012-03-28 12:48:10] INFO : Processing by ApplyController#create_user as HTML

Web Server:

1.2.3.4 - - [28/Mar/2012:12:48:10 -0400] "POST /en/apply/create_user HTTP/1.1" 499 0 "en/apply/create_user" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)" "-"
1.2.3.4 - - [28/Mar/2012:12:48:11 -0400] "POST /en/apply/create_user HTTP/1.1" 302 147 "en/apply/create_user" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)" "-"

So the create action was hit three times (returning to the form the first time due to an error, probably), and at least once on each server. The latter two both are registered by the webserver as separate requests, but the first gets status code 499 Client Closed Request (an nginx extension according to wikipedia), and the second gets a 302 as expected. Could the 499 be causing the problems here?

like image 505
Daniel Vandersluis Avatar asked Mar 28 '12 19:03

Daniel Vandersluis


People also ask

What causes duplicate entries?

Data aggregation and human typing errors are some of the sources of duplicate data. Customers may also provide a company with different information at different points in time. Hence, businesses should consider removing duplicate records from their Database.

Why am I getting duplicate records in SQL?

You are getting duplicates because more than one row matches your conditions. To prevent duplicates use the DISTINCT keyword: SELECT DISTINCT respid, cq4_1, dma etc...

How do you prevent duplicate records?

The SQL DISTINCT keyword, which we have already discussed is used in conjunction with the SELECT statement to eliminate all the duplicate records and by fetching only the unique records.

Why are there duplicate records in my salesforce report?

Duplicate Record Sets are created in Salesforce when two or more records are identified as duplicates. This event triggers when Duplicate Rules or Duplicate Job runs. A Duplicate Record Set contains one duplicate record item for every duplicate account or contact found by the Matching Rule.


1 Answers

Two possibilities come to mind.

The first one is an odd (and against the RFC) behavior of Nginx when used as a load balancer. It will retry any failed requests against the next backend. The RFC allows that only for safe methods (e.g. GET or HEAD). The result of this is that if your nginx considers a request failed for some reason, it might be that it is re-send to the next server. If both servers complete their transaction though, you have a duplicate record. Judging from your webservers log (and the 499 status code which Nginx uses to denote a user clicking abort in their browser) this looks like the most probable cause.

The second possibility is that your users double-click on the send button. With the right timing, their browsers could send two complete requests nearly at the same time.

To make sure that your user records are really unique, you should create unique indexes on your database. These are then actually ensured (albeit with a worse error message compared to the ActiveRecord check. Because of that, you should always define your uniqueness constraint on both the database schema and your models.

Also, you could look into replacing your frontend nginx with a more conformant loadbalancer. I'd recommend haproxy for that.

like image 130
Holger Just Avatar answered Oct 25 '22 07:10

Holger Just