Why is the n+1 selects pattern slow?

Tags:

I'm rather inexperienced with databases and have just read about the "n+1 selects issue". My follow-up question: Assuming the database resides on the same machine as my program, is cached in RAM and properly indexed, why is the n+1 query pattern slow?

As an example let's take the code from the accepted answer:

SELECT * FROM Cars;

/* for each car */
SELECT * FROM Wheel WHERE CarId = ?

With my mental model of the database cache, each of the SELECT * FROM Wheel WHERE CarId = ? queries should need:

1 lookup to reach the "Wheel" table (one hashmap get())
1 lookup to reach the list of k wheels with the specified CarId (another hashmap get())
k lookups to get the wheel rows for each matching wheel (k pointer dereferenciations)

Even if we multiply that by a small constant factor for an additional overhead because of the internal memory structure, it still should be unnoticeably fast. Is the interprocess communication the bottleneck?

Edit: I just found this related article via Hacker News: Following a Select Statement Through Postgres Internals. - HN discussion thread.

Edit 2: To clarify, I do assume N to be large. A non-trivial overhead will add up to a noticeable delay then, yes. I am asking why the overhead is non-trivial in the first place, for the setting described above.

744

asked Oct 07 '14 21:10

Perseids

2 Answers

You are correct that avoiding n+1 selects is less important in the scenario you describe. If the database is on a remote machine, communication latencies of > 1ms are common, i.e. the cpu would spend millions of clock cycles waiting for the network.

If we are on the same machine, the communication delay is several orders of magnitude smaller, but synchronous communication with another process necessarily involves a context switch, which commonly costs > 0.01 ms (source), which is tens of thousands of clock cycles.

In addition, both the ORM tool and the database will have some overhead per query.

To conclude, avoiding n+1 selects is far less important if the database is local, but still matters if n is large.

answered Sep 20 '22 04:09

meriton

Assuming the database resides on the same machine as my program

Never assume this. Thinking about special cases like this is never a good idea. It's quite likely that your data will grow, and you will need to put your database on another server. Or you will want redundancy, which involves (you guessed it) another server. Or for security, you might want not want your app server on the same box as the DB.

why is the n+1 query pattern slow?

You don't think it's slow because your mental model of performance is probably all wrong.

1) RAM is horribly slow. Your CPU is wasting around 200-400 CPU cycles each time it needs to read something something from RAM. CPUs have a lot of tricks to hide this (caches, pipelining, hyperthreading)

2) Reading from RAM is not "Random Access". It's like a hard drive: sequential reads are faster. See this article about how accessing RAM in the right order is 76.6% faster http://lwn.net/Articles/255364/ (Read the whole article if you want to know how horrifyingly complex RAM actually is.)

CPU cache

In your "N+1 query" case, the "loop" for each N includes many megabytes of code (on client and server) swapping in and out of caches on each iteration, plus context switches (which usually dumps the caches anyway).

The "1 query" case probably involves a single tight loop on the server (finding and copying each row), then a single tight loop on the client (reading each row). If those loops are small enough, they can execute 10-100x faster running from cache.

RAM sequential access

The "1 query" case will read everything from the DB to one linear buffer, send it to the client who will read it linearly. No random accesses during data transfer.

The "N+1 query" case will be allocating and de-allocating RAM N times, which (for various reasons) may not be the same physical bit of RAM.

Various other reasons

The networking subsystem only needs to read one or two TCP headers, instead of N.

Your DB only needs to parse one query instead of N.

When you throw in multi-users, the "locality/sequential access" gets even more fragmented in the N+1 case, but stays pretty good in the 1-query case.

Lots of other tricks that the CPU uses (e.g. branch prediction) work better with tight loops.

See: http://blogs.msdn.com/b/oldnewthing/archive/2014/06/13/10533875.aspx

answered Sep 22 '22 04:09

BraveNewCurrency

Related questions
                            
                                Is hexing input sufficient to sanitize SQL Queries?
                            
                                Query to change vertical to horizontal
                            
                                Best way to sort by IP addresses in SQL
                            
                                Calculate percentage in MS SQL Server with int columns
                            
                                update query from select statement only update if the field is empty
                            
                                How can i get the first free ID from table
                            
                                How to format dates in microsoft access queries
                            
                                Azure database connection error. [duplicate]
                            
                                INSERT INTO (SELECT & VALUES) TOGETHER
                            
                                SSIS error on access import
                            
                                How to create column name with space?
                            
                                Extract TIME from DATETIME - informix
                            
                                ORA-28040: No matching authentication protocol exception when using groovy.sql package with oracle12c
                            
                                Which way do you reset magento orders?
                            
                                PHP PDO MySQL Correct way to check if an update query succeeded when no rows are affected
                            
                                How to get id of records inserted into a table by using SQLBULKCOPY?
                            
                                Error converting data type varchar to bigint in stored procedure
                            
                                Detect SQL database changes
                            
                                How to do a LEFT JOIN in MS Access without duplicates?
                            
                                Reasons not to use GROUP_CONCAT?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With