Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the n+1 selects pattern slow?

I'm rather inexperienced with databases and have just read about the "n+1 selects issue". My follow-up question: Assuming the database resides on the same machine as my program, is cached in RAM and properly indexed, why is the n+1 query pattern slow?

As an example let's take the code from the accepted answer:

SELECT * FROM Cars;

/* for each car */
SELECT * FROM Wheel WHERE CarId = ?

With my mental model of the database cache, each of the SELECT * FROM Wheel WHERE CarId = ? queries should need:

  • 1 lookup to reach the "Wheel" table (one hashmap get())
  • 1 lookup to reach the list of k wheels with the specified CarId (another hashmap get())
  • k lookups to get the wheel rows for each matching wheel (k pointer dereferenciations)

Even if we multiply that by a small constant factor for an additional overhead because of the internal memory structure, it still should be unnoticeably fast. Is the interprocess communication the bottleneck?


Edit: I just found this related article via Hacker News: Following a Select Statement Through Postgres Internals. - HN discussion thread.

Edit 2: To clarify, I do assume N to be large. A non-trivial overhead will add up to a noticeable delay then, yes. I am asking why the overhead is non-trivial in the first place, for the setting described above.

like image 744
Perseids Avatar asked Oct 07 '14 21:10

Perseids


People also ask

What is the N 1 problem in hibernate How is it resolved?

Hibernate N+1 issue occurs when you use `FetchType. LAZY` for your entity associations. Hibernate will perform n-additional queries to load lazily fetched objects. To escape this issue use join fetch, batching or sub select.

What is the N 1 problem?

What is the N+1 query problem. The N+1 query problem happens when the data access framework executed N additional SQL statements to fetch the same data that could have been retrieved when executing the primary SQL query. The larger the value of N, the more queries will be executed, the larger the performance impact.

How do you fix N 1 problems?

The solution to fix the N+1 queries is to configure Hibernate to eagerly fetch the data needed in each query. As I explained before, the best practice is to configure every entity's relationship (ManyToOne…) to be lazily fetched by default.

Can you tell something about the N 1 Select problem in hibernate?

The N+1 query problem is said to occur when an ORM, like hibernate, executes 1 query to retrieve the parent entity and N queries to retrieve the child entities. As the number of entities in the database increases, the queries being executed separately can easily affect the performance of the application.


2 Answers

You are correct that avoiding n+1 selects is less important in the scenario you describe. If the database is on a remote machine, communication latencies of > 1ms are common, i.e. the cpu would spend millions of clock cycles waiting for the network.

If we are on the same machine, the communication delay is several orders of magnitude smaller, but synchronous communication with another process necessarily involves a context switch, which commonly costs > 0.01 ms (source), which is tens of thousands of clock cycles.

In addition, both the ORM tool and the database will have some overhead per query.

To conclude, avoiding n+1 selects is far less important if the database is local, but still matters if n is large.

like image 58
meriton Avatar answered Sep 20 '22 04:09

meriton


Assuming the database resides on the same machine as my program

Never assume this. Thinking about special cases like this is never a good idea. It's quite likely that your data will grow, and you will need to put your database on another server. Or you will want redundancy, which involves (you guessed it) another server. Or for security, you might want not want your app server on the same box as the DB.

why is the n+1 query pattern slow?

You don't think it's slow because your mental model of performance is probably all wrong.

1) RAM is horribly slow. Your CPU is wasting around 200-400 CPU cycles each time it needs to read something something from RAM. CPUs have a lot of tricks to hide this (caches, pipelining, hyperthreading)

2) Reading from RAM is not "Random Access". It's like a hard drive: sequential reads are faster. See this article about how accessing RAM in the right order is 76.6% faster http://lwn.net/Articles/255364/ (Read the whole article if you want to know how horrifyingly complex RAM actually is.)

CPU cache

In your "N+1 query" case, the "loop" for each N includes many megabytes of code (on client and server) swapping in and out of caches on each iteration, plus context switches (which usually dumps the caches anyway).

The "1 query" case probably involves a single tight loop on the server (finding and copying each row), then a single tight loop on the client (reading each row). If those loops are small enough, they can execute 10-100x faster running from cache.

RAM sequential access

The "1 query" case will read everything from the DB to one linear buffer, send it to the client who will read it linearly. No random accesses during data transfer.

The "N+1 query" case will be allocating and de-allocating RAM N times, which (for various reasons) may not be the same physical bit of RAM.

Various other reasons

The networking subsystem only needs to read one or two TCP headers, instead of N.

Your DB only needs to parse one query instead of N.

When you throw in multi-users, the "locality/sequential access" gets even more fragmented in the N+1 case, but stays pretty good in the 1-query case.

Lots of other tricks that the CPU uses (e.g. branch prediction) work better with tight loops.

See: http://blogs.msdn.com/b/oldnewthing/archive/2014/06/13/10533875.aspx

like image 25
BraveNewCurrency Avatar answered Sep 22 '22 04:09

BraveNewCurrency