Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Arangodb AQL Filter NOT IN collection, very slow

Tags:

arangodb

aql

I want to find the set of users not having a profile.

ArangoDB 2.4.3

LENGTH(users) -> 130k
LENGTH(profiles) -> 110k

users.userId -> unique hash index
profiles.userId -> unique hash index

This AQL snippet I made is slower than a snail crossing the Grand Canyon in mid-summer.

LET usersWithProfiles = ( /* This part is ok */
FOR i IN users
    FOR j IN profiles
        FILTER i.userId == j.userId
RETURN i
)

LET usersWithoutProfiles = ( /* This is not */
FOR i IN usersWithProfiles
    FILTER i NOT IN users
RETURN i
)

RETURN LENGTH(usersWithoutProfiles)

I'm pretty sure there is a perfectly sane way of doing it right, but I'm missing it. Any ideas?

Edit 1 (After @dothebart 's response):

This is the new query, but it is still very slow

LET userIds_usersWithProfile = (
FOR i IN users
    FOR j IN profile
        FILTER i.userId == j.userId
RETURN i.userId
)

LET usersWithoutProfiles = (
FOR i IN users 
    FILTER i.userId NOT IN userIds_usersWithProfile
RETURN i
)

RETURN LENGTH(usersWithoutProfiles)
like image 900
rollingBalls Avatar asked Dec 20 '22 07:12

rollingBalls


1 Answers

Note also that this part of the original query was extremely expensive:

LET usersWithoutProfiles = (
  FOR i IN usersWithProfiles
    FILTER i NOT IN users
    RETURN i
)

The reason is the FILTER using users, which at this point is an expression that builds all documents from the collections as an array. Instead of using this, I suggest this query, which will return the _key attribute of users that do not have an associated profile record:

FOR user IN users 
  LET profile = (
    FOR profile IN profiles 
      FILTER profile.userId == user.userId 
      RETURN 1
  ) 
  FILTER LENGTH(profile) == 0 
  RETURN user._key
like image 147
stj Avatar answered Jan 29 '23 05:01

stj