Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Knex with PostgreSQL select query extremely performance degradation on multiple parallel requests

In brief

I'm developing a game(of dream) and my backend stack is Node.js and PostgreSQL(9.6) with Knex. I hold all players data here and I need to request it frequently. One of the requests need to make 10 simple selects that would pull the data and this is where the problem starts: these queries are quite fast(~1ms), if server serves only 1 request at the same time. But if server server many requests in parallel(100-400), queries execution time degradates extremely (can be up to several seconds per query)

Details

In order to be more objective, I will describe server's request goal, select queries and the results that I received.

About system

I'm running node code on Digital Ocean 4cpu/8gb droplet and Postgres on the same conf(2 different droplets, same configuration)

About request

It need to do some gameplay actions for which he selects data for 2 players from DB

DDL

Players' data represented by 5 tables:

CREATE TABLE public.player_profile(
    id integer NOT NULL DEFAULT nextval('player_profile_id_seq'::regclass),
    public_data integer NOT NULL,
    private_data integer NOT NULL,
    current_active_deck_num smallint NOT NULL DEFAULT '0'::smallint,
    created_at bigint NOT NULL DEFAULT '0'::bigint,
    CONSTRAINT player_profile_pkey PRIMARY KEY (id),
    CONSTRAINT player_profile_private_data_foreign FOREIGN KEY (private_data)
        REFERENCES public.profile_private_data (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION,
    CONSTRAINT player_profile_public_data_foreign FOREIGN KEY (public_data)
        REFERENCES public.profile_public_data (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

CREATE TABLE public.player_character_data(
    id integer NOT NULL DEFAULT nextval('player_character_data_id_seq'::regclass),
    owner_player integer NOT NULL,
    character_id integer NOT NULL,
    experience_counter integer NOT NULL,
    level_counter integer NOT NULL,
    character_name character varying(255) COLLATE pg_catalog."default" NOT NULL,
    created_at bigint NOT NULL DEFAULT '0'::bigint,
    CONSTRAINT player_character_data_pkey PRIMARY KEY (id),
    CONSTRAINT player_character_data_owner_player_foreign FOREIGN KEY (owner_player)
        REFERENCES public.player_profile (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

CREATE TABLE public.player_cards(
    id integer NOT NULL DEFAULT nextval('player_cards_id_seq'::regclass),
    card_id integer NOT NULL,
    owner_player integer NOT NULL,
    card_level integer NOT NULL,
    first_deck boolean NOT NULL,
    consumables integer NOT NULL,
    second_deck boolean NOT NULL DEFAULT false,
    third_deck boolean NOT NULL DEFAULT false,
    quality character varying(10) COLLATE pg_catalog."default" NOT NULL DEFAULT 'none'::character varying,
    CONSTRAINT player_cards_pkey PRIMARY KEY (id),
    CONSTRAINT player_cards_owner_player_foreign FOREIGN KEY (owner_player)
        REFERENCES public.player_profile (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

CREATE TABLE public.player_character_equipment(
    id integer NOT NULL DEFAULT nextval('player_character_equipment_id_seq'::regclass),
    owner_character integer NOT NULL,
    item_id integer NOT NULL,
    item_level integer NOT NULL,
    item_type character varying(20) COLLATE pg_catalog."default" NOT NULL,
    is_equipped boolean NOT NULL,
    slot_num integer,
    CONSTRAINT player_character_equipment_pkey PRIMARY KEY (id),
    CONSTRAINT player_character_equipment_owner_character_foreign FOREIGN KEY (owner_character)
        REFERENCES public.player_character_data (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

CREATE TABLE public.player_character_runes(
    id integer NOT NULL DEFAULT nextval('player_character_runes_id_seq'::regclass),
    owner_character integer NOT NULL,
    item_id integer NOT NULL,
    slot_num integer,
    decay_start_timestamp bigint,
    CONSTRAINT player_character_runes_pkey PRIMARY KEY (id),
    CONSTRAINT player_character_runes_owner_character_foreign FOREIGN KEY (owner_character)
        REFERENCES public.player_character_data (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

With indexes

knex.raw('create index "player_cards_owner_player_first_deck_index" on "player_cards"("owner_player") WHERE first_deck = TRUE');
knex.raw('create index "player_cards_owner_player_second_deck_index" on "player_cards"("owner_player") WHERE second_deck = TRUE');
knex.raw('create index "player_cards_owner_player_third_deck_index" on "player_cards"("owner_player") WHERE third_deck = TRUE');
knex.raw('create index "player_character_equipment_owner_character_is_equipped_index" on "player_character_equipment" ("owner_character") WHERE is_equipped = TRUE');
knex.raw('create index "player_character_runes_owner_character_slot_num_not_null_index" on "player_character_runes" ("owner_character") WHERE slot_num IS NOT NULL');

The code

First query

async.parallel([
    cb => tx('player_character_data')
        .select('character_id', 'id')
        .where('owner_player', playerId)
        .limit(1)
        .asCallback(cb),
    cb => tx('player_character_data')
        .select('character_id', 'id')
        .where('owner_player', enemyId)
        .limit(1)
        .asCallback(cb)
], callbackFn);

Second query

async.parallel([
    cb => tx('player_profile')
        .select('current_active_deck_num')
        .where('id', playerId)
        .asCallback(cb),
    cb => tx('player_profile')
        .select('current_active_deck_num')
        .where('id', enemyId)
        .asCallback(cb)
], callbackFn);

Third q

playerQ = { first_deck: true }
enemyQ = { first_deck: true }
MAX_CARDS_IN_DECK = 5
async.parallel([
    cb => tx('player_cards')
        .select('card_id', 'card_level')
        .where('owner_player', playerId)
        .andWhere(playerQ)
        .limit(MAX_CARDS_IN_DECK)
        .asCallback(cb),
    cb => tx('player_cards')
        .select('card_id', 'card_level')
        .where('owner_player', enemyId)
        .andWhere(enemyQ)
        .limit(MAX_CARDS_IN_DECK)
        .asCallback(cb)
], callbackFn);

Fourth q

MAX_EQUIPPED_ITEMS = 3
async.parallel([
    cb => tx('player_character_equipment')
        .select('item_id', 'item_level')
        .where('owner_character', playerCharacterUniqueId)
        .andWhere('is_equipped', true)
        .limit(MAX_EQUIPPED_ITEMS)
        .asCallback(cb),
    cb => tx('player_character_equipment')
        .select('item_id', 'item_level')
        .where('owner_character', enemyCharacterUniqueId)
        .andWhere('is_equipped', true)
        .limit(MAX_EQUIPPED_ITEMS)
        .asCallback(cb)
], callbackFn);

Fifth one

runeSlotsMax = 3
async.parallel([
    cb => tx('player_character_runes')
        .select('item_id', 'decay_start_timestamp')
        .where('owner_character', playerCharacterUniqueId)
        .whereNotNull('slot_num')
        .limit(runeSlotsMax)
        .asCallback(cb),
    cb => tx('player_character_runes')
        .select('item_id', 'decay_start_timestamp')
        .where('owner_character', enemyCharacterUniqueId)
        .whereNotNull('slot_num')
        .limit(runeSlotsMax)
        .asCallback(cb)
], callbackFn);

EXPLAIN(ANALYZE)

Only Index scans and <1ms for planning and execution times. Can publish if need (did not published to save space)

Time itself

(total is number of requests, min/max/avg/median is for response time)

  • 4 concurrent requests: { "total": 300, "avg": 1.81, "median": 2, "min": 1, "max": 6 }
  • 400 concurrent requests:
    • { "total": 300, "avg": 209.57666666666665, "median": 176, "min": 9, "max": 1683 } - first select
    • { "total": 300, "avg": 2105.9, "median": 2005, "min": 1563, "max": 4074 } - last select

I tried to put slow queries that are performed more than 100ms into logs - nothing. Also tried to increase connection pool size up to num of parallel requests - nothing too.

like image 601
Andrew Medvedev Avatar asked Feb 09 '17 12:02

Andrew Medvedev


1 Answers

A solution was found fast, but forgot to respond here (was busy, sorry).

No magic with slow queries, but only node's event loop nature:

  • All silimar requests was made in parallel;
  • I have a code block with very slow execution time(~150-200ms);
  • If you have ~800 parallel requests 150ms code block transforms into ~10000ms event loop lag;
  • All you'll see is a visibility of slow requests, but it's just a lag of callback function, not database;

Conclusion: use pgBadger to detect slow queries and isBusy module to detect event loop lags

like image 82
Andrew Medvedev Avatar answered Oct 15 '22 18:10

Andrew Medvedev