Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django ORM poor performance compared to raw sql

I'm using Django ORM for data query and I get almost 2 million rows in this table.I tried

app_count = App.objects.count()

and

from django.db import connection
cursor = connection.cursor()
cursor.execute('''SELECT count(*) FROM app''')

The mysql slow_query log gave me

Time: 2017-04-27T09:18:38.809498Z

User@Host: www[www] @ [172.19.0.3] Id: 5

Query_time: 4.107433 Lock_time: 0.004405 Rows_sent: 1 Rows_examined: 0

use app_platform; SET timestamp=1493284718; SELECT count(*) FROM app;

This query took avg more than 4 seconds, but when I used mysql client and mysql shell to do this query

mysql> select count(*) from app;

+----------+
| count(*) |
+----------+
|  1870019 |
+----------+

1 row in set (0.41 sec)

Just take me 0.4 second, 10X difference, Why and how to improve it.

EDIT

here is my model

class AppMain(models.Model):
    """
    """
    store = models.ForeignKey("AppStore", related_name="main_store")
    name = models.CharField(max_length=256)
    version = models.CharField(max_length=256, blank=True)
    developer = models.CharField(db_index=True, max_length=256, blank=True)
    md5 = models.CharField(max_length=256, blank=True)
    type = models.CharField(max_length=256, blank=True)
    size = models.IntegerField(blank=True)
    download = models.CharField(max_length=1024, blank=True)
    download_md5 = models.CharField(max_length=256, blank=True)
    download_times = models.BigIntegerField(blank=True)
    snapshot = models.CharField(max_length=2048, blank=True)
    description = models.CharField(max_length=5000, blank=True)
    app_update_time = models.DateTimeField(blank=True)
    create_time = models.DateTimeField(db_index=True, auto_now_add=True)
    update_time = models.DateTimeField(auto_now=True)

    class Meta:
        unique_together = ("store", "name", "version")

EDIT 2

And I'm using Docker and docker-compose for my project

version: '2'
services:
  mysqldb:
    restart: always
    image: mysql:latest
    ports:
      - "3306:3306"
    environment:
      MYSQL_ROOT_PASSWORD: just_for_test
      MYSQL_USER: www
      MYSQL_PASSWORD: www
      MYSQL_DATABASE: app_platform
    volumes:
      - mysqldata:/var/lib/mysql
      - ./config/:/etc/mysql/conf.d
      - ./log/mysql/:/var/log/mysql/
  web:
    restart: always
    build: ./app_platform/app_platform
    env_file: .env
    environment:
      PYTHONPATH: '/usr/src/app/app_platform'
    command: bash -c "gunicorn --chdir /usr/src/app/app_platform app_platform.wsgi:application  -k gevent  -w 6 -b :8000 --timeout 8000 --reload"
    volumes:
      - ./app_platform:/usr/src/app
      - ./sqldata:/usr/src/sqldata
      - /usr/src/app/static
    ports:
      - "8000"
    dns:
        - 114.114.114.114
        - 8.8.8.8
    links:
      - mysqldb
  nginx:
    restart: always
    build: ./nginx/
    ports:
      - "80:80"
    volumes:
      - ./app_platform:/usr/src/app
      - ./nginx/sites-enabled/:/etc/nginx/sites-enabled
    links:
      - web:web
volumes:
    mysqldata:

And my django settings look like this:

import os
from django.utils.translation import ugettext_lazy as _

LANGUAGES = (
    ('en', _('English')),
    ('zh-CN', _('Chinese')),
)


LANGUAGE_CODE = 'zh-CN'

BASE_DIR = os.path.dirname(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

LOCALE_PATHS = (
    os.path.join(BASE_DIR, "locale"),
)

# SECURITY WARNING: keep the secret key used in production secret!
SECRET_KEY = 'just_for_test'

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'rest_framework',
    'app_scrapy',
    'app_user',
    'app_api',
    'app_check',
    'common',
    'debug_toolbar',
]


MIDDLEWARE_CLASSES = [
    'django.middleware.security.SecurityMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'debug_toolbar.middleware.DebugToolbarMiddleware',
    'django.middleware.locale.LocaleMiddleware',
    'django.middleware.common.CommonMiddleware',
    'django.middleware.csrf.CsrfViewMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.contrib.auth.middleware.SessionAuthenticationMiddleware',
    'django.contrib.messages.middleware.MessageMiddleware',
    'django.middleware.clickjacking.XFrameOptionsMiddleware',
]

AUTH_USER_MODEL = 'app_user.MyUser'

AUTHENTICATION_BACKENDS = (
    'app_user.models.CustomAuth', 'django.contrib.auth.backends.ModelBackend')


ROOT_URLCONF = 'app_platform.urls'


TEMPLATES = [
    {
        'BACKEND': 'django.template.backends.django.DjangoTemplates',
        'DIRS': ["/usr/src/app/app_platform/templates"],
        'APP_DIRS': True,
        'OPTIONS': {
            'context_processors': [
                'django.template.context_processors.debug',
                'django.template.context_processors.request',
                'django.template.context_processors.i18n',
                'django.contrib.auth.context_processors.auth',
                'django.contrib.messages.context_processors.messages',
            ],
        },
    },
]

WSGI_APPLICATION = 'app_platform.wsgi.application'

LOGIN_REDIRECT_URL = '/'
LOGIN_URL = '/login/'
# Database
# https://docs.djangoproject.com/en/1.9/ref/settings/#databases
# Password validation
# https://docs.djangoproject.com/en/1.9/ref/settings/#auth-password-validators

AUTH_PASSWORD_VALIDATORS = [
    {
        'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',
    },
    {
        'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator',
    },
    {
        'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator',
    },
    {
        'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator',
    },
]

STATICFILES_FINDERS = (
    'django.contrib.staticfiles.finders.FileSystemFinder',
    'django.contrib.staticfiles.finders.AppDirectoriesFinder'
)

# Internationalization
# https://docs.djangoproject.com/en/1.9/topics/i18n/

TIME_ZONE = 'Asia/Shanghai'

USE_I18N = True

USE_L10N = True

USE_TZ = True


# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/1.9/howto/static-files/

STATIC_ROOT = "/static/"

STATIC_URL = '/static/'

STATICFILES_DIRS = (
    'public/static/',
)


DEBUG = True

ALLOWED_HOSTS = []

REST_FRAMEWORK = {
    'DEFAULT_AUTHENTICATION_CLASSES': (
        'rest_framework.authentication.BasicAuthentication',
        'rest_framework.authentication.SessionAuthentication',
    ),
    'DEFAULT_PERMISSION_CLASSES': (
        'rest_framework.permissions.AllowAny',
    ),
    'DEFAULT_PAGINATION_CLASS':
        'rest_framework.pagination.LimitOffsetPagination',
        'PAGE_SIZE': 5,
}

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'app_platform',
        'USER': 'www',
        'PASSWORD': 'www',
        'HOST': 'mysqldb',   # Or an IP Address that your DB is hosted on
        'PORT': '3306',
    }
}

DEBUG_TOOLBAR_CONFIG = {
    "SHOW_TOOLBAR_CALLBACK": lambda request: True,
}

My app table info

CREATE TABLE `app` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(256) NOT NULL,
  `version` varchar(256) NOT NULL,
  `developer` varchar(256) NOT NULL,
  `md5` varchar(256) NOT NULL,
  `type` varchar(256) NOT NULL,
  `size` int(11) NOT NULL,
  `download` varchar(1024) NOT NULL,
  `download_md5` varchar(256) NOT NULL,
  `download_times` bigint(20) NOT NULL,
  `snapshot` varchar(2048) NOT NULL,
  `description` varchar(5000) NOT NULL,
  `app_update_time` datetime(6) NOT NULL,
  `create_time` datetime(6) NOT NULL,
  `update_time` datetime(6) NOT NULL,
  `store_id` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `app_store_id_6822fab1_uniq` (`store_id`,`name`,`version`),
  KEY `app_7473547c` (`store_id`),
  KEY `app_developer_b74bcd8e_uniq` (`developer`),
  KEY `app_create_time_a071d977_uniq` (`create_time`),
  CONSTRAINT `app_store_id_aef091c6_fk_app_scrapy_appstore_id` FOREIGN KEY (`store_id`) REFERENCES `app_scrapy_appstore` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1870020 DEFAULT CHARSET=utf8;

EDIT 3

Here is EXPLAIN SELECT COUNT(*) FROM app;

mysql> EXPLAIN SELECT COUNT(*) FROM `app`;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
1 row in set, 1 warning (0.00 sec)

EDIT 4

Here is my mysql.cnf

innodb_read_io_threads=12
innodb_write_io_threads=12
innodb_io_capacity=300
innodb_read_io_threads=12
innodb_write_io_threads=12  #To stress the double write buffer
innodb_buffer_pool_size=3G
innodb_log_file_size = 32M #Small log files, more page flush
innodb_log_buffer_size=8M
innodb_flush_method=O_DIRECT

My docker settings is 2 CPUS and 4GB Memory

EDIT 5

When I ran ORM query inside django shell, just took me 0.5-1 second. So the problem is about docker settings? or maybe gunicorn settings?

like image 957
Windsooon Avatar asked Apr 28 '17 02:04

Windsooon


1 Answers

10X -- I like it. That exactly matches my Rule of Thumb: "If the data is not cached, the query will take 10 times as long as if it is cached." (Rick's RoTs)

But, let's move on to the real question: "4.1s is too slow, what can I do about it."

  • Change your app so you don't need the number of rows. Have you noticed that Search engines no longer say "out of 12345678 hits"?

  • Keep an estimate, rather than recomputing.

  • Let's see EXPLAIN SELECT COUNT(*) FROM app; it might give some more clues. (One place you say app, another you say app_scrapy_appmain; are they the same??)

  • As long as you never DELETE any rows, this would give you the same answer: SELECT MAX(id) FROM app, and run "instantly". (Once a DELETE, ROLLBACK, etc) happens, id(s) will be lost, so the COUNT will be less than MAX.)

More

innodb_buffer_pool_size=3G is probably too much on only 4GB of RAM. If MySQL swaps, performance becomes really bad. Suggest only 2G, unless you can see that it is not swapping.

Note: Scanning 1.8M rows is destined to take at least 0.4s on that hardware, or probably any hardware. It takes time do to the task. Also, doing the 'long' query interferes in two ways with other tasks: It consumes CPU and/or I/O while performing the query, plus it may be bumping other blocks out of cache, causing them to slow down. So, I really think the 'right' thing to do is to take heed of my hints on avoiding COUNT(*). Here's another one:

  • Build and maintain a "Summary Table" of daily subtotals of this (and other) table. Include in it the daily COUNT(*) and whatever else you might like to have. This will even shrink the 0.4s timing by using SUM(subtotal) from this table. More on Summary Tables.
like image 77
Rick James Avatar answered Dec 09 '22 01:12

Rick James