Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

E_WARNING: Error while sending STMT_PREPARE packet. PID=*

My Laravel 5.7 website has been experiencing a few problems that I think are related to each other (but happen at different times):

  1. PDO::prepare(): MySQL server has gone away
  2. E_WARNING: Error while sending STMT_PREPARE packet. PID=10
  3. PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry (My database often seems to try to write the same record twice in the same second. I've been unable to figure out why or how to reproduce it; it doesn't seem to be related to user behavior.)
  4. Somehow, those first 2 types of errors only ever appear in my Rollbar logs but not on the text logs on the server or in my Slack notifications, as all errors are supposed to (and all others do).

For months, I've continued to see scary log messages like these, and I've been completely unable to reproduce these errors (and have been unable to diagnose and solve them).

I haven't yet found any actual symptoms or heard of any complaints from users, but the error messages seem non-trivial, so I really want to understand and fix the root causes.


I've tried changing my MySQL config to use max_allowed_packet=300M (instead of the default of 4M) but still get these exceptions frequently on days when I have more than a couple of visitors to my site.

I've also set (changed from 5M and 10M) the following because of this advice:

innodb_buffer_pool_chunk_size=218M
innodb_buffer_pool_size = 218M

As further background:

  • My site has a queue worker that runs jobs (artisan queue:work --sleep=3 --tries=3 --daemon).
  • There are a bunch of queued jobs that can be scheduled to happen at the same moment based on the signup time of visitors. But the most I see that have happened simultaneously is 20.
  • There are no entries in the MySQL Slow Query Log.
  • I have a few cron jobs, but I doubt they're problematic. One runs every minute but is really simple. Another runs every 5 minutes to send certain scheduled emails if any are pending. And another runs every 30 minutes to run a report.
  • I've run various mysqlslap queries (I'm completely novice though) and haven't found anything slow even when simulating hundreds of concurrent clients.
  • I'm using Laradock (Docker).
  • My server is DigitalOcean 1GB RAM, 1 vCPU, 25GB SSD. I've also tried 2GB RAM with no difference.
  • The results from SHOW VARIABLES; and SHOW GLOBAL STATUS; are here.

My my.cnf is:

[mysql]

[mysqld]
sql-mode="STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION"
character-set-server=utf8
innodb_buffer_pool_chunk_size=218M
innodb_buffer_pool_size = 218M
max_allowed_packet=300M
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow_query_log.log
long_query_time = 10
log_queries_not_using_indexes = 0

Any ideas about what I should explore to diagnose and fix these problems? Thanks.


like image 357
Ryan Avatar asked Nov 25 '18 17:11

Ryan


2 Answers

Re Slowlog: Show us your my.cnf. Were the changes in the [mysqld] section? Test it via SELECT SLEEP(12);, then look both in the file and the table.

Alternate way to find the query: Since the query is taking several minutes, do SHOW FULL PROCESSLIST; when you think it might be running.

How much RAM do you have? Do not have max_allowed_packet=300M unless you have at least 30GB of RAM. Else you are risking swapping (or even crashing). Keep that setting under 1% of RAM.

For further analysis of tunables, please provide (1) RAM size, (2) SHOW VARIABLES; and (3) SHOW GLOBAL STATUS;.

Re deleted_at: That link you gave starts with "The column deleted_at is not a good index candidate". You misinterpreted it. It is talking about a single-column INDEX(deleted_at). I am suggesting a composite index such as INDEX(contact_id, job_class_name, execute_at, deleted_at).

158 seconds for a simple query on a small table? It could be that there is a lot of other stuff going on. Get the PROCESSLIST.

Re Separate indexes versus composite: Think of two indexes: INDEX(last_name) and INDEX(first_name). You flip through the last_name index to find "James", then what can you do? Flipping through the other index for "Rick" won't help you find me.

Analysis of VARIABLES and GLOBAL STATUS

Observations:

  • Version: 5.7.22-log
  • 1.00 GB of RAM
  • Uptime = 16d 10:30:19
  • Are you sure this was a SHOW GLOBAL STATUS ?
  • You are not running on Windows.
  • Running 64-bit version
  • You appear to be running entirely (or mostly) InnoDB.

The More Important Issues:

innodb_buffer_pool_size -- I thought you had it at 213M, not 10M. 10M is much too small. On the other hand, you seem to have less than that much data.

Since the RAM is so small, I recommend dropping tmp_table_size and max_heap_table_size and max_allowed_packet to 8M. And lower table_open_cache, table_definition_cache, and innodb_open_files to 500.

What causes so many simultaneous connections?

Details and other observations:

( innodb_buffer_pool_size / _ram ) = 10M / 1024M = 0.98% -- % of RAM used for InnoDB buffer_pool

( innodb_buffer_pool_size ) = 10M -- InnoDB Data + Index cache

( innodb_lru_scan_depth ) = 1,024 -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixed by lowering lru_scan_depth

( Innodb_buffer_pool_pages_free / Innodb_buffer_pool_pages_total ) = 375 / 638 = 58.8% -- Pct of buffer_pool currently not in use -- innodb_buffer_pool_size is bigger than necessary?

( Innodb_buffer_pool_bytes_data / innodb_buffer_pool_size ) = 4M / 10M = 40.0% -- Percent of buffer pool taken up by data -- A small percent may indicate that the buffer_pool is unnecessarily big.

( innodb_log_buffer_size / _ram ) = 16M / 1024M = 1.6% -- Percent of RAM used for buffering InnoDB log writes. -- Too large takes away from other uses for RAM.

( innodb_log_file_size * innodb_log_files_in_group / innodb_buffer_pool_size ) = 48M * 2 / 10M = 960.0% -- Ratio of log size to buffer_pool size. 50% is recommended, but see other calculations for whether it matters. -- The log does not need to be bigger than the buffer pool.

( innodb_flush_method ) = innodb_flush_method = -- How InnoDB should ask the OS to write blocks. Suggest O_DIRECT or O_ALL_DIRECT (Percona) to avoid double buffering. (At least for Unix.) See chrischandler for caveat about O_ALL_DIRECT

( innodb_flush_neighbors ) = 1 -- A minor optimization when writing blocks to disk. -- Use 0 for SSD drives; 1 for HDD.

( innodb_io_capacity ) = 200 -- I/O ops per second capable on disk . 100 for slow drives; 200 for spinning drives; 1000-2000 for SSDs; multiply by RAID factor.

( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF -- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.

( min( tmp_table_size, max_heap_table_size ) / _ram ) = min( 16M, 16M ) / 1024M = 1.6% -- Percent of RAM to allocate when needing MEMORY table (per table), or temp table inside a SELECT (per temp table per some SELECTs). Too high may lead to swapping. -- Decrease tmp_table_size and max_heap_table_size to, say, 1% of ram.

( net_buffer_length / max_allowed_packet ) = 16,384 / 16M = 0.10%

( local_infile ) = local_infile = ON -- local_infile = ON is a potential security issue

( Select_scan / Com_select ) = 111,324 / 264144 = 42.1% -- % of selects doing full table scan. (May be fooled by Stored Routines.) -- Add indexes / optimize queries

( long_query_time ) = 10 -- Cutoff (Seconds) for defining a "slow" query. -- Suggest 2

( Max_used_connections / max_connections ) = 152 / 151 = 100.7% -- Peak % of connections -- increase max_connections and/or decrease wait_timeout

You have the Query Cache half-off. You should set both query_cache_type = OFF and query_cache_size = 0 . There is (according to a rumor) a 'bug' in the QC code that leaves some code on unless you turn off both of those settings.

Abnormally small:

( Innodb_pages_read + Innodb_pages_written ) / Uptime = 0.186
Created_tmp_files = 0.015 /HR
Handler_write = 0.21 /sec
Innodb_buffer_pool_bytes_data = 3 /sec
Innodb_buffer_pool_pages_data = 256
Innodb_buffer_pool_pages_total = 638
Key_reads+Key_writes + Innodb_pages_read+Innodb_pages_written+Innodb_dblwr_writes+Innodb_buffer_pool_pages_flushed = 0.25 /sec
Table_locks_immediate = 2.8 /HR
Table_open_cache_hits = 0.44 /sec
innodb_buffer_pool_chunk_size = 5MB

Abnormally large:

Com_create_db = 0.41 /HR
Com_drop_db = 0.41 /HR
Connection_errors_peer_address = 2
Performance_schema_file_instances_lost = 9
Ssl_default_timeout = 500

Abnormal strings:

ft_boolean_syntax = + -><()~*:&
have_ssl = YES
have_symlink = DISABLED
innodb_fast_shutdown = 1
optimizer_trace = enabled=off,one_line=off
optimizer_trace_features = greedy_search=on, range_optimizer=on, dynamic_range=on, repeated_subselect=on
session_track_system_variables = time_zone, autocommit, character_set_client, character_set_results, character_set_connection
slave_rows_search_algorithms = TABLE_SCAN,INDEX_SCAN
like image 154
Rick James Avatar answered Oct 08 '22 19:10

Rick James


I encountered the same situation on a long-running PHP CLI script (it listens on a Redis list ; each action is quick but the script basically runs for ever).

I create the PDO object and a prepared statement at the beginning, then reuse them afterwards.

The day after I started the script, I got the exact same errors:

PHP Warning:  Error while sending STMT_EXECUTE packet. PID=9438 in /...redacted.../myscript.php on line 39

SQLSTATE[HY000]: General error: 2006 MySQL server has gone away

In my case, it's a development server, there is no load, MySQL is on the same box... so it's unlikely to come from external factors. It's most likely related to the fact I used the same MySQL connection for too long, and it timed out. And PDO doesn't bother, so any subsequent query will just return "MySQL server has gone away".

Checking the value of "wait_timeout" in MySQL:

mysql> show variables like 'wait_timeout';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wait_timeout  | 28800 |
+---------------+-------+
1 row in set (0.06 sec)

mysql> show local variables like 'wait_timeout';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wait_timeout  | 28800 |
+---------------+-------+
1 row in set (0.00 sec)

I see 28800 seconds = 8 hours, which seems coherent with the timing of my errors.

In my case, restarting the MySQL server, or setting wait_timeout very low, while keeping the same PHP worker running, makes it very easy to reproduce the issue.

Overall:

  • PDO doesn't care if the connection times out, and will not automatically reconnect. If you put a try/catch around your PDO queries, the script will never crash and keep using the obsolete PDO instance.
  • the STMT_EXECUTE warning is probably incidental ; just because the script whose connection timed out was using prepared statements, and the first query post-timeout happened to be using a prepared statement

To get back to your case

  • in theory Laravel 5 is immune to this issue: https://blog.armen.im/en/laravel-4-and-stmt_prepare-error/ ; do you use something other than Illuminate, or even bare PDO directly? Also, I'm not sure what Laravel does when it detects a lost connection (does it reconnect and rebuild prepared statements?), it might be worth digging further.
  • check your MySQL wait_timeout value, and increase it if it's too low
  • if it's not happening all the time, see if the errors correlate with server / DB load. High load can make things (especially big SQL queries) several times slower, to the point some other MySQL timeout like max_execution_time gets reached.
  • see if you wrapped PDO queries in a try / catch block and use it to retry the query ; it might be preventing the connection error from bubbling up.
like image 2
Mathieu Rey Avatar answered Oct 08 '22 19:10

Mathieu Rey