Audit a MySQL or MariaDB database (English)

2018-08-26

  english    database 

Just as you take care to monitor and keep your applications up to date, it is also important to take care of the engines that serve your data because it is the most critical brick of your application: if your data is corrupted or cannot be updated in the time allowed, your application will be directly impacted.

It is therefore important to audit your database, even if you do not yet encounter any particular problem but in order to warn you to be victim of your success afterwards.

Audit MySQL MariaDB

In this article, I deliberately chose to rely on MySQL / MariaDB databases, having had experience auditing this database engine.

First of all, know your database

The most important thing to manage a database is to know the data and to be aware of the use of each table.

The following query will therefore help you to get a quick overview of a database:


SELECT table_name, engine, row_format, table_rows, ROUND(((data_length + index_length) / 1024 / 1024), 2) AS "size (mb)"
FROM information_schema.tables
WHERE table_schema = "video"
ORDER BY (data_length + index_length) DESC;

This request will return something looking to the following:

table_name engine row_format table_rows size (mb)
content InnoDB Dynamic 304434 210.89
video InnoDB Dynamic 277810 143.67
media InnoDB Dynamic 502889 101.31

The data reported by these fields are as follows:

  • table_name: The table name,
  • engine: The engine used to store data in this table (by default InnoDB since MySQL version 5.5.5),
  • row_format: The formatting type of the table (the default value is defined by the variable innodb_default_row_format), its value can be DYNAMIC, COMPRESSED, REDUNDANT or COMPACT,
  • table_rows: The number of lines in the table,
  • size: The size (in megabytes) of the table,
Formatting files containing your data

As previously mentioned, InnoDB is now the default engine for MySQL and MariaDB.

InnoDB has evolved over time and now supports two file format types: Antelope and Barracuda.

Antelope supports data formats (row_format) REDUNDANT and COMPACT/p> In order, the REDUNDANT format was used by default in MySQL then it was replaced by the COMPACT type which allowed to better manage the representation of null values and the length of columns.


Barracuda is the latest data format version that supports data formatting (row_format) COMPRESSED and DYNAMIC.

Both formats can store very large fields outside the rest of the data and all indexes and large fields are compressed, saving memory and disk space.

The DYNAMIC format is an improved version that also allows you to add less data to the buffer used when executing queries, thus saving you I/O operations and CPU and memory performance.


To summarize, you will prefer today to use the Barracuda engine (used by default), more recent and more optimized. In addition, it allows you to better manage large BLOB or TEXT fields and allows you to manage larger columns (varchar(255) with utf8mb4).

System configuration

The system configuration of your server depends a lot on the use of your database, both in terms of storage and querying.


InnoDB allocates by default 128MB of RAM for its buffer and this value can be modified via the parameter innodb_buffer_pool_size.

It is generally recommended to allocate about 60% of the available physical memory to this parameter.


SHOW VARIABLES LIKE 'innodb_buffer_pool_size';

Variable_name Value
innodb_buffer_pool_size 134217728

This value is therefore the default value (134217728 / 1024 / 1024 = 128).

MySQL configuration

MySQL exposes about 500 configuration variables, we will only skip to the most important and those that can quickly help you.

These are the values that you will have to refine according to the elements that we will analyze in the following article.


  • innodb_flush_log_at_trx_commit: Allows the transaction log to be written to each transaction and thus avoids any loss of data in case of system or machine crash (value 1), however, the value 2 is recommended because it allows to store on the disk once per second and avoids a large influx of disk I/O,
  • innodb_io_capacity: InnoDB defaults to 200 I/O per second for its flush, buffer and data merge operations. In case of high activity, do not hesitate to increase this limit to 400 I/O per second, or more if necessary.
  • innodb_lock_wait_timeout: This is the time InnoDB takes to acquire a lock before releasing it. Set to 50 seconds by default, it can be interesting to lower or increase this limit according to your use in order to avoid finding yourself with locked data while your application tries to modify them again.
  • innodb_log_buffer_size: This value (set to 16MB by default) allows you to define the buffer size used for log data : an extended log buffer will allow you to save I/O disk before the transaction is committed.
  • innodb_log_file_size: This parameter is very important, in case of high load, this parameter allows to transform random disk operations into sequential ones, improving the global use of the database. By default, 48MB are allocated, feel free to increase this value to 64 or 128MB depending on the use of your database.
  • innodb_thread_concurrency: This is the number of threads allocated by InnoDB to serve competing requests. It is recommended to allocate between 2 and 8 times the number of CPUs available on the machine (with a maximum value of 256).
  • key_buffer_size: In addition to the InnoDB buffer, MySQL uses this buffer to store temporary tables that can be used for some of your queries. It is recommended to keep a value of 64 to 128MB.
  • max_connections: This parameter limits the number of connections to your MySQL database, depending on its use, this parameter can be increased.
  • max_heap_table_size: This parameter allows to define the maximum size of implicit temporary tables in memory.
  • open_files_limit: This is the maximum number of files that can be opened by MySQL, it is recommended to increase this limit if your database accepts many connections.
  • query_cache_limit: This parameter allows you to define the maximum size of results returned by your query that will be cached. Reducing this value prevents a large query result from being hidden and invalidates the result of smaller and more frequent queries (requiring more cache).
  • query_cache_size: Limits the memory allocated to the results cache of your queries. A size of less than 512 MB is generally recommended (depending on your available memory).
  • transaction-isolation: READ-COMMITTED is recommended for this parameter, allowing each consistent reading, even within the same transaction, to establish and read its own fresh snapshot (reduces the number of deadlocks).
  • slow_query_log: It is recommended to leave the log of slow requests active, by increasing the value of the long_query_time parameter allowing to define the threshold of a slow request.

Feel free to browse through the different parameters to see what will be the best values for you.

SQL traffic analysis

In order to be able to define these values, it is also necessary to analyze the traffic of your database and thus the requests made on your database.

To do this, you can decide to enable general query log for only a few seconds. That's what we're gonna do here for 60 seconds:


SET GLOBAL general_log = 1; SELECT SLEEP(60); SET GLOBAL general_log = 0;

You can also enable slow query logs by adapting the long_query_time parameter value.


SET GLOBAL slow_query_log = 1;
SET GLOBAL long_query_time = 0.1;
SET GLOBAL log_queries_not_using_indexes = 1;

Once your logs are generated, you can use the tool percona-tookit to perform further analysis and statistics on the different logs:


$ apt-get install percona-toolkit

You can then generate these statistics from the tool pt-query-digest


$ cd /var/log/mysql
$ pt-query-digest --type=genlog query.log > mysql.log.digest
$ pt-query-digest mysql-slow.log > mysql-slow.log.digest

You will thus have in these files a header presenting you statistics on the whole of the recorded requests and thus to see the patterns of requests most carried out or still taking the most time:


# 270ms user time, 10ms system time, 26.00k rss, 14.00k vsz
# Current date: Sun Aug 26 14:17:47 2018
# Hostname: b85e368d128b
# Files: query.log
# Overall: 242 total, 4 unique, 0 QPS, 0x concurrency ___________________
# Time range: 2018-08-26T14:15:03.611597Z to 2018-08-26T14:16:03.612118Z
# Attribute          total     min     max     avg     95%  stddev  median
# ============     ======= ======= ======= ======= ======= ======= =======
# Exec time              0       0       0       0       0       0       0
# Query size       137.17k       8   1.89k  580.40   1.04k  472.00  299.03

# Profile
# Rank Query ID           Response time Calls R/Call V/M   Item
# ==== ================== ============= ===== ====== ===== ===============
#    1 0x87AF0AF11E8DFF98  0.0000  0.0%    40 0.0000  0.00 SELECT product
#    2 0xB5403927856EA576  0.0000  0.0%    40 0.0000  0.00 SELECT media media_image media_video
#    3 0x615E6B9AC6FB3863  0.0000  0.0%    27 0.0000  0.00 SELECT pattern
#    4 0x24A0450BEC6D831A  0.0000  0.0%    23 0.0000  0.00 SELECT category taxonomy

Then, for each query, you can search the file based on the Query ID to get more details.


# Query 1: 0 QPS, 0x concurrency, ID 0x87AF0AF11E8DFF98 at byte 150147 ___
# This item is included in the report because it matches --limit.
# Scores: V/M = 0.00
# Time range: 2018-08-26T14:15:52.308078Z to 2018-08-26T14:15:52.583549Z
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count         16      40
# Exec time      0       0       0       0       0       0       0       0
# Query size     5   7.46k     190     191  190.90  183.58       0  183.58
# String:
# Databases    video
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms
# 100ms
#    1s
#  10s+
# Tables
#    SHOW TABLE STATUS FROM `video` LIKE 'product'\G
#    SHOW CREATE TABLE `video`.`product`\G
# EXPLAIN /*!50100 PARTITIONS*/
SELECT t0.pro_id AS pro_id_1, t0.pro_object_type AS pro_object_type_2, t0.pro_object_id AS pro_object_id_3 FROM product t0 WHERE t0.pro_object_type = 'media' AND t0.pro_object_id = 79 LIMIT 1\G

SQL query profiling

To analyze your SQL queries in more detail, you can also enable profiling on your session only and then play your queries:


SET SESSION profiling = 1;
SELECT * FROM mytable WHERE myfield LIKE '%toto';

Then, to get the result of your profiles:


SHOW PROFILES;

Query_ID Duration Query
1 0.00026850 "SELECT * FROM mytable WHERE myfield LIKE '%toto'"

SHOW PROFILE CPU FOR QUERY 1;

Note that the "CPU" is not mandatory, but the information can be interesting to get back. This will give you the following return:

Status Duration CPU_user CPU_system
starting 0.000096 0.000000 0.000000
checking permissions 0.000011 0.000000 0.000000
Opening tables 0.000022 0.000000 0.000000
init 0.000037 0.000000 0.000000
System lock 0.000011 0.000000 0.000000
optimizing 0.000011 0.000000 0.000000
statistics 0.000070 0.000000 0.000000
preparing 0.000011 0.000000 0.000000
executing 0.000003 0.000000 0.000000
Sending data 0.000024 0.000000 0.000000
end 0.000004 0.000000 0.000000
query end 0.000008 0.000000 0.000000
closing tables 0.000009 0.000000 0.000000
freeing items 0.000094 0.000000 0.000000
cleaning up 0.000022 0.000000 0.000000

For more information on the elements returned by the profiler, go to the following URL: https://dev.mysql.com/doc/refman/8.0/en/profiling-table.html

MySQL counter analysis

MySQL also comes with a bunch of counters, displayed via the query:


SHOW GLOBAL STATUS;

Again, many items are returned but a small sample of counters will already help you analyze your database activity.


You can for example retrieve information on connections:

  • Aborted_clients: This is the number of connections that were cancelled because the client failed without closing the connection,
  • Aborted_connects: This is the number of failed connections : a significant number can be indicated in the case where a TCP check would be performed : indeed, a TCP check is not sufficient to determine if a database is functional,

InnoDB counters have also been reset:

  • Innodb_buffer_pool_read_requests : This is the number of readings taken.
  • Innodb_buffer_pool_reads : This is the number of readings taken from disk that cannot be satisfied by the pool buffer.

If the ratio falls below 99.95%, consider increasing the size of the InnoDB pool buffer so as not to degrade the performance of your database.

  • Created_tmp_disk_tables : This is the number of temporary tables created using the disk.
  • Created_tmp_files : This is the number of files created to satisfy temporary tables.
  • Created_tmp_tables : This is the number of temporary tables created since the server was started.

The ratio of temporary tables created on the disk should not exceed 10-20%, ideally, try to reduce the ratio by improving your queries.

The following counters will also give you additional information about using the MySQL query cache:

  • Qcache_free_blocks
  • Qcache_free_memory
  • Qcache_hits
  • Qcache_inserts
  • Qcache_lowmem_prunes
  • Qcache_not_cached
  • Qcache_queries_in_cache
  • Qcache_total_blocks

You can measure the ratio between Inserts and Hits to see how the cache is being used.

You can then hunt down requests that are not hidden.

Very important: remember that queries containing a TEXT or BLOB field or a result greater than the "query_cache_limit" parameter will never be hidden. Prefer a VARCHAR(255) to a TEXT field.


Finally, you will also find information about SELECT queries made on your database:

  • Select_full_join
  • Select_full_range_join
  • Select_range
  • Select_range_check
  • Select_scan

Here, a large number of "Select_scan" means that many "full table scans" are performed on your database, so it is advisable to analyze the queries with the tools previously cited in this article to observe what is wrong.

You will certainly need to review your indexes.

Optimize your SQL requests

Use the keyword EXPLAIN (or EXPLAIN EXTENDED) before your queries that allow you to see the MySQL run plan and make sure the indexes are used correctly.


Also, avoid quotes in your queries when you filter on numeric fields so that MySQL doesn't need to do conversions on columns and therefore browse the entire table.


It is also important to monitor your primary keys and auto_increment values. Here is a query that will allow you to see the usage rate of your primary keys as well as the type of them:


SELECT TABLE_SCHEMA,TABLE_NAME,COLUMN_NAME,COLUMN_TYPE, DATA_TYPE, auto_increment, max_value, ROUND(auto_increment / max_value * 100,2) as 'pct_used'
FROM (
    SELECT t.TABLE_SCHEMA,t.TABLE_NAME,c.COLUMN_NAME,c.DATA_TYPE,c.COLUMN_TYPE, t.auto_increment, IF(Locate('unsigned', column_type) = 0,
    CASE data_type
        WHEN 'tinyint' THEN '127'
        WHEN 'smallint' THEN '32767'
        WHEN 'mediumint' THEN '8388607'
        WHEN 'int' THEN '2147483647'
        WHEN 'bigint' THEN '9223372036854775807'
    END ,
    CASE data_type
        WHEN 'tinyint' THEN '255'
        WHEN 'smallint' THEN '65535'
        WHEN 'mediumint' THEN '16777215'
        WHEN 'int' THEN '4294967295'
        WHEN 'bigint' THEN '18446744073709551615'
    END) AS max_value
    FROM INFORMATION_SCHEMA.TABLES t
    JOIN INFORMATION_SCHEMA.COLUMNS c ON (t.TABLE_SCHEMA = c.TABLE_SCHEMA AND t.TABLE_NAME = c.TABLE_NAME)
    WHERE c.COLUMN_KEY = 'PRI' AND EXTRA='auto_increment'
) bigq
ORDER BY pct_used DESC;

You will get the following return:

TABLE_SCHEMA TABLE_NAME COLUMN_NAME COLUMN_TYPE DATA_TYPE auto_increment max_value pct_used
video content con_id int(11) int 659268 2147483647 0.03
video media med_id int(11) int 1057114 2147483647 0.05
video taxonomy tax_id int(11) int 7028 2147483647 0.00

In case you do not have a primary key, InnoDB will have to create a hidden primary key that will not be optimized at all.

Hunt also for TEXT / BLOB columns in order to benefit from the MySQL query cache, here is a query that will allow you to move up all BLOB or TEXT columns:


SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, COLUMN_TYPE
FROM information_schema.columns
WHERE (COLUMN_TYPE LIKE '%TEXT%' OR COLUMN_TYPE LIKE '%BLOB%')
AND TABLE_SCHEMA NOT IN ('performance_schema', 'information_schema', 'mysql');

If you have a doubt on a field, you can search the maximum length present in your data for this clone by playing the following query:


SELECT MAX(LENGTH(myfield)) FROM mytable;

I hope that this information will have helped you to optimize the performance of your database, both in terms of query and physical resources because the two are closely related.

Feel free to contact me if I can help you move forward by providing you with more information.

Comments