Optimizing MySql queries. Combining DDL queries

Facebook

MySQL is still the world's most popular relational database, but it is also the least optimized. Many people stay with the default settings without digging deeper. In this article we'll look at some MySQL optimization tips combined with some new features that have come out relatively recently.

Configuration optimization

The first thing every MySQL user should do to improve performance is tweak the configuration. However, most people skip this step. In 5.7 (the current version), the default settings are much better than those of its predecessors, but it is still possible and easy to improve them.

We hope that you are using Linux or something like Vagrant -box (like our Homestead Improved) and, accordingly, your configuration file will be located in /etc/mysql/my.cnf . It's possible that your installation will actually load additional file configurations in this one. So look, if the my.cnf file doesn't contain much, then look in /etc/mysql/mysql.conf.d/mysqld.cnf .

Manual tuning

The following settings should be done out of the box. According to these tips, add to the config file in the section:

Innodb_buffer_pool_size = 1G # (here change about 50%-70% of the total RAM) innodb_log_file_size = 256M innodb_flush_log_at_trx_commit = 1 # can be changed to 2 or 0 innodb_flush_method = O_DIRECT

innodb_buffer_pool_size . The buffer pool is a kind of “warehouse” for caching data and indexes in memory. It is used to store frequently accessed data in memory. And when you use dedicated or virtual server, in which it is often the database that is the bottleneck, that is, it makes sense to give it the majority of the RAM. Hence, we give it 50-70% of the total RAM. There is a guide to setting up this pool in the MySQL documentation.
innodb_log_file_size . Setting the log file size is well described, but in a nutshell it is the amount of data stored in the logs before it is cleared. Please note that the log in this case is not error records, but a kind of delta snapshot of changes that have not yet been flushed to disk in the main innodb files. MySQL writes in the background, but this still affects performance at the time of writing. A larger log file means higher performance due to fewer new and smaller checkpoints being created, but also more long time recovery in case of a crash (more data must be rewritten to the database).
innodb_flush_log_at_trx_commit is described and shows what happens to the log file. Value 1 is the safest, because the log is flushed to disk after each transaction. With values 0 and 2, ACID is guaranteed less, but performance is higher. The difference is not large enough to outweigh the stability benefits at 1.
innodb_flush_method . To top it all off when it comes to flushing data, this setting needs to be set to O_DIRECT - to avoid double buffering. I advise you to always do this while the I/O system remains very slow. Although on most hostings, like DigitalOcean, you will have SSD drives, so the I/O system will be more productive.

There is a tool from Percona that will help us find the remaining problems automatically. Note that if we were to run it without this manual setting, only 1 of the 4 settings would be defined, since the other 3 depend on the user's preferences and the application environment.

Variable Inspector

Installing variable inspector on Ubuntu:

Wget https://repo.percona.com/apt/percona-release_0.1-4.$(lsb_release -sc)_all.deb sudo dpkg -i percona-release_0.1-4.$(lsb_release -sc)_all. deb sudo apt-get update sudo apt-get install percona-toolkit

For other systems, follow these instructions.

Then run the toolkit:

Pt-variable-advisor h=localhost,u=homestead,p=secret

You will see this result:

# WARN delay_key_write: MyISAM index blocks are never flushed until necessary. # NOTE max_binlog_size: The max_binlog_size is smaller than the default of 1GB. # NOTE sort_buffer_size-1: The sort_buffer_size variable should generally be left at its default unless an expert determines it is necessary to change it. # NOTE innodb_data_file_path: Auto-extending InnoDB files can consume a lot of disk space that is very difficult to reclaim later. # WARN log_bin: Binary logging is disabled, so point-in-time recovery and replication are not possible.

Note translator:
On my local machine, in addition to this, I also received the following warning:

# NOTE innodb_flush_method: Most production database servers that use InnoDB should set innodb_flush_method to O_DIRECT to avoid double-buffering, unless the I/O system is very low performance.

The fact that the innodb_flush_method parameter needs to be set to O_DIRECT and why was discussed above. And if you followed the tuning sequence as in the article, then you will not see this warning.

None of these ( approx.: indicated by the author) warnings are not critical, they do not need to be corrected. The only thing that can be corrected is setting up a binary log for replication and snapshots.

Note: in new versions the default binlog size is 1G and this warning will not occur.

Max_binlog_size = 1G log_bin = /var/log/mysql/mysql-bin.log server-id=master-01 binlog-format = "ROW"

max_binlog_size . Determines how large the binary logs will be. They record your transactions and requests and make checkpoints. If a transaction exceeds the maximum, then the log may exceed its size when saved to disk; otherwise MySQL will support it within this limit.
log_bin. This option enables binary logging in general. Without it, snapshots or replications are impossible. Please note that this can have a big impact on your disk space. server-id is a required option when enabling binary logging, so the logs "know" which server they came from (for replication), and binlog-format is simply the way they are written.

As you can see, the new MySQL has defaults that are almost production ready. Of course, each application is different and has additional tricks and tweaks it applies.

MySQL Tuner

Supporting tools: Percona Toolkit for identifying duplicate indexes

The Percona Toolkit we previously installed also has a tool for detecting duplicate indexes, which can be useful when using third-party CMSs or just checking yourself if you accidentally added more indexes than necessary. For example, WordPress installation by default has duplicate indexes in the wp_posts table:

Pt-duplicate-key-checker h=localhost,u=homestead,p=secret # ############################### ######################################### # homestead.wp_posts # #### ################################################## ################## # Key type_status_date ends with a prefix of the clustered index # Key definitions: # KEY `type_status_date` (`post_type`,`post_status`,`post_date` ,`ID`), # PRIMARY KEY (`ID`), # Column types: # `post_type` varchar(20) collate utf8mb4_unicode_520_ci not null default "post" # `post_status` varchar(20) collate utf8mb4_unicode_520_ci not null default "publish" " # `post_date` datetime not null default "0000-00-00 00:00:00" # `id` bigint(20) unsigned not null auto_increment # To shorten this duplicate clustered index, execute: ALTER TABLE `homestead`.` wp_posts` DROP INDEX `type_status_date`, ADD INDEX `type_status_date` (`post_type`,`post_status`,`post_date`);

As can be seen from last line, this tool also gives you tips on how to get rid of duplicate indexes.

Helper tools: Percona Toolkit for unused indexes

Percona Toolkit can also detect unused indexes. If you are logging slow queries (see bottlenecks section below), you can run the utility and it will check if and how those queries use indexes on tables.

Pt-index-usage /var/log/mysql/mysql-slow.log

For detailed information on using this utility, see .

Narrow places

This section describes how to detect and monitor database bottlenecks.

First, let's enable logging of slow queries:

Slow_query_log = /var/log/mysql/mysql-slow.log long_query_time = 1 log-queries-not-using-indexes = 1

The lines above must be added to the mysql configuration. The database will keep track of queries that took more than 1 second to complete and those that do not use indexes.

Once there is some data in this log, you can analyze it for index usage using the pt-index-usage utility above, or using pt-query-digest, which will produce results something like this:

Pt-query-digest /var/log/mysql/mysql-slow.log # 360ms user time, 20ms system time, 24.66M rss, 92.02M vsz # Current date: Thu Feb 13 22:39:29 2014 # Hostname: * # Files: mysql-slow.log # Overall: 8 total, 6 unique, 1.14 QPS, 0.00x concurrency ________________ # Time range: 2014-02-13 22:23:52 to 22:23:59 # Attribute total min max avg 95% stddev median # ============ ======= ======= ======= ======= ===== == ======= ======= # Exec time 3ms 267us 406us 343us 403us 39us 348us # Lock time 827us 88us 125us 103us 119us 12us 98us # Rows sent 36 1 15 4.50 14.52 4.18 3.89 # Rows examiners 87 4 30 10.88 28.75 7.37 7.70 # Query size 2.15k 153 296 245.11 284.79 48.90 258.32 # === = === ========== === ===== ====== ===== =============== # Profile # Rank Query ID Response time Calls R/Call V/M Item # ==== ==== ============ ===== ====== === == =============== # 1 0x728E539F7617C14D 0.0011 41.0% 3 0.0004 0.00 SELECT blog_article # 2 0x1290EEE0B201F3FF 0.0003 12.8% 1 0.0003 0.00 SELECT portfolio_ item # 3 0x31DE4535BDBFA465 0.0003 12.6% 1 0.0003 0.00 SELECT portfolio_item # 4 0xF14E15D0F47A5742 0.0003 12.1% 1 0.0003 0.00 SELECT portfolio_category # 5 0x8F848005A09C9588 0.0003 11.8% 1 0.0003 0.00 SELECT blog_category # 6 0x55F49C 753CA2ED64 0.0003 9.7% 1 0.0003 0.00 SELECT blog_article # ==== ============ ====== ============= ===== ====== ===== =============== # Query 1: 0 QPS, 0x concurrency, ID 0x728E539F7617C14D at byte 736 ______ # Scores: V/M = 0.00 # Time range: all events occurred at 2014-02-13 22:23:52 # Attribute pct total min max avg 95 % stddev median # ============ === ======= ======= ======= ======= === ==== ======= ======= # Count 37 3 # Exec time 40 1ms 352us 406us 375us 403us 22us 366us # Lock time 42 351us 103us 125us 117us 119us 9us 119us # Rows sent 25 9 1 4 3 3.89 1.37 3.89 # Rows examine 24 21 5 8 7 7.70 1.29 7.70 # Query size 47 1.02k 261 262 261.25 258.32 0 258.32 # String: # Hosts localhost # Users * # Query_time distribution # 1us # 10us # 100us #### ################################################## ########## # 1ms # 10ms # 100ms # 1s # 10s+ # Tables # SHOW TABLE STATUS LIKE "blog_article"\G # SHOW CREATE TABLE `blog_article`\G # EXPLAIN /*!50100 PARTITIONS*/ SELECT b0_.id AS id0, b0_.slug AS slug1, b0_.title AS title2, b0_.excerpt AS excerpt3, b0_.external_link AS external_link4, b0_.description AS description5, b0_.created AS created6, b0_.updated AS updated7 FROM blog_article b0_ ORDER BY b0_.created DESC LIMIT 10

If you prefer to analyze these logs manually, you can do the same, but first you will need to export the log to a more parsable format. This can be done like this:

Mysqldumpslow /var/log/mysql/mysql-slow.log

With additional options, you can filter the data to export only what you need. For example, the top 10 queries, sorted by average execution time:

Mysqldumpslow -t 10 -s at /var/log/mysql/localhost-slow.log

Conclusion

In this comprehensive MySQL optimization post, we covered various methods and techniques with which we can make our MySQL fly.

We figured out configuration optimization, we upgraded the indexes, and we got rid of some bottlenecks. This was all mostly theory, however, it is all applicable to real world applications.

How to optimize MySQL queries?

For an ordinary, not particularly visited site, there is not much difference whether MySQL queries to the database are optimized or not. But for production servers under heavy load, the difference between correct and incorrect SQL is huge, and at runtime they can significantly affect the behavior and reliability of services. In this article, I'll look at how to write fast queries and the factors that make them slow.

Why MySQL?

Today there is a lot of talk about Dig Data and other new technologies. NoSQL and cloud solutions are great, but a lot of popular software (such as WordPress, phpBB, Drupal) still runs on MySQL. Migration to the latest solutions can result in more than just changing the configuration on the servers. In addition, the efficiency of MySQL is still at the level, especially the Percona version.

Don't make the common mistake of throwing more and more hardware at solving the problem of slow queries and high server load - it's better to go to the root of the problem. Increased processor power and hard drives and adding RAM is also a certain type of optimization, however, this is not what we will talk about in this article. Also, by optimizing the site and solving the problem with hardware, the load will only grow exponentially. Therefore, this is only a short-term solution.

A good understanding of SQL is the most important tool for a web developer; it will allow you to effectively optimize and use relational databases data. In this article we will focus on a popular open source database often used in conjunction with PHP, MySQL.

Who is this article for?

For web developers, architects and database developers and system administrators, familiar with MySQL. If you haven't used MySQL before, this article may not be of much use to you, but I will still try to be as informative and helpful as possible, even for those new to MySQL.

Backup first

I recommend doing the following steps based on the MySQL you are working with, but be sure to make a backup. If you don't have a database to work with, I will provide examples for creating your own database where appropriate.

Do MySQL backups simply, using the mysqldump utility:

$ mysqldump myTab > myTab-backup.sql You can learn more about mysqldump.

What makes a query slow?

Here is a general list of factors affecting the speed of requests and server load:

table indexes;
WHERE condition (and use internal functions MySQL, for example, such as IF or DATE);
sort by ORDER BY;
frequent repetition of identical requests;
type of data storage mechanism (InnoDB, MyISAM, Memory, Blackhole);
not using the Percona version;
server configurations (my.cnf / my.ini);
large data outputs (more than 1000 rows);
unstable connection;
distributed or cluster configuration;
Poor table design.

We will address all of these issues next. Also, install Percona if you are not already using this built-in replacement for standard MySQL - it will give a huge boost to your database power.

What are indexes?

Indexes are used in MySQL to search for rows with specified column values, such as with the WHERE clause. Without indexes, MySQL must, starting with the first row, read the entire table looking for relevant values. The larger the table, the higher the costs.

If a table has indexes on the columns that will be used in the query, MySQL will quickly find the locations of the information it needs without scanning the entire table. This is much faster than searching each line sequentially.

Unstable connection?

When your application connects to a database and a stable connection is configured, it will be used every time without having to open a new connection each time. This is the optimal solution for the work environment.

Reducing the frequent repetition of identical requests

The fastest and most efficient way I've found to do this is to create a storage of queries and their results using Memcached or Redis. With Memcache you can easily cache the result of your query, for example like this:

connect("localhost",11211); $cacheResult = $cache->get("key-name"); if($cacheResult)( //don't need a query $result = $cacheResult; ) else ( //run your query $mysqli = mysqli("p:localhost","username","password","table"); //add p: for long-term storage $sql = "SELECT * FROM posts LEFT JOIN userInfo using (UID) WHERE posts.post_type = "post" || posts.post_type = "article" ORDER BY column LIMIT 50"; $result = $mysqli->query($sql); $memc->set("key-name", $result->fetch_array(), MEMCACHE_COMPRESSED,86400); ) //Password $cacheResult to template $template->assign(" posts", $cacheResult); ?> Now a heavy query using LEFT JOIN will be executed only once every 86,400 seconds (that is, once a day), which will significantly reduce the load on the MySQL server, leaving resources for other connections.

Note: Add p: at the beginning of the MySQLi host argument to create a persistent connection.

Distributed or clustered configuration

When the data becomes more and more, and the speed of your service goes downhill, panic can take over you. Quick solution may become resource distribution (sharding). However, I don't recommend doing this unless you are very experienced, since distribution inherently makes data structures complex.

Weak Table Design

Creating database schemas is not difficult work if you follow the golden rules of working with constraints and knowing what will work. For example, storing images in BLOB cells is very confusing - storing the file path in a VARCHAR cell is a much better solution.

Ensuring proper design for desired use is paramount in creating your application. Store different data in different tables (for example, categories and articles) and make sure that many to one and one to many relationships can be easily associated with IDs. Using FOREIGN KEY in MySQL is ideal for storing cascading data in tables.

When creating a table, remember the following:

Create effective tables to solve your problems, rather than filling tables with unnecessary data and relationships.
Don't expect MySQL to execute your business logic or programming - the data should be ready for your scripting language to insert a row. For example, if you need to sort a list in random order, do it in PHP array without using ORDER BY from the MySQL arsenal.
Use UNIQUE index types for unique data sets and use ON DUPLICATE KEY UPDATE to keep the date updated, for example to know when a row was in last time changed.
Use the INT data type to store integers. If you don't specify a data type size, MySQL will do it for you.

Optimization Basics

To optimize effectively, we must apply three approaches to your application:

Analysis (logging slow queries, studying the system, query analysis and database design)
Execution requirements (how many users)
Technology limitations (hardware speed, incorrect use of MySQL)

The analysis can be done in several ways. First we'll look at the most obvious ways to look under the hood of your MySQL where queries are running. The very first optimization tool in your arsenal is EXPLAIN. If you add this statement before your SELECT query, the result of the query will be like this:

Columns, you see, save important information about the request. The columns you should pay the most attention to are possible_keys and Extra.

The possible_keys column will show the indexes that MySQL had access to in order to execute the query. Sometimes you need to assign indexes to make your query run faster. The Extra column will indicate whether an additional WHERE or ORDER BY was used. The most important thing to notice is whether Using Filesort is in the output.

What Using Filesort does is stated in the MySQL help:

MySQL has to do an extra pass to figure out how to return the rows in sorted form. This sort takes place over all rows according to the join type and stores the sort key and row pointer for all rows that match the WHERE clause. The keys are sorted and the rows are returned in the correct order.

An extra pass will slow down your application and should be avoided at all costs. Another critical result of Extra that we should avoid is Using temporary. It says that MySQL had to create a temporary table to execute the query. Obviously this is a terrible use of MySQL. In this case, the query result should be stored in Redis or Memcache and not be executed by users again.

To avoid the problem with Using Filesort we must make sure that MySQL uses INDEX. There are currently several keys specified in possible_keys from which to choose, but MySQL can only select one index for the final query. Also, indexes can be composed of several columns, and you can also enter hints for the MySQL optimizer, pointing to the indexes that you have created.

Index hinting

The MySQL optimizer will use statistics based on table queries to select the best index to run the query on. It operates quite simply, based on built-in statistical logic, so having several options, it does not always work right choice without the help of hinting. To ensure that the correct (or incorrect) key was used, use the FORCE INDEX, USE INDEX, and IGNORE INDEX keywords in your query. You can read more about index hinting in MySQL Help.

To display the table keys, use the SHOW INDEX command. You can specify several hints for use by the optimizer.

In addition to EXPLAIN, there is the DESCRIBE keyword. With DESCRIBE, you can view table information as follows:

Adding an index

To add indexes in MySQL, you must use the CREATE INDEX syntax. There are several types of indexes. FULLTEXT is used for full-text search, and UNIQUE is used to store unique data.

To add an index to your table, use the following syntax:

Mysql> CREATE INDEX idx_bookname ON `books` (bookname(10)); This will create an index on the books table that will use the first 10 letters of the column that stores book titles and is of type varchar. In this case, any search with a WHERE query on a book title with a match of up to 10 characters will produce the same result as scanning the entire table from start to finish.

Composite indices

Indexes have a big impact on the speed of query execution. Just assigning a master unique key is not enough - composite keys are a real use case in MySQL tuning, which sometimes requires some A/B checks using EXPLAIN.

For example, if we need to reference two columns in the condition of a WHERE clause, a composite key would be an ideal solution.

Mysql> CREATE INDEX idx_composite ON users(username, active); Once we have created a key based on the username column, which stores the user's name and active columns of type ENUM, which determines whether his account is active. Now everything is optimized for a query that will use WHERE to find a valid username with active account(active = 1).

How fast is your MySQL?

Let's turn on profiling to take a closer look at MySQL queries. This can be done by running the command set profiling=1, after which you need to run show profiles to view the result.

If you are using PDO, run the following code:

$db->query("set profiling=1"); $db->query("select headline, body, tags from posts"); $rs = $db->query("show profiles"); $db->query("set profiling=0"); // disable profiling after query execution $records = $rs->fetchAll(PDO::FETCH_ASSOC); // get profiling results $errmsg = $rs->errorInfo(); //Catch some errors here The same can be done using mysqli:

$db = new mysqli($host,$username,$password,$dbname); $db->query("set profiling=1"); $db->query("select headline, body, tags from posts"); if ($result = $db->query("SHOW profiles", MYSQLI_USE_RESULT)) ( while ($row = $result->fetch_row()) ( var_dump($row); ) $result->close(); ) if ($result = $db->query("show profile for query 1", MYSQLI_USE_RESULT)) ( while ($row = $result->fetch_row()) ( var_dump($row); ) $result->close( ); ) $db->query("set profiling=0"); This will return you profiled data containing the query execution time in the second element of the associative array.

Array(3) ( => string(1) "1" => string(10) "0.00024300" => string(17) "select headline, body, tags from posts" ) This query took 0.00024300 seconds to complete. It's pretty fast, so let's not worry. But when the numbers get big, we have to look deeper. Go to your application to practice with a working example. Check the DEBUG constant in your database configuration, and then start exploring the system by enabling profiling output using the var_dump or print_r functions. This way you can move from page to page in your application, getting convenient system profiling.

Full audit of your website database

To make a full audit of your requests, enable logging. Some website developers worry that logging greatly impacts execution and further slows down requests. However, practice shows that the difference is insignificant.

To enable logging in MySQL 5.1.6 use the log_slow_queries global variable, you can also mark a file for logging using the slow_query_log_file variable. This can be done by running the following query:

Set global log_slow_queries = 1; set global slow_query_log_file = /dev/slow_query.log; You can also specify this in the /etc/my.cnf or my.ini configuration files of your server.

After making changes, do not forget to restart the MySQL server with the necessary command, for example service mysql restart if you are using Linux.

IN MySQL versions After 5.6.1, the log_slow_queries variable is deprecated and slow_query_log is used instead. Also, for more convenient debugging, you can enable table output by setting the log_output variable to TABLE, however, this function is only available since MySQL 5.6.1.

Log_output = TABLE; log_queries_not_using_indexes = 1; long_query_time = 1; The long_query_time variable specifies the number of seconds after which the query is considered slow. The value is 10 and the minimum is 0. You can also specify milliseconds using a fraction; now I indicated one second. And now every request that will be executed for more than 1 second is recorded in the logs in the table.

Logging will be done in the mysql.slow_log and mysql.general_log tables of your MySQL database. To turn off logging, change log_output to NONE.

Logging on a production server

On a production server that serves clients, it is better to use logging only for a short period and to monitor the load so as not to create unnecessary load. If your service is overloaded and immediate attention is needed, try isolating the problem by running SHOW PROCESSLIST, or accessing the information_schema.PROCESSLIST table by running SELECT * FROM information_schema.PROCESSLIST;.

Logging all requests on a production server can give you a lot of information and be a good tool for research purposes when reviewing a project, but logs over long periods will not give you much useful information compared to logs over a period of up to 48 hours (try to track peak loads to have a chance to better investigate query execution).

Note: If you have a site that experiences waves of traffic and little to no traffic at times, such as a sports site during the off season, then use this information to build and study logging.

Logging multiple requests

Not only is it important to be aware of queries that take longer than a second to execute, but you also need to be aware of queries that are executed hundreds of times. Even if queries are executed quickly, in a busy system they can consume all resources.

This is why you always need to be on guard after making changes in a live project - this is the most critical time for the operation of any database.

Hot and cold cache

The number of requests and server load have a strong impact on execution and can also affect the execution time of requests. When developing, you should make it a rule that each request should take no more than a fraction of a millisecond (0.0xx or faster) to complete on a free server.

Using Memcache has a strong effect on the load on servers and will free up resources that execute requests. Make sure you are using Memcached effectively and have tested your application with a hot cache (loaded data) and a cold cache.

To avoid running on a production server with an empty cache, it is a good idea to have a script that collects all the necessary cache before starting the server, so that a large influx of clients does not reduce the system boot time.

Fixing Slow Queries

Now that logging is set up, you might have found some slow queries on your site. Let's fix them! As an example, I will show several common problems, and you can see the logic for fixing them.

If you haven't found a slow query yet, check your long_query_time settings if you use this logging method. Otherwise, after checking all your profiling queries (set profiling=1), make a list of queries that take more time than a fraction of a millisecond (0.000x seconds) and start from there.

Common problems

Here are the six most common problems I've found optimizing MySQL queries:

ORDER BY and filesort

Preventing filesort is sometimes not possible due to the ORDER BY clause. For optimization, store the result in Memcache, or perform sorting in your application logic.

Using ORDER BY with WHERE and LEFT JOIN

ORDER BY makes queries very slow. If possible, try not to use ORDER BY. If you need sorting, then use sorting by indexes.

Using ORDER BY on temporary columns

Just don't do it. If you need to combine results, do it in your application logic; Do not use filtering or sorting on the MySQL query temporary table. This requires a lot of resources.

Ignoring FULLTEXT index

Using LIKE is the best way to make full text search slow.

Unnecessarily selecting a large number of rows

Forgetting about LIMIT in your query can greatly increase the time it takes to fetch from the database, depending on the size of the tables.

Excessive use of JOIN instead of creating composite tables or views

When you use more than three or four LEFT JOIN operators in a single query, ask yourself: is everything correct here? Continue unless you have a good reason, for example - the query is not used often for output in the admin panel, or the output result can be cached. If you need to perform a query with a large number of table join operations, then it is better to think about creating composite tables from the necessary columns or using views.

We discussed the basics of optimization and the tools needed to get the job done. We examined the system using profiling and the EXPLAIN statement to see what was going on with the database and how we could improve the design.

We also looked at some examples and classic pitfalls you can fall into when using MySQL. By using index hinting, we can ensure that MySQL will select the necessary indexes, especially when multiple selections are made on the same table. To continue studying the topic, I advise you to look towards the Percona project.

In everyday work, you encounter fairly similar errors when writing queries.

In this article I would like to give examples of how NOT to write queries.

Select all fields
SELECT * FROM table
When writing queries, do not use a selection of all fields - "*". List only the fields you really need. This will reduce the amount of data fetched and sent. Also, don't forget about covering indexes. Even if you actually need all the fields in the table, it is better to list them. Firstly, it improves the readability of the code. When using an asterisk, it is impossible to know which fields are in the table without looking at it. Secondly, over time, the number of columns in your table may change, and if today there are five INT columns, then in a month TEXT and BLOB fields may be added, which will slow down the selection.
Requests in a cycle.
You need to clearly understand that SQL is a set-operating language. Sometimes programmers who are accustomed to thinking in terms of procedural languages find it difficult to shift their thinking to the language of sets. This can be done quite simply by adopting a simple rule - “never execute queries in a loop.” Examples of how this can be done:
1. Samples
$news_ids = get_list("SELECT news_id FROM today_news ");
while($news_id = get_next($news_ids))
$news = get_row("SELECT title, body FROM news WHERE news_id = ". $news_id);
The rule is very simple - the fewer requests, the better (although there are exceptions to this, like any rule). Don't forget about the IN() construct. The above code can be written in one query:
SELECT title, body FROM today_news INNER JOIN news USING(news_id)
2. Inserts
$log = parse_log();
while($record = next($log))
query("INSERT INTO logs SET value = ". $log["value"]);!}
It is much more efficient to concatenate and execute one query:
INSERT INTO logs (value) VALUES (...), (...)
3. Updates
Sometimes you need to update several rows in one table. If the updated value is the same, then everything is simple:
UPDATE news SET title="test" WHERE id IN (1, 2, 3).!}
If the value being changed is different for each record, then this can be done with the following query:
UPDATE news SET
title = CASE
WHEN news_id = 1 THEN "aa"
WHEN news_id = 2 THEN "bb" END
WHERE news_id IN (1, 2)
Our tests show that such a request is 2-3 times faster than several separate requests.
Performing operations on indexed fields
SELECT user_id FROM users WHERE blogs_count * 2 = $value
This query will not use the index, even if the blogs_count column is indexed. For an index to be used, no transformations must be performed on the indexed field in the query. For such requests, move the conversion functions to another part:
SELECT user_id FROM users WHERE blogs_count = $value / 2;
Similar example:
SELECT user_id FROM users WHERE TO_DAYS(CURRENT_DATE) - TO_DAYS(registered)<= 10;
Will not use an index on the registered field, whereas
SELECT user_id FROM users WHERE registered >= DATE_SUB(CURRENT_DATE, INTERVAL 10 DAY);
will.
Fetching rows only to count their number
$result = mysql_query("SELECT * FROM table", $link);
$num_rows = mysql_num_rows($result);
If you need to select the number of rows that satisfy a certain condition, use the SELECT COUNT(*) FROM table query rather than selecting all the rows just to count the number of rows.
Fetching extra rows
$result = mysql_query("SELECT * FROM table1", $link);
while($row = mysql_fetch_assoc($result) && $i< 20) {
…
}
If you only need n fetch rows, use LIMIT instead of discarding extra lines in the application.
Using ORDER BY RAND()
SELECT * FROM table ORDER BY RAND() LIMIT 1;
If the table has more than 4-5 thousand rows, then ORDER BY RAND() will work very slowly. It would be much more efficient to run two queries:
If the table has auto_increment" primary key and no gaps:
$rnd = rand(1, query("SELECT MAX(id) FROM table"));
$row = query("SELECT * FROM table WHERE id = ".$rnd);
Or:
$cnt = query("SELECT COUNT(*) FROM table");
$row = query("SELECT * FROM table LIMIT ".$cnt.", 1");
which, however, can also be slow if there are a very large number of rows in the table.
Using a large number of JOINs
SELECT
v.video_id
a.name,
g.genre
FROM
videos AS v
LEFT JOIN
link_actors_videos AS la ON la.video_id = v.video_id
LEFT JOIN
actors AS a ON a.actor_id = la.actor_id
LEFT JOIN
link_genre_video AS lg ON lg.video_id = v.video_id
LEFT JOIN
genres AS g ON g.genre_id = lg.genre_id
It must be remembered that when connecting tables one-to-many, the number of rows in the selection will increase with each next JOIN. For such cases, it is faster to split such a query into several simple ones.
Using LIMIT
SELECT… FROM table LIMIT $start, $per_page
Many people think that such a query will return $per_page of records (usually 10-20) and therefore will work quickly. It will work quickly for the first few pages. But if the number of records is large, and you need to execute a SELECT... FROM table LIMIT 1000000, 1000020 query, then to execute such a query, MySQL will first select 1000020 records, discard the first million and return 20. This may not be fast at all. There are no trivial ways to solve the problem. Many simply limit the number of available pages to a reasonable number. You can also speed up such queries by using covering indexes or third party solutions(for example sphinx).
Not using ON DUPLICATE KEY UPDATE
$row = query("SELECT * FROM table WHERE id=1");
If($row)
query("UPDATE table SET column = column + 1 WHERE id=1")
else
query("INSERT INTO table SET column = 1, id=1");
A similar construction can be replaced with one query, provided that there is a primary or unique key for the id field:
INSERT INTO table SET column = 1, id=1 ON DUPLICATE KEY UPDATE column = column + 1

Read

Sometimes when you create a query, you already know you only need one unique row in the table. You can form a sample by unique record. Or you can simply run a check to see if any number of records exist that satisfy your condition.

In such cases, using the LIMIT 1 method can significantly improve performance:

// is there data for people from California in the database? // NO, there are none!: $r = mysql_query("SELECT * FROM user WHERE state = "California""); if (mysql_num_rows($r) > 0) ( // ... other code ) // Positive answer $r = mysql_query("SELECT 1 FROM user WHERE state = "California" LIMIT 1"); if (mysql_num_rows($r) > 0) ( // ... other code )

2. Optimization of working with the database using query cache processing

Most MySQL servers support query caching. This is one of the most effective methods performance improvements that the database engine handles without problems.

When the same query is executed several times, the result will be obtained from the cache. Without having to process all the tables again. This speeds up the process significantly.

// if query cache is NOT supported $r = mysql_query("SELECT username FROM user WHERE signup_date >= CURDATE()"); // cache supported! $today_date = date("Y-m-d"); $r = mysql_query("SELECT username FROM user WHERE signup_date >= "$today_date"");

3. Indexing search fields

Indexes are not only intended to be assigned to primary or unique keys. If there are columns in a table that you are searching for, they should almost certainly be indexed.

As you can imagine, this rule also applies to parts of the search string: such as “last_name LIKE ‘%’”. When searching at the beginning of a row, MySQL can use indexing on that column.

You also need to understand what kinds of queries cannot use regular indexes. For example, when searching for a word (for example, "WHERE post_content LIKE '%tomato%""), using a regular index will not give you anything. In this case, you will be better off using MySQL full match search or creating your own index.

4. Indexing and using columns of the same type when joining

If your application contains many join queries, you need to ensure that the columns in both tables you are joining are indexed. This affects the optimization of MySQL's internal join operations.

Additionally, the columns that are merged must be the same type. For example, if you join a column of type DECIMAL from one table and a column of type INT from another, MySQL will not be able to use at least one of the indexes.

Even the character encoding must be the same type for the corresponding rows of the columns being joined.

// looking for companies located in my state $r = mysql_query("SELECT company_name FROM users LEFT JOIN companies ON (users.state = companies.state) WHERE users.id = $user_id"); // both state columns must be indexed // and they both must be the same type and have the same character encoding for the corresponding rows // or MySQL will have to scan the entire table

5. If possible, avoid using SELECT * queries

The more data in a table is processed during a query, the slower the query itself runs. Time is spent on disk operations. Additionally, when the database server is shared with the web server, there are delays in transferring data between the servers.

// unwanted query $r = mysql_query("SELECT * FROM user WHERE user_id = 1"); $d = mysql_fetch_assoc($r); echo "Welcome ($d["username"])"; // it is better to use the following code: $r = mysql_query("SELECT username FROM user WHERE user_id = 1"); $d = mysql_fetch_assoc($r); echo "Welcome ($d["username"])";

6. Please do not use the ORDER BY RAND() sort method

This is one of those tricks that seems good at first, and many new programmers fall for it. You can’t even imagine what kind of trap you’re setting for yourself as soon as you start using this filter in queries.

If you really need to sort some strings in your search results, then there is much more effective ways do it. Let's say you need to add additional code to the request, but because of this trap you will not be able to do this, which will lead to a decrease in the efficiency of data processing as the database grows in size.

The problem is that MySQL will perform a RAND() operation (which uses server computing resources) before sorting for each row in the table. In this case, only one row will be selected.

// what code should NOT be used: $r = mysql_query("SELECT username FROM user ORDER BY RAND() LIMIT 1"); // it would be more correct to use the following code: $r = mysql_query("SELECT count(*) FROM user"); $d = mysql_fetch_row($r); $rand = mt_rand(0,$d - 1); $r = mysql_query("SELECT username FROM user LIMIT $rand, 1");

This way, you will select fewer search results, after which you can apply the LIMIT method described in point 1.

7. Use ENUM columns instead of VARCHAR

ENUM columns are very compact and therefore fast to process. Inside the database, their contents are stored in TINYINT format, but they can contain and display any values. Therefore, it is very convenient to set certain fields in them.

If you have a field that contains several different values of the same type, then it is better to use ENUM instead of VARCHAR columns. For example, this could be a "Status" column, which only contains values such as "active", "inactive", "pending", " expired" etc.

It is even possible to set a scenario in which MySQL will “prompt” you to change the table structure. When you have a VARCHAR field, the system may automatically recommend changing the column format to ENUM. This can be done by calling the PROCEDURE ANALYSE() function.

Use UNSIGNED INT fields to store IP addresses

Many developers create VARCHAR(15) fields for this purpose, while IP addresses could be stored in the database as decimal numbers. INT fields provide the ability to store up to 4 bytes of information, and at the same time they can be set to a fixed field size.

You must make sure that your columns are in UNSIGNED INT format, since the IP address is specified in 32 bits.

In queries, you can use the INET_ATON() parameter to convert IP addresses to decimal numbers, and INET_NTOA() is the opposite. PHP has other similar functions long2ip() and ip2long().

8. Vertical sectioning (separation)

Vertical partitioning is the process of dividing a table structure vertically to optimize database performance.

Example 1: Let's say you have a table of users that contains, among other things, their home addresses. This information used very rarely. You can split your table and store the address data in another table.

This way, your main users table will noticeably decrease in size. And as you know, smaller tables are processed faster.

Example 2: You have a field in your table “last_login” (last login). It is updated every time the user logs in with their username. But every change to a table is written to the query cache for that table, which is stored on disk. You can move this field to another table to reduce the number of calls to your main users table.

However, you must be sure that both tables resulting from partitioning will not be used equally often in the future. Otherwise, it will significantly reduce performance.

9. Smaller columns are faster

For database engines, disk space is perhaps the biggest bottleneck. Therefore, storing information more compactly is generally beneficial from a performance point of view. This reduces the number of disk accesses.

The MySQL Docs outlines a number of storage requirements different types data. If the table is not expected to contain too many a large number of records, then there is no reason to store the primary key in fields of type INT, MEDIUMINT, SMALLINT, and in some cases even TINYINT. If you do not need time components (hours: minutes) in the date format, then use fields of type DATE instead of DATETIME.

However, make sure that you leave yourself enough room for development in the future. Otherwise, at some point something like a collapse may occur.

From the author: one of my friends decided to optimize his car. First, he took off one wheel, so he cut off the roof, then the engine... In general, now he walks. These are all the consequences of the wrong approach! Therefore, in order for your DBMS to continue to run, MySQL optimization must be done correctly.

When to optimize and why?

It’s not worth going into the server settings and changing parameter values once again (especially if you don’t know how this could end). If we consider this topic from the “bell tower” of improving the performance of web resources, then it is so extensive that an entire scientific publication in 7 volumes needs to be devoted to it.

But I clearly don’t have that kind of patience as a writer, and neither do you as a reader. We will do it simpler and try to delve only slightly into the thicket of optimization of the MySQL server and its components. By using optimal installation All DBMS parameters can achieve several goals:

Increase the speed of query execution.

Improve overall server performance.

Reduce the waiting time for resource pages to load.

Reduce consumption of hosting server capacity.

Reduce the amount of disk space consumed.

We will try to break down the entire topic of optimization into several points, so that it is more or less clear what makes the “pot boil.”

Why set up a server

In MySQL, performance optimization should start from the server. First of all, you should speed up its operation and reduce the time it takes to process requests. A universal means to achieve all of the above goals is to enable caching. Don't know “what is it”? Now I’ll explain everything.

If caching is enabled on your server instance, the MySQL system automatically “remembers” the query entered by the user. And the next time it is repeated, this query result (for sampling) will not be processed, but taken from the system memory. It turns out that in this way the server “saves” time on issuing a response, and as a result, the site’s response speed increases. This also applies to the overall download speed.

In MySQL, query optimization is applicable to those engines and CMS that operate on the basis of this DBMS and PHP. In this case, the code written in a programming language, to generate a dynamic web page, requests some of its structural parts and contents (records, archives and other taxonomies) from the database.

Thanks to enabled caching in MySQL, executing queries to the DBMS server is much faster. Due to this, the loading speed of the entire resource as a whole increases. And this has a positive effect on both the user experience and the site’s position in search results.

Enable and configure caching

But let's get back from “boring” theory to interesting practice. We will continue further optimization of the MySQL database by checking the caching status on your database server. To do this using special request we will output the values of all system variables:

It's a completely different matter.

Let's do it small review the obtained values, which will be useful to us for optimizing MySQL databases:

have_query_cache – the value indicates whether query caching is “ON” or not.

query_cache_type – displays the active cache type. We need the value "ON". This indicates that caching is enabled for all types of selection (SELECT command). Except for those that use the SQL_NO_CACHE parameter (prohibits saving information about this query).

We have all the settings set correctly.

We measure the cache for indexes and keys

Now you need to check how much RAM is allocated for indexes and keys. It is recommended to set this parameter, important for optimizing the MySQL database, to 20-30% of the amount of RAM available to the server. For example, if 4 “hectares” are allocated for a DBMS instance, then feel free to set 32 “meters”. But it all depends on the characteristics of a particular database and its structure (types) of tables.

To set the parameter value, you need to edit the contents of the my.ini configuration file, which in Denver is located at the following path: F:\Webserver\usr\local\mysql-5.5

Open the file using Notepad. Then we find the key_buffer_size parameter in it and set the optimal size for your PC system (depending on the “hectares” of RAM). After this, you need to restart the database server.

The DBMS uses several additional subsystems (low-level), and all their main settings are also specified in this file configurations. Therefore, if you need to optimize MySQL InnoDB, then welcome here. We will study this topic in more detail in one of our next materials.

Measuring the level of indexes

The use of indexes in tables significantly increases the speed of processing and generating a DBMS response to an entered query. MySQL constantly “measures” the level of index and key usage in each database. For getting given value use query:

SHOW STATUS LIKE "handler_read%"

In the resulting result, we are interested in the value in the Handler_read_key line. If the number indicated there is small, then this indicates that indexes are almost never used in this database. And this is bad (like ours).