Innodb

如何使用 ORDER BY 優化涉及 GROUP_CONCAT 函式的 SQL 查詢

  • April 18, 2019

我編寫了以下 SQL 查詢來對結果進行分組,然後按日期對每個組中的結果進行排序:

SELECT 
origin,
provider,
SUBSTRING_INDEX(GROUP_CONCAT(price
           ORDER BY departure_date
           SEPARATOR ','),
       ',',
       10)
FROM
   `flights`
WHERE
   origin = 'LHR' AND destination IN ('FCO', 'CDG', 'AMS')
       AND cabin_type = 'ECONOMY'
GROUP BY origin , provider;

第二個查詢:

SELECT 
destination,
provider,
SUBSTRING_INDEX(GROUP_CONCAT(price
           ORDER BY departure_date
           SEPARATOR ','),
       ',',
       10)
FROM
   `flights`
WHERE
   origin = 'LHR' AND destination IN ('FCO', 'CDG', 'AMS')
       AND cabin_type = 'ECONOMY'
GROUP BY destination, provider;

InnoDB 表flights非常簡單:

+----------------+---------------+------+-----+-----------+----------------+
| Field          | Type          | Null | Key | Default   | Extra          |
+----------------+---------------+------+-----+-----------+----------------+
| id             | int(11)       | NO   | PRI | NULL      | auto_increment |
| origin         | varchar(255)  | NO   | MUL | NULL      |                |
| destination    | varchar(255)  | NO   | MUL | NULL      |                |
| departure_date | date          | NO   | MUL | NULL      |                |
| price          | decimal(14,2) | NO   |     | NULL      |                |
| provider       | varchar(255)  | NO   |     | airnet    |                |
| cabin_type     | varchar(255)  | YES  |     | ECONOMY   |                |
| created_at     | datetime      | YES  |     | NULL      |                |
| updated_at     | datetime      | YES  |     | NULL      |                |
+----------------+---------------+------+-----+-----------+----------------+

我在列上有索引:origin, destination, departure_date. 大約。表中有 1000 萬條記錄。

上述查詢大約需要 200 到 600 毫秒。我嘗試使用分析器,發現大部分時間都花在了Creating sort index(大約 98% 的整個查詢時間)上。

有沒有辦法優化這個查詢。我在 上嘗試了複合索引origin, provider, departure_date,但這似乎沒有幫助。也許應該對配置進行更多調整,如果是這樣,必須考慮哪些配置變數?

機器有 4 GB 的 RAM、2 個 vCPU 並且專用於 DB(10.3.13-Maria DB)。

測試

WITH cte AS ( SELECT origin,
                    provider,
                    price,
                    departure_date,
                    ROW_NUMBER() OVER (PARTITION BY /* origin, */ provider
                                       ORDER BY departure_date) rn 
             FROM flights
             WHERE origin = 'LHR' 
               AND destination IN ('FCO', 'CDG', 'AMS')
               AND cabin_type = 'ECONOMY' )
SELECT origin,
      provider,
      GROUP_CONCAT(price
                   ORDER BY departure_date
                   SEPARATOR ',')
FROM cte
WHERE rn <= 10
GROUP BY origin, 
        provider;

**更新:(**程式碼草案,見評論)

WITH 
cte1 AS ( SELECT origin,
                provider,
                price,
                departure_date,
                ROW_NUMBER() OVER (PARTITION BY provider
                                   ORDER BY departure_date) rn 
         FROM flights
         WHERE origin = 'LHR' 
           AND destination ='FCO'
           AND cabin_type = 'ECONOMY' ),
cte2 AS ( SELECT origin,
                provider,
                price,
                departure_date
         from cte1
         WHERE rn <= 10),
cte3 AS ( SELECT origin,
                provider,
                price,
                departure_date,
                ROW_NUMBER() OVER (PARTITION BY provider
                                   ORDER BY departure_date) rn 
         FROM flights
         WHERE origin = 'LHR' 
           AND destination = 'CDG'
           AND cabin_type = 'ECONOMY' ),
cte4 AS ( SELECT origin,
                provider,
                price,
                departure_date
         from cte3
         WHERE rn <= 10),
cte5 AS ( SELECT origin,
                provider,
                price,
                departure_date,
                ROW_NUMBER() OVER (PARTITION BY provider
                                   ORDER BY departure_date) rn 
         FROM flights
         WHERE origin = 'LHR' 
           AND destination = 'AMS'
           AND cabin_type = 'ECONOMY' ),
cte6 AS ( SELECT origin,
                provider,
                price,
                departure_date
         from cte5
         WHERE rn <= 10),
cte7 AS ( SELECT origin,
                provider,
                price,
                departure_date
         FROM cte2
         UNION ALL
         SELECT origin,
                provider,
                price,
                departure_date
         FROM cte4
         UNION ALL
         SELECT origin,
                provider,
                price,
                departure_date
         FROM cte6 )
cte8 as ( SELECT origin,
                provider,
                price,
                departure_date,
                ROW_NUMBER() OVER (PARTITION BY provider
                                       ORDER BY departure_date) rn 
          FROM cte7)
SELECT origin,
      provider,
      GROUP_CONCAT(price
                   ORDER BY departure_date
                   SEPARATOR ',')
FROM cte8
WHERE rn <= 10
GROUP BY origin, 
        provider;        
WHERE
origin = 'LHR' AND destination IN ('FCO', 'CDG', 'AMS')
   AND cabin_type = 'ECONOMY'

乞求以下任一複合索引:

INDEX(origin, cabin_type, destination)
INDEX(cabin_type, origin, destination)

origin``destination看起來他們可以CHAR(3)代替VARCHAR(255).

cabin_type可能是一個ENUM,或者至少是一個更短的VARCHAR.

引用自:https://dba.stackexchange.com/questions/233062