Mysql

JOIN 子句中 groupwise MAX 的性能問題

  • February 14, 2019

問題

我的應用程序中有一些資產不時以非同步方式更新。

我將在這裡使用的範例是Vehicles. 有兩個表:

  • **Vehicles:**保存有關車輛本身的資訊
  • **VehicleUpdates:**保存有關該車輛發生的所有更新的資訊。

表結構的相關部分是:

CREATE TABLE `Vehicles` (
 `id` varchar(50) NOT NULL,
 `organizationId` varchar(50) NOT NULL,
 `plate` char(7) NOT NULL,
 `vehicleInfo` json DEFAULT NULL,
 `createdAt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
 `updatedAt` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
 PRIMARY KEY (`id`),
 UNIQUE KEY `unq_Vehicles_orgId_plate_idx` (`organizationId`,`plate`) USING BTREE,
 KEY `Vehicles_createdAt_idx` (`createdAt`),
);

CREATE TABLE `VehicleUpdates` (
 `id` varchar(50) NOT NULL,
 `organizationId` varchar(50) NOT NULL,
 `vehiclePlate` char(7) NOT NULL,
 `status` varchar(15) NOT NULL,
 `createdAt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
 `updatedAt` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
 PRIMARY KEY (`id`),
 KEY `VehicleUpdates_orgId_vhclPlt_createdAt_idx` (`organizationId`,`vehiclePlate`,`createdAt`) USING BTREE
);

現在我有一個新要求,我必須在車輛資訊本身的同時返回最新的更新資訊

Groupwise MAX 解決方案

經過一番探勘,我找到了這篇部落格文章。然後我嘗試使用建議的“不相關子查詢”方法,因為它被認為是最好的方法:

不相關子查詢

SELECT  vu1.*
   FROM VehicleUpdates AS vu1
   JOIN
     ( SELECT  vehiclePlate, organizationId, MAX(createdAt) AS createdAt
           FROM  VehicleUpdates
           GROUP BY organizationId, vehiclePlate
     ) AS vu2 USING (organizationId, vehiclePlate, createdAt);

275 ms此查詢在我的生產數據庫中的平均執行時間為。

我認為這太慢了,所以我決定試一試“LEFT JOIN”方法:

啞彈:左加入

SELECT  vu1.*
   FROM  VehicleUpdates AS vu1
   LEFT JOIN  VehicleUpdates AS vu2 ON vu1.organizationId = vu2.organizationId and vu1.vehiclePlate = vu2.vehiclePlate
     AND  vu2.createdAt > vu1.createdAt
   WHERE  vu2.id IS NULL;

這個表現更好,平均執行時間為40 ms. 對我來說已經足夠好了。

然後我需要將此查詢作為對Vehicles錶的查詢的一部分執行。

目前結果

以下查詢將滿足我的要求:

SELECT  v.*, vu1.*
FROM  Vehicles AS v
LEFT JOIN VehicleUpdates AS vu1 
   ON v.plate = vu1.vehiclePlate 
       AND v.organizationId = vu1.organizationId
LEFT JOIN  VehicleUpdates AS vu2 
   ON vu1.organizationId = vu2.organizationId 
       AND vu1.vehiclePlate = vu2.vehiclePlate
       AND  vu2.createdAt > vu1.createdAt
WHERE  vu2.id IS NULL;

問題是它需要20 s(!)才能執行。大問題!

但我從不對生產進行全表掃描。查詢總是被限制為單個organizationId並且它是分頁的,所以我每頁最多返回 100 行,所以我執行了以下查詢:

SELECT  v.*, vu1.*
FROM  Vehicles AS v
LEFT JOIN VehicleUpdates AS vu1 
   ON v.plate = vu1.vehiclePlate 
       AND v.organizationId = vu1.organizationId
LEFT JOIN  VehicleUpdates AS vu2 
   ON vu1.organizationId = vu2.organizationId 
       AND vu1.vehiclePlate = vu2.vehiclePlate
       AND  vu2.createdAt > vu1.createdAt
WHERE vu2.id IS NULL
   and v.organizationId = '<some organization ID>'
LIMIT 100;

現在它需要從750 ms11 s執行,具體取決於與多少車輛相關聯。還不夠好。

執行explain上面的查詢讓我:

"select_type" | "table" | "type" | "possible_keys"                                        | "key"                                      | "key_len" | "ref"                               | "rows" | "filtered" | "Extra"
SIMPLE        | v       | ref    | unq_Vehicles_orgId_plate_idx,Vehicles_orgId_status_idx | unq_Vehicles_orgId_plate_idx               | "202"     | const                               | 30     | 100        |
SIMPLE        | vu1     | ALL    |                                                        |                                            |           |                                     | 263171 | 100        | Using where; Using join buffer (Block Nested Loop)
SIMPLE        | vu2     | ref    | VehicleUpdates_orgId_vhclPlt_createdAt_idx             | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "173"     | vu1.organizationId,vu1.vehiclePlate | 10     | 10         | Using where; Not exists; Using index

令我印象深刻的是該vu1表正在執行全表掃描,即使最左邊的表Vehicles正在使用索引列進行過濾,該索引列organizationId也在VehicleUpdates.

所以我決定再試一次“不相關的子查詢”並執行:

SELECT  v.*, vu.*
FROM  Vehicles AS v
LEFT JOIN (
   SELECT  vu1.*
       FROM VehicleUpdates AS vu1
       JOIN
         ( SELECT  vehiclePlate, organizationId, MAX(createdAt) AS createdAt
               FROM  VehicleUpdates
               GROUP BY organizationId, vehiclePlate
         ) AS vu2 USING (organizationId, vehiclePlate, createdAt)
) AS vu 
   ON vu.organizationId = v.organizationId 
       AND vu.vehiclePlate = v.plate
WHERE v.organizationId = '<SOME ORGANIZATION ID>'
LIMIT 100;

這次執行時間從1.4 s到不等,具體取決於給定13 s的表中有多少條目。我的申請不能接受。Vehicles``organizationId

跑步explain讓我:

| "select_type" | "table"        | "type" | "possible_keys"                            | "key"                                      | "key_len" | "ref"                                             | "rows" | "filtered" | "Extra"
| PRIMARY       | v              | ALL    |                                            |                                            |           |                                                   | 14456  | 100        |
| PRIMARY       | <derived3>     | ALL    |                                            |                                            |           |                                                   | 29289  | 100        | Using where
| PRIMARY       | vu1            | ref    | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "327"     | vu2.organizationId,vu2.vehiclePlate,vu2.createdAt | 1      | 100        | Using where
| DERIVED       | VehicleUpdates | range  | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "323"     |                                                   | 29289  | 100        | Using index for group-by

目前結果 - 更新

我注意到添加特定organizationId子句可以提高性能。

左連接

跑步:

SELECT  v.*, vu1.*
FROM  Vehicles AS v
LEFT JOIN VehicleUpdates AS vu1 
   ON v.plate = vu1.vehiclePlate 
       AND v.organizationId = vu1.organizationId
       AND vu1.organizationId = '<SOME ORGANIZATION ID>' -- <--------
LEFT JOIN  VehicleUpdates AS vu2 
   ON vu1.organizationId = vu2.organizationId 
       AND vu1.vehiclePlate = vu2.vehiclePlate
       AND vu2.createdAt > vu1.createdAt
WHERE vu2.id IS NULL
   and v.organizationId = '<SOME ORGANIZATION ID>' -- <-----------
LIMIT 100;

65 ms我得到的執行時間從(可接受)到(不可接受)不等2.5 s

不相關查詢

organizationId = '<SOME ORGANIZATION ID>'在“主”查詢和連接外部子查詢中放置一個子句:

SELECT  v.*, vu.*
FROM  Vehicles AS v
LEFT JOIN (
   SELECT  vu1.*
       FROM VehicleUpdates AS vu1
       JOIN
         ( SELECT  vehiclePlate, organizationId, MAX(createdAt) AS createdAt
               FROM  VehicleUpdates
               GROUP BY organizationId, vehiclePlate
         ) AS vu2 ON vu1.organizationId = vu2.organizationId
               and vu1.vehiclePlate = vu2.vehiclePlate
               and vu1.createdAt = vu2.createdAt
       WHERE organizationId = '<SOME ORGANIZATION ID>' -- <--------
   ) AS vu 
       ON vu.organizationId = v.organizationId 
           AND vu.vehiclePlate = v.plate
where
   v.organizationId = '<SOME ORGANIZATION ID>' -- <---------
LIMIT 100;

450 ms我得到的執行時間從(不可接受)到(不可接受)不等900 ms

organizationId = '<SOME ORGANIZATION ID>'在“主”查詢和連接內部子查詢中放置一個子句:

SELECT  v.*, vu.*
FROM  Vehicles AS v
LEFT JOIN (
   SELECT  vu1.*
       FROM VehicleUpdates AS vu1
       JOIN
         ( SELECT  vehiclePlate, organizationId, MAX(createdAt) AS createdAt
               FROM  VehicleUpdates
               WHERE organizationId = '<SOME ORGANIZATION ID>' -- <--------
               GROUP BY organizationId, vehiclePlate
         ) AS vu2 ON vu1.organizationId = vu2.organizationId
               and vu1.vehiclePlate = vu2.vehiclePlate
               and vu1.createdAt = vu2.createdAt
   ) AS vu 
       ON vu.organizationId = v.organizationId 
           AND vu.vehiclePlate = v.plate
where
   v.organizationId = '<SOME ORGANIZATION ID>' -- <---------
LIMIT 100;

225 ms我得到的執行時間從(可接受)到(不可接受)不等500 ms


有沒有更好的方法來處理這樣的查詢?

數據庫資訊

  • MySQL
  • 版本:5.7.23-log (Amazon RDS)
  • 引擎:InnoDB

我覺得自己好傻!才發現問題。

出於某種原因,Vehicles並且在生產中VehicleUpdates具有不同的字元集utf8mb4utf8,分別)。

這就是EXPLAIN“不相關子查詢”方法的結果在其步驟之一中進行全表掃描的原因:

| "select_type" | "table"        | "type" | "possible_keys"                            | "key"                                      | "key_len" | "ref"                                             | "rows" | "filtered" | "Extra"
| PRIMARY       | v              | ALL    |                                            |                                            |           |                                                   | 14456  | 100        |
| PRIMARY       | <derived3>     | ALL    |                                            |                                            |           |                                                   | 29289  | 100        | Using where
| PRIMARY       | vu1            | ref    | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "327"     | vu2.organizationId,vu2.vehiclePlate,vu2.createdAt | 1      | 100        | Using where
| DERIVED       | VehicleUpdates | range  | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "323"     |                                                   | 29289  | 100        | Using index for group-by

轉換VehicleUpdates為後utf8mb4EXPLAIN結果為:

| "select_type" | "table"        | "type" | "possible_keys"                            | "key"                                      | "key_len" | "ref"                                           | "rows" | "filtered" | "Extra"
| PRIMARY       | v              | ref    | Vehicles_orgId_status_idx                  | Vehicles_orgId_status_idx                  | "202"     | const                                           | 188    | 100        |
| PRIMARY       | <derived2>     | ref    | <auto_key1>                                | <auto_key1>                                | "230"     | v.plate,v.organizationId                        | 10     | 100        |
| PRIMARY       | vu1            | ref    | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "234"     | v.organizationId,vu2.vehiclePlate,vu2.createdAt | 1      | 100        | Using where
| DERIVED       | VehicleUpdates | ref    | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "202"     | const                                           | 24090  | 100        | Using where; Using index

同樣,“LEFT JOIN”方法執行計劃從:

| "select_type" | "table" | "type" | "possible_keys"                                        | "key"                                      | "key_len" | "ref"                               | "rows" | "filtered" | "Extra"
| SIMPLE        | v       | ref    | unq_Vehicles_orgId_plate_idx,Vehicles_orgId_status_idx | unq_Vehicles_orgId_plate_idx               | "202"     | const                               | 30     | 100        |
| SIMPLE        | vu1     | ALL    |                                                        |                                            |           |                                     | 263171 | 100        | Using where; Using join buffer (Block Nested Loop)
| SIMPLE        | vu2     | ref    | VehicleUpdates_orgId_vhclPlt_createdAt_idx             | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "173"     | vu1.organizationId,vu1.vehiclePlate | 10     | 10         | Using where; Not exists; Using index

到:

| "select_type" | "table" | "type" | "possible_keys"                            | "key"                                      | "key_len" | "ref"                               | "rows" | "filtered" | "Extra"
| SIMPLE        | v       | ref    | Vehicles_orgId_status_idx                  | Vehicles_orgId_status_idx                  | "202"     | const                               | 188    | 100        |
| SIMPLE        | vu1     | ref    | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "230"     | v.organizationId,v.plate            | 9      | 100        |
| SIMPLE        | vu2     | ref    | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "230"     | vu1.organizationId,vu1.vehiclePlate | 9      | 10         | Using where; Not exists; Using index

因此,現在不同查詢的性能是:

左連接

SELECT v.*, vu1.*
FROM  Vehicles AS v
LEFT JOIN VehicleUpdates AS vu1 
   ON v.plate = vu1.vehiclePlate 
       AND v.organizationId = vu1.organizationId
LEFT JOIN  VehicleUpdates AS vu2 
   ON vu1.organizationId = vu2.organizationId 
       AND vu1.vehiclePlate = vu2.vehiclePlate
       AND vu2.createdAt > vu1.createdAt
where v.organizationId = '<SOME ORGANIZATION ID>'
   AND vu2.id IS NULL
LIMIT 100;

總是在下面跑50 ms

內部查詢中沒有 WHERE 的不相關子查詢:

SELECT v.*, vu1.* 
FROM  Vehicles AS v
LEFT JOIN (
       VehicleUpdates AS vu1
       JOIN
         ( SELECT  vehiclePlate, organizationId, MAX(createdAt) AS createdAt
               FROM  VehicleUpdates
               GROUP BY organizationId, vehiclePlate
         ) AS vu2 ON vu1.organizationId = vu2.organizationId
               and vu1.vehiclePlate = vu2.vehiclePlate
               and vu1.createdAt = vu2.createdAt
   )
       ON vu1.organizationId = v.organizationId 
           AND vu1.vehiclePlate = v.plate
where
   v.organizationId = '<SOME ORGANIZATION ID>'
LIMIT 100;

平均在300 ms.

內部查詢中帶​​有 WHERE 的不相關子查詢:

SELECT v.*, vu1.* 
FROM  Vehicles AS v
LEFT JOIN (
       VehicleUpdates AS vu1
       JOIN
         ( SELECT  vehiclePlate, organizationId, MAX(createdAt) AS createdAt
               FROM  VehicleUpdates
               WHERE organizationId = '<SOME ORGANIZATION ID>' -- <-- HERE
               GROUP BY organizationId, vehiclePlate
         ) AS vu2 ON vu1.organizationId = vu2.organizationId
               and vu1.vehiclePlate = vu2.vehiclePlate
               and vu1.createdAt = vu2.createdAt
   )
       ON vu1.organizationId = v.organizationId 
           AND vu1.vehiclePlate = v.plate
where
   v.organizationId = '<SOME ORGANIZATION ID>'
LIMIT 100;

也總是在下面跑50 ms


我決定堅持使用“LEFT JOIN”方法,因為它允許我創建一個視圖來表示內部查詢,這樣我就可以簡化返輸入輛的查詢。

我不能用“不相關子查詢”來做到這一點,因為它需要WHERE organizationId = '<ORGANIZATION ID>'內部查詢中的子句,所以視圖不會那麼有效。

引用自:https://dba.stackexchange.com/questions/229643