Mysql

MySQL中多個列的嵌套聚合和分組

  • August 18, 2015

考慮下product表(高度精簡):

`id` int AUTO_INCREMENT
`category_id` int
`subcategory_id` int
`vendor_id` int
`price` decimal(6,2)
`inserted_at` timestamp

對於給定的類別 ID,我正在嘗試檢索包含每個子類別的最新價格最低的供應商的列表。“最新”是指供應商可能對給定的類別 ID/子類別 ID 組合有多個價格,因此只應使用該類別 ID/子類別 ID/供應商 ID 的最新插入價格。如果 2 個或更多供應商的價格之間存在平局,則應使用最低的 id 作為平局者。

例如,使用此數據:

id | category_id | subcategory_id | vendor_id | price | inserted_at
---------------------------------------------------------------------------
1  | 1           | 2              | 3         | 16.00 | 2015-07-23 04:00:00  
2  | 1           | 1              | 2         |  9.00 | 2015-07-26 08:00:00 
3  | 1           | 2              | 4         | 16.00 | 2015-08-02 10:00:00 
4  | 1           | 1              | 1         |  7.00 | 2015-08-04 11:00:00
5  | 1           | 1              | 1         | 11.00 | 2015-08-09 16:00:00

因此,首先找到每個子類別/供應商組合的最新價格(行將price=7.00被刪除,因為它不是該子類別中該供應商的最新價格)。那麼對於子類別 1,最低價格是 9(因此 vendor_id = 2),對於子類別 2,最低價格是 16(兩個供應商並列 ()ids 3 和 4),所以我們選擇具有最低 vendor_id = 3 的那個)。

我希望得到以下結果category_id = 1

subcategory_id | vendor_id | price
----------------------------------
1              | 2         |  9.00
2              | 3         | 16.00

這是我到目前為止所擁有的。我覺得它已經開始失控了,這甚至沒有考慮到 2 個或更多供應商的價格之間的聯繫。

SELECT c.subcategory_id, c.vendor_id, c.price
FROM products AS c
JOIN
(
   SELECT MIN(a.price) AS min_price,
          a.subcategory_id
   FROM products AS a
   JOIN
   (
       SELECT  MAX(`inserted_at`) AS latest_price_time,
               vendor_id,
               subcategory_id
       FROM    products
       WHERE   category_id = 1
       GROUP BY vendor_id, subcategory_id
   ) AS b
   ON a.inserted_at = b.latest_price_time AND a.vendor_id = b.vendor_id AND a.subcategory_id = b.subcategory_id
   WHERE a.category_id = 1
   GROUP BY a.subcategory_id
) AS d
ON c.price = d.min_price AND c.subcategory_id = d.subcategory_id
WHERE c.category_id = 1

在我走得更遠之前,我想看看是否有更簡單的方法。當涉及到對其他分組/聚合的分組/聚合結果時,是否有一種方法可以為我提供最佳性能(最重要)和/或更易於閱讀(不太重要)?

這是一個“greatest-n-per-group”查詢,在 MySQL 中編寫非常複雜 - 首先是由於缺少視窗函式,其次是因為您有 2 個 best-n-per-group 規範,首先是每個組的最新日期供應商和第二為每個子類別的最低價格。

這是一種相當複雜的編寫方式:

SELECT 
   ps.subcategory_id,  ps.vendor_id,  ps.price   -- , p.inserted_at
FROM
   ( SELECT DISTINCT subcategory_id
     FROM product
     WHERE category_id = 1
   ) AS s
 JOIN
   product AS ps
     ON  ps.category_id = 1
     AND ps.subcategory_id = s.subcategory_id
     AND ps.id =
         ( SELECT psv.id
           FROM
               ( SELECT DISTINCT subcategory_id, vendor_id
                 FROM product
                 WHERE category_id = 1
               ) AS sv
             JOIN
               product AS psv
               ON  psv.category_id = 1
               AND psv.subcategory_id = sv.subcategory_id
               AND psv.vendor_id = sv.vendor_id
               AND psv.inserted_at =
                   ( SELECT pi.inserted_at
                     FROM product AS pi
                     WHERE pi.category_id = 1
                       AND pi.subcategory_id = sv.subcategory_id
                       AND pi.vendor_id = sv.vendor_id
                     ORDER BY pi.inserted_at DESC
                     LIMIT 1
                   )
           WHERE sv.subcategory_id = s.subcategory_id
           ORDER BY psv.price,
                    psv.vendor_id
           LIMIT 1
         ) ;

SQLfiddle-2中測試。一個計劃也不錯,有一個適當的索引(category_id, subcategory_id, vendor_id, inserted_at)

它可能不是最有效的,我肯定會嘗試使用索引(參見 Fiddle,我還有一個索引。它可能不是很有用,但在更大的表上測試 t。)

( SQLfile-1中查詢的第一個版本)

引用自:https://dba.stackexchange.com/questions/111334