Substr() 與 Group by 子句

September 12, 2017

我一直在努力解決這裡顯示的問題，但出現了一個問題
這是我創建的架構、我的問題陳述和我的測試數據：
架構：學生
CREATE TABLE Students (group_id text, sql_quotient float);
INSERT INTO Students(group_id, sql_quotient)
VALUES 
   ( 'A', 25 ),
   ( 'B', 30 ),
   ( 'C', 40 ),
   ( 'A', 35 ),
   ( 'B', 20 );
sql_quotient任務：顯示所有組的最大平均值。
group_id保證是 AZ 範圍內的單個字元。
這裡對於 A 組，avg 是 30；對於 B，平均為 25；對於 C，avg 為 40；因此，應該顯示 40。
我嘗試了以下 2 個查詢；兩者都給了我正確的答案。
查詢 1
select max(round(b.avg_quotient,2)) as answer 
        from 
         (SELECT AVG(sql_quotient) as avg_quotient FROM Students GROUP BY group_id) as b;
執行時間 = 0.002378 秒
查詢 2
select max(round(b.avg_quotient,2)) as answer 
    from 
     (SELECT AVG(sql_quotient) as avg_quotient FROM Students GROUP BY substr(group_id,1,1) )as b;  
執行時間 = 0.000459 秒
區別 - 第一個查詢將數據分組group_id；第二個由`substr（group_id，1,1）。
由於第二個查詢應用了一個附加功能，我預計它需要更長的時間。但是，正如您在上面看到的，第 2 號查詢的執行時間明顯少於第 2 號查詢。1.
我的問題：為什麼查詢 2 的執行時間比查詢 1 低，即使查詢 2 有一個額外的函式 (substr())。
筆記：
架構已定義。我不知道為什麼 Id 的數據類型是文本而不是 char(1)，並且就我的問題而言，這無關緊要。
我正在尋找這兩個查詢之間執行時差異的原因，而不是針對同一問題的另一個查詢。
每次執行都會重新創建學生表，因此這不是第一次執行時必須從磁碟讀取數據，第二次執行時必須在記憶體中的情況。為了證明這一點，我又執行了四次查詢，執行 Query 2，然後是 Query 1，然後是 2，然後是 1。執行時間是：
第二季度，第一：0.000493
第一季度，第一：0.002779
第二季度，第二：0.000499
第一季度，第二季度：0.002787

不需要子字元串。無需計算max()

像這樣的東西，

SELECT id, avg(price)
FROM Students
GROUP BY id
ORDER BY avg(price) DESC
LIMIT 1;

+------+------------+
| id   | avg(price) |
+------+------------+
| C    |    40.0000 |
+------+------------+

這可能比必須計算最大值更簡單。

SELECT id, avg(price)
FROM Students
GROUP BY id
HAVING avg(price) = (
   SELECT max(avg)
   FROM (
       SELECT id, avg(price) AS avg
       FROM Students
       GROUP BY id
   ) AS t
);

計時

+----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| Query_ID | Duration   | Query                                                                                                                                           |
+----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|        1 | 0.00008716 | SELECT id, avg(price) FROM Students GROUP BY id HAVING avg(price) = ( SELECT max(avg) FROM ( SELECT id, avg(price) AS avg FROM Students GROUP BY id) AS t ) |
|        2 | 0.00008053 | SELECT id, avg(price) FROM Students GROUP BY id ORDER BY avg(price) DESC LIMIT 1                                                                      |
|        3 | 0.00011303 | SELECT id, avg(price) FROM Students GROUP BY id HAVING avg(price) = ( SELECT max(avg) FROM ( SELECT id, avg(price) AS avg FROM Students GROUP BY id) AS t ) |
|        4 | 0.00006121 | SELECT id, avg(price) FROM Students GROUP BY id ORDER BY avg(price) DESC LIMIT 1                                                                      |
+----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------+

我認為您不了解寒冷和炎熱的時期。再次執行相同的查詢。

我認為這個問題的整個基礎是錯誤的。它沒有更快。第一個表查找速度較慢，然後被記憶體。

MariaDB [test]&gt; show profiles;
+----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query_ID | Duration   | Query                                                                                                                                                 |
+----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
|        1 | 0.00366737 | select max(round(b.avg_price,2)) as answer 
            from 
             (SELECT AVG(price) as avg_price FROM Students GROUP BY id )as b          |
|        2 | 0.00080117 | select max(round(b.avg_price,2)) as answer 
        from 
         (SELECT AVG(price) as avg_price FROM 
   Students GROUP BY substr(id,1,1) )as b |
|        3 | 0.00010088 | select max(round(b.avg_price,2)) as answer 
            from 
             (SELECT AVG(price) as avg_price FROM Students GROUP BY id )as b          |
|        4 | 0.00015381 | select max(round(b.avg_price,2)) as answer 
        from 
         (SELECT AVG(price) as avg_price FROM 
   Students GROUP BY substr(id,1,1) )as b |
+----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+

你看不到堆命中和高畫質命中，因為 MySQL 很臭。但是在真實的數據庫中，

explain (ANALYZE, VERBOSE, BUFFERS) SELECT id, avg(price)
FROM Students
GROUP BY id
ORDER BY avg(price) DESC
LIMIT 1;;
                                                          QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=31.50..31.50 rows=1 width=40) (actual time=0.091..0.091 rows=1 loops=1)
  Output: id, (avg(price))
  Buffers: shared hit=3 read=1
  -&gt;  Sort  (cost=31.50..32.00 rows=200 width=40) (actual time=0.090..0.090 rows=1 loops=1)
        Output: id, (avg(price))
        Sort Key: (avg(students.price)) DESC
        Sort Method: top-N heapsort  Memory: 25kB
        Buffers: shared hit=3 read=1
        -&gt;  HashAggregate  (cost=28.00..30.50 rows=200 width=40) (actual time=0.045..0.046 rows=3 loops=1)
              Output: id, avg(price)
              Group Key: students.id
              Buffers: shared read=1
              -&gt;  Seq Scan on public.students  (cost=0.00..22.00 rows=1200 width=40) (actual time=0.027..0.028 rows=5 loops=1)
                    Output: id, price
                    Buffers: shared read=1
Planning time: 0.541 ms
Execution time: 0.332 ms
(17 rows)

test=# explain (ANALYZE, VERBOSE, BUFFERS) SELECT id, avg(price)
FROM Students
GROUP BY id
ORDER BY avg(price) DESC
LIMIT 1;;
                                                          QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=31.50..31.50 rows=1 width=40) (actual time=0.059..0.059 rows=1 loops=1)
  Output: id, (avg(price))
  Buffers: shared hit=1
  -&gt;  Sort  (cost=31.50..32.00 rows=200 width=40) (actual time=0.057..0.057 rows=1 loops=1)
        Output: id, (avg(price))
        Sort Key: (avg(students.price)) DESC
        Sort Method: top-N heapsort  Memory: 25kB
        Buffers: shared hit=1
        -&gt;  HashAggregate  (cost=28.00..30.50 rows=200 width=40) (actual time=0.037..0.039 rows=3 loops=1)
              Output: id, avg(price)
              Group Key: students.id
              Buffers: shared hit=1
              -&gt;  Seq Scan on public.students  (cost=0.00..22.00 rows=1200 width=40) (actual time=0.013..0.015 rows=5 loops=1)
                    Output: id, price
                    Buffers: shared hit=1
Planning time: 0.114 ms
Execution time: 0.152 ms
(17 rows)

觀察，使用 PostgreSQL，

  Buffers: shared hit=3 read=1

對比

  Buffers: shared hit=1

第一次執行必須讀取表，然後它有三個共享命中要 ram。第二次，它已經在 ram 中，而且一擊即中。

引用自：https://dba.stackexchange.com/questions/185514

Substr() 與 Group by 子句

相關問答

MySQL優化-年份列分組-使用臨時表、文件排序

數據庫和查詢優化

MySQL 能否在 10GB 網路儲存上使用 raid 與本地 HDD 表現得更好？

mysql 簡單更新 4000 萬和 128GB RAM 花費太多時間

MySQL Connections 減慢了我的網站速度

優化慢查詢