Mysql

Substr() 與 Group by 子句

  • September 12, 2017

我一直在努力解決這裡顯示的問題,但出現了一個問題

這是我創建的架構、我的問題陳述和我的測試數據:

架構:學生

CREATE TABLE Students (group_id text, sql_quotient float);
INSERT INTO Students(group_id, sql_quotient)
VALUES 
   ( 'A', 25 ),
   ( 'B', 30 ),
   ( 'C', 40 ),
   ( 'A', 35 ),
   ( 'B', 20 );

sql_quotient任務:顯示所有組的最大平均值。

  • group_id保證是 AZ 範圍內的單個字元。
  • 這裡對於 A 組,avg 是 30;對於 B,平均為 25;對於 C,avg 為 40;因此,應該顯示 40。

我嘗試了以下 2 個查詢;兩者都給了我正確的答案。

查詢 1

select max(round(b.avg_quotient,2)) as answer 
        from 
         (SELECT AVG(sql_quotient) as avg_quotient FROM Students GROUP BY group_id) as b;

執行時間 = 0.002378 秒

查詢 2

select max(round(b.avg_quotient,2)) as answer 
    from 
     (SELECT AVG(sql_quotient) as avg_quotient FROM Students GROUP BY substr(group_id,1,1) )as b;  

執行時間 = 0.000459 秒

區別 - 第一個查詢將數據分組group_id;第二個由`substr(group_id,1,1)。

由於第二個查詢應用了一個附加功能,我預計它需要更長的時間。但是,正如您在上面看到的,第 2 號查詢的執行時間明顯少於第 2 號查詢。1.

我的問題:為什麼查詢 2 的執行時間比查詢 1 低,即使查詢 2 有一個額外的函式 (substr())。

筆記:

  • 架構已定義。我不知道為什麼 Id 的數據類型是文本而不是 char(1),並且就我的問題而言,這無關緊要。

  • 我正在尋找這兩個查詢之間執行時差異的原因,而不是針對同一問題的另一個查詢。

  • 每次執行都會重新創建學生表,因此這不是第一次執行時必須從磁碟讀取數據,第二次執行時必須在記憶體中的情況。為了證明這一點,我又執行了四次查詢,執行 Query 2,然後是 Query 1,然後是 2,然後是 1。執行時間是:

    • 第二季度,第一:0.000493
    • 第一季度,第一:0.002779
    • 第二季度,第二:0.000499
    • 第一季度,第二季度:0.002787

不需要子字元串。無需計算max()

像這樣的東西,

SELECT id, avg(price)
FROM Students
GROUP BY id
ORDER BY avg(price) DESC
LIMIT 1;

+------+------------+
| id   | avg(price) |
+------+------------+
| C    |    40.0000 |
+------+------------+

這可能比必須計算最大值更簡單。

SELECT id, avg(price)
FROM Students
GROUP BY id
HAVING avg(price) = (
   SELECT max(avg)
   FROM (
       SELECT id, avg(price) AS avg
       FROM Students
       GROUP BY id
   ) AS t
);

計時

+----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| Query_ID | Duration   | Query                                                                                                                                           |
+----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|        1 | 0.00008716 | SELECT id, avg(price) FROM Students GROUP BY id HAVING avg(price) = ( SELECT max(avg) FROM ( SELECT id, avg(price) AS avg FROM Students GROUP BY id) AS t ) |
|        2 | 0.00008053 | SELECT id, avg(price) FROM Students GROUP BY id ORDER BY avg(price) DESC LIMIT 1                                                                      |
|        3 | 0.00011303 | SELECT id, avg(price) FROM Students GROUP BY id HAVING avg(price) = ( SELECT max(avg) FROM ( SELECT id, avg(price) AS avg FROM Students GROUP BY id) AS t ) |
|        4 | 0.00006121 | SELECT id, avg(price) FROM Students GROUP BY id ORDER BY avg(price) DESC LIMIT 1                                                                      |
+----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------+

我認為您不了解寒冷和炎熱的時期。再次執行相同的查詢。

我認為這個問題的整個基礎是錯誤的。它沒有更快。第一個表查找速度較慢,然後被記憶體。

MariaDB [test]> show profiles;
+----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query_ID | Duration   | Query                                                                                                                                                 |
+----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
|        1 | 0.00366737 | select max(round(b.avg_price,2)) as answer 
            from 
             (SELECT AVG(price) as avg_price FROM Students GROUP BY id )as b          |
|        2 | 0.00080117 | select max(round(b.avg_price,2)) as answer 
        from 
         (SELECT AVG(price) as avg_price FROM 
   Students GROUP BY substr(id,1,1) )as b |
|        3 | 0.00010088 | select max(round(b.avg_price,2)) as answer 
            from 
             (SELECT AVG(price) as avg_price FROM Students GROUP BY id )as b          |
|        4 | 0.00015381 | select max(round(b.avg_price,2)) as answer 
        from 
         (SELECT AVG(price) as avg_price FROM 
   Students GROUP BY substr(id,1,1) )as b |
+----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+

你看不到堆命中和高畫質命中,因為 MySQL 很臭。但是在真實的數據庫中,

explain (ANALYZE, VERBOSE, BUFFERS) SELECT id, avg(price)
FROM Students
GROUP BY id
ORDER BY avg(price) DESC
LIMIT 1;;
                                                          QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=31.50..31.50 rows=1 width=40) (actual time=0.091..0.091 rows=1 loops=1)
  Output: id, (avg(price))
  Buffers: shared hit=3 read=1
  ->  Sort  (cost=31.50..32.00 rows=200 width=40) (actual time=0.090..0.090 rows=1 loops=1)
        Output: id, (avg(price))
        Sort Key: (avg(students.price)) DESC
        Sort Method: top-N heapsort  Memory: 25kB
        Buffers: shared hit=3 read=1
        ->  HashAggregate  (cost=28.00..30.50 rows=200 width=40) (actual time=0.045..0.046 rows=3 loops=1)
              Output: id, avg(price)
              Group Key: students.id
              Buffers: shared read=1
              ->  Seq Scan on public.students  (cost=0.00..22.00 rows=1200 width=40) (actual time=0.027..0.028 rows=5 loops=1)
                    Output: id, price
                    Buffers: shared read=1
Planning time: 0.541 ms
Execution time: 0.332 ms
(17 rows)

test=# explain (ANALYZE, VERBOSE, BUFFERS) SELECT id, avg(price)
FROM Students
GROUP BY id
ORDER BY avg(price) DESC
LIMIT 1;;
                                                          QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=31.50..31.50 rows=1 width=40) (actual time=0.059..0.059 rows=1 loops=1)
  Output: id, (avg(price))
  Buffers: shared hit=1
  ->  Sort  (cost=31.50..32.00 rows=200 width=40) (actual time=0.057..0.057 rows=1 loops=1)
        Output: id, (avg(price))
        Sort Key: (avg(students.price)) DESC
        Sort Method: top-N heapsort  Memory: 25kB
        Buffers: shared hit=1
        ->  HashAggregate  (cost=28.00..30.50 rows=200 width=40) (actual time=0.037..0.039 rows=3 loops=1)
              Output: id, avg(price)
              Group Key: students.id
              Buffers: shared hit=1
              ->  Seq Scan on public.students  (cost=0.00..22.00 rows=1200 width=40) (actual time=0.013..0.015 rows=5 loops=1)
                    Output: id, price
                    Buffers: shared hit=1
Planning time: 0.114 ms
Execution time: 0.152 ms
(17 rows)

觀察,使用 PostgreSQL,

  Buffers: shared hit=3 read=1

對比

  Buffers: shared hit=1

第一次執行必須讀取表,然後它有三個共享命中要 ram。第二次,它已經在 ram 中,而且一擊即中。

引用自:https://dba.stackexchange.com/questions/185514