Mysql
Substr() 與 Group by 子句
我一直在努力解決這裡顯示的問題,但出現了一個問題
這是我創建的架構、我的問題陳述和我的測試數據:
架構:學生
CREATE TABLE Students (group_id text, sql_quotient float); INSERT INTO Students(group_id, sql_quotient) VALUES ( 'A', 25 ), ( 'B', 30 ), ( 'C', 40 ), ( 'A', 35 ), ( 'B', 20 );
sql_quotient
任務:顯示所有組的最大平均值。
group_id
保證是 AZ 範圍內的單個字元。- 這裡對於 A 組,avg 是 30;對於 B,平均為 25;對於 C,avg 為 40;因此,應該顯示 40。
我嘗試了以下 2 個查詢;兩者都給了我正確的答案。
查詢 1
select max(round(b.avg_quotient,2)) as answer from (SELECT AVG(sql_quotient) as avg_quotient FROM Students GROUP BY group_id) as b;
執行時間 = 0.002378 秒
查詢 2
select max(round(b.avg_quotient,2)) as answer from (SELECT AVG(sql_quotient) as avg_quotient FROM Students GROUP BY substr(group_id,1,1) )as b;
執行時間 = 0.000459 秒
區別 - 第一個查詢將數據分組
group_id
;第二個由`substr(group_id,1,1)。由於第二個查詢應用了一個附加功能,我預計它需要更長的時間。但是,正如您在上面看到的,第 2 號查詢的執行時間明顯少於第 2 號查詢。1.
我的問題:為什麼查詢 2 的執行時間比查詢 1 低,即使查詢 2 有一個額外的函式 (substr())。
筆記:
架構已定義。我不知道為什麼 Id 的數據類型是文本而不是 char(1),並且就我的問題而言,這無關緊要。
我正在尋找這兩個查詢之間執行時差異的原因,而不是針對同一問題的另一個查詢。
每次執行都會重新創建學生表,因此這不是第一次執行時必須從磁碟讀取數據,第二次執行時必須在記憶體中的情況。為了證明這一點,我又執行了四次查詢,執行 Query 2,然後是 Query 1,然後是 2,然後是 1。執行時間是:
- 第二季度,第一:0.000493
- 第一季度,第一:0.002779
- 第二季度,第二:0.000499
- 第一季度,第二季度:0.002787
不需要子字元串。無需計算
max()
像這樣的東西,
SELECT id, avg(price) FROM Students GROUP BY id ORDER BY avg(price) DESC LIMIT 1; +------+------------+ | id | avg(price) | +------+------------+ | C | 40.0000 | +------+------------+
這可能比必須計算最大值更簡單。
SELECT id, avg(price) FROM Students GROUP BY id HAVING avg(price) = ( SELECT max(avg) FROM ( SELECT id, avg(price) AS avg FROM Students GROUP BY id ) AS t );
計時
+----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ | Query_ID | Duration | Query | +----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | 0.00008716 | SELECT id, avg(price) FROM Students GROUP BY id HAVING avg(price) = ( SELECT max(avg) FROM ( SELECT id, avg(price) AS avg FROM Students GROUP BY id) AS t ) | | 2 | 0.00008053 | SELECT id, avg(price) FROM Students GROUP BY id ORDER BY avg(price) DESC LIMIT 1 | | 3 | 0.00011303 | SELECT id, avg(price) FROM Students GROUP BY id HAVING avg(price) = ( SELECT max(avg) FROM ( SELECT id, avg(price) AS avg FROM Students GROUP BY id) AS t ) | | 4 | 0.00006121 | SELECT id, avg(price) FROM Students GROUP BY id ORDER BY avg(price) DESC LIMIT 1 | +----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
我認為您不了解寒冷和炎熱的時期。再次執行相同的查詢。
我認為這個問題的整個基礎是錯誤的。它沒有更快。第一個表查找速度較慢,然後被記憶體。
MariaDB [test]> show profiles; +----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query_ID | Duration | Query | +----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | 0.00366737 | select max(round(b.avg_price,2)) as answer from (SELECT AVG(price) as avg_price FROM Students GROUP BY id )as b | | 2 | 0.00080117 | select max(round(b.avg_price,2)) as answer from (SELECT AVG(price) as avg_price FROM Students GROUP BY substr(id,1,1) )as b | | 3 | 0.00010088 | select max(round(b.avg_price,2)) as answer from (SELECT AVG(price) as avg_price FROM Students GROUP BY id )as b | | 4 | 0.00015381 | select max(round(b.avg_price,2)) as answer from (SELECT AVG(price) as avg_price FROM Students GROUP BY substr(id,1,1) )as b | +----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
你看不到堆命中和高畫質命中,因為 MySQL 很臭。但是在真實的數據庫中,
explain (ANALYZE, VERBOSE, BUFFERS) SELECT id, avg(price) FROM Students GROUP BY id ORDER BY avg(price) DESC LIMIT 1;; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Limit (cost=31.50..31.50 rows=1 width=40) (actual time=0.091..0.091 rows=1 loops=1) Output: id, (avg(price)) Buffers: shared hit=3 read=1 -> Sort (cost=31.50..32.00 rows=200 width=40) (actual time=0.090..0.090 rows=1 loops=1) Output: id, (avg(price)) Sort Key: (avg(students.price)) DESC Sort Method: top-N heapsort Memory: 25kB Buffers: shared hit=3 read=1 -> HashAggregate (cost=28.00..30.50 rows=200 width=40) (actual time=0.045..0.046 rows=3 loops=1) Output: id, avg(price) Group Key: students.id Buffers: shared read=1 -> Seq Scan on public.students (cost=0.00..22.00 rows=1200 width=40) (actual time=0.027..0.028 rows=5 loops=1) Output: id, price Buffers: shared read=1 Planning time: 0.541 ms Execution time: 0.332 ms (17 rows) test=# explain (ANALYZE, VERBOSE, BUFFERS) SELECT id, avg(price) FROM Students GROUP BY id ORDER BY avg(price) DESC LIMIT 1;; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Limit (cost=31.50..31.50 rows=1 width=40) (actual time=0.059..0.059 rows=1 loops=1) Output: id, (avg(price)) Buffers: shared hit=1 -> Sort (cost=31.50..32.00 rows=200 width=40) (actual time=0.057..0.057 rows=1 loops=1) Output: id, (avg(price)) Sort Key: (avg(students.price)) DESC Sort Method: top-N heapsort Memory: 25kB Buffers: shared hit=1 -> HashAggregate (cost=28.00..30.50 rows=200 width=40) (actual time=0.037..0.039 rows=3 loops=1) Output: id, avg(price) Group Key: students.id Buffers: shared hit=1 -> Seq Scan on public.students (cost=0.00..22.00 rows=1200 width=40) (actual time=0.013..0.015 rows=5 loops=1) Output: id, price Buffers: shared hit=1 Planning time: 0.114 ms Execution time: 0.152 ms (17 rows)
觀察,使用 PostgreSQL,
Buffers: shared hit=3 read=1
對比
Buffers: shared hit=1
第一次執行必須讀取表,然後它有三個共享命中要 ram。第二次,它已經在 ram 中,而且一擊即中。