查詢突然停止工作 - 簡單的左連接但看不到什麼問題

January 29, 2021

我有一個查詢，它是兩個表之間的簡單左連接，IS NULL包含在 where 子句中，因為我需要顯示的左表的所有行，即使它在右表中給出了空值。
這很有效，因為我讓它在我的 php 程式碼中工作，並且我的網站正在顯示它需要的內容。我已經一個多星期沒看這個了，今天回去發現它現在突然不工作了，即使我沒有碰過它。
我在這裡用我的確切程式碼和表格創建了一個 db fiddle - https://dbfiddle.uk/?rdbms=mariadb_10.4&fiddle=2effc82390641ce513806252700fd25c
我想顯示 - 左表 (level_quiz) 中的所有行和右表 (student_points) 中的所有行，其中 student_no = 40204123 或有 NULL 行
任何人都可以看看這個，看看為什麼它沒有顯示左表的額外行？（右表會有 NULL 值）
這將不勝感激。

您必須在加入前選擇學生。

也使用別名，這樣你就可以少輸入

SELECT 
    *
FROM
    level_quiz
        LEFT JOIN
    (SELECT * FROM student_points WHERE student_no = 40204123) sp ON level_quiz.id = sp.level_id

編號 | 級別標題 | quiz_desc | 整體任務 | 學生號 | level_id | 積分 | 時間戳 
-: | :--------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------: | -------: | -----: | :------------------
 1 | 套裝 | 此挑戰的目的是幫助您完全熟悉如何編寫特定類型的集合，以及如何對實際集合執行標準操作。| 假設我們有三組數 A、B、C，定義如下： A = { 1, 10, 7, 3, 5, 2 };B = { 5, 8, 6, 7, 4 };C = { 7, 1, 8 } | 40204123 | 1 | 80 | 2021-01-12 15:37:11
 2 | 序列 | 序列描述 | 整體問題2 | 40204123 | 2 | 75 | 2021-01-12 15:38:06
 3 | 命題邏輯 | 邏輯描述 | 整體任務邏輯 | 40204123 | 3 | 30 | 2021-01-13 22:13:13
 4 | 謂詞邏輯 - 集合 | 謂詞 desc 1 | 前提任務 1 | *空*| *空*| *空*| *空值*

db<>在這裡擺弄

您的程式碼正在按照您的要求執行！

<TL;DR>

此處概述的問題有三種可能的解決方案：

添加帶有條目的附加記錄NULL以“強制”表格JOIN，
重寫 SQL（使其性能降低 - 請參閱最後的性能分析部分），
更改模式（規範化） - 這是最佳的恕我直言 - 更好的性能和更好的現實表現。

</TL;DR>

我調整了您的架構，使其更符合我的口味（此分析第一部分的小提琴可在此處獲得）：

CREATE TABLE level_quiz 
(
 id            INTEGER       NOT NULL,
 level_title   VARCHAR  (50) NOT NULL,
 quiz_desc     VARCHAR (200) NOT NULL,
 overall_quest VARCHAR (250) NOT NULL
);

CREATE TABLE student_points 
(
 student_no INTEGER   NOT NULL,
 level_id   INTEGER   NOT NULL,
 points     INTEGER   NULL,  -- &lt;&lt; have to make NULLable, see below
 ts         TIMESTAMP NULL  -- &lt;&lt; renamed timestamp to ts!
);

需要注意的兩點：

除非您需要（或此處和此處），否則將欄位聲明為INT(x)其中 x 是一個數字是沒有意義的- 加上會做同樣的事情 - 再加上它使您的程式碼不可移植（見下文），ZEROFILLLPAD
你永遠不應該使用SQL keyword(TIMESTAMP在這種情況下) 作為表名或列名 - 這對調試不利，會產生令人困惑的錯誤消息，並且通常是不好的做法。

為了使結果更簡單，我截斷瞭如下欄位：

INSERT INTO level_quiz (id, level_title, quiz_desc, overall_quest) 
VALUES
(1, 'Sets',       'The purpose of...', 'Suppose we have... '),  -- &lt;&lt; truncated strings
(2, 'Seqs',       'sequences desc...', 'overall question... '),
(3, 'Prop Logic', 'logic desc    ...', 'overall quest... '),
(4, 'Pred Logic', 'pred desc 1   ...', 'predicase quest...');

還有兩個額外的記錄，我稍後會INSERT在我的分析中。

INSERT INTO student_points (student_no, level_id, points, ts) 
VALUES
(12345678, 1, 80, '2021-01-15 16:07:43'),
(12345678, 2, 25, '2021-01-13 17:15:10'),
(12345678, 3, 90, '2021-01-17 22:41:55'),
(12345678, 4, 90, '2021-01-17 22:41:55'),

(40204123, 1, 80, '2021-01-12 15:37:11'),
(40204123, 2, 75, '2021-01-12 15:38:06'),
(40204123, 3, 30, '2021-01-13 22:13:13'),
-- (40204123, 4, NULL, NULL),           -- &lt;&lt;&lt;  -- see below what happens when this 
                                               -- record is inserted
                                               
(40213894, 1, 90, '2021-01-14 21:52:00'),
(40213894, 2, 95, '2021-01-17 22:42:50'),
(40213894, 4, 100, '2021-01-17 22:42:50');
-- (40213894, 4, NULL, NULL),            -- &lt;&lt;&lt; see below also

現在，您的程式碼：

SELECT *
FROM level_quiz 
LEFT JOIN student_points 
 ON level_quiz.id = student_points.level_id 
WHERE student_points.student_no = 40204123 
OR student_points.student_no IS NULL  -- &lt;&lt;-- Makes NO difference

結果（請參閱小提琴以獲得更好的格式）：

id  level_title quiz_desc   overall_quest   student_no  level_id    points  ts
1  Sets    The purpose of...   Suppose we have...  40204123    1   80  2021-01-12 15:37:11
2  Seqs    sequences desc...   overall question...     40204123    2   75  2021-01-12 15:38:06
3  Prop Logic  logic desc    ...   overall quest...    40204123    3   30  2021-01-13 22:13:13

但是，您只有 3 條記錄 - 沒有對應NULL40204123student_no的測驗級別 4！

現在，當我得到一個奇怪的結果時，我的"go-to"反應是檢查 PostgreSQL 在相同情況下做了什麼。我一直發現 PostgreSQL 在幾乎所有方面都優於 MySQL。

因此，與其急於向 MySQL 報告錯誤（祝你好運…… - 他們有這麼多！），你應該嘗試檢查其他伺服器 - 不太可能因為如此基本的東西而LEFT JOIN未被發現的根本錯誤長！結果就在這裡，可以看出PostgreSQL對於同一個查詢返回的數據是一樣的！

發生什麼了？

好吧，我們現在來看看@nbk 的答案。

--
-- Solution proposed by nbk - NULLs in the result as desired!
--

SELECT 
 lq.id, lq.level_title, lq.quiz_desc, lq.overall_quest,
 sp.student_no, sp.level_id, sp.points, sp.ts
FROM
 level_quiz lq
LEFT JOIN
(
  SELECT * FROM student_points 
  WHERE student_no = 40204123
) sp
ON lq.id = sp.level_id;

結果（在小提琴上更好地查看）：

id  level_title quiz_desc   overall_quest   student_no  level_id    points  ts
1   Sets    The purpose of...   Suppose we have...  40204123    1   80  2021-01-12 15:37:11
2   Seqs    sequences desc...   overall question...     40204123    2   75  2021-01-12 15:38:06
3   Prop Logic  logic desc    ...   overall quest...    40204123    3   30  2021-01-13 22:13:13
4   Pred Logic  pred desc 1   ...   predicase quest...  NULL    NULL   NULL  NULL

所以現在，我們顯然得到了正確的結果——用NULLs 表示 4 級的測驗！但是，現在讓我們看看當我們為**2**學生執行相同的查詢時會發生什麼！

--
-- bb25's original SQL - with 2 students - but no NULLs in the result!
--

SELECT *
FROM level_quiz 
LEFT JOIN student_points 
 ON level_quiz.id = student_points.level_id 
WHERE student_points.student_no IN (40204123, 40213894) 
OR student_points.student_no IS NULL  -- &lt;&lt; Makes NO difference!

結果：

id  level_title quiz_desc   overall_quest   student_no  level_id    points  ts
1   Sets    The purpose of...   Suppose we have...  40204123    1   80  2021-01-12 15:37:11
2   Seqs    sequences desc...   overall question...     40204123    2   75  2021-01-12 15:38:06
3   Prop Logic  logic desc    ...   overall quest...    40204123    3   30  2021-01-13 22:13:13
1   Sets    The purpose of...   Suppose we have...  40213894    1   90  2021-01-14 21:52:00
2   Seqs    sequences desc...   overall question...     40213894    2   95  2021-01-17 22:42:50
4   Pred Logic  pred desc 1   ...   predicase quest...  40213894    4   100 2021-01-17 22:42:50

出乎意料的是沒有 NULLs - 我們有學生 40204123 的 1、2 和 3 的測驗結果和學生 40213894 的測驗 1、2 和 4。

接下來，我們重新審視nbk的回答。

--
-- SQL proposed by nbk - with 2 students - but again no NULLs in the result!
--
SELECT 
 lq.id, lq.level_title, lq.quiz_desc, lq.overall_quest,
 sp.student_no, sp.level_id, sp.points, sp.ts
FROM
 level_quiz lq
LEFT JOIN
(
  SELECT * FROM student_points 
  WHERE student_no IN (40204123, 40213894)
) sp
ON lq.id = sp.level_id;

結果：

id  level_title quiz_desc   overall_quest   student_no  level_id    points  ts
1   Sets    The purpose of...   Suppose we have...  40204123    1   80  2021-01-12 15:37:11
2   Seqs    sequences desc...   overall question...     40204123    2   75  2021-01-12 15:38:06
3   Prop Logic  logic desc    ...   overall quest...    40204123    3   30  2021-01-13 22:13:13
1   Sets    The purpose of...   Suppose we have...  40213894    1   90  2021-01-14 21:52:00
2   Seqs    sequences desc...   overall question...     40213894    2   95  2021-01-17 22:42:50
4   Pred Logic  pred desc 1   ...   predicase quest...  40213894    4   100 2021-01-17 22:42:50

再次沒有NULL任何地方可以看到！@nbk 的答案的結果與 OP 的 SQL的結果相同

解決方案 1： - 添加一些 (2) 記錄！

所以，我們這樣做：

INSERT INTO student_points VALUES
(40204123, 4, NULL, NULL),     -- &lt;&lt;&lt;  NOW we INSERT these records! 
(40213894, 3, NULL, NULL);

現在，我們有所有學生的所有測驗級別的記錄 - 但顯然學生沒有完成一個級別的級別不能有分數（points= NULL），也不能有沒有發生的事情的時間戳（ts= NULL）！

因此，基本上，bb25（即 OP）的 SQL 適用於這種情況（就像 nbk 一樣），並且兩條 SQL 都適用於兩個學生以及一個學生 - 所以，添加這些記錄可以解決問題！

我在這裡只展示了 OP 的原始 SQL（對於 2 名學生）——更多內容顯示在fiddle上。

--
-- bb25 original SQL - 2 students - NULLs NOW in the result
--

SELECT *
FROM level_quiz 
LEFT JOIN student_points 
 ON level_quiz.id = student_points.level_id 
WHERE student_points.student_no IN (40204123, 40213894) 
ORDER BY student_points.student_no, level_quiz.id;

結果（在fiddle上更好地查看）：

id  level_title quiz_desc   overall_quest   student_no  level_id    points  ts
1   Sets    The purpose of...   Suppose we have...  40204123    1   80  2021-01-12 15:37:11
2   Seqs    sequences desc...   overall question...     40204123    2   75  2021-01-12 15:38:06
3   Prop Logic  logic desc    ...   overall quest...    40204123    3   30  2021-01-13 22:13:13
4   Pred Logic  pred desc 1   ...   predicase quest...  40204123    4       
1   Sets    The purpose of...   Suppose we have...  40213894    1   90  2021-01-14 21:52:00
2   Seqs    sequences desc...   overall question...     40213894    2   95  2021-01-17 22:42:50
3   Prop Logic  logic desc    ...   overall quest...    40213894    3       
4   Pred Logic  pred desc 1   ...   predicase quest...  40213894    4   100 2021-01-17 22:42:50

現在我們確實NULL在適當的地方。

解決方案 2 - 更改 SQL 以使用原始數據集：

更好的解決方案可能是實際讓 SQL 生成所需的數據**，而**無需添加補充記錄 - 尤其是。帶有NULLs 的記錄 - 許多人認為這是有問題的。

所以，這裡我只是在表中用表的s執行一個CROSS JOINon以獲得所有可能的學生與測驗的組合……student_no``student_points``id``level_quiz

因此，首先我們DELETE為解決方案 1 工作而插入的記錄。

DELETE FROM student_points WHERE points IS NULL;

然後執行這個 SQL：

SELECT distinct sp1.student_no, t1.id
FROM student_points sp1
CROSS JOIN 
(
 SELECT distinct lq.id 
 FROM level_quiz lq
) AS t1
ORDER BY sp1.student_no, t1.id;

結果：

student_no  id
12345678    1
12345678    2
12345678    3
12345678    4
40204123    1
40204123    2
40204123    3
40204123    4
40213894    1
40213894    2
40213894    3
40213894    4
12 rows

然後，我們必須將JOIN這些記錄返回到它們的原始表中：

SELECT 
 t2.id, 
 SUBSTRING(lq2.level_title, 1, 6) AS "LT:", lq2.quiz_desc, lq2.overall_quest,
 t2.student_no, COALESCE(sp2.points, 0) AS "Points:", sp2.ts
FROM
(
 SELECT distinct sp1.student_no, t1.id
 FROM student_points sp1
 CROSS JOIN 
 (
   SELECT distinct lq1.id 
   FROM level_quiz lq1
 ) AS t1
) AS t2
LEFT JOIN student_points sp2
 ON t2.student_no = sp2.student_no
 AND t2.id = sp2.level_id
JOIN level_quiz lq2
 ON t2.id = lq2.id
WHERE t2.student_no IN (40204123, 40213894)
ORDER BY t2.student_no, t2.id;

結果：

id  LT: quiz_desc   overall_quest   student_no  Points: ts
1   Sets    The purpose of...   Suppose we have...  40204123    80  2021-01-12 15:37:11
2   Seqs    sequences desc...   overall question...     40204123    75  2021-01-12 15:38:06
3   Prop L  logic desc    ...   overall quest...    40204123    30  2021-01-13 22:13:13
4   Pred L  pred desc 1   ...   predicase quest...  40204123    0   
1   Sets    The purpose of...   Suppose we have...  40213894    90  2021-01-14 21:52:00
2   Seqs    sequences desc...   overall question...     40213894    95  2021-01-17 22:42:50
3   Prop L  logic desc    ...   overall quest...    40213894    0   
4   Pred L  pred desc 1   ...   predicase quest...  40213894    100 2021-01-17 22:42:50

我們可以看到，NULL作為COALESCE函式的結果，我們有 0 分，但是現在我們失去的記錄已經“重新出現”。

解決方案 3：重新設計架構：

比如說，如果我們有一個沒有參加任何測驗的學生（回想我的學生時代，這非常有可能！），我們將如何處理這種情況？

我們可以改進模式（fiddle here）。

關係（表格）是實體（事物）——我的（特別出色的）關係理論概要！:-)。現在，aquiz是一個“事物”，這意味著它必須對應於我們關係數據庫中的一個關係（即表）。學生也是“事物”——因此，student需要一張桌子。

“棘手”的一點是——the 和 the 之間的關係也是一個“事物” ，因此應該是一張桌子！諸如此類的實體被稱為關聯表，它們對應的表稱為關聯表——但更常見的是或表（實際上，連結上有 17 個名稱。student``quizAssociative Entitiesjoining``linking

新架構：

因此，我自己的建議是您執行以下操作：

CREATE TABLE student 
(
 s_id INTEGER NOT NULL,
 s_name VARCHAR (20) NOT NULL,
 CONSTRAINT student_pk PRIMARY KEY (s_id)
);

CREATE TABLE quiz 
(
 q_id         INTEGER       NOT NULL,
 q_title   VARCHAR  (50) NOT NULL,
 CONSTRAINT ql_pk PRIMARY KEY (q_id)
);

CREATE TABLE student_score
(
 ss_s_id  INTEGER   NOT NULL,
 ss_q_id INTEGER   NOT NULL,
 score   INTEGER   NOT NULL,
 ts    TIMESTAMP NOT NULL,

 CONSTRAINT sp_pk PRIMARY KEY (ss_s_id, ss_q_id),
 CONSTRAINT sp_s_no_fk FOREIGN KEY (ss_s_id)  REFERENCES student (s_id),
 CONSTRAINT sp_ql_id   FOREIGN KEY (ss_q_id) REFERENCES quiz (q_id)
);

這個答案變得相當長，所以我將在這裡給出最終的 SQL（一些中間步驟顯示在小提琴中）：

SELECT 
 q.q_id, q.q_title,
 s.s_id, s.s_name, COALESCE(ss.score, 0) AS score
FROM quiz q
CROSS JOIN student s
LEFT JOIN student_score ss
 ON ss.ss_s_id = s.s_id
 AND ss.ss_q_id = q.q_id
ORDER BY s.s_id, q.q_id;

結果（注意0學生 4 的 4 秒！）：

q_id    q_title s_id    s_name  score
1   Quiz 1  12345678    Student1_name   80
2   Quiz 2  12345678    Student1_name   25
3   Quiz 3  12345678    Student1_name   90
4   Quiz 4  12345678    Student1_name   90
1   Quiz 1  40204123    Student2_name   80
2   Quiz 2  40204123    Student2_name   75
3   Quiz 3  40204123    Student2_name   30
4   Quiz 4  40204123    Student2_name   0
1   Quiz 1  40213894    Student3_name   90
2   Quiz 2  40213894    Student3_name   95
3   Quiz 3  40213894    Student3_name   0
4   Quiz 4  40213894    Student3_name   100
1   Quiz 1  98765432    Student4_name   0
2   Quiz 2  98765432    Student4_name   0
3   Quiz 3  98765432    Student4_name   0
4   Quiz 4  98765432    Student4_name   0

性能分析：

使用 MySQL 8 的EXPLAIN ANALYZE功能，我們看到使用舊模式的工作 SQL 產生以下計劃（參見 fiddle here）：

EXPLAIN
-&gt; Sort: t2.student_no, t2.id  (actual time=0.184..0.185 rows=8 loops=1)
   -&gt; Stream results  (cost=32.42 rows=320) (actual time=0.133..0.170 rows=8 loops=1)
       -&gt; Left hash join (sp2.level_id = lq2.id), (sp2.student_no = t2.student_no)  (cost=32.42 rows=320) (actual time=0.125..0.148 rows=8 loops=1)
           -&gt; Nested loop inner join  (cost=5.85 rows=32) (actual time=0.082..0.100 rows=8 loops=1)
               -&gt; Table scan on lq2  (cost=0.65 rows=4) (actual time=0.005..0.015 rows=4 loops=1)
               -&gt; Index lookup on t2 using &lt;auto_key2&gt; (id=lq2.id)  (actual time=0.001..0.002 rows=2 loops=4)
                   -&gt; Materialize  (cost=4.60 rows=8) (actual time=0.020..0.021 rows=2 loops=4)
                       -&gt; Table scan on &lt;temporary&gt;  (actual time=0.000..0.001 rows=8 loops=1)
                           -&gt; Temporary table with deduplication  (cost=4.60 rows=8) (actual time=0.062..0.063 rows=8 loops=1)
                               -&gt; Inner hash join (no condition)  (cost=4.60 rows=8) (actual time=0.046..0.049 rows=24 loops=1)
                                   -&gt; Table scan on t1  (cost=1.48 rows=4) (actual time=0.000..0.001 rows=4 loops=1)
                                       -&gt; Materialize  (cost=0.65 rows=4) (actual time=0.021..0.022 rows=4 loops=1)
                                           -&gt; Table scan on &lt;temporary&gt;  (actual time=0.000..0.001 rows=4 loops=1)
                                               -&gt; Temporary table with deduplication  (cost=0.65 rows=4) (actual time=0.016..0.017 rows=4 loops=1)
                                                   -&gt; Table scan on lq1  (cost=0.65 rows=4) (actual time=0.004..0.008 rows=4 loops=1)
                                   -&gt; Hash
                                       -&gt; Filter: (sp1.student_no in (40204123,40213894))  (cost=1.25 rows=2) (actual time=0.010..0.016 rows=6 loops=1)
                                           -&gt; Table scan on sp1  (cost=1.25 rows=10) (actual time=0.005..0.014 rows=10 loops=1)
           -&gt; Hash
               -&gt; Table scan on sp2  (cost=0.16 rows=10) (actual time=0.015..0.026 rows=10 loops=1)

使用 PostgreSQL 功能的相同 SQL顯示了一個非常複雜的計劃（請參閱此小提琴EXPLAIN (ANALYZE, BUFFERS, COSTS, TIMING)的底部）。

具有修改後架構的 SQL 的小提琴如下（見底部）：

EXPLAIN
-&gt; Nested loop left join  (cost=2.05 rows=4) (actual time=0.018..0.031 rows=4 loops=1)
   -&gt; Table scan on q  (cost=0.65 rows=4) (actual time=0.011..0.015 rows=4 loops=1)
   -&gt; Single-row index lookup on ss using PRIMARY (ss_s_id=40204123, ss_q_id=q.q_id)  (cost=0.28 rows=1) (actual time=0.003..0.003 rows=1 loops=4)

並在這裡檢查 PostgreSQL ：

QUERY PLAN
Hash Left Join  (cost=14.64..46.31 rows=540 width=188) (actual time=0.077..0.083 rows=4 loops=1)
 Hash Cond: ((s.s_id = ss.ss_s_id) AND (q.q_id = ss.ss_q_id))
 Buffers: shared hit=5
 -&gt;  Nested Loop  (cost=0.15..28.97 rows=540 width=184) (actual time=0.032..0.035 rows=4 loops=1)
       Buffers: shared hit=3
       -&gt;  Index Scan using student_pk on student s  (cost=0.15..8.17 rows=1 width=62) (actual time=0.020..0.021 rows=1 loops=1)
             Index Cond: (s_id = 40204123)
             Buffers: shared hit=2
       -&gt;  Seq Scan on quiz q  (cost=0.00..15.40 rows=540 width=122) (actual time=0.008..0.009 rows=4 loops=1)
             Buffers: shared hit=1
 -&gt;  Hash  (cost=14.37..14.37 rows=8 width=12) (actual time=0.027..0.027 rows=3 loops=1)
       Buckets: 1024  Batches: 1  Memory Usage: 9kB
       Buffers: shared hit=2
       -&gt;  Bitmap Heap Scan on student_score ss  (cost=4.21..14.37 rows=8 width=12) (actual time=0.017..0.019 rows=3 loops=1)
             Recheck Cond: (ss_s_id = 40204123)
             Heap Blocks: exact=1
             Buffers: shared hit=2
             -&gt;  Bitmap Index Scan on sp_pk  (cost=0.00..4.21 rows=8 width=0) (actual time=0.006..0.007 rows=3 loops=1)
                   Index Cond: (ss_s_id = 40204123)
                   Buffers: shared hit=1
Planning Time: 0.254 ms
Execution Time: 0.188 ms
22 rows

因此，座右銘似乎是一個良好規範化的模式會產生 a) - 正確的結果，或者至少是更容易和更高效的正確結果！很高興知道！

引用自：https://dba.stackexchange.com/questions/283812

查詢突然停止工作 - 簡單的左連接但看不到什麼問題

解決方案 1： - 添加一些 (2) 記錄！

解決方案 2 - 更改 SQL 以使用原始數據集：

解決方案 3：重新設計架構：

新架構：

性能分析：

相關問答

MariaDb - 僅在存在單個右側記錄時如何確保（左）加入

使用映射表時獲取未映射記錄的計數

使用使用 JOIN 的查詢獲取大表的最後幾行

通過自我 JOIN 和 GROUP BY 更新表

改進查詢（使用 ctes 在範圍上進行令人討厭的自連接）

查找總計不匹配的行