Mysql
如何在 MySQL/MariaDB 中使用 FIRST_VALUE 視窗函式來獲取每組一行?
我很高興看到視窗函式登陸 MariaDB 10.2。我認為它們會很好地解決第一組問題,但我正在努力了解它們的效率。我有這樣的事情:
CREATE TABLE email (id INT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT, contact_id INT UNSIGNED NOT NULL, /* FK */ email VARCHAR(200), is_primary TINYINT(1) UNSIGNED NOT NULL DEFAULT 0) ; INSERT INTO email (id, contact_id, email, is_primary) VALUES (1, 1, 'oldwilma@example.com', 0), (2, 1, 'currentwilma@example.com', 1), (3, 1, 'otherwilma@example.com', 0), (4, 2, '', 1), (5, 2, 'betty@example.com', 0), ;
我想要每個聯繫人的列表以及最適合他們的電子郵件。“最佳”被定義為:
is_primary
如果存在的話,更喜歡他們的。我要這個:
Contact ID Email ID Email -------------- -------------------- ------------------ 1 2 currentwilma@example.com 2 5 betty@example.com
使用視窗功能,我可以獲得最好的電子郵件,例如
SELECT contact_id, FIRST_VALUE(email) OVER (PARTITION BY contact_id ORDER BY is_primary DESC) best_email, FIRST_VALUE(id) OVER (PARTITION BY contact_id ORDER BY is_primary DESC) best_email_id FROM email WHERE email != '' ; +------------+--------------------------+---------------+ | contact_id | best_email | best_email_id | +------------+--------------------------+---------------+ | 1 | currentwilma@example.com | 2 | | 1 | currentwilma@example.com | 2 | | 1 | currentwilma@example.com | 2 | | 2 | betty@example.com | 5 | +------------+--------------------------+---------------+
但我注意到
- 使用n封有效(或至少非空)電子郵件,我得到n行輸出。
- 我不得不複製邏輯:
partition by contact_id
對於每個SELECT
; 這感覺效率低下:如果我在電子郵件表中有 12 個其他列需要其數據,我會執行 12 次,除非我只是在 ID 欄位上執行此操作,然後在該最佳 ID 上再次進行 INNER JOINED 電子郵件。所以為了得到我需要的東西,我最終是這樣的:
SELECT contact_id, MIN(best_email) best_email, MIN(best_email_id) best_email_id FROM ( SELECT contact_id, FIRST_VALUE(email) OVER (PARTITION BY contact_id ORDER BY is_primary DESC) best_email, FIRST_VALUE(id) OVER (PARTITION BY contact_id ORDER BY is_primary DESC) best_email_id FROM email WHERE email != '' ) q GROUP BY contact_id ;
但這感覺效率很低:
MIN()
需要檢查每一行,即使它們都是一樣的。我可以這樣做:SELECT contact_id, best_email, best_email_id FROM ( SELECT contact_id, row_number() OVER (PARTITION BY contact_id) r, FIRST_VALUE(email) OVER (PARTITION BY contact_id ORDER BY is_primary DESC) best_email, FIRST_VALUE(id) OVER (PARTITION BY contact_id ORDER BY is_primary DESC) best_email_id FROM email WHERE email != '' ) q WHERE q.r=1;
但它仍然感覺次優。
這似乎更有效:
SET @nth=0, @c=null; SELECT id, email FROM ( SELECT @nth := IF(@c = contact_id, @nth + 1, 1) r, id, email, @c:=contact_id dummy FROM email WHERE email != '' ORDER BY contact_id, is_primary DESC ) sq WHERE sq.r = 1;
我錯過了什麼嗎?也許這不是視窗函式的正確位置?
我不確定我是否理解這個問題,但我會試一試。這將滿足您的樣本數據和預期結果:
select contact_id, email_id as email_id, email from ( select contact_id, id as email_id, email , row_number() over (partition by contact_id order by is_primary desc) as rn from email where email <> '' ) as t where rn = 1;
由於您使用 row_number 列舉結果集,我不確定您為什麼需要 first_value。
正如您所指出的,FIRST_VALUE 不會以任何方式過濾結果集,它只是使用第一個值擴展每一行。