Postgresql
如何在子查詢中應用外部限制偏移量和過濾器,以避免對 Postgresql 中子查詢中使用的完整表進行分組
我有類似於以下的舊表:
員工
------------------------------------ | employee_id | name ------------------------------------ | 1 | David | 2 | Mathew ------------------------------------
工資單
------------------------------------- | employee_id | salary ------------------------------------- | 2 | 200000 | 3 | 90000 -------------------------------------
在加入和過濾後,我想獲得以下數據:
----------------------------------------------------------- | address_id | employee_id | address ----------------------------------------------------------- | 1 | 2 | street 1, NY | 2 | 2 | street 2, DC ------------------------------------------------------------
我有以下查詢:
SELECT employee_id, salary, address_arr FROM employee LEFT JOIN payroll on payroll.employee_id = employee.employee_id INNER JOIN ( SELECT employee_id, ARRAY_AGG(address) as address_arr FROM addresses GROUP BY employee_id ) table_address ON table_address.employee_id = employee.employee_id WHERE employee.employee_id < 1000000 LIMIT 100 OFFSET 0
上面的查詢給出了所需的輸出,但高度未優化,因為在
GROUP BY
用於JOIN
操作外部查詢之前,操作發生在完整的地址表上。請回答:
- 我們如何避免
GROUP BY
使用LIMIT OFFSET
外部查詢對完整的地址表進行操作?- 條件
WHERE employee.employee_id < 1000000
是在內部查詢操作之前還是之後應用於子GROUP BY
查詢。如果在 之後應用條件GROUP BY
,我們如何避免這種情況?
JOIN
注意:實際使用的查詢中有多個s和子查詢。
我不確定這是否真的更有效,但您可以嘗試加入應用限制的派生表。
select emp.employee_id, emp.salary, adr.address_arr from ( SELECT employee_id, salary, address_arr FROM employee LEFT JOIN payroll on payroll.employee_id = employee.employee_id WHERE employee.employee_id < 1000000 LIMIT 100 OFFSET 0 ) as emp JOIN ( SELECT a.employee_id, ARRAY_AGG(a.address) as address_arr FROM addresses a GROUP BY employee_id ) as adr ON adr.employee_id = emp.employee_id;
第一個派生表只選擇 100 行,然後應該只對這 100 名員工進行連接/分組。
如果優化器沒有下推,您可以嘗試橫向連接來“強制”下推:
select emp.employee_id, emp.salary, adr.address_arr from ( SELECT employee_id, salary, address_arr FROM employee LEFT JOIN payroll on payroll.employee_id = employee.employee_id WHERE employee.employee_id < 1000000 LIMIT 100 OFFSET 0 ) as emp LATERAL JOIN ( SELECT a.employee_id, ARRAY_AGG(a.address) as address_arr FROM addresses a WHERE a.employee_id = emp.employee_id GROUP BY employee_id ) as adr ON adr.employee_id = emp.employee_id;
加入條件並不是真正需要的,但也沒有什麼壞處
也許
SELECT employee.employee_id, payroll.salary, ARRAY_AGG(addresses.address) FROM employee INNER JOIN addresses ON addresses.employee_id = employee.employee_id LEFT JOIN payroll on payroll.employee_id = employee.employee_id WHERE employee.employee_id < 1000000 GROUP BY employee.employee_id LIMIT 100 OFFSET 0
?
而且 - 你真的需要在表中沒有適當記錄
payroll
導致 NULL 的記錄payroll.salary
嗎?也許,INNER JOIN 就足夠了?