Postgresql

如何在子查詢中應用外部限制偏移量和過濾器,以避免對 Postgresql 中子查詢中使用的完整表進行分組

  • February 23, 2019

我有類似於以下的舊表:

員工

------------------------------------
|  employee_id  |   name    
------------------------------------
|  1            |   David    
|  2            |  Mathew  
------------------------------------

工資單

-------------------------------------
|  employee_id  | salary      
-------------------------------------
|   2           |  200000    
|   3           |  90000      
-------------------------------------

在加入和過濾後,我想獲得以下數據:

-----------------------------------------------------------
|  address_id  | employee_id  |  address        
-----------------------------------------------------------
|   1          |  2           |  street 1, NY  
|   2          |  2           |  street 2, DC  
------------------------------------------------------------

我有以下查詢:

SELECT employee_id, salary, address_arr 
FROM employee 
LEFT JOIN payroll on payroll.employee_id = employee.employee_id
INNER JOIN 
    (
      SELECT employee_id, ARRAY_AGG(address) as address_arr 
      FROM addresses 
      GROUP BY employee_id
    ) table_address ON table_address.employee_id = employee.employee_id
WHERE employee.employee_id < 1000000
LIMIT 100
OFFSET 0

上面的查詢給出了所需的輸出,但高度未優化,因為在GROUP BY用於JOIN操作外部查詢之前,操作發生在完整的地址表上。

請回答:

  1. 我們如何避免GROUP BY使用LIMIT OFFSET外部查詢對完整的地址表進行操作?
  2. 條件WHERE employee.employee_id < 1000000是在內部查詢操作之前還是之後應用於子GROUP BY查詢。如果在 之後應用條件GROUP BY,我們如何避免這種情況?

JOIN注意:實際使用的查詢中有多個s和子查詢。

我不確定這是否真的更有效,但您可以嘗試加入應用限制的派生表。

select emp.employee_id, emp.salary, adr.address_arr 
from (
 SELECT employee_id, salary, address_arr 
 FROM employee 
   LEFT JOIN payroll on payroll.employee_id = employee.employee_id
 WHERE employee.employee_id < 1000000
 LIMIT 100
 OFFSET 0
) as emp
 JOIN (
    SELECT a.employee_id, ARRAY_AGG(a.address) as address_arr 
    FROM addresses a
    GROUP BY employee_id
 ) as adr ON adr.employee_id = emp.employee_id;

第一個派生表只選擇 100 行,然後應該只對這 100 名員工進行連接/分組。

如果優化器沒有下推,您可以嘗試橫向連接來“強制”下推:

select emp.employee_id, emp.salary, adr.address_arr 
from (
 SELECT employee_id, salary, address_arr 
 FROM employee 
   LEFT JOIN payroll on payroll.employee_id = employee.employee_id
 WHERE employee.employee_id < 1000000
 LIMIT 100
 OFFSET 0
) as emp
 LATERAL JOIN (
    SELECT a.employee_id, ARRAY_AGG(a.address) as address_arr 
    FROM addresses a
    WHERE a.employee_id = emp.employee_id
    GROUP BY employee_id
 ) as adr ON adr.employee_id = emp.employee_id;

加入條件並不是真正需要的,但也沒有什麼壞處

也許

SELECT employee.employee_id, payroll.salary, ARRAY_AGG(addresses.address)
FROM employee 
INNER JOIN addresses ON addresses.employee_id = employee.employee_id
LEFT JOIN payroll on payroll.employee_id = employee.employee_id
WHERE employee.employee_id < 1000000
GROUP BY employee.employee_id
LIMIT 100
OFFSET 0

?

而且 - 你真的需要在表中沒有適當記錄payroll導致 NULL 的記錄payroll.salary嗎?也許,INNER JOIN 就足夠了?

引用自:https://dba.stackexchange.com/questions/229214