Postgresql
為什麼 ORDER BY NULLS LAST 會影響主鍵上的查詢計劃?
使用 PostgreSQL 11,我有下表,其中包含大約 4.5 億行:
postgres=> \d+ sales Table "public.sales" Column | Type | Modifiers | Storage | Stats target | Description -----------------------------------+-----------------------------+------------------------------------+----------+--------------+------------- created_terminal_id | integer | not null | plain | | company_id | integer | not null | plain | | customer_id | integer | | plain | | sale_no | character varying(20) | not null | extended | | sale_type | smallint | not null | plain | | source_type | smallint | not null | plain | | sale_date | timestamp without time zone | not null | plain | | paid_amount | numeric(18,4) | not null default 0.0000 | main | | change_amount | numeric(18,4) | not null default 0.0000 | main | | cashup_id | integer | | plain | | staff_id | integer | | plain | | payment_terminal_id | integer | not null | plain | | site_id | integer | not null | plain | | sale_id | integer | not null | plain | | deleted | smallint | default 0 | plain | | is_tax_on | smallint | not null default 1 | plain | | props | text | not null | extended | | modified_time | timestamp without time zone | not null default CURRENT_TIMESTAMP | plain | | sum_line_variation_ex_tax_price | numeric(18,4) | not null default 0.0000 | main | | sum_line_variation_inc_tax_price | numeric(18,4) | not null default 0.0000 | main | | sum_line_quantified_ex_tax_price | numeric(18,4) | not null default 0.0000 | main | | sum_line_quantified_inc_tax_price | numeric(18,4) | not null default 0.0000 | main | | sum_line_subtotal_ex_tax_price | numeric(18,4) | not null default 0.0000 | main | | sum_line_subtotal_inc_tax_price | numeric(18,4) | not null default 0.0000 | main | | sum_line_total_ex_tax_price | numeric(18,4) | not null default 0.0000 | main | | sum_line_total_inc_tax_price | numeric(18,4) | not null default 0.0000 | main | | sum_line_cost_inc_tax_price | numeric(18,4) | not null default 0.0000 | main | | sum_line_cost_ex_tax_price | numeric(18,4) | not null default 0.0000 | main | | sum_payment_tip_price | numeric(18,4) | not null default 0.0000 | main | | order_variation_ex_tax_price | numeric(18,4) | not null default 0.0000 | main | | order_variation_inc_tax_price | numeric(18,4) | not null default 0.0000 | main | | order_total_ex_tax_price | numeric(18,4) | not null default 0.0000 | main | | order_total_inc_tax_price | numeric(18,4) | not null default 0.0000 | main | | order_tip_price | numeric(18,4) | not null default 0.0000 | main | | order_variation_is_percent | smallint | not null default 0 | plain | | order_variation_percent | numeric(18,4) | not null default 1.0000 | main | | order_type | smallint | | plain | | order_tip_is_percent | smallint | not null default 0 | plain | | order_tip_percent | numeric(18,4) | not null default 1.0000 | main | | sale_date_id | integer | not null default 0 | plain | | voided_sale_id | integer | | plain | | voided_sale_date | timestamp without time zone | | plain | | sale_date_utc | timestamp without time zone | | plain | | foo | numeric(18,4) | | main | | bar | numeric(18,4) | not null default 0 | main | | Indexes: "sales_pkey" PRIMARY KEY, btree (sale_id) "idx_unique_sale" UNIQUE CONSTRAINT, btree (created_terminal_id, sale_date, sale_no) "idx_sale_cashup_id" btree (cashup_id) "idx_sale_customer_id" btree (customer_id) "idx_sale_modified_time" btree (modified_time) "idx_sale_payment_terminal_id" btree (payment_terminal_id) "idx_sale_site_date" btree (sale_date) "idx_sale_site_id" btree (site_id) "idx_sale_staff_id" btree (staff_id) "sales_company_id" btree (company_id) "sales_sale_date_id" btree (sale_date_id) Has OIDs: no
執行以下查詢大約需要 35 分鐘:
postgres=> EXPLAIN postgres-> SELECT sales.sale_id as numeric_id, sales.site_id, sales.created_terminal_id as terminal_id, sales.props as props, sales.order_total_inc_tax_price as SaleAmount, sales.staff_id as staff_id, sales.paid_amount as PaidAmount, (sales.order_total_inc_tax_price - sales.order_total_ex_tax_price) as taxAmount, sales.sale_date as SaleDate, sales.sale_no as SaleNo, sales.voided_sale_id as LinkedSaleNo, sales.voided_sale_date as LinkedSaleDate, sales.sale_type as sale_type, sales.deleted postgres-> FROM sales postgres-> WHERE sales.deleted = 0 postgres-> AND sales.site_id = 72620 postgres-> AND order_total_inc_tax_price < 40 postgres-> AND sale_date > '2019-03-08' postgres-> ORDER BY sale_id DESC postgres-> LIMIT 50; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------- Limit (cost=1000.60..76779.68 rows=50 width=189) -> Gather Merge (cost=1000.60..29806430.01 rows=19666 width=189) Workers Planned: 2 -> Parallel Index Scan Backward using sales_pkey on sales (cost=0.57..29803160.04 rows=8194 width=189) Filter: ((order_total_inc_tax_price < '40'::numeric) AND (sale_date > '2019-03-08 00:00:00'::timestamp without time zone) AND (del eted = 0) AND (site_id = 72620)) (5 rows)
但是,如果我改變
ORDER BY sale_id DESC NULLS LAST
我會得到一個完全不同的計劃和巨大的速度提升(查詢只需要幾秒鐘):postgres=> EXPLAIN postgres-> SELECT sales.sale_id as numeric_id, sales.site_id, sales.created_terminal_id as terminal_id, sales.props as props, sales.order_total_inc_tax_price as SaleAmount, sales.staff_id as staff_id, sales.paid_amount as PaidAmount, (sales.order_total_inc_tax_price - sales.order_total_ex_tax_price) as taxAmount, sales.sale_date as SaleDate, sales.sale_no as SaleNo, sales.voided_sale_id as LinkedSaleNo, sales.voided_sale_date as LinkedSaleDate, sales.sale_type as sale_type, sales.deleted postgres-> FROM sales postgres-> WHERE sales.deleted = 0 postgres-> AND sales.site_id = 72620 postgres-> AND order_total_inc_tax_price < 40 postgres-> AND sale_date > '2019-03-08' postgres-> ORDER BY sale_id DESC NULLS LAST postgres-> LIMIT 50; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- ----------- Limit (cost=216622.14..216622.26 rows=50 width=189) -> Sort (cost=216622.14..216671.30 rows=19666 width=189) Sort Key: sale_id DESC NULLS LAST -> Index Scan using idx_sale_site_id on sales (cost=0.57..215968.85 rows=19666 width=189) Index Cond: (site_id = 72620) Filter: ((order_total_inc_tax_price < '40'::numeric) AND (sale_date > '2019-03-08 00:00:00'::timestamp without time zone) AND (del eted = 0)) (6 rows)
我正在按不能包含 NULL 的 PRIMARY KEY 排序,那麼為什麼查詢計劃程序會做出這個選擇?
值得注意的是,此伺服器僅用於基準測試,數據庫沒有其他負載。
ANALYZE
在載入數據後也執行。
ORDER BY
如果索引與 指定的順序相同,則索引只能用於處理子句而不進行排序ORDER BY
。現在您的索引(預設情況下)是 sorted
ASC NULLS LAST
,並且由於可以雙向掃描索引,因此它可以同時支持ORDER BY sale_id ASC NULLS LAST
和ORDER BY sale_id DESC NULLS FIRST
。但是由於順序不同,所以不能支持ORDER BY sale_id DESC NULLS LAST
.規劃器不考慮
NOT NULL
列定義的 。這是在 中確定build_index_pathkeys
的src/backend/optimizer/path/pathkeys.c
:if (ScanDirectionIsBackward(scandir)) { reverse_sort = !index->reverse_sort[i]; nulls_first = !index->nulls_first[i]; } else { reverse_sort = index->reverse_sort[i]; nulls_first = index->nulls_first[i]; } /* * OK, try to make a canonical pathkey for this sort key. Note we're * underneath any outer joins, so nullable_relids should be NULL. */ cpathkey = make_pathkey_from_sortinfo(root, indexkey, NULL, index->sortopfamily[i], index->opcintype[i], index->indexcollations[i], reverse_sort, nulls_first, 0, index->rel->relids, false);
我不知道在這裡考慮可空性是多麼容易,但目前還沒有完成。
也許你可以向 pgsql-hackers 郵件列表建議。