Postgresql

為什麼 ORDER BY NULLS LAST 會影響主鍵上的查詢計劃?

  • December 4, 2019

使用 PostgreSQL 11,我有下表,其中包含大約 4.5 億行:

postgres=> \d+ sales
                                                            Table "public.sales"
             Column               |            Type             |             Modifiers              | Storage  | Stats target | Description
-----------------------------------+-----------------------------+------------------------------------+----------+--------------+-------------
created_terminal_id               | integer                     | not null                           | plain    |              |
company_id                        | integer                     | not null                           | plain    |              |
customer_id                       | integer                     |                                    | plain    |              |
sale_no                           | character varying(20)       | not null                           | extended |              |
sale_type                         | smallint                    | not null                           | plain    |              |
source_type                       | smallint                    | not null                           | plain    |              |
sale_date                         | timestamp without time zone | not null                           | plain    |              |
paid_amount                       | numeric(18,4)               | not null default 0.0000            | main     |              |
change_amount                     | numeric(18,4)               | not null default 0.0000            | main     |              |
cashup_id                         | integer                     |                                    | plain    |              |
staff_id                          | integer                     |                                    | plain    |              |
payment_terminal_id               | integer                     | not null                           | plain    |              |
site_id                           | integer                     | not null                           | plain    |              |
sale_id                           | integer                     | not null                           | plain    |              |
deleted                           | smallint                    | default 0                          | plain    |              |
is_tax_on                         | smallint                    | not null default 1                 | plain    |              |
props                             | text                        | not null                           | extended |              |
modified_time                     | timestamp without time zone | not null default CURRENT_TIMESTAMP | plain    |              |
sum_line_variation_ex_tax_price   | numeric(18,4)               | not null default 0.0000            | main     |              |
sum_line_variation_inc_tax_price  | numeric(18,4)               | not null default 0.0000            | main     |              |
sum_line_quantified_ex_tax_price  | numeric(18,4)               | not null default 0.0000            | main     |              |
sum_line_quantified_inc_tax_price | numeric(18,4)               | not null default 0.0000            | main     |              |
sum_line_subtotal_ex_tax_price    | numeric(18,4)               | not null default 0.0000            | main     |              |
sum_line_subtotal_inc_tax_price   | numeric(18,4)               | not null default 0.0000            | main     |              |
sum_line_total_ex_tax_price       | numeric(18,4)               | not null default 0.0000            | main     |              |
sum_line_total_inc_tax_price      | numeric(18,4)               | not null default 0.0000            | main     |              |
sum_line_cost_inc_tax_price       | numeric(18,4)               | not null default 0.0000            | main     |              |
sum_line_cost_ex_tax_price        | numeric(18,4)               | not null default 0.0000            | main     |              |
sum_payment_tip_price             | numeric(18,4)               | not null default 0.0000            | main     |              |
order_variation_ex_tax_price      | numeric(18,4)               | not null default 0.0000            | main     |              |
order_variation_inc_tax_price     | numeric(18,4)               | not null default 0.0000            | main     |              |
order_total_ex_tax_price          | numeric(18,4)               | not null default 0.0000            | main     |              |
order_total_inc_tax_price         | numeric(18,4)               | not null default 0.0000            | main     |              |
order_tip_price                   | numeric(18,4)               | not null default 0.0000            | main     |              |
order_variation_is_percent        | smallint                    | not null default 0                 | plain    |              |
order_variation_percent           | numeric(18,4)               | not null default 1.0000            | main     |              |
order_type                        | smallint                    |                                    | plain    |              |
order_tip_is_percent              | smallint                    | not null default 0                 | plain    |              |
order_tip_percent                 | numeric(18,4)               | not null default 1.0000            | main     |              |
sale_date_id                      | integer                     | not null default 0                 | plain    |              |
voided_sale_id                    | integer                     |                                    | plain    |              |
voided_sale_date                  | timestamp without time zone |                                    | plain    |              |
sale_date_utc                     | timestamp without time zone |                                    | plain    |              |
foo                               | numeric(18,4)               |                                    | main     |              |
bar                               | numeric(18,4)               | not null default 0                 | main     |              |
Indexes:
   "sales_pkey" PRIMARY KEY, btree (sale_id)
   "idx_unique_sale" UNIQUE CONSTRAINT, btree (created_terminal_id, sale_date, sale_no)
   "idx_sale_cashup_id" btree (cashup_id)
   "idx_sale_customer_id" btree (customer_id)
   "idx_sale_modified_time" btree (modified_time)
   "idx_sale_payment_terminal_id" btree (payment_terminal_id)
   "idx_sale_site_date" btree (sale_date)
   "idx_sale_site_id" btree (site_id)
   "idx_sale_staff_id" btree (staff_id)
   "sales_company_id" btree (company_id)
   "sales_sale_date_id" btree (sale_date_id)
Has OIDs: no

執行以下查詢大約需要 35 分鐘:

postgres=> EXPLAIN
postgres-> SELECT sales.sale_id as numeric_id, sales.site_id, sales.created_terminal_id  as terminal_id, sales.props as props, sales.order_total_inc_tax_price as SaleAmount, sales.staff_id as staff_id, sales.paid_amount as PaidAmount, (sales.order_total_inc_tax_price - sales.order_total_ex_tax_price) as taxAmount, sales.sale_date as SaleDate, sales.sale_no as SaleNo, sales.voided_sale_id as LinkedSaleNo, sales.voided_sale_date as LinkedSaleDate, sales.sale_type as sale_type, sales.deleted
postgres-> FROM sales
postgres-> WHERE sales.deleted = 0
postgres->   AND sales.site_id = 72620
postgres->   AND order_total_inc_tax_price < 40
postgres->   AND sale_date > '2019-03-08'
postgres-> ORDER BY sale_id DESC
postgres-> LIMIT 50;
                                                                                   QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------
Limit  (cost=1000.60..76779.68 rows=50 width=189)
  ->  Gather Merge  (cost=1000.60..29806430.01 rows=19666 width=189)
        Workers Planned: 2
        ->  Parallel Index Scan Backward using sales_pkey on sales  (cost=0.57..29803160.04 rows=8194 width=189)
              Filter: ((order_total_inc_tax_price < '40'::numeric) AND (sale_date > '2019-03-08 00:00:00'::timestamp without time zone) AND (del
eted = 0) AND (site_id = 72620))
(5 rows)

但是,如果我改變ORDER BY sale_id DESC NULLS LAST我會得到一個完全不同的計劃和巨大的速度提升(查詢只需要幾秒鐘):

postgres=> EXPLAIN
postgres-> SELECT sales.sale_id as numeric_id, sales.site_id, sales.created_terminal_id  as terminal_id, sales.props as props, sales.order_total_inc_tax_price as SaleAmount, sales.staff_id as staff_id, sales.paid_amount as PaidAmount, (sales.order_total_inc_tax_price - sales.order_total_ex_tax_price) as taxAmount, sales.sale_date as SaleDate, sales.sale_no as SaleNo, sales.voided_sale_id as LinkedSaleNo, sales.voided_sale_date as LinkedSaleDate, sales.sale_type as sale_type, sales.deleted
postgres-> FROM sales
postgres-> WHERE sales.deleted = 0
postgres->   AND sales.site_id = 72620
postgres->   AND order_total_inc_tax_price < 40
postgres->   AND sale_date > '2019-03-08'
postgres-> ORDER BY sale_id DESC NULLS LAST
postgres-> LIMIT 50;
                                                                        QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------
-----------
Limit  (cost=216622.14..216622.26 rows=50 width=189)
  ->  Sort  (cost=216622.14..216671.30 rows=19666 width=189)
        Sort Key: sale_id DESC NULLS LAST
        ->  Index Scan using idx_sale_site_id on sales  (cost=0.57..215968.85 rows=19666 width=189)
              Index Cond: (site_id = 72620)
              Filter: ((order_total_inc_tax_price < '40'::numeric) AND (sale_date > '2019-03-08 00:00:00'::timestamp without time zone) AND (del
eted = 0))
(6 rows)

我正在按不能包含 NULL 的 PRIMARY KEY 排序,那麼為什麼查詢計劃程序會做出這個選擇?

值得注意的是,此伺服器僅用於基準測試,數據庫沒有其他負載。ANALYZE在載入數據後也執行。

ORDER BY如果索引與 指定的順序相同,則索引只能用於處理子句而不進行排序ORDER BY

現在您的索引(預設情況下)是 sorted ASC NULLS LAST,並且由於可以雙向掃描索引,因此它可以同時支持ORDER BY sale_id ASC NULLS LASTORDER BY sale_id DESC NULLS FIRST。但是由於順序不同,所以不能支持ORDER BY sale_id DESC NULLS LAST.

規劃器不考慮NOT NULL列定義的 。這是在 中確定build_index_pathkeyssrc/backend/optimizer/path/pathkeys.c

   if (ScanDirectionIsBackward(scandir))
   {
       reverse_sort = !index->reverse_sort[i];
       nulls_first = !index->nulls_first[i];
   }
   else
   {
       reverse_sort = index->reverse_sort[i];
       nulls_first = index->nulls_first[i];
   }

   /*
    * OK, try to make a canonical pathkey for this sort key.  Note we're
    * underneath any outer joins, so nullable_relids should be NULL.
    */
   cpathkey = make_pathkey_from_sortinfo(root,
                                         indexkey,
                                         NULL,
                                         index->sortopfamily[i],
                                         index->opcintype[i],
                                         index->indexcollations[i],
                                         reverse_sort,
                                         nulls_first,
                                         0,
                                         index->rel->relids,
                                         false);

我不知道在這裡考慮可空性是多麼容易,但目前還沒有完成。

也許你可以向 pgsql-hackers 郵件列表建議。

引用自:https://dba.stackexchange.com/questions/254731