Postgresql

PostgreSQL 索引記憶體

  • August 25, 2021

我很難找到關於如何在 PostgreSQL 中記憶體索引的“普通”解釋,所以我想對這些假設中的任何一個或所有假設進行現實檢查:

  1. PostgreSQL 索引,就像行一樣,存在於磁碟上,但可能會被記憶體。
  2. 索引可能完全在記憶體中,也可能根本不在。
  3. 它是否被記憶體取決於它的使用頻率(由查詢計劃器定義)。
  4. 出於這個原因,大多數“明智的”索引將一直在記憶體中。
  5. 索引與buffer cache行位於相同的記憶體(?)中,因此索引使用的記憶體空間對行不可用。

我理解這一點的動機來自另一個問題,我問過在哪裡建議可以在大多數數據永遠不會被訪問的表上使用

*部分索引。*在進行此操作之前,我想明確一點,使用部分索引會產生兩個優點:

  1. 我們減少了記憶體中索引的大小,為記憶體中的行本身釋放了更多空間。
  2. 我們減小了 B-Tree 的大小,從而加快了查詢響應。

玩了一下pg_buffercache,我可以回答你的一些問題。

  1. 這很明顯,但**(5)的結果也表明答案是肯定的**
  2. 我還沒有為此建立一個很好的例子,現在它比沒有更多:)(請參閱下面的編輯,答案是否定的。)
  3. 由於計劃者決定是否使用索引,我們可以說YES,它決定記憶體(但這更複雜)
  4. 記憶體的確切細節可以從原始碼中獲得,我在這個主題上找不到太多,除了這個(也見作者的回答)。但是,我很確定這又比簡單的“是”或“否”要復雜得多。(同樣,從我的編輯中你可以得到一些想法——由於記憶體大小是有限的,那些“合理”的索引會競爭可用空間。如果它們太多,它們會從記憶體中互相踢掉——所以答案是NO。 )
  5. 作為一個簡單的pg_buffercache節目查詢,答案是肯定的YES。值得注意的是,臨時表數據不會在這裡記憶體。

編輯

我找到了 Jeremiah Peschka關於表和索引儲存的精彩文章。有了那裡的資訊,我也可以回答**(2)**。我設置了一個小測試,所以你可以自己檢查這些。

-- we will need two extensions
CREATE EXTENSION pg_buffercache;
CREATE EXTENSION pageinspect;


-- a very simple test table
CREATE TABLE index_cache_test (
     id serial
   , blah text
);


-- I am a bit megalomaniac here, but I will use this for other purposes as well
INSERT INTO index_cache_test
SELECT i, i::text || 'a'
FROM generate_series(1, 1000000) a(i);


-- let's create the index to be cached
CREATE INDEX idx_cache_test ON index_cache_test (id);


-- now we can have a look at what is cached
SELECT c.relname,count(*) AS buffers
FROM 
   pg_class c 
   INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode 
   INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
GROUP BY c.relname
ORDER BY 2 DESC LIMIT 10;

            relname              | buffers
----------------------------------+---------
index_cache_test                 |    2747
pg_statistic_relid_att_inh_index |       4
pg_operator_oprname_l_r_n_index  |       4
... (others are all pg_something, which are not interesting now)

-- this shows that the whole table is cached and our index is not in use yet

-- now we can check which row is where in our index
-- in the ctid column, the first number shows the page, so 
-- all rows starting with the same number are stored in the same page
SELECT * FROM bt_page_items('idx_cache_test', 1);

itemoffset |  ctid   | itemlen | nulls | vars |          data
------------+---------+---------+-------+------+-------------------------
         1 | (1,164) |      16 | f     | f    | 6f 01 00 00 00 00 00 00
         2 | (0,1)   |      16 | f     | f    | 01 00 00 00 00 00 00 00
         3 | (0,2)   |      16 | f     | f    | 02 00 00 00 00 00 00 00
         4 | (0,3)   |      16 | f     | f    | 03 00 00 00 00 00 00 00
         5 | (0,4)   |      16 | f     | f    | 04 00 00 00 00 00 00 00
         6 | (0,5)   |      16 | f     | f    | 05 00 00 00 00 00 00 00
...
        64 | (0,63)  |      16 | f     | f    | 3f 00 00 00 00 00 00 00
        65 | (0,64)  |      16 | f     | f    | 40 00 00 00 00 00 00 00

-- with the information obtained, we can write a query which is supposed to
-- touch only a single page of the index
EXPLAIN (ANALYZE, BUFFERS) 
   SELECT id 
   FROM index_cache_test 
   WHERE id BETWEEN 10 AND 20 ORDER BY id
;

Index Scan using idx_test_cache on index_cache_test  (cost=0.00..8.54 rows=9 width=4) (actual time=0.031..0.042 rows=11 loops=1)
  Index Cond: ((id >= 10) AND (id <= 20))
  Buffers: shared hit=4
Total runtime: 0.094 ms
(4 rows)

-- let's have a look at the cache again (the query remains the same as above)
            relname              | buffers
----------------------------------+---------
index_cache_test                 |    2747
idx_test_cache                   |       4
...

-- and compare it to a bigger index scan:
EXPLAIN (ANALYZE, BUFFERS) 
SELECT id 
   FROM index_cache_test 
   WHERE id <= 20000 ORDER BY id
;


Index Scan using idx_test_cache on index_cache_test  (cost=0.00..666.43 rows=19490 width=4) (actual time=0.072..19.921 rows=20000 loops=1)
  Index Cond: (id <= 20000)
  Buffers: shared hit=4 read=162
Total runtime: 24.967 ms
(4 rows)

-- this already shows that something was in the cache and further pages were read from disk
-- but to be sure, a final glance at cache contents:

            relname              | buffers
----------------------------------+---------
index_cache_test                 |    2691
idx_test_cache                   |      58

-- note that some of the table pages are disappeared
-- but, more importantly, a bigger part of our index is now cached

總而言之,這表明索引和表可以逐頁記憶體,因此**(2)的答案是否定**的。

最後一個來說明臨時表在此處未記憶體:

CREATE TEMPORARY TABLE tmp_cache_test AS 
SELECT * FROM index_cache_test ORDER BY id FETCH FIRST 20000 ROWS ONLY;

EXPLAIN (ANALYZE, BUFFERS) SELECT id FROM tmp_cache_test ORDER BY id;

-- checking the buffer cache now shows no sign of the temp table

引用自:https://dba.stackexchange.com/questions/25513