Sqlite

隨機值重複,無法選擇N個隨機配對,子查詢不刷新 - sqlite

  • November 15, 2021

問題

使用 SQLite v3.35.4 和 v3.36.0 我有一個first_name表和一個surname包含常用名稱列表的表。我想在一個新表中生成 N 個配對。

我寫了這個遞歸查詢:

WITH RECURSIVE
cte(first_name, surname) AS (
   SELECT first_name, surname from ( -- always returns the same value
       select first_name, surname from (select first_name from first_name order by random() limit 1)
       join (select surname from surname order by random() limit 1)
   )
   UNION ALL
   SELECT first_name, surname
   FROM cte
   LIMIT 2000
)
SELECT first_name, surname FROM cte;

不幸的是,輸出看起來像這樣:

+------------+---------+
| first_name | surname |
+------------+---------+
| james      | smith   |
| james      | smith   |
| james      | smith   |
| ---------- | ------- |
| ...        | ...     |
+------------+---------+

我試過的

在查看 SQLite 文件後,我嘗試了遞歸 CTE 和子查詢展平部分NOT MATERIALIZED概述的幾個條件。我將隨機名稱選擇放在一個視圖中。然而,這些都沒有對結果產生積極影響。

有沒有辦法執行我正在嘗試做的事情?

*編輯

我嘗試了一個視窗函式並從 where 子句中隨機選擇名稱,但沒有成功:(其中 1998 是表的大小)

with recursive
r_first_name as (
   select first_name, ROW_NUMBER() over(order by random()) as rn from first_name
),
r_surname as (
   select surname, ROW_NUMBER() over(order by random()) as rn from surname
),
rcte(first_name, surname) as (
   select first_name, surname from r_first_name rf
   join r_surname rs on rs.rn = (select abs(random() % 1998))
   where rf.rn = (select abs(random() % 1998))
   union all
   select first_name, surname from rcte
   limit 3000
)
select * from rcte

!!!解決方案 !!!

在查看了類似問題的答案後。

我發現在 CTE 的遞歸方面,arandom()將成功更新。雖然不幸的是,它在嵌套在子查詢中時不會更新,但如果它位於 CTE 遞歸的“根”,我可以利用它來獲取隨機數。

以下是我開發的解決方案。它符合我的特定案例,並且與交叉連接相比性能相對較高:

WITH RECURSIVE
cte AS (
      select abs(random()) % (select count(*) from first_name) as first_name_num, abs(random()) % (select count(*) from surname) as surname_num
      union all
      select abs(random()) % (select count(*) from first_name) as first_name_num, abs(random()) % (select count(*) from surname) as surname_num from cte
       LIMIT 6000  
),
result as (
   select * from cte
   join (select first_name, ROW_NUMBER() over (order by random()) as rn from first_name) fn -- this is always the same result
   on cte.first_name_num = fn.rn
   join (select surname, ROW_NUMBER() over (order by random()) as rn from surname) sn -- this updates every loop around except subqueries are compiled/cached or something so they are unusable here if you want updated values
   on cte.surname_num = sn.rn
)
select first_name, surname from result

我想在一個新表中生成 N 個配對。 ”為什麼不簡單地做一個這樣的CROSS JOINwithLIMIT子句呢?

SELECT fn.first_name, sn.surname
FROM first_name fn
CROSS JOIN surname sn
LIMIT 2000;

這應該會產生一個不同的 2,000 行列表first_namesurname組合(假設您在每個表中都沒有重複)。這只是偽隨機的,因為結果是不確定的,因為沒有ORDER BY子句。但它比查詢要簡單得多,尤其是當您只需要載入一次表時。

您還可以添加一個帶有函式的ORDER BY子句來提高上述查詢的隨機性,如下所示:RANDOM()

SELECT fn.first_name, sn.surname
FROM first_name fn
CROSS JOIN surname sn
ORDER BY RANDOM()
LIMIT 2000;

我的直覺告訴我,這也不是完全隨機的,但應該比我的第一個查詢好得多,如果你確實需要一些隨機性的話。

注意我還建議不要將列命名為與它們所屬的表完全相同的名稱。這可能會在語法上造成混淆,並使可讀性更加困難。就我個人而言,我有一個names帶有欄位的表來表示它的名稱類型(名字與姓氏等)。但是,如果您想要兩個顯式表,那麼我只需將列命名為通用名稱,name因為當您選擇first_name.name.

引用自:https://dba.stackexchange.com/questions/302614