Postgresql

Levenshtein 函式根據限制值返回不同的結果

  • October 6, 2020

Levenshtein 函式沒有像我預期的那樣工作。有什麼我不明白的嗎?

這是查詢:

SELECT c0.id, c0.engine_type, c0.mpg, c0.kwh, c0.price, c0.make, c0.model, c0.vin, c0.inserted_at, c0.updated_at 
FROM cars AS c0 ORDER BY LEAST(levenshtein(c0.model, 'Camry'), levenshtein(c0.make, 'Toyota')) 
LIMIT 5

執行此查詢將返回以下數據:

5   "electric"      257 32288   "Toyota"    "Camry" "SW081452D50423138" "2020-10-06 14:48:27"   "2020-10-06 14:48:27"
20  "gasoline"  83      68851   "Toyota"    "Camry" "643VN327D4ZH04928" "2020-10-06 14:48:27"   "2020-10-06 14:48:27"
4   "gasoline"  74      74482   "Toyota"    "Corolla"   "1K48R780410S27945" "2020-10-06 14:48:27"   "2020-10-06 14:48:27"
10  "gasoline"  73      87040   "Dodge" "Ram"   "J22782VG240639409" "2020-10-06 14:48:27"   "2020-10-06 14:48:27"
3   "electric"      116 66560   "Audi"  "A5"    "94V5772ZB4BJ23179" "2020-10-06 14:48:27"   "2020-10-06 14:48:27"

正如您在上面看到的,前兩場比賽是豐田凱美瑞;我希望從查詢中得到。但是,LIMIT例如,當我將屬性更改為 10 時,

SELECT c0.id, c0.engine_type, c0.mpg, c0.kwh, c0.price, c0.make, c0.model, c0.vin, c0.inserted_at, c0.updated_at 
FROM cars AS c0 ORDER BY LEAST(levenshtein(c0.model, 'Camry'), levenshtein(c0.make, 'Toyota')) 
LIMIT 10

我得到不同的結果:

4   "gasoline"  74      74482   "Toyota"    "Corolla"   "1K48R780410S27945" "2020-10-06 14:48:27"   "2020-10-06 14:48:27"
5   "electric"      257 32288   "Toyota"    "Camry" "SW081452D50423138" "2020-10-06 14:48:27"   "2020-10-06 14:48:27"
20  "gasoline"  83      68851   "Toyota"    "Camry" "643VN327D4ZH04928" "2020-10-06 14:48:27"   "2020-10-06 14:48:27"
10  "gasoline"  73      87040   "Dodge" "Ram"   "J22782VG240639409" "2020-10-06 14:48:27"   "2020-10-06 14:48:27"
7   "electric"      274 41661   "Dodge" "Charger"   "FDND794KFW0179068" "2020-10-06 14:48:27"   "2020-10-06 14:48:27"
8   "gasoline"  57      42369   "BMW"   "M3"    "NS7V3N1VW5J508253" "2020-10-06 14:48:27"   "2020-10-06 14:48:27"
9   "electric"      214 15710   "BMW"   "X5"    "3VUFCG07ATW125829" "2020-10-06 14:48:27"   "2020-10-06 14:48:27"
11  "electric"      417 63167   "Nissan"    "Juke"  "6800ULHC7H0857158" "2020-10-06 14:48:27"   "2020-10-06 14:48:27"
12  "gasoline"  78      21059   "Lincoln"   "MKX"   "AFUCF3SUG6W287040" "2020-10-06 14:48:27"   "2020-10-06 14:48:27"
13  "electric"      348 93954   "Lincoln"   "MKS"   "2L64A6Z18XR348145" "2020-10-06 14:48:27"   "2020-10-06 14:48:27"

在上面,出於我無法理解的原因,豐田卡羅拉排在兩輛凱美瑞之前。當我明確搜尋“Toyota Camry”並將查詢返回的行數限制為 10 時,為什麼“Toyota Corolla”的 Levenshtein 距離小於“Toyota Camry”?

知道為什麼嗎?

least()函式返回兩個參數中較小的一個;兩組行都與“Toyota”完美匹配,其距離為 0。因此,3 行中每一行的最終評估 ORDER BY 值為 0。

將每個參數的附加 ORDER BY 表達式添加到 least() 應該可以滿足您的要求:ORDER BY LEAST( levenshtein(c0.model, 'Camry'), levenshtein(c0.make, 'Toyota')), levenshtein(c0.model, 'Camry'), levenshtein(c0.model, 'Toyota')首先返回所有凱美瑞,然後是任何非凱美瑞豐田,然後是其最小 () 表達式評估為高於 0 的行。

引用自:https://dba.stackexchange.com/questions/276646