Postgresql
Levenshtein 函式根據限制值返回不同的結果
Levenshtein 函式沒有像我預期的那樣工作。有什麼我不明白的嗎?
這是查詢:
SELECT c0.id, c0.engine_type, c0.mpg, c0.kwh, c0.price, c0.make, c0.model, c0.vin, c0.inserted_at, c0.updated_at FROM cars AS c0 ORDER BY LEAST(levenshtein(c0.model, 'Camry'), levenshtein(c0.make, 'Toyota')) LIMIT 5
執行此查詢將返回以下數據:
5 "electric" 257 32288 "Toyota" "Camry" "SW081452D50423138" "2020-10-06 14:48:27" "2020-10-06 14:48:27" 20 "gasoline" 83 68851 "Toyota" "Camry" "643VN327D4ZH04928" "2020-10-06 14:48:27" "2020-10-06 14:48:27" 4 "gasoline" 74 74482 "Toyota" "Corolla" "1K48R780410S27945" "2020-10-06 14:48:27" "2020-10-06 14:48:27" 10 "gasoline" 73 87040 "Dodge" "Ram" "J22782VG240639409" "2020-10-06 14:48:27" "2020-10-06 14:48:27" 3 "electric" 116 66560 "Audi" "A5" "94V5772ZB4BJ23179" "2020-10-06 14:48:27" "2020-10-06 14:48:27"
正如您在上面看到的,前兩場比賽是豐田凱美瑞;我希望從查詢中得到。但是,
LIMIT
例如,當我將屬性更改為 10 時,SELECT c0.id, c0.engine_type, c0.mpg, c0.kwh, c0.price, c0.make, c0.model, c0.vin, c0.inserted_at, c0.updated_at FROM cars AS c0 ORDER BY LEAST(levenshtein(c0.model, 'Camry'), levenshtein(c0.make, 'Toyota')) LIMIT 10
我得到不同的結果:
4 "gasoline" 74 74482 "Toyota" "Corolla" "1K48R780410S27945" "2020-10-06 14:48:27" "2020-10-06 14:48:27" 5 "electric" 257 32288 "Toyota" "Camry" "SW081452D50423138" "2020-10-06 14:48:27" "2020-10-06 14:48:27" 20 "gasoline" 83 68851 "Toyota" "Camry" "643VN327D4ZH04928" "2020-10-06 14:48:27" "2020-10-06 14:48:27" 10 "gasoline" 73 87040 "Dodge" "Ram" "J22782VG240639409" "2020-10-06 14:48:27" "2020-10-06 14:48:27" 7 "electric" 274 41661 "Dodge" "Charger" "FDND794KFW0179068" "2020-10-06 14:48:27" "2020-10-06 14:48:27" 8 "gasoline" 57 42369 "BMW" "M3" "NS7V3N1VW5J508253" "2020-10-06 14:48:27" "2020-10-06 14:48:27" 9 "electric" 214 15710 "BMW" "X5" "3VUFCG07ATW125829" "2020-10-06 14:48:27" "2020-10-06 14:48:27" 11 "electric" 417 63167 "Nissan" "Juke" "6800ULHC7H0857158" "2020-10-06 14:48:27" "2020-10-06 14:48:27" 12 "gasoline" 78 21059 "Lincoln" "MKX" "AFUCF3SUG6W287040" "2020-10-06 14:48:27" "2020-10-06 14:48:27" 13 "electric" 348 93954 "Lincoln" "MKS" "2L64A6Z18XR348145" "2020-10-06 14:48:27" "2020-10-06 14:48:27"
在上面,出於我無法理解的原因,豐田卡羅拉排在兩輛凱美瑞之前。當我明確搜尋“Toyota Camry”並將查詢返回的行數限制為 10 時,為什麼“Toyota Corolla”的 Levenshtein 距離小於“Toyota Camry”?
知道為什麼嗎?
該
least()
函式返回兩個參數中較小的一個;兩組行都與“Toyota”完美匹配,其距離為 0。因此,3 行中每一行的最終評估 ORDER BY 值為 0。將每個參數的附加 ORDER BY 表達式添加到 least() 應該可以滿足您的要求:
ORDER BY LEAST( levenshtein(c0.model, 'Camry'), levenshtein(c0.make, 'Toyota')), levenshtein(c0.model, 'Camry'), levenshtein(c0.model, 'Toyota')
首先返回所有凱美瑞,然後是任何非凱美瑞豐田,然後是其最小 () 表達式評估為高於 0 的行。