Oracle
使用 utl_match.jaro_winkler_similarity 使用 Oracle 選擇不同的行
我有一個包含日誌消息的表,其中多次包含重複資訊,但值略有不同。例如:
無法使執行緒 19043 成為實時程序 無法使執行緒 20763 成為實時程序 無法使執行緒 22179 成為實時程序 FEED_XYZ 輔助儀表線未配置 FEED_ZZZ 輔助儀表線未配置 (...)
我想從上面的每一組中只取一行:
無法使執行緒 19043 成為實時程序 FEED_XYZ 輔助儀表線未配置
有沒有辦法使用選擇查詢來檢索不同的行
utl_match.jaro_winkler_similarity
?就像是:
select log_message from logs_table where "utl_match.jaro_winkler_similarity between rows < 80"
我知道我可以製作流水線函式或其他一些 PL/SQL 常式,但我只是想仔細檢查是否有更簡單的方法可以做到這一點。
我對這個
UTL_MATCH.JARO_WINKLER_SIMILARITY
函式並不太熟悉,但是在 UNION 查詢中使用它,結合ROW_NUMBER
分析函式,會給你你想要的結果:WITH temp AS ( SELECT log_message, ROW_NUMBER() OVER (ORDER BY log_message) rn FROM logs_table WHERE UTL_MATCH.JARO_WINKLER_SIMILARITY(log_message, 'Unable to make thread 19043 a realtime process') > 80 UNION SELECT log_message, ROW_NUMBER() OVER (ORDER BY log_message) rn FROM logs_table WHERE UTL_MATCH.JARO_WINKLER_SIMILARITY(log_message, 'FEED_XYZ Secondary Instrument Lines not configured') > 80 ) SELECT log_message FROM temp WHERE rn = 1 ;
結果(來自apex.oracle.com的測試數據庫的螢幕截圖):
如果您只需要每個組中的任何值(而不是每個排序組中的“第一個”值),您可以使用
ROWNUM
條件而不是ROW_NUMBER
函式,這將返回類似的結果:WITH temp AS ( SELECT log_message FROM logs_table WHERE UTL_MATCH.JARO_WINKLER_SIMILARITY(log_message, 'Unable to make thread 19043 a realtime process') > 80 AND ROWNUM = 1 UNION SELECT log_message FROM logs_table WHERE UTL_MATCH.JARO_WINKLER_SIMILARITY(log_message, 'FEED_XYZ Secondary Instrument Lines not configured') > 80 AND ROWNUM = 1 ) SELECT log_message FROM temp ;
但是,除非有特定原因需要使用,否則您可能會成為XY 問題
UTL_MATCH.JARO_WINKLER_SIMILARITY
的受害者。您可以通過將兩個單獨的 WHERE 子句行(在上述任一查詢中)替換為以下內容來獲得相同的結果:WHERE log_message LIKE 'Unable to make thread %' WHERE log_message LIKE 'FEED\_%' ESCAPE '\'
或者這些(例如,如果需要更細粒度的過濾):
WHERE REGEXP_LIKE(log_message, 'Unable\sto\smake\sthread\s\d+\sa\srealtime\sprocess') WHERE REGEXP_LIKE(log_message, 'FEED_[A-Z]{3}\sSecondary\sInstrument\sLines\snot\sconfigured')
謝謝你的回答,但這裡的重點是:我想要一個通用的 SQL,它只會從“logs_table”返回唯一的“log_message”。這個例子只是為了說明。我最終創建了一個流水線函式(對不起,額外的表,但這是我的環境的最終解決方案):
function get_alert_messages return rl_alert_info_table pipelined as /* This function returns the distinct error messages using the algorithm Jaro Winkler to get rid of slightly different strings. More info: http://docs.oracle.com/cd/E18283_01/appdev.112/e16760/u_match.htm Example of use: select * from table(rl.get_alert_messages()) */ cursor logs_cur is select distinct r.customer_id, r.report_id, r.server_id, r.name as report_name, li.message from reports r, logs l, logs_info li where r.dateandtime > to_timestamp(to_char(current_timestamp,'DD-MON-YYYY')||' 01:00:00') and -- beginning of the day r.report_id = l.report_id and li.log_id = l.log_id and li.type_id = 4 and li.timedetail > to_timestamp(to_char(current_timestamp,'DD-MON-YYYY')||' 01:00:00') order by r.customer_id, r.report_id, li.message; -- the order by is important here. l_message logs_cur%rowtype; l_last_msg logs_info.message%type; begin begin open logs_cur; l_last_msg := ''; -- this variable stores the message from the previous row. loop fetch logs_cur into l_message; exit when logs_cur%notfound; if utl_match.jaro_winkler_similarity(l_last_msg,l_message.message) < 85 then -- if the level of similarity is less than 85% pipe row(rl_alert_info(l_message.customer_id, l_message.server_id, l_message.report_name, l_message.message)); end if; l_last_msg := l_message.message; end loop; close logs_cur; exception when others then close logs_cur; raise; end; return; end get_alert_messages;