Oracle

使用 utl_match.jaro_winkler_similarity 使用 Oracle 選擇不同的行

  • December 16, 2016

我有一個包含日誌消息的表,其中多次包含重複資訊,但值略有不同。例如:

無法使執行緒 19043 成為實時程序
無法使執行緒 20763 成為實時程序
無法使執行緒 22179 成為實時程序
FEED_XYZ 輔助儀表線未配置
FEED_ZZZ 輔助儀表線未配置
(...)

我想從上面的每一組中只取一行:

無法使執行緒 19043 成為實時程序
FEED_XYZ 輔助儀表線未配置

有沒有辦法使用選擇查詢來檢索不同的行utl_match.jaro_winkler_similarity

就像是:

select log_message
from logs_table
where "utl_match.jaro_winkler_similarity between rows < 80"

我知道我可以製作流水線函式或其他一些 PL/SQL 常式,但我只是想仔細檢查是否有更簡單的方法可以做到這一點。

我對這個UTL_MATCH.JARO_WINKLER_SIMILARITY函式並不太熟悉,但是在 UNION 查詢中使用它,結合ROW_NUMBER分析函式,會給你你想要的結果:

WITH
   temp AS
   (
   SELECT log_message, ROW_NUMBER() OVER (ORDER BY log_message) rn
   FROM logs_table
   WHERE UTL_MATCH.JARO_WINKLER_SIMILARITY(log_message, 'Unable to make thread 19043 a realtime process') > 80
   UNION
   SELECT log_message, ROW_NUMBER() OVER (ORDER BY log_message) rn
   FROM logs_table
   WHERE UTL_MATCH.JARO_WINKLER_SIMILARITY(log_message, 'FEED_XYZ Secondary Instrument Lines not configured') > 80
   )
SELECT log_message
FROM temp
WHERE rn = 1
;

結果(來自apex.oracle.com的測試數據庫的螢幕截圖):

在此處輸入圖像描述

如果您只需要每個組中的任何值(而不是每個排序組中的“第一個”值),您可以使用ROWNUM條件而不是ROW_NUMBER函式,這將返回類似的結果:

WITH
   temp AS
   (
   SELECT log_message
   FROM logs_table
   WHERE UTL_MATCH.JARO_WINKLER_SIMILARITY(log_message, 'Unable to make thread 19043 a realtime process') > 80
     AND ROWNUM = 1
   UNION
   SELECT log_message
   FROM logs_table
   WHERE UTL_MATCH.JARO_WINKLER_SIMILARITY(log_message, 'FEED_XYZ Secondary Instrument Lines not configured') > 80
     AND ROWNUM = 1
   )
SELECT log_message
FROM temp
;

但是,除非有特定原因需要使用,否則您可能會成為XY 問題UTL_MATCH.JARO_WINKLER_SIMILARITY的受害者。您可以通過將兩個單獨的 WHERE 子句行(在上述任一查詢中)替換為以下內容來獲得相同的結果:

WHERE log_message LIKE 'Unable to make thread %'
WHERE log_message LIKE 'FEED\_%' ESCAPE '\'

或者這些(例如,如果需要更細粒度的過濾):

WHERE REGEXP_LIKE(log_message, 'Unable\sto\smake\sthread\s\d+\sa\srealtime\sprocess')
WHERE REGEXP_LIKE(log_message, 'FEED_[A-Z]{3}\sSecondary\sInstrument\sLines\snot\sconfigured')

謝謝你的回答,但這裡的重點是:我想要一個通用的 SQL,它只會從“logs_table”返回唯一的“log_message”。這個例子只是為了說明。我最終創建了一個流水線函式(對不起,額外的表,但這是我的環境的最終解決方案):

function get_alert_messages return rl_alert_info_table pipelined as
 /*
   This function returns the distinct error messages using the algorithm Jaro Winkler to get rid
   of slightly different strings.
   More info: http://docs.oracle.com/cd/E18283_01/appdev.112/e16760/u_match.htm
   Example of use:
   select * from table(rl.get_alert_messages())
 */
 cursor logs_cur is select distinct r.customer_id, r.report_id, r.server_id, r.name as report_name, li.message
                      from reports r, logs l, logs_info li
                     where r.dateandtime > to_timestamp(to_char(current_timestamp,'DD-MON-YYYY')||' 01:00:00') and -- beginning of the day
                             r.report_id = l.report_id and 
                             li.log_id = l.log_id and
                             li.type_id = 4 and
                             li.timedetail > to_timestamp(to_char(current_timestamp,'DD-MON-YYYY')||' 01:00:00') 
                     order by r.customer_id, r.report_id, li.message; -- the order by is important here.
 l_message logs_cur%rowtype;
 l_last_msg logs_info.message%type;
begin
 begin
 open logs_cur;
 l_last_msg := ''; -- this variable stores the message from the previous row.
 loop 
   fetch logs_cur into l_message;
   exit when logs_cur%notfound;
     if utl_match.jaro_winkler_similarity(l_last_msg,l_message.message) < 85 then -- if the level of similarity is less than 85%
       pipe row(rl_alert_info(l_message.customer_id,
                             l_message.server_id,
                             l_message.report_name,
                             l_message.message));
    end if;
     l_last_msg := l_message.message;
 end loop;
 close logs_cur;
 exception
   when others then 
     close logs_cur;
     raise;
 end;
 return;
end get_alert_messages;

引用自:https://dba.stackexchange.com/questions/158190