Oracle

刪除重複項時出現故障 oracle11g regexp_replace

  • June 4, 2015

我有一個逗號分隔的值列表(作為一個字元串),其中包含重複項,我想刪除它們。我嘗試過,REGEXP_REPLACE但對於某些值,它給了我一個錯誤的結果:

/* CORRECT */
select '308RONBADB046782001,308RONBADB046782001,308RONCRT0046782001,308RONCRT0046782001,308RONMFRT046782001,308RONMFRT046782001' as str,
      regexp_replace('308RONBADB046782001,308RONBADB046782001,308RONCRT0046782001,308RONCRT0046782001,308RONMFRT046782001,308RONMFRT046782001',
                 '([^,]+)(,*\1)+',
                 '\1') as str_uq
from dual
union all 
/* FAULTY */
select '106RONBADB0W2562701,106RONCRT00W2562701,106RPB0130350001,106RONBADB0W2562701' as str,
      regexp_replace('106RONBADB0W2562701,106RONCRT00W2562701,106RPB0130350001,106RONBADB0W2562701',
                 '([^,]+)(,*\1)+',
                 '\1') as str_uq
from dual

執行上面的範例,將為第一個查詢提供預期的結果,但對於第二個查詢,它只是連接所有值而不刪除重複項。我猜這些值開始的事實1與它有關。

你能幫我解決正則表達式模式嗎?我找不到錯誤…

謝謝!

在您的第二個範例中,值是唯一的,但是您正在搜尋任何重複的匹配項並且有多個匹配項:

‘106RONBADB0W256270 1,1 06RONCRT 00 W256270 1,1 06RPB013035 0001,1 06RONBADB0W2562701’

它沒有連接,它正在刪除一些字元:-)

通常添加單詞邊界\b應該可以,但在 Oracle 中不行。

我發現這是可行'(.+?,)\1+'的,但是您需要在字元串中添加最後一個逗號(小提琴

/* CORRECT */
select '308RONBADB046782001,308RONBADB046782001,308RONCRT0046782001,308RONCRT0046782001,308RONMFRT046782001,308RONMFRT046782001' as str,
      regexp_replace('308RONBADB046782001,308RONBADB046782001,308RONCRT0046782001,308RONCRT0046782001,308RONCRT0046782001,308RONMFRT046782001,308RONMFRT046782001' || ',',
                 --'(\b[^,]+)(,*\1)+', --should work, doesn't in Oracle
                 '(.+?,)\1+',
                 '\1') as str_uq
from dual
union all 
/* FAULTY */
select '106RONBADB0W2562701,106RONCRT00W2562701,106RPB0130350001,106RONBADB0W2562701' as str,
      regexp_replace('106RONBADB0W2562701,106RONCRT00W2562701,106RPB0130350001,106RONBADB0W2562701' || ',',
                 --'(\b[^,]+)(,*\1)+', --should work, doesn't in Oracle
                 '(.+?,)\1+',
                 '\1') as str_uq
from dual

事實上,正如 dnoeth 所說,我似乎無法僅使用 REGEXP 來實現預期的行為,所以我建構了一個可以做到這一點的函式。我會把它貼在這裡,也許它對其他人也有用。

create or replace function str_remove_duplicates (str in varchar2, separator in varchar2) 
   return varchar2
is
   regex_pattern varchar2(5);
   result varchar2(255);
begin

   regex_pattern := '[^' || nvl(separator, ',') || ']+'; -- '[^,]+'

   select listagg(acc, separator) within group (order by acc) into result
   from (
           select distinct acc from (
                                       select regexp_substr(str, regex_pattern, 1, level) as acc from dual
                                       connect by regexp_substr(str, regex_pattern, 1, level) is not null
                                    )
   );

   return result;
end str_remove_duplicates;

一切順利!

丹尼爾

引用自:https://dba.stackexchange.com/questions/103124