Sql-Server-2012

幫助用字元串最後一個字元的文本字元串替換字元

  • December 12, 2021

我需要幫助來修復截斷字元串最後一個字元的替換語句。原始字元串 (Question_Text) 包含 HTML 字元和其他需要清理的錯誤字元。我在函式中包含了多個 Replace 語句。這是我需要幫助的功能。

正如您在範例數據中看到的,Question_Text2 已被清理,除了它剪切了最後一個字元。同時刪除了一些問題中間的錯誤字元(見 ID = 12165)。我究竟做錯了什麼?

在使用該函式說明問題後,我已包含用於創建範例數據結果的程式碼。

/****** Object:  Table [dbo].[tblQuestionsSample]    Script Date: 12/10/2021 5:12:16 PM ******/
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[tblQuestionsSample]') AND type in (N'U'))
DROP TABLE [dbo].[tblQuestionsSample]
GO

/****** Object:  Table [dbo].[tblQuestionsSample]    Script Date: 12/10/2021 5:12:16 PM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[tblQuestionsSample](
   [QuestionsRecID] [int] IDENTITY(1,1) NOT NULL,
   [Question_ID] [int] NULL,
   [Question_Text] [varchar](1000) NULL,
   [Question_Text2] [varchar](1000) NULL
CONSTRAINT [PK_dbo.[QuestionsRecID] PRIMARY KEY CLUSTERED 
(
   [QuestionsRecID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
INSERT INTO [dbo].[tblQuestionsSample]
          ([Question_ID]
          ,[Question_Text]
          ,[Question_Text2])
    VALUES
       (11603,'<p>Date/Time</p>','Date/Tim')
       ,(11632,'<p>Attachments</p>','Attachment')
       ,(12166,'<p>Employee ID</p>','Employee I')
       ,(12166,'<p>Work Related?</p>','Work Related')
       ,(12165,'<p>Date & Time of injury/onset illness?</p>','Date & Time of injury/onset illness')
       ,(12165,'<p>Full Injury/Illness Description</p>','Full Injury/Illness Descriptio')
       ,(12165,'<p>Job Title</p>','Job Titl')
go

查詢結果:

| id   |  question_text|                                Question_Text2| 
| ---- | -------- | -------------------------------------------------- |
| 11603| <p>Date/Time</p>|                               Date/Tim|
| 11632| <p>Attachments</p> |                            Attachment|
| 12166| <p>Employee ID</p> |                            Employee I|
| 12166| <p>Work Related?</p>|                            Work Related|
| 12165| <p>Date & Time of injury/onset illness?</p>| Date & Time of injury/onset illnes|
| 12165| <p>Full Injury/Illness Description</p>|       Full Injury/Illness Descriptio|
| 12165| <p>Job Title</p>|                                Job Titl|

如您所見,我正在使用一個函式

$$ dbo $$.$$ udf_StripHTML $$清理數據:

CREATE FUNCTION [dbo].[udf_StripHTML] (@HTMLText VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @Start INT
DECLARE @End INT
DECLARE @Length INT

SET @Start = CHARINDEX('<',@HTMLText) SET @End = 
CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText)) 
SET @Length = (@End - @Start) + 1 WHILE @Start > 0
AND @End > 0
AND @Length > 0
BEGIN
SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'')
SET @Start = CHARINDEX('<',@HTMLText) SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
END
--RETURN replace(REPLACE((@HTMLText),' ',' '), ''', '')
--RETURN replace(replace(LTRIM((@HTMLText)),' ',' '), ''','')
RETURN replace(substring(REPLACE(REPLACE(replace(ltrim(rtrim(@htmltext)),' ',' '), '&#39',' '), '&',''), charindex('.)',REPLACE(REPLACE(replace(ltrim(rtrim(@htmltext)),' ',' '), ''', ''), '&','')), len(REPLACE(REPLACE(replace(ltrim(rtrim(@htmltext)),' ',' '), ''', ''), '&',''))

END


Select [Question_ID]
,[Question_Text]
,[dbo].[udf_StripHTML]([question_text2])
From [dbo].[tblQuestionsSample]
WHERE  (question_text NOT LIKE '%Comments%') AND (question_text NOT LIKE '%Archived%')

感謝您提供的任何幫助,

凱倫

如果您的 HTML 是有效的 XHTML(從提供的小範例中看起來是這樣),您可以將其強制轉換為xml,然後.value與 XQuery 一起使用來解析它。這將比處理字元串操作和 HTML 轉義做得更好:

SELECT
 Question_Text_Parsed = TRY_CAST(Question_Text AS xml).value('(/p/text())[1]','nvarchar(1000)')
FROM tblQuestionsSample

db<>小提琴

引用自:https://dba.stackexchange.com/questions/303812