Sql-Server-2012
幫助用字元串最後一個字元的文本字元串替換字元
我需要幫助來修復截斷字元串最後一個字元的替換語句。原始字元串 (Question_Text) 包含 HTML 字元和其他需要清理的錯誤字元。我在函式中包含了多個 Replace 語句。這是我需要幫助的功能。
正如您在範例數據中看到的,Question_Text2 已被清理,除了它剪切了最後一個字元。同時刪除了一些問題中間的錯誤字元(見 ID = 12165)。我究竟做錯了什麼?
在使用該函式說明問題後,我已包含用於創建範例數據結果的程式碼。
/****** Object: Table [dbo].[tblQuestionsSample] Script Date: 12/10/2021 5:12:16 PM ******/ IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[tblQuestionsSample]') AND type in (N'U')) DROP TABLE [dbo].[tblQuestionsSample] GO /****** Object: Table [dbo].[tblQuestionsSample] Script Date: 12/10/2021 5:12:16 PM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[tblQuestionsSample]( [QuestionsRecID] [int] IDENTITY(1,1) NOT NULL, [Question_ID] [int] NULL, [Question_Text] [varchar](1000) NULL, [Question_Text2] [varchar](1000) NULL CONSTRAINT [PK_dbo.[QuestionsRecID] PRIMARY KEY CLUSTERED ( [QuestionsRecID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO INSERT INTO [dbo].[tblQuestionsSample] ([Question_ID] ,[Question_Text] ,[Question_Text2]) VALUES (11603,'<p>Date/Time</p>','Date/Tim') ,(11632,'<p>Attachments</p>','Attachment') ,(12166,'<p>Employee ID</p>','Employee I') ,(12166,'<p>Work Related?</p>','Work Related') ,(12165,'<p>Date & Time of injury/onset illness?</p>','Date & Time of injury/onset illness') ,(12165,'<p>Full Injury/Illness Description</p>','Full Injury/Illness Descriptio') ,(12165,'<p>Job Title</p>','Job Titl') go
查詢結果:
| id | question_text| Question_Text2| | ---- | -------- | -------------------------------------------------- | | 11603| <p>Date/Time</p>| Date/Tim| | 11632| <p>Attachments</p> | Attachment| | 12166| <p>Employee ID</p> | Employee I| | 12166| <p>Work Related?</p>| Work Related| | 12165| <p>Date & Time of injury/onset illness?</p>| Date & Time of injury/onset illnes| | 12165| <p>Full Injury/Illness Description</p>| Full Injury/Illness Descriptio| | 12165| <p>Job Title</p>| Job Titl|
如您所見,我正在使用一個函式
$$ dbo $$.$$ udf_StripHTML $$清理數據:
CREATE FUNCTION [dbo].[udf_StripHTML] (@HTMLText VARCHAR(MAX)) RETURNS VARCHAR(MAX) AS BEGIN DECLARE @Start INT DECLARE @End INT DECLARE @Length INT SET @Start = CHARINDEX('<',@HTMLText) SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText)) SET @Length = (@End - @Start) + 1 WHILE @Start > 0 AND @End > 0 AND @Length > 0 BEGIN SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'') SET @Start = CHARINDEX('<',@HTMLText) SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText)) SET @Length = (@End - @Start) + 1 END --RETURN replace(REPLACE((@HTMLText),' ',' '), ''', '') --RETURN replace(replace(LTRIM((@HTMLText)),' ',' '), ''','') RETURN replace(substring(REPLACE(REPLACE(replace(ltrim(rtrim(@htmltext)),' ',' '), ''',' '), '&',''), charindex('.)',REPLACE(REPLACE(replace(ltrim(rtrim(@htmltext)),' ',' '), ''', ''), '&','')), len(REPLACE(REPLACE(replace(ltrim(rtrim(@htmltext)),' ',' '), ''', ''), '&','')) END Select [Question_ID] ,[Question_Text] ,[dbo].[udf_StripHTML]([question_text2]) From [dbo].[tblQuestionsSample] WHERE (question_text NOT LIKE '%Comments%') AND (question_text NOT LIKE '%Archived%')
感謝您提供的任何幫助,
凱倫
如果您的 HTML 是有效的 XHTML(從提供的小範例中看起來是這樣),您可以將其強制轉換為
xml
,然後.value
與 XQuery 一起使用來解析它。這將比處理字元串操作和 HTML 轉義做得更好:SELECT Question_Text_Parsed = TRY_CAST(Question_Text AS xml).value('(/p/text())[1]','nvarchar(1000)') FROM tblQuestionsSample