Press "Enter" to skip to content

Month: April 2021

Fixing Unicode Conversion Issues in XML documents, TRY_CONVERT returns question mark instead of NULL

An interesting question asked by @danthesqlman in #sqlhelp (sqlcommunity.slack.com)

Having issues with Unicode in my XML, tried using a try_convert(varchar,fieldname) but not returning NULL.
Set it to have a test on my box, and weird results.
declare @n nvarchar(10) = N’ניקודות‎’
select try_convert(varchar(10),@n)
This doesn’t return NULL, but ?????????
I’m curious what would I be doing wrong, or how can i locate unicode within XML easily

And then when people suggested individual character shredding –

XML documents in a table over 200k rows, 2mb xml each, could take hours to parse 1 character at a time

There were a few suggestions, (my initial crap one was just dumping it to C#), but after a few jokes back and forth about how SQL Server was just returning normal question marks for TRY_CONVERT and how silly that was the idea came up… why not just:

  1. Replace all question marks with something unique? (I suggested a GUID)
  2. Run the conversion and then do a reverse replace, updating the data in place.
  3. Profit!

For a simple code example…

DECLARE @magic_value UNIQUEIDENTIFIER = NEWID()
SELECT
TRY_CONVERT
(
VARCHAR(100),
REPLACE(tar.name,'?',@magic_value)
)
FROM target_table AS tar

Any new question marks that exist in the output would be characters that failed the conversion process.

The test ran in ten minutes instead of a few hours… great!

Also a fun followup on weird SQL Server homoglyph conversion issues in general – https://bertwagner.com/posts/how-unicode-homoglyphs-can-thwart-your-database-security/