I receive from a REST API a text with this kind of style, for example <ul> <li>𝓗𝓸𝔀 𝓽𝓸 𝓻𝓮𝓶𝓸𝓿𝓮 𝓽𝓱𝓲𝓼 𝓯𝓸𝓷𝓽 𝓯𝓻𝓸𝓶 𝓪 𝓼𝓽𝓻𝓲𝓷𝓰?</li> <li>𝐻𝑜𝓌 𝓉𝑜 𝓇𝑒𝓂𝑜𝓋𝑒 𝓉𝒽𝒾𝓈 𝒻𝑜𝓃𝓉 𝒻𝓇𝑜𝓂 𝒶 𝓈𝓉𝓇𝒾𝓃𝑔?</li> <li>нσω тσ яємσνє тнιѕ &fnof;σηт &fnof;яσм α ѕтяιηg?</li> </ul> But this is not italic or bold or underlined since the type it's string. This kind of text make it failed my Regex <code>^[a-zA-Z0-9._]*$</code> I would like to normalize this string received in a standard one in order to make my Regex still valid.

You can use Unicode Compatibility normalization forms, which use Unicode's own (lossy) character mappings to transform letter-like characters (among other things) to their simplified equivalents. In python, for instance: <pre class="prettyprint"><code>>>> from unicodedata import normalize >>> normalize('NFKD','𝓗𝓸𝔀 𝓽𝓸 𝓻𝓮𝓶𝓸𝓿𝓮 𝓽𝓱𝓲𝓼 𝓯𝓸𝓷𝓽 𝓯𝓻𝓸𝓶 𝓪 𝓼𝓽𝓻𝓲𝓷𝓰') 'How to remove this font from a string' # EDIT: This one wouldn't work >>> normalize('NFKD','нσω тσ яємσνє тнιѕ &fnof;σηт &fnof;яσм α ѕтяιηg?') 'нσω тσ яємσνє тнιѕ &fnof;σηт &fnof;яσм α ѕтяιηg?' </code></pre> Interactive example here. EDIT: Note that this only applies to stylistic forms (superscripts, blackletter, fill-width, etc.), so your third example, which uses non-latin characters, can't be decomposed to ASCII. EDIT2: I didn't realize your question was specific to C#, here's the documentation for String.Normalize, which does just that: <pre class="prettyprint"><code>string s1 = "𝓗𝓸𝔀 𝓽𝓸 𝓻𝓮𝓶𝓸𝓿𝓮 𝓽𝓱𝓲𝓼 𝓯𝓸𝓷𝓽 𝓯𝓻𝓸𝓶 𝓪 𝓼𝓽𝓻𝓲𝓷𝓰" string s2 = s1.Normalize(NormalizationForm.FormKD) </code></pre>

How to normalize fancy-looking unicode string in C#?

1 Answers

You can use Unicode Compatibility normalization forms, which use Unicode's own (lossy) character mappings to transform letter-like characters (among other things) to their simplified equivalents.

In python, for instance:

>>> from unicodedata import normalize
>>> normalize('NFKD','𝓗𝓸𝔀 𝓽𝓸 𝓻𝓮𝓶𝓸𝓿𝓮 𝓽𝓱𝓲𝓼 𝓯𝓸𝓷𝓽 𝓯𝓻𝓸𝓶 𝓪 𝓼𝓽𝓻𝓲𝓷𝓰')
'How to remove this font from a string'

# EDIT: This one wouldn't work
>>> normalize('NFKD','нσω тσ яємσνє тнιѕ ƒσηт ƒяσм α ѕтяιηg?')
'нσω тσ яємσνє тнιѕ ƒσηт ƒяσм α ѕтяιηg?'

Interactive example here.

EDIT: Note that this only applies to stylistic forms (superscripts, blackletter, fill-width, etc.), so your third example, which uses non-latin characters, can't be decomposed to ASCII.

EDIT2: I didn't realize your question was specific to C#, here's the documentation for String.Normalize, which does just that:

string s1 = "𝓗𝓸𝔀 𝓽𝓸 𝓻𝓮𝓶𝓸𝓿𝓮 𝓽𝓱𝓲𝓼 𝓯𝓸𝓷𝓽 𝓯𝓻𝓸𝓶 𝓪 𝓼𝓽𝓻𝓲𝓷𝓰"
string s2 = s1.Normalize(NormalizationForm.FormKD)

151

answered Sep 22 '22 18:09

VLRoyrenn

Related questions
                            
                                Marking a function `noexcept` that could cause an exception constructing the returned object
                            
                                SwiftUI in iOS14 Keyboard Avoidance Issues and IgnoresSafeArea Modifier Issues
                            
                                Do branch likelihood hints carry through function calls?
                            
                                Execution failed for task ':app:checkDebugAarMetadata'
                            
                                Angular 11 Unit Test Code Coverage is Now Breaking
                            
                                Microsoft ReportViewer: Session Expired Errors
                            
                                Using django-rest-interface
                            
                                Javascript and Accessibility
                            
                                Are there scala-like mixins for C++?
                            
                                Estimating database size [closed]
                            
                                How can I increase the heap size .NET? [duplicate]
                            
                                How to do fsync on an ofstream?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to normalize fancy-looking unicode string in C#?

Tags:

Luigi Saggese

People also ask

1 Answers

VLRoyrenn

Recent Activity

Donate For Us