Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery UDF to remove accents/diacritics in a string

Using this javascript code we can remove accents/diacritics in a string.

var originalText = "éàçèñ"
var result = originalText.normalize('NFD').replace(/[\u0300-\u036f]/g, "")
console.log(result) // eacen

If we create a BigQuery UDF it does not (even with double \).

 CREATE OR REPLACE FUNCTION project.remove_accent(x STRING)
RETURNS STRING
  LANGUAGE js AS """
  return x.normalize("NFD").replace(/[\u0300-\u036f]/g, "");
""";

SELECT project.remove_accent("éàçèñ") --"éàçèñ"

Any thoughts on that?

like image 442
Jason Tragakis Avatar asked Nov 16 '25 00:11

Jason Tragakis


1 Answers

Consider below approach

select originalText, 
  regexp_replace(normalize(originalText, NFD), r"\pM", '') output

if applied to sample data in your question - output is

enter image description here

You can easily wrap it with SQL UDF if you wish

like image 76
Mikhail Berlyant Avatar answered Nov 18 '25 19:11

Mikhail Berlyant



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!