I'm writing an import utility that is using phone numbers as a unique key within the import.
I need to check that the phone number does not already exist in my DB. The problem is that phone numbers in the DB could have things like dashes and parenthesis and possibly other things. I wrote a function to remove these things, the problem is that it is slow and with thousands of records in my DB and thousands of records to import at once, this process can be unacceptably slow. I've already made the phone number column an index.
I tried using the script from this post:
T-SQL trim   (and other non-alphanumeric characters)
But that didn't speed it up any.
Is there a faster way to remove non-numeric characters? Something that can perform well when 10,000 to 100,000 records have to be compared.
Whatever is done needs to perform fast.
Update
Given what people responded with, I think I'm going to have to clean the fields before I run the import utility.
To answer the question of what I'm writing the import utility in, it is a C# app. I'm comparing BIGINT to BIGINT now, with no need to alter DB data and I'm still taking a performance hit with a very small set of data (about 2000 records).
Could comparing BIGINT to BIGINT be slowing things down?
I've optimized the code side of my app as much as I can (removed regexes, removed unneccessary DB calls). Although I can't isolate SQL as the source of the problem anymore, I still feel like it is.
select to_number(regexp_replace('Ph: +91 984-809-8540', '\D', '')) OUT_PUT from dual; In this statement '\D' would find all Non-digit characters and the will be replaced by null.
In order to remove all non-numeric characters from a string, replace() function is used. replace() Function: This function searches a string for a specific value, or a RegExp, and returns a new string where the replacement is done.
The TRIM() function removes the space character OR other specified characters from the start or end of a string. By default, the TRIM() function removes leading and trailing spaces from a string.
If you want to leave the numbers (remove non-alpha numeric characters), then... replace ^a-z with ^a-z^0-9 That search string appears in the code in two different places. Be sure to replace both of them.
I saw this solution with T-SQL code and PATINDEX. I like it :-)
CREATE Function [fnRemoveNonNumericCharacters](@strText VARCHAR(1000)) RETURNS VARCHAR(1000) AS BEGIN WHILE PATINDEX('%[^0-9]%', @strText) > 0 BEGIN SET @strText = STUFF(@strText, PATINDEX('%[^0-9]%', @strText), 1, '') END RETURN @strText END
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With