Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract decimal number from string in C#

string sentence = "X10 cats, Y20 dogs, 40 fish and 1 programmer.";
string[] digits = Regex.Split (sentence, @"\D+");

For this code I get these values in the digits array

10,20,40,1

string sentence = "X10.4 cats, Y20.5 dogs, 40 fish and 1 programmer.";
string[] digits = Regex.Split (sentence, @"\D+");

For this code I get these values in the digits array

10,4,20,5,40,1

But I would like to get like

10.4,20.5,40,1 as decimal numbers. How can I achieve this?

like image 376
ratty Avatar asked Aug 26 '10 13:08

ratty


People also ask

How do you write decimal numbers in C?

For example, 5.48958123 should be printed as 5.4895 if given precision is 4. In C, there is a format specifier in C. To print 4 digits after dot, we can use 0.4f in printf(). Below is program to demonstrate the same.

Can int have decimals C?

You have already been exposed to the C basic data type int. As you will recall, a variable declared to be of type int can be used to contain integral values only—that is, values that do not contain decimal places. The C programming language provides four other basic data types: float, double, char, and _Bool.


2 Answers

The decimal/float number extraction regex can be different depending on whether and what thousand separators are used, what symbol denotes a decimal separator, whether one wants to also match an exponent, whether or not to match a positive or negative sign, whether or not to match numbers that may have leading 0 omitted, whether or not extract a number that ends with a decimal separator.

A generic regex to match the most common decimal number types is provided in Matching Floating Point Numbers with a Regular Expression:

[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?

I only changed the capturing group to a non-capturing one (added ?: after (). It matches enter image description here

If you need to make it even more generic, if the decimal separator can be either a dot or a comma, replace \. with a character class (or a bracket expression) [.,]:

[-+]?[0-9]*[.,]?[0-9]+(?:[eE][-+]?[0-9]+)?
           ^^^^

Note the expressions above match both integer and floats. To match only float/decimal numbers make sure the fractional pattern part is obligatory by removing the second ? after \. (demo):

[-+]?[0-9]*\.[0-9]+(?:[eE][-+]?[0-9]+)?
            ^

Now, 34 is not matched: enter image description here is matched.

If you do not want to match float numbers without leading zeros (like .5) make the first digit matching pattern obligatory (by adding + quantifier, to match 1 or more occurrences of digits):

[-+]?[0-9]+\.[0-9]+(?:[eE][-+]?[0-9]+)?
          ^

See this demo. Now, it matches much fewer samples: enter image description here

Now, what if you do not want to match <digits>.<digits> inside <digits>.<digits>.<digits>.<digits>? How to match them as whole words? Use lookarounds:

[-+]?(?<!\d\.)\b[0-9]+\.[0-9]+(?:[eE][-+]?[0-9]+)?\b(?!\.\d)

And a demo here:

enter image description here

Now, what about those floats that have thousand separators, like 12 123 456.23 or 34,345,767.678? You may add (?:[,\s][0-9]+)* after the first [0-9]+ to match zero or more sequences of a comma or whitespace followed with 1+ digits:

[-+]?(?<![0-9]\.)\b[0-9]+(?:[,\s][0-9]+)*\.[0-9]+(?:[eE][-+]?[0-9]+)?\b(?!\.[0-9])

See the regex demo:

enter image description here

Swap a comma with \. if you need to use a comma as a decimal separator and a period as as thousand separator.

Now, how to use these patterns in C#?

var results = Regex.Matches(input, @"<PATTERN_HERE>")
        .Cast<Match>()
        .Select(m => m.Value)
        .ToList();
like image 83
Wiktor Stribiżew Avatar answered Sep 21 '22 13:09

Wiktor Stribiżew


Small improvement to @Michael's solution:

// NOTES: about the LINQ:
// .Where() == filters the IEnumerable (which the array is)
//     (c=>...) is the lambda for dealing with each element of the array
//     where c is an array element.
// .Trim()  == trims all blank spaces at the start and end of the string
var doubleArray = Regex.Split(sentence, @"[^0-9\.]+")
    .Where(c => c != "." && c.Trim() != "");

Returns:

10.4
20.5
40
1

The original solution was returning

[empty line here]
10.4
20.5
40
1
.
like image 25
code4life Avatar answered Sep 21 '22 13:09

code4life