Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to check if a string can be parsed

I am parsing CSV files to lists of objects with strongly-typed properties. This involves parsing each string value from the file to an IConvertible type (int, decimal, double, DateTime, etc) using TypeDescriptor.

I am using a try catch to handle situations when parsing fails. The exact details of where and why this exception occurs is then logged for further investigation. Below is the actually parsing code:

try
{
    parsedValue = TypeDescriptor.GetConverter(type).ConvertFromString(dataValue);
}
catch (Exception ex)
{
    // Log failure
}

Problem:

When values are successfully parsed, the process is quick. When parsing data with lots of invalid data, the process can take thousands of times slower (due to catching the exception).

I've been testing this with parsing to DateTime. These are the performance figures:

  • Successful parsing: average of 32 ticks per parse
  • Failed parsing: average of 146296 ticks per parse

That's more than 4500 times slower.

Question:

Is it possible for me to check to see if a string value can be successfully parsed without having to use my expensive try catch method? Or perhaps there is another way I should be doing this?

EDIT: I need to use TypeDescriptor (and not DateTime.TryParse) because the type is determined at runtime.

like image 414
Dave New Avatar asked May 30 '13 12:05

Dave New


People also ask

How do you check if a string can be parsed?

Using the parseDouble() method Therefore, to know whether a particular string is parse-able to double or not, pass it to the parseDouble method and wrap this line with try-catch block. If an exception occurs this indicates that the given String is not pars able to double.

How to int Parse in c#?

TryParse("11", out number) ) or Parse method (for example, var number = int. Parse("11") ). Using a Convert method is more useful for general objects that implement IConvertible. You use Parse or TryParse methods on the numeric type you expect the string contains, such as the System.


2 Answers

If you have a known set of types to convert, you can do a series of if/elseif/elseif/else (or switch/case on the type name) to essentially distribute it to specialized parsing methods. This should be pretty fast. This is as described in @Fabio's answer.

If you still have performance issues, you can also create a lookup table which will let you add new parsing methods as you need to support them:

Given some basic parsing wrappers:

public delegate bool TryParseMethod<T>(string input, out T value);

public interface ITryParser
{
    bool TryParse(string input, out object value);
}

public class TryParser<T> : ITryParser
{
    private TryParseMethod<T> ParsingMethod;

    public TryParser(TryParseMethod<T> parsingMethod)
    {
        this.ParsingMethod = parsingMethod;
    }

    public bool TryParse(string input, out object value)
    {
        T parsedOutput;
        bool success = ParsingMethod(input, out parsedOutput);
        value = parsedOutput;
        return success;
    }
}

You can then setup a conversion helper which does the lookup and calls the appropriate parser:

public static class DataConversion
{
    private static Dictionary<Type, ITryParser> Parsers;

    static DataConversion()
    {
        Parsers = new Dictionary<Type, ITryParser>();
        AddParser<DateTime>(DateTime.TryParse);
        AddParser<int>(Int32.TryParse);
        AddParser<double>(Double.TryParse);
        AddParser<decimal>(Decimal.TryParse);
        AddParser<string>((string input, out string value) => {value = input; return true;});
    }

    public static void AddParser<T>(TryParseMethod<T> parseMethod)
    {
        Parsers.Add(typeof(T), new TryParser<T>(parseMethod));
    }

    public static bool Convert<T>(string input, out T value)
    {
        object parseResult;
        bool success = Convert(typeof(T), input, out parseResult);
        if (success)
            value = (T)parseResult;
        else
            value = default(T);
        return success;
    }

    public static bool Convert(Type type, string input, out object value)
    {
        ITryParser parser;
        if (Parsers.TryGetValue(type, out parser))
            return parser.TryParse(input, out value);
        else
            throw new NotSupportedException(String.Format("The specified type \"{0}\" is not supported.", type.FullName));
    }
}

Then usage might be like:

//for a known type at compile time
int value;
if (!DataConversion.Convert<int>("3", out value))
{
    //log failure
}

//or for unknown type at compile time:
object value;
if (!DataConversion.Convert(myType, dataValue, out value))
{
    //log failure
}

This could probably have the generics expanded on to avoid object boxing and type casting, but as it stands this works fine; perhaps only optimize that aspect if you have a measurable performance from it.

EDIT: You can update the DataConversion.Convert method so that if it doesn't have the specified converter registered, it can fall-back to your TypeConverter method or throw an appropriate exception. It's up to you if you want to have a catch-all or simply have your predefined set of supported types and avoid having your try/catch all over again. As it stands, the code has been updated to throw a NotSupportedException with a message indicating the unsupported type. Feel free to tweak as it makes sense. Performance wise, maybe it makes sense to do the catch-all as perhaps those will be fewer and far between once you specify specialized parsers for the most commonly used types.

like image 92
Chris Sinclair Avatar answered Sep 28 '22 05:09

Chris Sinclair


If you know a type where you trying to parse, then use TryParse method:

String value;
Int32 parsedValue;
if (Int32.TryParse(value, parsedValue) == True)
    // actions if parsed ok
else
    // actions if not parsed

Same for other types

Decimal.TryParse(value, parsedValue)
Double.TryParse(value, parsedValue)
DateTime.TryParse(value, parsedValue)

Or you can use next workaround:

Create a parse methods for every type with same name, but different signature(wrap TryParse inside of them):

Private bool TryParsing(String value, out Int32 parsedValue)
{
    Return Int32.TryParse(value, parsedValue)
}

Private bool TryParsing(String value, out Double parsedValue)
{
    Return Double.TryParse(value, parsedValue)
}

Private bool TryParsing(String value, out Decimal parsedValue)
{
    Return Decimal.TryParse(value, parsedValue)
}

Private bool TryParsing(String value, out DateTime parsedValue)
{
    Return DateTime.TryParse(value, parsedValue)
}

Then you can use method TryParsing with your types

like image 23
Fabio Avatar answered Sep 28 '22 05:09

Fabio