Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strongly Typed String

Tags:

The Setting

I have a prototype class TypedString<T> that attempts to "strongly type" (dubious meaning) strings of a certain category. It uses the C#-analogue of the curiously recurring template pattern (CRTP).

class TypedString<T>

public abstract class TypedString<T>     : IComparable<T>     , IEquatable<T>     where T : TypedString<T> {     public string Value { get; private set; }      protected virtual StringComparison ComparisonType     {         get { return StringComparison.Ordinal; }     }      protected TypedString(string value)     {         if (value == null)             throw new ArgumentNullException("value");         this.Value = Parse(value);     }      //May throw FormatException     protected virtual string Parse(string value)     {         return value;     }      public int CompareTo(T other)     {         return string.Compare(this.Value, other.Value, ComparisonType);     }      public bool Equals(T other)     {         return string.Equals(this.Value, other.Value, ComparisonType);     }      public override bool Equals(object obj)     {         return obj is T && Equals(obj as T);     }      public override int GetHashCode()     {         return Value.GetHashCode();     }      public override string ToString()     {         return Value;     } } 

The TypedString<T> class can now be used to eliminate code duplication when defining a bunch of different "string categories" throughout my project. An example simple usage of this class is in defining a Username class:

class Username (example)

public class Username : TypedString<Username> {     public Username(string value)         : base(value)     {     }      protected override string Parse(string value)     {         if (!value.Any())             throw new FormatException("Username must contain at least one character.");         if (!value.All(char.IsLetterOrDigit))             throw new FormatException("Username may only contain letters and digits.");         return value;     } } 

This now lets me use the Username class throughout my whole project, never having to check if a username is correctly formatted - if I have an expression or variable of type Username, it's guaranteed to be correct (or null).

Scenario 1

string GetUserRootDirectory(Username user) {     if (user == null)         throw new ArgumentNullException("user");     return Path.Combine(UsersDirectory, user.ToString()); } 

I don't have to worry about formatting of the user string here - I already know it's correct by nature of the type.

Scenario 2

IEnumerable<Username> GetFriends(Username user) {     //... } 

Here the caller knows what it's getting as the return just based on the type. An IEnumerable<string> would require reading into the details of the method or documentation. Even worse, if someone were to change the implementation of GetFriends such that it introduces a bug and produces invalid username strings, that error could silently propagate to callers of the method and wreak all kinds of havoc. This nicely typed version prevents that.

Scenario 3

System.Uri is an example of a class in .NET that does little more than wrap a string that has a huge number of formatting constraints and helper properties/methods for accessing useful parts of it. So that's one piece of evidence that this approach isn't totally crazy.

The Question

I imagine this kind of thing has been done before. I already see the benefits of this approach and don't need to convince myself any more.

Is there a downside I may be missing?
Is there a way this could come back to bite me later?

like image 633
Timothy Shields Avatar asked Jun 03 '13 23:06

Timothy Shields


People also ask

What is strongly typed method?

A strongly typed programming language is one in which each type of data, such as integers, characters, hexadecimals and packed decimals, is predefined as part of the programming language, and all constants or variables defined for a given program must be described with one of the data types.

What are strongly typed objects?

Strongly typed is a concept used to refer to a programming language that enforces strict restrictions on intermixing of values with differing data types. When such restrictions are violated and error (exception) occurs.

Why TypeScript is called strongly typed?

In particular, TypeScript is strongly typed — that is, variables and other data structures can be declared to be of a specific type, like a string or a boolean, by the programmer, and TypeScript will check the validity of their values. This isn't possible in JavaScript, which is loosely typed.

Is TypeScript strongly or weakly typed?

TypeScript is a strongly typed programming language that builds on JavaScript, giving you better tooling at any scale.


1 Answers

General Thoughts

I'm not fundamentally against the approach (and kudos for knowing/using the CRTP, which can be quite useful). The approach allows metadata to be wrapped around a single value, which can be a very good thing. It's extensible too; you can add additional data to the type without breaking interfaces.

I don't like the fact that your current implementation seems to depend heavily on exception-based flow. This may be perfectly appropriate for some things or in truly exceptional cases. However, if a user was trying to pick a valid username, they could potentially throw dozens of exceptions in the process of doing so.

Of course, you could add exception-free validation to the interface. You must also ask yourself where you want the validation rules to live (which is always a challenge, especially in distributed applications).

WCF

Speaking of "distribution": consider the implications of implementing such types as part of a WCF data contract. Ignoring the fact that data contracts should usually expose simple DTOs, you also have the problem of proxy classes which will maintain your type's properties, but not its implementation.

Of course, you can mitigate this by placing the parent assembly on both client and server. In some cases, this is perfectly appropriate. In other cases, less so. Let's say that the validation of one of your strings required a call to a database. This would most likely not be appropriate to have in both the client/server locations.

"Scenario 1"

It sounds like you are seeking consistent formatting. This is a worthy goal and works great for things like URIs and perhaps usernames. For more complex strings, this can be a challenge. I've worked on products where even "simple" strings can be formatted in many different ways depending on context. In such cases, dedicated (and perhaps reusable) formatters may be more appropriate.

Again, very situation-specific.

"Scenario 2"

Even worse, if someone were to change the implementation of GetFriends such that it introduces a bug and produces invalid username strings, that error could silently propagate to callers of the method and wreak all kinds of havoc.

IEnumerable<Username> GetFriends(Username user) { } 

I can see this argument. A few things come to mind:

  • A better method name: GetUserNamesOfFriends()
  • Unit/integration testing
  • Presumably these usernames are validated when they are created/modified. If this is your own API, why wouldn't you trust what it gives you?

Side note: when dealing with people/users, an immutable ID is probably more useful (people like changing usernames).

"Scenario 3"

System.Uri is an example of a class in .NET that does little more than wrap a string that has a huge number of formatting constraints and helper properties/methods for accessing useful parts of it. So that's one piece of evidence that this approach isn't totally crazy.

No argument there, there are many such examples in the BCL.

Final Thoughts

  • There's nothing wrong with wrapping a value into a more complex type so that it may be described/manipulated with richer metadata.
  • Centralizing validation in a single place is a good thing, but make sure you pick the right place.
  • Crossing serialization boundaries can present challenges when logic resides within the type being passed.
  • If you are mainly focused on trusting the input, you could use a simple wrapper class that lets the callee know that it is receiving data that has been validated. It doesn't matter where/how this validation has occurred.

ASP.Net MVC uses a similar paradigm for strings. If a value is IMvcHtmlString, it is treated as trusted and not encoded again. If not, it is encoded.

like image 101
Tim M. Avatar answered Sep 22 '22 05:09

Tim M.