Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you deal with strings that have structure?

Suppose I have an object representing a person, with getter and setter methods for the person's email address. The setter method definition might look something like this:

setEmailAddress(String emailAddress)
    {
    this.emailAddress = emailAddress;
    }

Calling person.setEmailAddress(0), then, would generate a type error, but calling person.setEmailAddress("asdf") would not - even though "asdf" is in no way a valid email address.

In my experience, so-called strings are almost never arbitrary sequences of characters, with no restriction on length or format. URIs come to mind - as do street addresses, as do phone numbers, as do first names ... you get the idea. Yet these data types are most often stored as "just strings".

Returning to my person object, suppose I modify setEmailAddress() like so

setEmailAddress(EmailAddress emailAddress)
    // ...

where EmailAddress is a class ... whose constructor takes a string representation of an email address. Have I gained anything?

OK, so an email address is kind of a bad example. What about a URI class that takes a string representation of a URI as a constructor parameter, and provides methods for managing that URI - setting the path, fetching a query parameter, etc. The validity of the source string becomes important.

So I ask all of you, how do you deal with strings that have structure? And how do you make your structural expectations clear in your interfaces?

Thank you.

like image 885
Adam Siler Avatar asked Feb 09 '09 23:02

Adam Siler


2 Answers

"Strings with structure" are a symptom of the common code smell "Primitive Obsession".

The remedy is to watch closely for duplication in code that validates or manipulates parts of these structures. At the first hint of duplication - but not before - extract a class that encapsulates the structure and locate validations and queries there.

like image 65
Morendil Avatar answered Sep 27 '22 23:09

Morendil


Welcome to the world of programming!

I don't think your question is a symptom of an error on your part. Rather it is a basic problem which appears in many guises throughout the programming world. Strings that have some structure and meaning are passed around between different subsystems of an application and each subsystem can only do much parsing and validation.

The problem of verifying an email address, for example, is quite tricky. The regular expressions various people offer accepting an email address, for example, are generally either "too tight" (don't accept everything) or "too loose" (accept illegal things). The first google hit for 'regex "email address"', for example says:

The regular expression I receive the most feedback, not to mention "bug" reports on, is the one you'll find right on this site's home page: \b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}\b Analyze this regular expression with RegexBuddy. This regular expression, I claim, matches any email address. Most of the feedback I get refutes that claim by showing one email address that this regex doesn't match.

The fact is the what is or isn't a valid email address is a complex problem, one that a given program might or might not want to solve. The problem of URLs is even worse, especially given the possibility of malicious URLS.

Ideally, you can have a library or system-call which solves problems of this sort instead of doing anything yourself (Microsoft windows calls a custom dialogue box to allow the user to select or create a file, since validating file names is another tricky problem). But you can't always count on having an appropriate system call for a given "meaningful string" either.

I would say that there no a generic solution to the problem of strings-with-structure. Rather, it is a basic problem that appears right when you design your application. In the process of gathering requirements for your application, you should determine what data the application will take in and how meaningful that data will be to the application. And this is where things get tricky, since you may notice the possibility that the app may grow in ways that your boss or customer might not have thought of - or the app may in fact grow in ways that none of you thought of. Thus the application needs to be a little more flexible than what seems like the minimum BUT only a little. It should also not be so flexible you get bogged down.

Now, if you decide that you need to validate/interpret etc a given string, putting that string into an object or a hash can be a good approach - this is one way I know to make sure your interface is clear. But the tricky thing is deciding just how much validation or interpretation you need.

Making these decisions is thus an art - there are no dogmatic answers that work here.

like image 31
Joe Soul-bringer Avatar answered Sep 27 '22 22:09

Joe Soul-bringer