Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A means of specifying pattern strings that drive parsing and formatting for arbitrary objects?

I'm building a general purpose data translation tool for internal enterprise use, using Java 5. The various departments use differing formats for coordinate information (latitudes/longitudes), and they want to see the data in their own format. For example, the coordinates of the White House in DMS format are

38° 53' 55.133" N, 77° 02' 15.691" W

But can also be expressed as:

385355.133 / -0770215.691

I want to represent the pattern required by each system as a string, and then use those patterns to parse instance data from the input system, and also use that pattern when formatting a string for consumption by the output system.

So it is not unlike a date/time formatting problem, for which the JDK provides java.text.SimpleDateFormat that lets you convert among various date/time patterns, which are defined by strings such as "YYYY-MM-DD" or "MM/DD/YY".

My question is, do I have to build this CoordinateFormat thing totally from scratch, or is there a good general tool or well-defined approach I can use to guide me in this endeavor?

like image 264
Kevin Pauli Avatar asked Jun 25 '09 20:06

Kevin Pauli


1 Answers

If I read it right, you're talking about the problem addressed by the Interpreter pattern, but sort of going in both directions.

There are some easy ways to get nice generic interfaces, so you can get the rest of the thing running. My recommendation on that is something like:

public interface Interpreter<OutputType> {
public void setCode(String coding);
public OutputType decode(String formattedData);
public String encode(OutputType rawData); }

However, there are a couple of hurdles with concrete implementations. For your date example, you might need to deal with "9/9/09", "9 SEP 09", "September 9th, 2009". The first "kind" of date is straightforward - numbers and set divider symbols, but either of the other two is pretty nasty. Honestly, doing something totally generic (which could already be canned) probably isn't reasonable, so I recommend the following.

I'd attack it on two levels, the first of which is pretty straightforward with regex and format string: chomping up the data string into the things that are going to become raw data. You'd supply something like "D*/M*/YY" (or "M*/D*") for the first one, "D* MMM YY" for the second, and "Mm+ D*e*, YYYY" for the last, where you've defined in your data some reserved symbols (D, M, Y, obvious interpretations) and for all data types (* multiple characters possible, + "full" output, e defined extraneous characters) - these symbols obviously being specific to your application. Then your regex stuff would chomp the string up, feeding everything associated with each reserved character to the individual data fields, and saving the decoration part (commas, etc) in some formatting string.

This first level can all be fairly generic - each data type (e.g., date, coordinate, address) has reserved symbols (which don't overlap with any formatting characters), and all data types have some shared symbols. Perhaps the Interpreter interface would also have public List<Character> reservedSymbols() and public void splitCode(List<String> splitcodes) methods, or perhaps guaranteed fields, so that you can make the divider an external class and pass in the results.

The second level is less easy, because it gets at the part that can't be generic. Based on the format of the reserved symbols, the individual fields need to know how to present themselves. To the date example, MM would tell the month to print as (01, 02, ... 12), M* as (1, 2, ... 12), MMM as (JAN, FEB, ... DEC), Mmm as (Jan, Feb, ...Dec), etc. If your company has been somewhat consistent or doesn't venture too far from standard representations of stuff, then hand coding each of these shouldn't be too bad (and in fact, there are probably smart ways within each data type to reduce replicated code). But I don't think it's practical to generify all this stuff - I mean, practically representing that something that can be presented as a number or characters (like months) or whole data that can be inferred from partial data (e.g., century from year) or how to get truncated representations from the data (e.g., the truncation for year is to the last two digits vice most normal numbers truncating to two leading digits) is probably going to take as long as handwriting those cases, though I guess I can imagine cases of your application the trade-off might be worth it. Date is really tricky example, but I can certainly see equally tricky things coming up for other sorts of data.

Summary:

-there's an easy generic face you can put on your problem, so the rest of your app can be coded around it.

-there's a fairly easy and generic first pass parsing, by having universal reserved symbols, and then reserved symbols for each data type; make sure these don't collide with symbols that will appear in formatting

-there's a somewhat tedious final coding stage for individual data bits

like image 67
Carl Avatar answered Nov 13 '22 20:11

Carl