Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correct approach when SRP violation seems to pass complexity down the chain

Currently in the middle of a substantive bit of refactoring work after realising my classes were all over the place. I'm trying to split things up a bit, to follow the SRP better, but I have always found it hard to evaluate the maxim of whether a class has "one reason to change". I'm hoping this practical example might help me understand.

The code in question is designed to clean data. Currently there are two separate processes here - we clean address data by using an external application which is called via the code. We clean other data fields using internal algorithms in C#.

This refactor started when I was told that we might want to change both these processes in the future - for example to use database stored procedures to do both these jobs rather than C# code and the external application. So my first instinct was to take these two functions and hide them behind interfaces (FileRow and FileContents are just DTOs):

public interface IAddressCleaner
{
    string CleanAddress(StringBuilder inputAddress);
    void CleanFile(FileContents fc);
}

public interface IFieldCleaner
{
    string CleanPhoneNumber(string phoneToClean);
    void CleanAllPhoneFields(FileRow row, FileContents fc);
    void MatchObscentities(FileRow row, FileContents fc);
    void CleanEmailFields(FileRow row, FileContents fc);
}

Which is fine. Realistically, however, I can't imagine a class would ever use one of these without the other. So it would seem to make sense to amalgamate them (and their implementations) into a single class. That also makes sense given we might replace both functions with a single solution such as a database.

On the flip side, it would seem that the IFieldCleaner already violates the SRP, because it's doing three things: cleaning phone numbers, emails and looking for rude words, all of which are logically distinct processes. So there would seem to be reason to split it into an IPhoneCleaner, IObscenityMatcher and IEmailCleaner.

What particularly bothers me about the latter approach is that these classes are used in a service, which already has a silly number of interface dependencies:

public class ReadFileService : IExecutableObject
{
    private ILogger _log;
    private IRepository _rep;
    private IFileHelper _fileHelp;
    private IFieldCleaner _fieldCleaner;
    private IFileParser _fileParser;
    private IFileWriter _fileWriter;
    private IEmailService _mailService;
    private IAddressCleaner _addressCleaner;

    public ReadFileService(ILogger log, IRepository rep, IFileHelper fileHelp, IFileParser fileParse, IFileWriter fileWrite, IEmailService email, IAddressCleaner addressCleaner)
    {
        // assign to privates
    }

  // functions
}

And that, in turn, also looks like it's violating the SRP to a ludicrous degree, without adding an additional two interfaces to it.

What's the right approach here? Should I have a single ICleaner interface, or split it into five?

like image 592
Bob Tway Avatar asked Jun 08 '16 09:06

Bob Tway


1 Answers

Disclaimer: I am no expert and folks might disagree with some of my thoughts here. It's tough to provide a straight answer because it depends heavily on what's behind the curtain here. There are probably also many "right" answers but it all depends on so much info we are missing here. Still, no one has answered yet and I thought there are some things I could point out that might guide you in the right direction.

Best of luck!


Do you have access to Pluralsight? Buying a quick month is totally worth it just to go through Encapsulation and SOLID. One of the "ah-hah" moments I had when going through it was about taking a look at your method signatures to help identify interfaces you could extract to help simplify code. Ignore the names, just look at the parameters.

I'll try to go through the exercise with the code you provided but I will need to make assumptions along the way that may be incorrect.

On IFieldCleaner you've got 3 methods with the same signature:

void CleanAllPhoneFields(FileRow row, FileContents fc);
void MatchObscentities(FileRow row, FileContents fc);
void CleanEmailFields(FileRow row, FileContents fc);

Notice how those methods all are the exact same. This suggests you could extract a single interface with 3 implementations:

interface IFieldCleaner {
  void Clean(FileRow row, FileContents fc);
}
class PhoneFieldCleaner : IFieldCleaner { }
class ObscentitiesFieldCleaner : IFieldCleaner { }
class EmailFieldCleaner : IFieldCleaner { }

Now this has nicely split up the responsibilities of cleaning those fields into bite-sized classes.

Now you have a couple other cleaning methods:

string CleanPhoneNumber(string phoneNumber);
string CleanAddress(StringBuilder inputAddress);

These are pretty similar, except one takes a StringBuilder presumably because the implementation cares about the individual lines? Let's just switch it to a string and assume the implementation will take care of line splitting/parsing, then we get the same result as before--two methods with the same signature:

string CleanPhoneNumber(string phoneNumber);
string CleanAddress(string inputAddress);

So, following our previous logic, let's also create an interface related to sanitizing strings:

interface IStringCleaner {
  string Clean(string s);
}

class PhoneNumberStringCleaner : IStringCleaner { }
class AddressStringCleaner : IStringCleaner { }

Now we've separated those responsibilities into their own implementations.

At this point, we only have one method left to tackle:

void CleanFile(FileContents fc);

I am not sure what this method does. Why is it part of IAddressCleaner? So, for now I will leave it out of the discussion--perhaps it is a way to read the file, find the address, then clean it, in which case you can do by calling our new AddressStringCleaner.

So let's see where we're at so far.

interface IFieldCleaner {
  void Clean(FileRow row, FileContents fc);
}

class PhoneFieldCleaner : IFieldCleaner { }
class ObscentitiesFieldCleaner : IFieldCleaner { }
class EmailFieldCleaner : IFieldCleaner { }

interface IStringCleaner {
  string Clean(string s);
}

class PhoneNumberStringCleaner : IStringCleaner { }
class AddressStringCleaner : IStringCleaner { }

These all seem similar to me and something smells. Based on your original method names like CleanAllFields, it appears you may be using a loop to just clean certain columns from a FileRow? But why also depend on FileContents? Again, I can't see your implementation so I am not quite sure. Perhaps you intend to pass a raw file or database input?

I also cannot see where you store the cleaned results--most of your previous methods returned void which means that calling the method has some side-effect (i.e. it's a Command) whereas a couple methods only returned a clean string (a Query).

So I will assume that the overall intent is to clean strings no matter where they are sourced from and also store them back somewhere. If that's the case, we can simplify our model even further:

interface IStringCleaner {
  string Clean(string s);
}

class PhoneNumberStringCleaner : IStringCleaner { }
class AddressStringCleaner : IStringCleaner { }
class ObscenitiesStringCleaner : IStringCleaner { }
class EmailStringCleaner : IStringCleaner { }

Notice we've removed the need for a IFieldCleaner, because these string cleaners only deal with an input string to clean.

Now back to your original context--it seems you can source data from a file and that these files may have rows? These rows contain columns whose values we need to clean. We also need to persist the cleaned changes we made.

So based on the service you provided, I see a couple things that might help us:

IRepository
IFileHelper
IFileWriter
IFileParser

My assumptions are that we intend to persist the cleaned fields back--to where I'm not sure given I see a "Repository" and then also a "FileWriter."

No matter, we know we need to eventually get the strings out of fields, maybe IFileParser will help with that?

interface IFileParser {
  FileContents ReadContents(File file);
  FileRow[] ReadRows(FileContents fc);
  FileField ReadField(FileRow row, string column);
}

This might be more complex than it needs to be--FileField could take care of storing the field value so presumably you could compose all these back together to form a FileContents to persist back to disk.

So, now we have separated our end goals (clean stuff) from where the input comes from (files, database, etc.) and how we persist it (back to files, database, etc.).

You can now use your service to compose this flow as you see fit. For example, you said currently you call an external program to clean addresses? No problem:

class ExternalAddressStringCleaner : IStringCleaner {
  // depend on whatever you need here

  public string Clean(string s) {
    // call external program
    return cleanString;
  }
}

Now you switch to a stored procedure? Okay, no problem either:

class DatabaseAddressStringCleaner : IStringCleaner {

  // depend on database
  DatabaseAddressStringCleaner(IRepository repository) {
  }

  string Clean(string s) {
    // call your database sproc
    return cleanString;
  }
}

It's hard to recommend ideas for your service--but it's possible you can split it up into separate smaller services (FileReaderService, FileCleaningService, and FileStoreService) or simplify the dependencies you take.

Now that you just have a single interface IStringCleaner, you can just declare the cleaners you require and swap them out/change them whenever.

public FileCleanerService {
  private IStringCleaner _addressCleaner;
  private IStringCleaner _phoneCleaner;
  private IStringCleaner _obscenityCleaner;
  private IStringCleaner _emailCleaner;

  ctor(IFileParser parser, /* deps */) {
    _parser = parser;

    _addressCleaner = new ExternalAddressStringCleaner(/* deps */);
    _phoneCleaner = new PhoneStringCleaner();
    _obscenityCleaner = new ObscenityStringCleaner();
    _emailCleaner = new EmailStringCleaner();
  }

  public void Clean(FileContents fc) {

    foreach(var row in _parser.ReadRows(fc)) {
      var address = _parser.ReadField(row, "Address");
      var phone   = _parser.ReadField(row, "Phone");
      var post    = _parser.ReadField(row, "PostContent");
      var email   = _parser.ReadField(row, "Email");

      // assumes you want to write back to the field?
      // handle this however you want
      address.Value = _addressCleaner.Clean(address.Value);
      phone.Value = _phoneCleaner.Clean(phone.Value);
      post.Value = _obscenityCleaner.Clean(post.Value);
      email.Value = _emailCleaner.Clean(email.Value);
    }

}

I've made a lot of assumptions about your process and code, so this is probably much more complex than I assumed. Without having all the information, it's tough to provide guidance--but there are still basic things you can reason about just by looking at interfaces and names and I hope I've demonstrated that. Sometimes you just have to look past the surface to see the 1s and 0s behind the Matrix and then it all makes sense ;)

Apologies for the long post but I totally understand where you're coming from. Figuring out how to refactor things is daunting, confusing, and nobody seems to be able to help. Hopefully this gives you somewhere to start when refactoring. It's a daunting task, but just stick to some simple guidelines and patterns and it'll probably end up being much easier to maintain anyway based on the effort you put in. And again, I absolutely recommend that PluralSight course.

like image 123
kamranicus Avatar answered Oct 20 '22 00:10

kamranicus