Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Good design pattern(s) for extensible program [closed]

I have a question about how to make a good design for my program. My program is quite simple but I want to have good architecture and make my program easily extensible in the future.

My program need to fetch data from external data sources (XML), extract information from these data and at the end it need to prepare SQL statements to import information to database. So for all the external data sources that there are now and there will be in the future there is simple 'flow' of my application: fetch, extract and load.

I was thinking about creating generic classes called: DataFetcher, DataExtractor and DataLoader and then write specific ones that will inherit from them. I suppose I will need some factory design pattern, but which? FactoryMethod or Abstract Factory?

I want also to not use code like this:

if data_source == 'X':
     fetcher = XDataFetcher()
elif data_source == 'Y':
     fetcher = YDataFetcher()
....

Ideally (I am not sure if this is easily possible), I would like to write new 'data source processor', add one or two lines in existing code and my program would load data from new data source.

How can I get use of design patterns to accomplish my goals? If you could provide some examples in python it would be great.

like image 850
sebast26 Avatar asked Feb 08 '13 15:02

sebast26


2 Answers

If the fetchers all have the same interface, you can use a dictionary:

fetcher_dict = {'X':XDataFetcher,'Y':YDataFetcher}
data_source = ...
fetcher = fetcher_dict[data_source]()

As far as keeping things flexible -- Just write clean idiomatic code. I tend to like the "You ain't gonna need it" (YAGNI) philosophy. If you spend too much time trying to look into the future to figure out what you're going to need, your code will end up too bloated and complex to make the simple adjustments when you find out what you actually need. If the code is clean up front, it should be easy enough to refactor later to suit your needs.

like image 50
mgilson Avatar answered Sep 20 '22 22:09

mgilson


You have neglected to talk about the most important part i.e. the shape of your data. That's really the most important thing here. "Design Patterns" are a distraction--many of these patterns exist because of language limitations that Python doesn't have and introduce unnecessary rigidity.

  1. Look first at the shape of your data. E.g.:
    1. First you have XML
    2. Then you have some collection of data extracted from the XML (a simple dict? A nested dict? What data do you need? Is it homogenous or heterogenous? This is the most important thing but you don't talk about it!)
    3. Then you serialize/persist this data in an SQL backend.
  2. Then design "interfaces" (verbal descriptions) of methods, properties, or even just items in a dict or tuple to facilitate operations on that data. If you keep it simple and stick to native Python types, you may not even need classes, just functions and dicts/tuples.
  3. Re-iterate until you have the level of abstraction you need for your application.

For example, the interface for an "extractor" might be "an iterable that yields xml strings". Note this could be either a generator or a class with an __iter__ and next() method! No need to define an abstract class and subclass it!

What kind of configurable polymorphism you add to your data depends on the exact shape of your data. For example you could use convention:

# persisters.py

def persist_foo(data):
    pass

# main.py
import persisters

data = {'type':'foo', 'values':{'field1':'a','field2':[1,2]}}
try:
   foo_persister = getitem(persisters, 'persist_'+data['type'])
except AttributeError:
   # no 'foo' persister is available!

Or if you need further abstraction (maybe you need to add new modules you can't control), you could use a registry (which is just a dict) and a module convention:

# registry.py
def register(registry, method, type_):
    """Returns a decorator that registers a callable in a registry for the method and type"""
    def register_decorator(callable_):
        registry.setdefault(method, {})[type_] = callable_
        return callable_
    return register_decorator

def merge_registries(r1, r2):
    for method, type_ in r2.iteritems():
        r1.setdefault(method, {}).update(r2[method])

def get_callable(registry, method, type_):
    try:
        callable_ = registry[method][type]
    except KeyError, e:
        e.message = 'No {} method for type {} in registry'.format(method, type)
        raise e
    return callable_

def retrieve_registry(module):
    try:
        return module.get_registry()
    except AttributeError:
        return {}

def add_module_registry(yourregistry, *modules)
    for module in modules:
        merge_registries(yourregistry, module)

# extractors.py
from registry import register

_REGISTRY = {}

def get_registry():
    return _REGISTRY


@register(_REGISTRY, 'extract', 'foo')
def foo_extractor(abc):
    print 'extracting_foo'

# main.py

import extractors, registry

my_registry = {}
registry.add_module_registry(my_registry, extractors)

foo_extracter = registry.get_callable(my_registry, 'extract', 'foo')

You can easily build a global registry on top of this structure if you want (although you should avoid global state even if it's a little less convenient.)

If you are building public framework and you need a maximum of extensibility and formalism and are willing to pay in complexity, you can look at something like zope.interface. (Which is used by Pyramid.)

Rather than roll your own extract-transform-load app, have you considered scrapy? Using scrapy you would write a "Spider" which is given a string and returns sequences of Items (your data) or Requests (requests for more strings, e.g. URLs to fetch). The Items are sent down a configurable item pipeline which does whatever it wants with the items it receives (e.g. persist in a DB) before passing them along.

Even if you don't use Scrapy, you should adopt a data-centric pipeline-like design and prefer thinking in terms of abstract "callable" and "iterable" interfaces instead of concrete "classes" and "patterns".

like image 29
Francis Avila Avatar answered Sep 19 '22 22:09

Francis Avila