Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP OOP design - limiting parameters to specific child classes while implementing generic interfaces

I often do PHP projects designed to scrape hierarchical data from web pages and save them to the DB (essentially, structure the data - think scraping government websites that do have the data, but do not provide it in a structured way). Each time, I try to come up an OOP design that would allow me to achieve the following:

  • Easily replace current HTML parsing scripts with new ones, in case the original web page changes
  • Allow easy extensions of the data scraped and saved, as these projects are also meant for others to take and build on. My aim is to collect the "base" data, while others might decide to include something extra, change the way it is saved and etc.

So far I am yet to find the solution, but the closest I got it something like this:

I define an abstract class for data containers that would implement common tree-traversing functions:

abstract class DataContainer {

  protected $parent = NULL;
  protected $children = NULL;   

  public function getParent() {
    return $this->parent;
  }

  public function getChildren() {
    return $this->children;
  }             
}

And then I have the actual data containers. Imagine, I am scraping data on participation in parliamentary sessions down to a "specific question in a sitting" level. I would have SessionContainer, SittingContainer, QuestionContainer that would all extend the DataContainer.

Each of the session, sitting and question data are scraped from a different URL. Leaving the mechanism of getting the URL content aside, let's just say I need scraper classes, which would take the containers and a DOmDocument for actual parsing. So I would define an generic interface like this:

interface Scraper {
  public function scrapeData(DOMDocument $Dom, DataContainer $DataContainer);   
}

Then, each of the session, sitting and question would have their own scrapers, which implement the interface. But I'd also like to ensure that they only can accept the containers they are meant for. So it would look like:

class SessionScraper implements Scraper {
  public function scrapeData(DOMDocument $DOM, SessionContainer $DataContainer) {
  }
}

Finally, I would have a generic Factory class that also implements Scraper interface and just distributes the scraping to relevant scrapers. Like this:

public function scrapeData(DOMDocument $DOM, DataContainer $DataContainer) {
  //get the scraper from configuration array
  $class = $this->config[get_class($DataContainer)];
  $craper = new $class();
  $class->scrapeData($DOM, $DataContainer);
}

This is the class that would be actually called in the code. Very similarly, I could deal with saving to DB - each data container could have its DBSaver class, which would implement DBSaver interface. Again, all the calls could be done via the Factory class, which would also implement the DBSaver interface.

Everything would be perfect, but the problem is that classes that implement the interface should implement exact signature of the interface. E.g. method SessionScraper::scrapeData cannot accept only SessionContainer objects, it must accept all DataContainer objects. But it is not meant to!

Finally, the question:

  • Is my design wrong and I should be structuring everything in a completely different way? (how?), or:
  • My design is OK, it's just that I need to enforce types within methods with instanceof and similar checks instead of enforcing it via typehinting?

Thanks in advance for all the suggestions / criticisms. I am completely happy with somebody overturning this code on its head, if necessary!

like image 914
Aurimas Avatar asked Oct 06 '11 21:10

Aurimas


People also ask

Can child classes override properties of their parents?

In the same way that the child class can have its own properties and methods, it can override the properties and methods of the parent class. When we override the class's properties and methods, we rewrite a method or property that exists in the parent again in the child, but assign to it a different value or code.

Can a class implement multiple interfaces in PHP?

Multiple interfaces can be implemented by a single class. The keyword "interface" is used to declare an interface. Non-abstract methods cannot be maintained by interfaces.

What is the use of $this in PHP?

$this is a reserved keyword in PHP that refers to the calling object. It is usually the object to which the method belongs, but possibly another object if the method is called statically from the context of a secondary object. This keyword is only applicable to internal methods.

How the child class can access the properties of parent class explain with PHP example?

Inheritance in OOP = When a class derives from another class. The child class will inherit all the public and protected properties and methods from the parent class. In addition, it can have its own properties and methods. An inherited class is defined by using the extends keyword.


1 Answers

Container springs into the eye. This name is very generic, you might need something more dynamic. I think you have Data and you classify it, so it has a type.

So instead you hardcode the exact interface into the type hinting, you should resolve this dynamically.

If now each Container would have a type, the Scraper could signal/tell whether or not it is applicable for the type of Container.

The concrete form of scraping is actually the strategy you use for specific data to parse it. Your container encapsulates this strategy providing an interface to the normalized data.

You just only need to add some logic/contract between Container and Scraper so that they can talk to each other. This contract you can put inside the interface of both.

This would also allow you to have a Scraper that can deal with multiple types if you want to stretch it.

For your Container, take a look into SPL as well that you implement some interfaces so that you have iterators (and recursive iterators) available. This might be the generic structure you're referring to, and the SPL could boost the usability of your Container classes.

You do not need to hardcode everything in OOP, you can keep things dynamic and especially in PHP you normally resolve things at runtime.

This will also allow you to easier replace Scrapers with a new version. As Scrapers now would have a type by definition (as suggested above), you can resolve at runtime which concrete class should do the scraping, e.g. dynamically loading them from a .php file in a nice file-system structure.

Just my 2 cents.

like image 177
hakre Avatar answered Sep 21 '22 20:09

hakre