I've almost finished my Data Mapper, but now I'm at the point where it comes to relationships.
I will try to illustrate my ideas here. I wasn't able to find good articles / informations on this topic, so maybe I'm re-inventing the wheel (for sure I am, I could just use a big framework - but I want to learn by doing it).
1:1 Relationships
First, lets look at 1:1 relationships. In general, when we've got an domain class called "Company" and one called "Address", our Company class will have something like address_id. Lets say in most cases we just display a list of Companies, and the address is only needed when someone looks at the details. In that case, my Data Mapper (CompanyDataMapper) simply loads lazily, meaning it will just fetch that address_id from the database, but will not do a join to get the address data as well.
In general, I have an getter method for every Relationship. So in this case, there's an getAddress(Company companyObject) method. It takes an company object, looks for it's address property and - if it's NULL - fetches the corresponding Address object from the database, using the Mapper class for that Address object (AddressDataMapper), and assigns that address object to the address property of the specified company object.
Important: Is a Data Mapper allowed to use another Data Mapper?
Lets say in most cases you need both the company object AND the address object, because you always display it in a list all together. In this case, the CompanyDataMapper not only fetches company objects, but does an SQL query with JOIN to also get all the fields of the address object. Finally, it iterates over the record set and feeds new objects with their corresponding values, assigning the address object to the company object.
Sounds simple, so far.
1:n Relationships
How about these? The only difference to 1:1 is, that an Company may have multiple Address objects. Lets have a look: When we're most of the time only interested in the Company, the Data Mapper would just set the addresses property of the company object to NULL. The addresses property is an array which may reference none, one or multiple addresses. But we don't know yet, since we load lazily, so it's just NULL. But what, if we would need all the addresses in most cases as well? If we would display a big list with all companys together with all their addresses? In this case, things start to get really ugly. First, we can't join the address table fifty times for every address object (I strongly believe that's impossible, and if it is, performance would be below zero). So, when we think this further down the road, it's impossible to NOT load lazily in this case.
Important: Is this true? Must I send out 100 queries to get 100 address objects, if I have 10 companies with each 10 addresses?
m:n Relationships
Lets say an address object only contains the country, state, city, road and house number. But one house could be a big business tower with lots of companies in them. Like one of those modern office buildings where anyone can rent a small rom to show off that tower on its website. So: Many companies can share the same address.
I have no plans yet to deal with that kind of problem.
Important: Probably it's not a bigger problem than the 1:n Relationships?
If anyone knows a good ressource that goes into details about solving / implementing this, I would be happy about a link!
This is useful when one needs to model and enforce strict business processes on the data in the domain layer that do not map neatly to the persistent data store. The layer is composed of one or more mappers (or Data Access Objects), performing the data transfer.
Database Mapper gives you a visual display of data lineage for impact analysis, so you can immediately understand data dependencies across the entire data stack. Close. Build a data dictionary.
A Data Mapper is a Data Access Layer that performs bidirectional transfer of data between a persistent data store (often a relational database) and an in-memory data representation (the domain layer).
Before I even start, I'd assume you've read PoEAA book from Fowler from beginning to end. =) Also, I'll consider that you already thought of first initial issues you face when dealing with ORMs. I can highlight an easy one, such as calling a DataMapper multiple times using the same identifier and always returning same object (read as IdentityMap).
Important: Is a Data Mapper allowed to use another Data Mapper?
It is only possible to have one DataMapper access another one if the second is a weak reference on second.
Lets say in most cases you need both the company object AND the address object, because you always display it in a list all together. In this case, the CompanyDataMapper not only fetches company objects, but does an SQL query with JOIN to also get all the fields of the address object. Finally, it iterates over the record set and feeds new objects with their corresponding values, assigning the address object to the company object.
The problem you're trying to discuss here sounds simple in practice, but it is a bit complex behind the scenes.
First of all, you shouldn't have a getAddress(Company), but rather benefit from having Proxy objects. A proxy is a non-initialized representation of a given instance. In this case, a Proxy contains a reference to which entry you're looking for. It must extend from your original object and needs to provide an initialization method, together with a related DataMapper to load it.
The second part about JOINing and loading multiple objects at once is called Hydrator. Hydrators receives a flat structure of lines and columns and convert into an object graph. But it really enters on a separate issue: if you're purely dealing with objects, why are you fetching tables? Trying to take a object fetching approach would lead you to implement a sort of OQL (Object Query Language).
Important: Is this true? Must I send out 100 queries to get 100 address objects, if I have 10 companies with each 10 addresses?
Dealing with a collection of objects is a nightmare in PHP. Yes, the language sucks a lot for the lack of a powerful collection implementation. Basically, you are required to deal with different situations here: - new instance and all elements in this list of elements are new - new instance and all elements in this list of elements are pre-existent - new instance and elements in this list of elements are mixed between new and pre-existent - pre-existing instance and not touching anything on the list of elements - pre-existing instance and manipulating items on the list
I'm being very simplistic here, but the main point I want to highlight you is the need of a Collection object. There're two of them: one that deals with new lists and one that deals with existent lists. The one that deals with existent lists need to be able to load the collection once you try to access anything inside of it. That's the only way to not have n + 1 issues.
Here it also highlights the next big problem you'd have to deal. Associations can be uni-directional or bi-directional. This means that Company knows about Address but Address have no idea about Company is uni-directional, while a User is part of many Groups and Groups contains many Users is a bi-directional association. Things easily become a nightmare here and that's why you require Mapping patterns to properly understand what's going on.
Dealing with many-to-many is just the same as dealing with collections in general.
There is an important part you haven't considered yet. If I build my entire object graph (Company and Address) and I decide to persist them... it needs to persist both or do I have to manually tell what I want to persist? Both ways have different sets of problems. Let's assume you want the first approach. You just entered in what I consider one of the most complex design patterns to implement: UnitOfWork. Then you'd have to deal with sorting the order of entities to be applied to not generate constraint problems (read Topological Sorting on how to solve this). If you take the second approach, you may easily enter on a situation where it feels your tool is broken, mainly because it's very easy to have your object graph in an inconsistent state.
Finally... are you planning to do ANY support for inheritance? If positive, your entire planning just entered on a whole new level. =( Trying to explain would take me a book. But I can point some design patterns you can look at: Concrete Table Inheritance (1 class, 1 table), Single Table Inheritance (N classes, 1 table) and Class Table Inheritance (N classes, M tables).
I can go in depth on many different points here, but ORMs normally leads to head explodes. I'll stop for now.
PS: I'm one of the core developers of Doctrine ORM. Unless you're doing this for study purposes, don't bother trying to create another one. It's an extremely complex, time consuming and it demands lots and lots of planning on how things would work before you even code the first line. As a matter of fact, we planned Doctrine ORM for 2 years and took 1 year to implement reliably the core functionality. I'm not discouraging you, but as Fowler's said on his ORM hate article, it's a complex solution for an even complex problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With