Software-design & architecture: How to sync data from a directory-tree with a database

Tags:

I´m twisting my head now for a while and don´t get to a final solution. So I hope I might find some exchange or help on how to solve this issue here on an architectural level.

I´m currently facing the following scenario: I want to write a web-application (I do it with Java, but this is not really relevant for a solution, as this is currently an question on a higher level), where there is this kind of relation:

Event --1:n--> Team --1:n--> Participant

Meaning: I have an event, in which will be a number of teams, having a number of participants. So far so good - this would be an easy relation in a SQL-database.

But then there is also a directory-tree, representing the same relation in a file-structure:

+--event1
|  +--team1
|  |  +--participant1
|  |  +--participant2
|  |  +--participant3
|  +--team2
|  |  +--participant4
|  +--team3
+--event2
|  +--team4
...

(I think, you got the idea) So in each participant´s directory are numerous files, which are copied to this directory via the file-system. Whenever there is a directory on the file-system, this should be connected to a corresponding entry in the database, where there is some additional data, that should be displayed together with the files in the web-GUI. It is not defined, what will be there at first (database-entry or directory) as this is operated by different users.

Now there are a couple of things to keep in mind, which make kind of sense to me:

When a directory-name changes (either event, team or participant), it should still relate to the same entry in the database (because there might be other entities, which still relate for example to a participant)
The directory of any event/team/participant might be deleted - the data in the database should then remain. BUT - if a new directory with the same name is created again at a later time and the event is 'closed', this directory will then point to a new database-entry (e.g. a new event). If the event is still active, then the creation of a directory with the same name should map to the previously assigned entry in the database.
Ideally a creation of a directory already leads to the creation of an corresponding database-entry.
It should also be possible to create an event/team/participant in the web-GUI, which then automatically creates a corresponding directory on the filesystem.

I hope my description is good enough to understand the scenario. I already have some things in mind, but all of them don´t really convince myself to be a robust solution. So hopefully one of you already have some idea on that. I´m pretty open to any technology or framework, which might help to solve this problem.

I´m looking forward to your ideas and a nice discussion!

Thanks for your help!

318

asked Sep 04 '17 19:09

digital-h

3 Answers

First of all the uniqueness of the directories must be designed. Did you consider to use hidden file, containing a unique key, inside of each watched directory? If there's no hi-load system, the creation time might be used.

Having the unique key in the file system it's no so hard to reflect the existing unique keys in the database and organize synchronization between two storages.

129

answered Nov 03 '22 22:11

edwgiz

The first principle i would look at is to have a "single source of thruth". Where is the name (the human-readable name) of the events/team/partecipants? into the database or into the filesystem?

The second principle: you wrote about "database entries" and "files" but these are just reprentations of the information of your domain. Design the data model first and then your data source can be organized to reflect that model

summing up, you can assign unique immutable ids to the entities in the domain model. Make names plain attributes of your entities and then implement your business rules as listed. You will implement you model as DS and as a file structure, you will access them through repositories that applies the same mutations over data keepeing in sync the minimal shared knowledge, like the ids

But i still have the doubt that you are using too many sources. Are you sure that you're not fine using just a DB or just a filsystem?

answered Nov 03 '22 23:11

Carmine Ingaldi

Use a hidden file with a name like .meta to contain some database information, at minimum the ID of the folder, and have a background process (daemon) that will scan the directory hierarchy every X seconds, compare what's there with what's in the database, and make the necessary adjustments. Stuff that gets deleted on the filesystem gets a "deleted" flag in the DB, stuff that's renamed has its name in the database changed, anything that needs to be added gets inserted, and additionally if a once-deleted folder is re-created, remove the "deleted" flag and re-create the subsidiary files in the directory.

Alternatively, if this is going to be an NFS drive or something like that, consider simulating the filesystem with a lightweight backend that translates delete, rename, and file creation operations into database commands instead. Then you only have one set of data you need to worry about the integrity of, and the web app and the file layout stay automatically in sync (no need for a daemon).

answered Nov 03 '22 22:11

Evan

Related questions
                            
                                Can anyone suggest a good client-side architecture and structure for large scale web applications? [closed]
                            
                                Multiple microservices in one repository
                            
                                SOA, Request/Response service layer, accepting and returning a request/response vs an array or requests/responses?
                            
                                Spring Prototype Beans and Benefits of Spring
                            
                                Adding Google_analytics for iOS is not working tried many ways?
                            
                                Design from the database first through to UI or t'other way round?
                            
                                What is the best way to organize JS code in webapps where the main "meat" is still server side?
                            
                                MVVM generic networking architecture
                            
                                How to access mysql outside my kubernetes cluster?
                            
                                Well designed / high-quality open source software [closed]
                            
                                The architecture of a music player with playlists, using Rails, Redis and HTML5
                            
                                Component based entity system in scala
                            
                                What are the best practices in language interoperability? [closed]
                            
                                The best way to separate admin functionality from a public site?
                            
                                Using Ninject in a SOLID application architecture
                            
                                How to client-side validation and server-side validation in sync?
                            
                                Solr safe dataimport and core swap on high-traffic website
                            
                                java framework for aggregation and sliding windows implementation [closed]
                            
                                Conform to Protocol and Keep Property Private

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Software-design & architecture: How to sync data from a directory-tree with a database

Tags:

architecture

software-design

modeling

data-synchronization