We want to create a relative simple document storage but there are some requirements. My idea was, that a file is scanned and handled by a separate tool/daemon when it arrives at storage immediately.
The (pseudo) DMS should provide access via NFS and Samba. As I've seen so far, pipes would be fine for passing the incoming file to some hooks. But I wondered if there's a way to create a directory as a pipe. I've seen only named pipes yet.
The process which should take any incoming file in this directory is a PHP script which should does something like MIME type guessing, CRC32 check (against value in DB)...Has anyone a hint how to do this?
EDIT: I hope it's a bit clearer with the following explanation - I'm looking for a way to provide an "endpoint" via Samba and NFS where files can be placed that are handled by virus scanning and meta process immediately (and stored finally).
Do you actually need it to be a SMB or NFS share? I thin you are probably better off writing/using some custom server code. SMB and NFS were designed to transfer normal files, quite distant from your use case
You can use FUSE to implement an intermediate Linux file system that sits on top of your real file system (the backend file system) and that performs any validation you need on the data before finally writing it to the backend. Then, you serve that file system via NFS/Samba.
Another possibility is to use the inotify API to be notified of the changes on some file system tree and perform the required operations. The problem with this approach is that the processing will be asynchronous, so malware files will be published for a short time until they are scanned and deleted.
update: ClamFS does exactly that!
If I understood you correctly, what you want to do is to provide the end users a very easy interface that is integrated to their file system. They will see an ordinary folder on their operating system, they will copy and move files, rename them, etc. Whatever they are doing on any other folder on their computers.
However, this folder will not be a real folder in the background. You want to have complete control on the operations on this folder. When they copy a file into this folder, you want a PHP Script to handle that. When they create a new folder under this special one, another PHP script will take care of it.
As salva had suggested, creating a file system interface on your own is really a good solution; but it is not a quick nor an easy one. Since you mentioned PHP as your backend, I think you want a higher level approach.
Your problem has two sides to be taken care of: client and server. On client side, you need a file system that is easy to mount as a folder on a posix system or as a drive on Windows. There are many alternatives for this, Samba and NFS being two of them as you mentioned.
You will be doing server side of it in PHP, as I understood. Considering this, I would suggest using WebDav instead of Samba or NFS. It is much easier to implement on server side. It is available on almost every modern operating system. There are even browser plugins for webdav access so you can provide multiple interfaces for your clients very easily.
On server side, if you use PHP, there is an open source php library called sabredav. With just a quick search on Google, I have even found a tutorial for this.
In this setup, you can handle files the way you want. This can be a single-machine system by embedding a web server like nginx or Apache into your system for PHP side or provided as a service running on your servers.
I hope I understood your question correctly and this is the solution you were looking for.
UPDATE: If you do not have a chance for using a different solution like I suggested and it really has to be Samba and NFS; both protocols are really too much to implement in PHP. It will be a big burden and a long term headache.
However, you can use normal NFS/Samba servers and try to monitor file updates in the background with your application. This means the special features you want to provide like file tagging or virus scanning will be available with a probably-acceptable latency. To implement this, you can go with checking all the files and folders on the system and work on the modified/new ones. An easier thing would be using servers with logging mechanisms for every action and following their logs. This can be even better than interfacing client machines directly. If you architect your system correctly, the latency between file modifications and processing them will be really reasonable even on a very large system. For this, nas4free can be a really good solution since it provides many interfaces from a single system and according to its features page, it has syslog capability.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With