Is there an "embedded DBMS" to support multiple writer applications (processes) on the same db files?

Tags:

I need to know if there is any embedded DBMS (preferably in Java and not necessarily relational) which supports multiple writer applications (processes) on the same set of db files. BerkeleyDB supports multiple readers but just one writer. I need multiple writers and multiple readers.

UPDATE:

It is not a multiple connection issue. I mean I do not need multiple connections to a running DBMS application (process) to write data. I need multiple DBMS applications (processes) to commit on the same storage files.

HSQLDB, H2, JavaDB (Derby) and MongoDB do not support this feature.

I think that there may be some File System limitations that prohibit this. If so, is there a File System that allows multiple writers on a single file?

Use Case: The use case is a high-throughput clustered system that intends to store its high-volume business log entries into a SAN storage. Storing business logs in separate files for each server does not fit because query and indexing capabilities are needed on the whole biz logs.

Because "a SAN typically is its own network of storage devices that are generally not accessible through the regular network by regular devices", I want to use SAN network bandwidth for logging while cluster LAN bandwidth is being used for other server to server and client to server communications.

653

asked Jan 17 '10 21:01

Amir Moghimi

2 Answers

You're basically out of luck, unless you somehow change your requirements.

First, specifically on Unix systems, there's nothing to stop multiple processes from writing to the same files. On a SINGLE SYSTEM, this won't be a problem whatsoever, you'll just have a typical race condition should two or more writes conflict over the same space in the file as to which will actually get written. Since it's on a single system, this had perfect resolution, at the byte level.

So, the game in terms of having multiple processes writing to the same file is how do those processes coordinate? How to they ensure that they don't walk on each other. In, again, Unix, there is an OS based locking mechanism that can be used to prevent that, but typically most systems implement a central server to and coordinate all of their write through that system, and it then writes to the disk while mitigating and handling any conflicts.

Your problem is two fold.

One, you're suggesting that the independent log processes will not cooperate, that they will not share information and coordinate their writes to the volume. That throws a wrench (a big wrench) in to the works right there.

Second, you propose having not only multiple processes write to the same volume, but that the volume they are writing to is shared over a SAN. That's another wrench.

Unlike NFS, SANs don't support "file systems". Rather they support "storage". Basically block level devices. SANs, once you get passed a bunch of volume management shenanigans, are actually pretty "stupid" from the OSs point of view.

I'm pretty sure you can actually have a volume mounted on multiple machines, but I'm not sure more than one can actually WRITE to the device. There are good reasons for this.

Simply, SANs are block level storage. A block being, say, 4K bytes. That's the "atomic" unit of work for the SAN. Want to change a single byte of data? Read a 4K block from the SAN, change your byte, and write the 4k block back.

If you have several machines thinking that they have "universal" access to the SAN storage, and are treating it as a file system, you have a corrupted, ruined file system. It's that simple. The machines will write what they think the blocks should look like while the other machines and smashing it with their local version. Disaster. Ruin. Not happy.

Even getting one machine to write to a SAN while another reads from it is tricky. It's also slow, as the reader can make few assumptions about the contents of the disk, so it needs to read, and re-read blocks (it can't cache anything, like file system TOCs, etc, as, well, they're changing behind it's back due to the activity of the writer -- so, read it again...and again...).

Things like NFS "solve" this problem because you no longer work with raw storage. Rather you work with an actual filesystem.

Finally, there's nothing wrong with having independent log files being streamed out from your servers. They can still be queried. You simply have to repeat the queries and consolidate the results.

If you have 5 machines streaming, and you want "all activity between 12:00pm and 12:05pm", then make 5 queries, one to each log store, and consolidate the results. As for how to efficiently query your log data, that's an indexing problem, and not insurmountable depending on how you query. If you query by time, then create files by time (every minute, every hour, whatever), and scan them. If your system is "read rarely", this isn't a big deal. If you need more sophisticated indexing, then you'll need to come up with something else.

You could use a database to write the files, and indexes, but I doubt you'll find many that enjoy reading from files that they don't control, or that change underneath them.

CouchDB might work, or something similar, because of its specific crash resistant, always consistent database format. It's datafile is always readable by a database instance. That could be an option for you.

But I would still do multiple queries and merge them.

116

answered Sep 20 '22 08:09

Will Hartung

Have individual processes stream it's logs to separate files/DB's to avoid corruption. Then you can have a daemon process which asynchronously reads from all this files/DB's and writes to a single consolidated file/DB. When you have to index or query logs, do it on consolidated file/DB. The only issue here is that the you won't have real time access to your logs. There will be some lag until logs are consolidated.

Updated details below:

Process-1 to Process-N writes in log-1 to log-N
Asyncronously, Process-X reads from log-1 to log-N and consolidates it to a single log-X (whole biz log)
Optionally Process-X can delete log-1 to log-N while it populates log-X to free up the space
Log Reader uses log-X (whole biz log) with query and indexing capabilities



    -------------     -------------           -------------  
    |           |     |           |           |           |  
    | Process-1 |     | Process-2 |    ...    | Process-N |  
    |           |     |           |           |           |  
    -------------     -------------           -------------  
          |                 |                       |        
          |                 |                       |        
          V                 V                       V        
      ( log-1 )         ( log-2 )      ...      ( log-N )    
            \                \                    /          
             \                \                  /           
              - - - -  \      |         / - - - -            
                        \     |        /                     
                          \   |     -                        
                            \ |  /                           
                             |||                             
                             VVV                             
                       -------------                         
                       |           |                         
                       | Process-X |                         
                       |           |                         
                       -------------                         
                             |                               
                             V                               
                             V               --------------  
                             V               |            |  
                         ( log-X )  ------>>>| Log Reader |  
                                             |            |  
                                             --------------

answered Sep 19 '22 08:09

Gladwin Burboz

Related questions
                            
                                ClassNotFoundException FreeMarkerConfigurationFactory
                            
                                BigInteger: count the number of decimal digits in a scalable method
                            
                                Add ButtonGroup to JPanel
                            
                                How does subtracting the character '0' from a char change it into an int?
                            
                                How do I override default Answers on a Mockito mock?
                            
                                Can I convert an artifactId to a classname prefix in my maven archetype?
                            
                                How To Delete Keywords Separately by Capital Letters (Ctrl+Backspace) in IntelliJ IDEA
                            
                                Element is present but `Set.contains(element)` returns false
                            
                                java.util.Date is not supported
                            
                                How to collect DoubleStream to List
                            
                                How to put log file in user home directory in portable way in logback?
                            
                                Sub-Resources in spring REST
                            
                                Error:null value in entry: streamOutputFolder=null [closed]
                            
                                ConcurrentHashMap and Fibonacci Numbers - Inconsistent result
                            
                                Reduce operation on custom object in java
                            
                                What the actual meaning `Caused by: java.lang.RuntimeException: view must have a tag`?
                            
                                Does closing the BufferedReader/PrintWriter close the socket connection?
                            
                                How to create Java webapp installer (.exe) that includes Tomcat and MySQL?"
                            
                                Call trace in java
                            
                                Java: Alternative to iterator.hasNext() if using for-each to loop over a collection

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there an "embedded DBMS" to support multiple writer applications (processes) on the same db files?

Tags:

java

file

database

Amir Moghimi

People also ask

2 Answers

Will Hartung

Gladwin Burboz

Recent Activity

Donate For Us