How to make use of the filesystem cache in Java or Python?

Tags:

A recent blog post on Elasticsearch website is talking about the features of their new 1.4 beta release.

I am very curious about how they make use of the filesystem cache:

Recent releases have added support for doc values. Essentially, doc values provide the same function as in-memory fielddata, but they are written to disk at index time. The benefit that they provide is that they consume very little heap space. Doc values are read from disk, instead of from memory. While disk access is slow, doc values benefit from the kernel’s filesystem cache. The filesystem cache, unlike the JVM heap, is not constrained by the 32GB limit. By shifting fielddata from the heap to the filesystem cache, you can use smaller heaps which means faster garbage collections and thus more stable nodes.

Before this release, doc values were significantly slower than in-memory fielddata. The changes in this release have improved the performance significantly, making them almost as fast as in-memory fielddata.

Does this mean that we can manipulate the behavior of filesystem cache instead of waiting for the effect from the OS passively? If it is the case, how can we make use of the filesystem cache in normal application developement? Say, if I'm writing a Python or Java program, how can I do this?

523

asked Oct 29 '14 03:10

shihpeng

1 Answers

File-system cache is an implementation detail related to OS inner workings that is transparent to the end user. It isn't something that needs adjustments or changes. Lucene already makes use of the file-system cache when it manages the index segments. Every time something is indexed into Lucene (via Elasticsearch) those documents are written to segments, which are first written to the file-system cache and then, after some time (when the translog - a way of keeping track of documents being indexed - is full for example) the content of the cache is written to an actual file. But, while the documents to be indexed are in file-system cache, they can still be accessed.

This improvement in doc values implementation refers to this feature as being able to use the file-system cache now, as they are read from disk, put in cache and accessed from there, instead of taking up Heap space.

How this file-system cache is being accessed is described in this excellent blog post:

In our previous approaches, we were relying on using a syscall to copy the data between the file system cache and our local Java heap. How about directly accessing the file system cache? This is what mmap does!

Basically mmap does the same like handling the Lucene index as a swap file. The mmap() syscall tells the O/S kernel to virtually map our whole index files into the previously described virtual address space, and make them look like RAM available to our Lucene process. We can then access our index file on disk just like it would be a large byte[] array (in Java this is encapsulated by a ByteBuffer interface to make it safe for use by Java code). If we access this virtual address space from the Lucene code we don’t need to do any syscalls, the processor’s MMU and TLB handles all the mapping for us. If the data is only on disk, the MMU will cause an interrupt and the O/S kernel will load the data into file system cache. If it is already in cache, MMU/TLB map it directly to the physical memory in file system cache.

Related to the actual means of using mmap in a Java program, I think this is the class and method to do so.

161

answered Oct 29 '22 17:10

Andrei Stefan

Related questions
                            
                                In proguard, what is the keyword to preserve package/default access variables and methods?
                            
                                Mock Files with PowerMockito
                            
                                How can I clear a dynamically compiled class from memory
                            
                                Java grammar definition completeness
                            
                                Parsing HTML into formatted plaintext using jsoup
                            
                                App-Engine throws NullPointerException on getByObjectId call of an entity that has been updated
                            
                                Using a custom hk2 InjectionResolver to inject application configuration
                            
                                Which information from imported class is stored inside compiled class after compilation?
                            
                                deactivate a variable after use
                            
                                Using the WildFly application server with NetBeans IDE
                            
                                How to keep a jar file external but still use its classes in my Android project?
                            
                                How do i create a key store file in android studio?
                            
                                When a class is loaded in JVM
                            
                                How to @Provide an Activity for the MortarActivityScope, without leaking the Activity on orientation changes?
                            
                                Maven and JavaDoc: install additional (generated) files
                            
                                How to create automatic path on a java application
                            
                                Synchronized Classloader calls from Hibernate
                            
                                Java 7 try-with-resources using spring transactions results in connection closed when committing
                            
                                Button ClickListener is not working in LibGDX game
                            
                                Adding an annotation to a runtime generated method/class using Javassist

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to make use of the filesystem cache in Java or Python?

Tags:

java

performance

python

caching

elasticsearch

shihpeng

People also ask

1 Answers

Andrei Stefan

Recent Activity

Donate For Us