Memory barriers guarantee that the data cache will be consistent. However, does it guarantee that the TLB will be consistent?
I am seeing a problem where the JVM (java 7 update 1) sometimes crashes with memory errors (SIGBUS, SIGSEG) when passing a MappedByteBuffer between threads.
e.g.
final AtomicReference<MappedByteBuffer> mbbQueue = new AtomicReference<>(); // in a background thread. MappedByteBuffer map = raf.map(MapMode.READ_WRITE, offset, allocationSize); Thread.yield(); while (!inQueue.compareAndSet(null, map)); // the main thread. (more than 10x faster than using map() in the same thread) MappedByteBuffer mbb = inQueue.getAndSet(null);
Without the Thread.yield() I occasionally get crashes in force(), put(), and C's memcpy() all indicating I am trying to access memory illegally. With the Thread.yield() I haven't had a problem, but that doesn't sound like a reliable solution.
Has anyone come across this problem? Are there any guarantees about TLB and memory barriers?
EDIT: The OS is Centos 5.7, I have seen the behaviour on i7 and a Dual Xeon machines.
Why do I do this? Because the average time to write a message is 35-100 ns depending on length and using a plain write() isn't as fast. If I memory map and clean up in the current thread this takes 50-130 microseconds, using a background thread to do it takes about 3-5 microseconds for the main thread to swap buffers. Why do I need to be swapping buffers at all? Because I am writing many GB of data and ByteBuffer cannot be 2+ GB in size.
Data Memory Barrier (DMB) prevents reordering of data accesses instructions across the DMB instruction.
The memory barrier instructions halt execution of the application code until a memory write of an instruction has finished executing. They are used to ensure that a critical section of code has been completed before continuing execution of the application code.
In computing, a memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction.
The mapping is done via mmap64 (FileChannel.map). When the address is accessed there will be a page fault and the kernel shall read/write there for you. TLB doesn't need to be updated during mmap.
TLB (of all cpus) is unvalidated during munmap which is handled by the finalization of the MappedByteBuffer, hence munmap is costly.
Mapping involves a lot synchronization so the address value shall not be corrupted.
Any chance you try fancy stuff via Unsafe?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With