I am experimenting with Unsafe to iterate over memory instead of iterating over the values in a byte[]. A memory block is allocated using unsafe. The memory is sufficient to hold 65536 byte values.
I AM TRYING THIS:
char aChar = some character
if ((byte) 0 == (unsafe.getByte(base_address + aChar) & mask)){
// do something
}
INSTEAD OF:
char aChar = some character
if ((byte) 0 == ( lookup[aChar] & mask )){
// do something
}
I thought Unsafe could access the memory faster than using a regular array access with the index check it does for each index...
It was only wishful thinking that the jvm would have a special op (unsafe) that would somehow make regular array access and iteration faster. The jvm, it seems to me, works fine with normal byte[] iterations and does them, fast as can be, using normal, unadulterated, vanilla java code.
@millimoose hits the proverbial 'nail on the head'
"Unsafe might be useful for a lot of things, but this level of microoptimisation isn't one of them. – millimoose"
Using Unsafe is faster in a very strict limited set of circumstances:
Using Unsafe (in my tests) is slower:
slower even if server
jvm option is invoked
Unsafe is slower from 9% or more ( 1_GB array and UnsafeLookup_8B(fastest one) in code below on 32 bit jvm (64bit was even slower??))
Is there some reason for this?**
When I run the code yellowB posted below (checks a 1GB byte[]), the normal is also still the fastest:
C:\Users\wilf>java -Xms1600m -Xprof -jar "S:\wilf\testing\dist\testing.jar"
initialize data...
initialize data done!
use normalLookup()...
Not found '0'
time : 1967737 us.
use unsafeLookup_1B()...
Not found '0'
time : 2923367 us.
use unsafeLookup_8B()...
Not found '0'
time : 2495663 us.
Flat profile of 26.35 secs (2018 total ticks): main
Interpreted + native Method
0.0% 1 + 0 test.StackOverflow.main
0.0% 1 + 0 Total interpreted
Compiled + native Method
67.8% 1369 + 0 test.StackOverflow.main
11.7% 236 + 0 test.StackOverflow.unsafeLookup_8B
11.2% 227 + 0 test.StackOverflow.unsafeLookup_1B
9.1% 184 + 0 test.StackOverflow.normalLookup
99.9% 2016 + 0 Total compiled
Stub + native Method
0.0% 0 + 1 sun.misc.Unsafe.getLong
0.0% 0 + 1 Total stub
Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM
Thread-local ticks:
100.0% 1 Blocked (of total)
Global summary of 26.39 seconds:
100.0% 2023 Received ticks
C:\Users\wilf>java -version
java version "1.7.0_07"
Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
Java HotSpot(TM) Client VM (build 23.3-b01, mixed mode, sharing)
CPU is: Intel Core 2 Duo E4600 @ 2.4GHZ 4.00GB (3.25GB usable) OS: Windows 7 (32)
Running the test on an 4 Core AMD64 with Windows 7_64, 32 bit java:
initialize data...
initialize data done!
use normalLookup()...
Not found '0'
time : 1631142 us.
use unsafeLookup_1B()...
Not found '0'
time : 2365214 us.
use unsafeLookup_8B()...
Not found '0'
time : 1783320 us.
Running the test on an 4 Core AMD64 with Windows 7_64, 64 bit java:
use normalLookup()...
Not found '0'
time : 655146 us.
use unsafeLookup_1B()...
Not found '0'
time : 904783 us.
use unsafeLookup_8B()...
Not found '0'
time : 764427 us.
Flat profile of 6.34 secs (13 total ticks): main
Interpreted + native Method
23.1% 3 + 0 java.io.PrintStream.println
23.1% 3 + 0 test.StackOverflow.unsafeLookup_8B
15.4% 2 + 0 test.StackOverflow.main
7.7% 1 + 0 java.io.DataInputStream.<init>
69.2% 9 + 0 Total interpreted
Compiled + native Method
7.7% 0 + 1 test.StackOverflow.unsafeLookup_1B
7.7% 0 + 1 test.StackOverflow.main
7.7% 0 + 1 test.StackOverflow.normalLookup
7.7% 0 + 1 test.StackOverflow.unsafeLookup_8B
30.8% 0 + 4 Total compiled
Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM
Thread-local ticks:
100.0% 1 Blocked (of total)
Global summary of 6.35 seconds:
100.0% 14 Received ticks
42.9% 6 Compilation
I think the two functions you posted are basically the same because they read only 1 byte and then convert it to int and do the futher comparing.
Reading 4-Byte int or 8-Byte long every time is much more effective.I wrote two function to do the same thing:compare the content of two byte[] to see if they are the same:
function 1:
public static boolean hadoopEquals(byte[] b1, byte[] b2)
{
if(b1 == b2)
{
return true;
}
if(b1.length != b2.length)
{
return false;
}
// Bring WritableComparator code local
for(int i = 0;i < b1.length; ++i)
{
int a = (b1[i] & 0xff);
int b = (b2[i] & 0xff);
if (a != b)
{
return false;
}
}
return true;
}
function 2:
public static boolean goodEquals(byte[] b1,byte[] b2)
{
if(b1 == b2)
{
return true;
}
if(b1.length != b2.length)
{
return false;
}
int baseOffset = UnSafe.arrayBaseOffset(byte[].class);
int numLongs = (int)Math.ceil(b1.length / 8.0);
for(int i = 0;i < numLongs; ++i)
{
long currentOffset = baseOffset + (i * 8);
long l1 = UnSafe.getLong(b1, currentOffset);
long l2 = UnSafe.getLong(b2, currentOffset);
if(0L != (l1 ^ l2))
{
return false;
}
}
return true;
}
I ran these two functions on my laptop(corei7 2630QM , 8GB DDR3 , 64bit win 7 , 64bit Hotspot JVM),and compare two 400MB byte[],result is below:
function 1: ~670ms
function 2: ~80ms
2 is much more faster.
So my suggestion is to read 8-byte every time and use the XOR operator(^):
long l1 = UnSafe.getLong(byteArray, offset); //8 byte
if(0L == l1 ^ 0xFF) //if the lowest byte == 0?
/* do something */
if(0L == l1 ^ 0xFF00) //if the 2nd lowest byte == 0?
/* do something */
/* go on... */
============================================================================
Hi Wilf, I use your code to make a test class as below,this class compare the speed among 3 functions in looking up the 1st 0 in a byte array:
package test;
import java.lang.reflect.Field;
import sun.misc.Unsafe;
/**
* Test the speed in looking up the 1st 0 in a byte array
* Set -Xms the same as -Xms to avoid Heap reallocation
*
* @author yellowb
*
*/
public class StackOverflow
{
public static Unsafe UnSafe;
public static Unsafe getUnsafe() throws SecurityException,
NoSuchFieldException, IllegalArgumentException,
IllegalAccessException
{
Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
theUnsafe.setAccessible(true);
Unsafe unsafe = (Unsafe) theUnsafe.get(null);
return unsafe;
}
/**
* use 'byte[index]' form to read 1 byte every time
* @param buf
*/
public static void normalLookup(byte[] buf)
{
for (int i = 0; i < buf.length; ++i)
{
if ((byte) 0 == buf[i])
{
System.out.println("The 1st '0' is at position : " + i);
return;
}
}
System.out.println("Not found '0'");
}
/**
* use Unsafe.getByte to read 1 byte every time directly from the memory
* @param buf
*/
public static void unsafeLookup_1B(byte[] buf)
{
int baseOffset = UnSafe.arrayBaseOffset(byte[].class);
for (int i = 0; i < buf.length; ++i)
{
byte b = UnSafe.getByte(buf, (long) (baseOffset + i));
if (0 == ((int) b & 0xFF))
{
System.out.println("The 1st '0' is at position : " + i);
return;
}
}
System.out.println("Not found '0'");
}
/**
* use Unsafe.getLong to read 8 byte every time directly from the memory
* @param buf
*/
public static void unsafeLookup_8B(byte[] buf)
{
int baseOffset = UnSafe.arrayBaseOffset(byte[].class);
//The first (numLongs * 8) bytes will be read by Unsafe.getLong in below loop
int numLongs = buf.length / 8;
long currentOffset = 0L;
for (int i = 0; i < numLongs; ++i)
{
currentOffset = baseOffset + (i * 8); //the step is 8 bytes
long l = UnSafe.getLong(buf, currentOffset);
//Compare each byte(in the 8-Byte long) to 0
//PS:x86 cpu is little-endian mode
if (0L == (l & 0xFF))
{
System.out.println("The 1st '0' is at position : " + (i * 8));
return;
}
if (0L == (l & 0xFF00L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 1));
return;
}
if (0L == (l & 0xFF0000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 2));
return;
}
if (0L == (l & 0xFF000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 3));
return;
}
if (0L == (l & 0xFF00000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 4));
return;
}
if (0L == (l & 0xFF0000000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 5));
return;
}
if (0L == (l & 0xFF000000000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 6));
return;
}
if (0L == (l & 0xFF00000000000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 7));
return;
}
}
//If some rest bytes exists
int rest = buf.length % 8;
if(0 != rest)
{
currentOffset = currentOffset + 8;
//Because the length of rest bytes < 8,we have to read them one by one
for(; currentOffset < (baseOffset + buf.length); ++currentOffset)
{
byte b = UnSafe.getByte(buf, (long)currentOffset);
if (0 == ((int) b & 0xFF))
{
System.out.println("The 1st '0' is at position : " + (currentOffset - baseOffset));
return;
}
}
}
System.out.println("Not found '0'");
}
public static void main(String[] args) throws SecurityException,
NoSuchFieldException, IllegalArgumentException,
IllegalAccessException
{
UnSafe = getUnsafe();
int len = 1024 * 1024 * 1024; //1G
long startTime = 0L;
long endTime = 0L;
System.out.println("initialize data...");
byte[] byteArray1 = new byte[len];
for (int i = 0; i < len; ++i)
{
byteArray1[i] = (byte) (i % 128 + 1); //No byte will equal to 0
}
//If you want to set one byte to 0,uncomment the below statement
// byteArray1[2500] = (byte)0;
System.out.println("initialize data done!");
System.out.println("use normalLookup()...");
startTime = System.nanoTime();
normalLookup(byteArray1);
endTime = System.nanoTime();
System.out.println("time : " + ((endTime - startTime) / 1000) + " us.");
System.out.println("use unsafeLookup_1B()...");
startTime = System.nanoTime();
unsafeLookup_1B(byteArray1);
endTime = System.nanoTime();
System.out.println("time : " + ((endTime - startTime) / 1000) + " us.");
System.out.println("use unsafeLookup_8B()...");
startTime = System.nanoTime();
unsafeLookup_8B(byteArray1);
endTime = System.nanoTime();
System.out.println("time : " + ((endTime - startTime) / 1000) + " us.");
}
}
And the output is:
initialize data...
initialize data done!
use normalLookup()...
Not found '0'
time : 1271781 us.
use unsafeLookup_1B()...
Not found '0'
time : 716898 us.
use unsafeLookup_8B()...
Not found '0'
time : 591689 us.
the result shows that even reading 1 byte every time by Unsafe.getByte() is much faster than iterating the byte[] regularly.And reading 8-byte long is the fastest.
I thought Unsafe could access the memory faster than using a regular array access with the index check it does for each index...
One possible reason why the range checking might not be a factor is the JIT compiler's optimizer. Since the array's size never changes, it may be possible for the optimizer to "hoist" all of the range checking and perform it once at the start of the loop.
By contrast, the JIT compiler might be unable to optimize (e.g. inline) the Unsafe.getByte() call. Or maybe the getByte
method has a read barrier ...)
However this is speculation. The way to be sure is to get the JVM to dump out the JIT-compiled native code for the two cases, and compare them instruction by instruction.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With