(spoiler) debazhil, disassembled and came to the conclusion that the problem is in the SSE instructionsHi, Habr!
It all started when I wrote the Load Java test for the internal component of the system I am working on now. The test created several threads and tried to do something many times. In the process of execution,
java.lang.ArrayIndexOutOfBoundsException sometimes appeared
: 0 errors on a line very similar to this:
"test".getBytes(StandardCharsets.UTF_8)
The line was of course different, but after a little study, I managed to find the problem in it. As a result, the JMH benchmark was written:
@Benchmark public byte[] originalTest() { return "test".getBytes(StandardCharsets.UTF_8); }
Which fell after a few seconds of working with the following exception:
java.lang.ArrayIndexOutOfBoundsException: 0 at sun.nio.cs.UTF_8$Encoder.encode(UTF_8.java:716) at java.lang.StringCoding.encode(StringCoding.java:364) at java.lang.String.getBytes(String.java:941) at org.sample.MyBenchmark.originalTest(MyBenchmark.java:41) at org.sample.generated.MyBenchmark_originalTest.originalTest_thrpt_jmhLoop(MyBenchmark_originalTest.java:103) at org.sample.generated.MyBenchmark_originalTest.originalTest_Throughput(MyBenchmark_originalTest.java:72) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.openjdk.jmh.runner.LoopBenchmarkHandler$BenchmarkTask.call(LoopBenchmarkHandler.java:210) at org.openjdk.jmh.runner.LoopBenchmarkHandler$BenchmarkTask.call(LoopBenchmarkHandler.java:192) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
I have never come across this before, so I tried some trivial solutions, like updating the JVM and restarting the computer, but this, of course, did not help. The problem occurred on my MacBook Pro (13-inch, 2017) 3.5 GHz Intel Core i7 and was not repeated on the machines of my colleagues. Not finding other factors, I decided to study the code further.
The problem occurred inside the StringCoding JVM class in the encode () method:
private static int scale(int len, float expansionFactor) {
In rare cases, the array ba was created with a length of 0 elements, and this caused an error later on.
I tried removing the dependency on UTF_8, but it didn't work out. The dependence had to be left, otherwise the problem was not reproduced, but it turned out to remove a lot of excess:
private static int encode() { return (int) ((double) StandardCharsets.UTF_8.newEncoder().maxBytesPerChar()); }
maxBytesPerChar returns a constant from the final field equal to 3.0, but the method itself in rare cases (1 per 1,000,000,000) returned 0. It was doubly strange that removing the caste in a double method worked as it should in all cases.
I added the JIT compiler option -XX: -TieredCompilation and -client but it had no effect. As a result, I compiled hsdis-amd64.dylib for Mac, added -XX options: PrintAssemblyOptions = intel, -XX: CompileCommand = print, * MyBenchmark.encode and -XX: CompileCommand = dontinline, * MyBenchmark.encode and started comparing the generated JIT ' om assembler for a method with a cast in double and without:
: 0x000000010a44e3ca: mov rbp,rax ;*synchronization entry ; - sun.nio.cs.UTF_8$Encoder::<init>@-1 (line 558) ; - sun.nio.cs.UTF_8$Encoder::<init>@2 (line 554) ; - sun.nio.cs.UTF_8::newEncoder@6 (line 72) ; - org.sample.MyBenchmark::encode@3 (line 50) 0x000000010a44e3cd: movabs rdx,0x76ab16350 ; {oop(a 'sun/nio/cs/UTF_8')} 0x000000010a44e3d7: vmovss xmm0,DWORD PTR [rip+0xffffffffffffff61] # 0x000000010a44e340 ; {section_word} 0x000000010a44e3df: vmovss xmm1,DWORD PTR [rip+0xffffffffffffff5d] # 0x000000010a44e344 ; {section_word} 0x000000010a44e3e7: mov rsi,rbp 0x000000010a44e3ea: nop 0x000000010a44e3eb: call 0x000000010a3f40a0 ; OopMap{rbp=Oop off=144} ;*invokespecial <init> ; - sun.nio.cs.UTF_8$Encoder::<init>@6 (line 558) ; - sun.nio.cs.UTF_8$Encoder::<init>@2 (line 554) ; - sun.nio.cs.UTF_8::newEncoder@6 (line 72) ; - org.sample.MyBenchmark::encode@3 (line 50) ; {optimized virtual_call} 0x000000010a44e3f0: mov BYTE PTR [rbp+0x2c],0x3f ;*new ; - sun.nio.cs.UTF_8::newEncoder@0 (line 72) ; - org.sample.MyBenchmark::encode@3 (line 50) 0x000000010a44e3f4: vcvtss2sd xmm0,xmm0,DWORD PTR [rbp+0x10] 0x000000010a44e3f9: vcvttsd2si eax,xmm0 0x000000010a44e3fd: cmp eax,0x80000000 0x000000010a44e403: jne 0x000000010a44e414 0x000000010a44e405: sub rsp,0x8 0x000000010a44e409: vmovsd QWORD PTR [rsp],xmm0 0x000000010a44e40e: call Stub::d2i_fixup ; {runtime_call} 0x000000010a44e413: pop rax ;*d2i ; - org.sample.MyBenchmark::encode@10 (line 50) 0x000000010a44e414: add rsp,0x20 0x000000010a44e418: pop rbp : 0x000000010ef7e04a: mov rbp,rax ;*synchronization entry ; - sun.nio.cs.UTF_8$Encoder::<init>@-1 (line 558) ; - sun.nio.cs.UTF_8$Encoder::<init>@2 (line 554) ; - sun.nio.cs.UTF_8::newEncoder@6 (line 72) ; - org.sample.MyBenchmark::encode@3 (line 50) 0x000000010ef7e04d: movabs rdx,0x76ab16350 ; {oop(a 'sun/nio/cs/UTF_8')} 0x000000010ef7e057: vmovss xmm0,DWORD PTR [rip+0xffffffffffffff61] # 0x000000010ef7dfc0 ; {section_word} 0x000000010ef7e05f: vmovss xmm1,DWORD PTR [rip+0xffffffffffffff5d] # 0x000000010ef7dfc4 ; {section_word} 0x000000010ef7e067: mov rsi,rbp 0x000000010ef7e06a: nop 0x000000010ef7e06b: call 0x000000010ef270a0 ; OopMap{rbp=Oop off=144} ;*invokespecial <init> ; - sun.nio.cs.UTF_8$Encoder::<init>@6 (line 558) ; - sun.nio.cs.UTF_8$Encoder::<init>@2 (line 554) ; - sun.nio.cs.UTF_8::newEncoder@6 (line 72) ; - org.sample.MyBenchmark::encode@3 (line 50) ; {optimized virtual_call} 0x000000010ef7e070: mov BYTE PTR [rbp+0x2c],0x3f ;*new ; - sun.nio.cs.UTF_8::newEncoder@0 (line 72) ; - org.sample.MyBenchmark::encode@3 (line 50) 0x000000010ef7e074: vmovss xmm1,DWORD PTR [rbp+0x10] 0x000000010ef7e079: vcvttss2si eax,xmm1 0x000000010ef7e07d: cmp eax,0x80000000 0x000000010ef7e083: jne 0x000000010ef7e094 0x000000010ef7e085: sub rsp,0x8 0x000000010ef7e089: vmovss DWORD PTR [rsp],xmm1 0x000000010ef7e08e: call Stub::f2i_fixup ; {runtime_call} 0x000000010ef7e093: pop rax ;*f2i ; - org.sample.MyBenchmark::encode@9 (line 50) 0x000000010ef7e094: add rsp,0x20 0x000000010ef7e098: pop rbp
One of the differences was the presence of instructions vcvtss2sd and vcvttsd2si. I switched to C ++ and decided to reproduce the sequence in inline asm, but during the debugging process it turned out that the clang compiler with the -O0 option uses the cvtss2sd instruction when comparing float! = 1.0. As a result, it all came down to the compare function:
bool compare() { float val = 1.0; return val != 1.0; }
And in rare cases this function returned false. I wrote a small wrapper to count the percentage of erroneous executions:
int main() { int error = 0; int secondCompareError = 0; for (int i = 0; i < INT_MAX; i++) { float result = 1.0; if (result != 1.0) { error++; if (result != 1.0) { secondCompareError++; } } } std::cout << "Iterations: " << INT_MAX << ", errors: " << error <<", second compare errors: " << secondCompareError << std::endl; return 0; }
The result was the following: Iterations: 2147483647, errors: 111, second compare errors: 0. Interestingly, the repeated check never gave an error.
I disabled SSE support for clang, the compare function began to look like this:
bool compare() { float val = 1.0; return val != 1.0; }
And the problem is no longer reproduced. From this I can conclude that the SSE instruction set
does not work
well on my system.
I have been working as a programmer for more than 7 years, and I have been programming more than 16, and during this time I used to trust primitive operations. It always works and the result is always the same. To realize that comparing a float at some point can break this, of course, shock. And what can be done about it except to replace the Mac is not clear.