问题
synchronized(同步-重量级锁)会解除所有编译器优化吗?
基础知识
使用当前的 Java 内存模型,未观察到的锁不一定会产生任何内存效应。除其他情况外,这意味着对非共享对象进行同步是徒劳的,因此运行时不必在那里做任何事情。这给编译优化提供了优化的机会。
因此,如果逃逸分析发现对象是非逃逸的,编译器就可以自由地消除同步。
实验
测试用例
在新对象上使用和不使用同步来增加值。
源码
import org.openjdk.jmh.annotations.*;
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(3)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
public class LockElision {
int x;
@Benchmark
public void baseline() {
x++;
}
@Benchmark
public void locked() {
synchronized (new Object()) {
x++;
}
}
}
通过-prof perfnorm进行执行上述测试用例,执行结果如下:
Benchmark Mode Cnt Score Error Units
LockElision.baseline avgt 15 0.268 ± 0.001 ns/op
LockElision.baseline:CPI avgt 3 0.200 ± 0.009 #/op
LockElision.baseline:L1-dcache-loads avgt 3 2.035 ± 0.101 #/op
LockElision.baseline:L1-dcache-stores avgt 3 ≈ 10⁻³ #/op
LockElision.baseline:branches avgt 3 1.016 ± 0.046 #/op
LockElision.baseline:cycles avgt 3 1.017 ± 0.024 #/op
LockElision.baseline:instructions avgt 3 5.076 ± 0.346 #/op
LockElision.locked avgt 15 0.268 ± 0.001 ns/op
LockElision.locked:CPI avgt 3 0.200 ± 0.005 #/op
LockElision.locked:L1-dcache-loads avgt 3 2.024 ± 0.237 #/op
LockElision.locked:L1-dcache-stores avgt 3 ≈ 10⁻³ #/op
LockElision.locked:branches avgt 3 1.014 ± 0.047 #/op
LockElision.locked:cycles avgt 3 1.015 ± 0.012 #/op
LockElision.locked:instructions avgt 3 5.062 ± 0.154 #/op
测试结果完全相同:时间相同,加载、存储、周期、指令的数量相同。很有可能,这意味着生成的代码是相同的。查看汇编代码,如下所示:
14.50% 16.97% ↗ incl 0xc(%r8) ; increment field
76.82% 76.05% │ movzbl 0x94(%r9),%r10d ; JMH infra: do another @Benchmark
0.83% 0.10% │ add $0x1,%rbp
0.47% 0.78% │ test %eax,0x15ec6bba(%rip)
0.47% 0.36% │ test %r10d,%r10d
╰ je BACK
锁被完全省略了,分配和同步都消失了。如果我们在运行时添加JVM参数:-XX:-EliminateLocks ,或者使用-XX:-DoEscapeAnalysis禁用 EA(这会破坏所有依赖于 EA 的优化,包括锁省略),那么locked计数器就会膨胀,并显示分配和简单同步的成本,执行结果如下:
Benchmark Mode Cnt Score Error Units
LockElision.baseline avgt 15 0.268 ± 0.001 ns/op
LockElision.baseline:CPI avgt 3 0.200 ± 0.001 #/op
LockElision.baseline:L1-dcache-loads avgt 3 2.029 ± 0.082 #/op
LockElision.baseline:L1-dcache-stores avgt 3 0.001 ± 0.001 #/op
LockElision.baseline:branches avgt 3 1.016 ± 0.028 #/op
LockElision.baseline:cycles avgt 3 1.015 ± 0.014 #/op
LockElision.baseline:instructions avgt 3 5.078 ± 0.097 #/op
LockElision.locked avgt 15 11.590 ± 0.009 ns/op
LockElision.locked:CPI avgt 3 0.998 ± 0.208 #/op
LockElision.locked:L1-dcache-loads avgt 3 11.872 ± 0.686 #/op
LockElision.locked:L1-dcache-stores avgt 3 5.024 ± 1.019 #/op
LockElision.locked:branches avgt 3 9.027 ± 1.840 #/op
LockElision.locked:cycles avgt 3 44.236 ± 3.364 #/op
LockElision.locked:instructions avgt 3 44.307 ± 9.954 #/op
总结
锁省略是逃逸分析启用的另一项优化,它删除了一些多余的同步。当内部同步实现没有逃逸到野外时,这尤其有益:然后,我们可以完全放弃同步!