There are a number of settings that affect the way the JVM allocates memory and the behavior of garbage collection in J2SE 1.4.1. I have attempted to ascertain the impact of various settings on NetBeans.
What was tested
A script was run which would copy a given set of settings
to NetBeans ide.cfg file, and then run NetBeans with the
argument -Dnetbeans.close=true
which shuts the
IDE down after the main window appears. Each set of settings
was run 8 times and the variation determined (the bracket at
the top of each graph bar indicates the standard deviation).
These tests only asses the impact on NetBeans' startup performance. Attempts were made to run NetBeans using the qa-functional tests that drive NetBeans UI. After several runs, the Ant process (which is lightweight - all processing is done after the the tests have been completed, on log files generated by NetBeans) failed with OutOfMemoryExceptions on all platforms. Further investigation is needed.
These tests were run on four machines - a Dell machine with 384Mb RAM and an 800Mhz processor (Linux and Windows), a Sun Ultra 60 running Solaris - 450Mhz and 512Mb RAM, and a Sony VAIO picturebook running Linux with 128Mb RAM. Note that the results on the low end laptop are quite different than those elsewhere. Further testing on underpowered machines is indicated.
The IDE version tested was a checkout of the NetBeans trunk from shortly after 3.4 release.
What was measured and its reliability
A set of 34 different combinations of JVM settings were
developed. A subset of them is presented in the reports, since
those with extremely poor performance results were culled.
Since it was not possible to test runtime performance during
a sustained run of the IDE, all settings which produced a
full garbage collection event during startup were culled
from the results (with the exception of the set "baseline"
which is NetBeans 3.4's out-of-the-box settings).
The following metrics were measured (all averaged across 8 runs of each set of settings):
Each of these metrics was graphed and the results charted and detailed here.
What numbers are good numbers?
Session duration is probably the least interesting metric,
for two reasons: First, it has the highest standard deviation,
across the board, of any of the numbers measured. This is
in part because it logged by the Ant script driving the tests,
which has its own inherent jitter due to garbage collection.
The primary goals of tuning garbage collection settings are to:
Probably the most interesting metric is average gc cycle duration. Keeping that number down means fewer user-perceptible pauses.
What information do we still need to make informed
recommendations?
Note that absent from these tests is any information on
old-generation collection - without more memory-intensive tests
of NetBeans, it is impossible to collect such information. Once
a tool is available to drive NetBeans UI in memory-intensive
activities such as code completion and popup-javadoc (without causing
the ant process to run out of memory), the
recommendations can be refined to cover what settings produce the
best results for minimizing and distributing old-generation collections.
Summary of Results
All tests were done using the same build of the NetBeans IDE, from the trunk shortly after NetBeans 3.4 release.
Naturally, there is no magic bullet across diverse systems - even Linux and Windows on the same machine give slightly different results in terms of best performance (test 17 below repeatedly came in first on by session time on Linux and second on Windows, but produces an average garbage collection duration dramatically lower).
Across the board, one useful thing to include is
-XX:PermSize=20M
- this sets the permanent area
memory size (where classes are stored) on NetBeans' startup
and eliminates that area being grown during startup. Simulating
the promoteall modifier further helps - using
-XXMaxTenuringThreshold=0
reduces garbage
collection cycles by causing promotion of objects directly
from the new area to the old area (without using the two
survivor areas), thus eliminating two memory copies. Whether
this has a deleterious effect on old generation gc's remains
to be seen.
What are thus far indicated to be the most effective settings are listed below. Follow the links to read the generated report and resulting charts.
-XX:TargetSurvivorRatio=1 -Xverify:none -XX:SurvivorRatio=2 -XX:+UseParallelGC -XX:PermSize=20M -XX:MaxTenuringThreshold=0 -XX:MaxNewSize=32M -XX:NewSize=32M -Xmx96m -Xms96m
- total time startup to shutdown 13.61 seconds (baseline with NetBeans
3.4 settings 16 seconds), average garbage
collection duration 62 miliseconds (baseline 13ms minor, 323ms major).
-XX:TargetSurvivorRatio=1 -Xverify:none
-XX:SurvivorRatio=2 -XX:+UseParallelGC -XX:PermSize=20M
-XX:MaxTenuringThreshold=0 -XX:MaxNewSize=32M -XX:NewSize=32M -Xmx96m
-Xms96m
- total time startup to shutdown 18.5 milliseconds (baseline with
NetBeans 3.4 settings is 20.4), average garbage collection duration
68.03 milliseconds (baseline 15ms minor, 415ms major)
-XX:TargetSurvivorRatio=1 -Xverify:none
-XX:SurvivorRatio=2 -XX:+UseParallelGC -XX:PermSize=20M
-XX:MaxTenuringThreshold=0 -XX:MaxNewSize=32M -XX:NewSize=32M -Xmx96m
-Xms96m
- total time startup to shutdown 18.5 milliseconds (baseline with
NetBeans 3.4 settings is 20.4), average garbage collection duration
68.03 milliseconds (baseline 15ms minor, 415ms major)
test8: -XX:TargetSurvivorRatio=1 -Xverify:none -XX:PermSize=20M
-XX:NewSize=32M -Xmx96m -Xms96m - 22950 milliseconds
- this test also performed best on Windows for session length,
but produced almost double the average garbage collection time.
Nonetheless, it may represent a better option for a single set
of cross platform settings.
Anomalies in the results
There were a few surprises in the results that are awaiting
explanation: All of the testing machines
were single-processor machines, but in most cases showed benefits
to running with the -XX+UseParallelGC
setting. This
garbage collector is described as being designed for gigabyte
heaps and multiple processors, which makes this result surprising.
Further, the Survivor Ratio and Target Survivor Ratio settings should be no-ops when using Parallel GC. Indeed, the resulting numbers are very similar. What is interesting is that when these numbers are not set, the standard deviation for all of the numbers measured increases by about six times - the range of values dramatically increases. It is unclear why this should be the case.
Also note that the numbers for Windows may not be reliable:
There is a bug which only appears there, that when Windows is shut
down with the -Dnetbeans.close=true
flag, all of the
components in the main window dock themselves into new frames a
split second before the process exists. Since there is a real cost
to acquiring these frames from the operating system, and it does
not happen with all settings, this maybe impacting the results.
Particularly surprising was that the Parallel Scavange collector
-XX:UseParallelGC
seemed to produce the smoothest
results on Windows and Linux. This collector is described as
optimized for machines with gigabyte-sized heaps, but in these
preliminary tests, produced the best numbers.