Tuesday, September 15, 2009

Configure JVM Memory with –Xms when Using Jersey and JAXB

When using JAXB and Jersey make sure you configure the –Xms property to tune your JVM initial memory allocation. If you forget to do this you’ll likely end up with very poor performance caused by garbage collector thrashing.

Prior to discovering this problem I was a bit naïve about garbage collection (actually I must admit that I still am). I knew that big high performance production servers benefited from lots of memory configuration but I thought that smaller test and development environments would survive just fine mostly on JVM defaults. For these smaller environments I would just set –Xmx. My naïve assumption was that memory would start small and then grow as needed to the maximum configured.

What I failed to realize is that there are many different partitions of the memory space and that some of these areas are allocated at JVM startup and never grow. In other words, their size is only affected by the initial memory allocation and not the maximum memory allocation. It turns out that certain aspects of Jersey and JAXB can be very sensitive to these initial allocations. This means that if you use Jersey and JAXB you should configure a larger initial memory size even in simple test and development environments and certainly in larger production environments.

The problem is related to short lived temporary objects. It turns out that it doesn’t take much to encounter the problem. It doesn’t take a massive multithreaded program with lots of allocated objects. I created a single threaded one page program that exhibits the problem. If you run this program under a 32-bit JVM then you will experience the symptom first hand. Try running the program with different initial memory allocations and see what you get.

I first noticed the problem when working with Jersey to do high-performance multipart content transfer. There are a couple different modes of HTTP content transfer used by Jersey. If the length of the content is known up front then Jersey uses a fixed streaming mode of content transfer. If, however, the length is not known ahead of time then Jersey defaults to a non-streaming mode. This non-streaming mode copies data temporarily into a ByteArrayInputStream and for larger content this process creates a lot of temporary short-lived objects. If the initial memory allocation is too low then the result is a significant stress on the garbage collector. In my case of MIME multipart transfer the length is not known up front and therefore content transfer occurs in non-streaming mode and causes the problem to exhibit itself.

Just performing non-streaming HTTP transfer isn’t enough to make the problem obvious though. If the initial memory allocation is too small then performance is mediocre but not horrible. It isn’t until I bring JAXB into the picture that things get really bad. What I discovered is that the simple initialization of a JAXB context consumes some area of Java memory such that programs with lots of short-lived objects start performing really badly. You don’t even need to use JAXB ever again. Just simply create a JAXB context and then never use it and you will see things slow down. I am not exactly sure why this happens. Since the JAXB context is a long-lived object it is not clear to me how it so adversely affects the behavior of short-lived objects. The effect is dramatic though as you can see from the test program.

The moral is: if you use JAXB you better make sure to provide the JVM with a larger initial memory allocation (-Xms) or else your program will slow down dramatically if it creates lots of short-lived objects.

Configuring dfc.properties for an application archive

When you obtain a new application server archive such as war file or ear file and that application archive contains a copy of DFC you are often faced with the task of configuring the dfc.properties file for the archive before it can be used. One common technique for doing this is to insert a dfc.properties file of your choice into the archive. For an ear file you can place your dfc.properties into the /APP-INF/classes directory of the archive. For a war file you can place your dfc.properties into the /WEB-INF/classes directory of the archive. This task can be accomplished using the jar utility from the JDK or using a zip utility such as Winzip.

For production environments that is often a good way to go. For development environments, however, constantly updating the archive each time you get a new version can be tedious. Another approach is to use a system property. If you set the “dfc.properties.file” system property then DFC will use that value to locate dfc.properties. For example:
java –Ddfc.properties.file=C:/Documentum/config/dfc.properties …

You can set this system property in a number of ways. If you are using an IDE to launch the application server you can generally set the system property in the “run” configuration screen of your IDE.

If you are starting your application server from a script you can modify the startup script to include the system property definition. In the case of Tomcat, you can also set the system property in an environment variable. For example:
set JAVA_OPTS=–Ddfc.properties.file=C:/Documentum/config/dfc.properties

Use the system property to select your dfc.properties externalizes the configuration from the application archive and allow you to easily switch between different versions of the archive without the need to configure each.

Avoid Jar Indexing

If your classpath includes jars that are indexed then performance of service factory lookup can be severely impacted. You should avoid using indexed jars. During a quick scan of built-in JRE jars I saw no sign of jar indexing so it appears not particularly important to use. If the JRE doesn’t find it useful then perhaps we shouldn’t either.

The problem relates to caching of information in the classloader infrastructure. The service factory lookup algorithm performs a global search through the classpath for a particular resource (javax.xml.parsers.SAXParserFactory for example). Observation of system behavior and a brief examination of the JRE code suggest that this lookup is cached for jars. This means after the first lookup subsequent lookups perform no further file system activity. This caching helps reduce the cost of subsequent service factory lookups. In the case of indexed jars, however, I noticed that this caching breaks down. Information from dependant jars that are referenced through a jar index doesn’t seem to get cached correctly. This means that for every single service factory lookup the jars referenced through the index are physically scanned again for the resource. This can cause a lot of additional file system activity.

I noticed the problem when working with Jersey and JAXB. It turns out that certain usage patterns of Jersey and JAXB can result in a very high number of service factory lookups (for SAXParserFactory) and those lookups are extra slow if you happen to have an indexed jar in your classpath. In this case there were two issues compounding each other to become an obvious performance problem when each on their own might go unnoticed. Without excessive factory lookups or without indexed jars then the problem is not nearly so noticeable. With both, however, you end up with lots of extra file system activity that definitely impacts performance.

It is interesting to note how I ended up in this situation. Previous to this I wasn’t even aware of jar indexing. I was using maven to build my project and decided that I wanted to add some manifest information to my jar. I searched the internet and found an example of creating a jar manifest with maven. I proceeded to cut and paste the sample into my own project. Unfortunately it just so happened that the sample also requested jar indexing. Oops, now my performance was worse. I fear anyone else that finds this same sample could encounter the same problem and not even know.

Cache Jersey JAXB Marshaller and Unmarshaller Instances

If you create new JAXB Marshaller and Unmarshaller instances often then performance suffers. To avoid unnecessary creations you should cache and reuse your Marshaller and Unmarshaller instances.

The reason performance is affected is that creation of each new Marshaller or Unmarshaller instance results (indirectly) in the creation of a new SAXParserFactory deep in the bowels of JAXB. This creation of a new SAXParserFactory for each Marshaller or Unmarshaller can be very expensive. SAXParserFactory creation involves a service lookup that scans the classpath for “javax.xml.parsers.SAXParserFactory”. If your classpath includes directories or includes jars that are indexed then a lot of file system activity can occur and affect your performance.

Ideally it would be nice if JAXB did not create so many SAXParserFactory instances. Factory creation can be expensive and it is not a good strategy to perform factory lookups so often. Why does JAXB create so many SAXParserFactory instances? Is it a bug in JAXB? Typically when such code is found in applications it is considered bad coding practice.

After thinking about it some I came to the conclusion that it probably isn’t a bug in JAXB. These are smart guys and I couldn’t imagine them making this kind of choice on accident. Though I don’t know the exact reasoning I assume it relates to the fact that JAXB is a low-level library that can be used from many different contexts and it is difficult for JAXB to make a choice that meets all use cases.

Unfortunately this behavior is deep in the bowels of JAXB in protected or private methods and does not appear easy to change externally. In the case of Unmarshaller, for example, the factory creation is found in the protected method AbstractUnmarshallerImpl.getXMLReader(). Your best bet to avoid the expense is to simply not create Marshaller or Unmarshaller instances more often than necessary.

The fix this problem in my Jersey-based REST service I created my own custom Jersey ContextResolver for Marshaller and Unmarshaller. The custom resolver maintains a thread-based cache of Marshaller and Unmarshaller instances and thereby avoids unnecessary creation of fresh instances and the corresponding expense of the excessive SAXParserFactory creations.

Notes on Configuring the IntelliJ Maven Integration

I recently started working with Maven for many of my personal projects. I love the ability to declare my dependencies on various open source projects and to have the dependencies automatically resolved and downloaded. I no longer have to search the web for the software, download the distribution, unzip it somewhere on my disk, and then configure my paths. It is all done automatically.

I use IntelliJ as my Java IDE. IntelliJ has a pretty nice Maven integration. You can import a maven project and all the dependencies and path are automatically configured for you. I found, however, that by default there are a couple pieces missing to get the really slick development environment I desire.

Maven has the ability to download sources and javadoc for the open source libraries you declare. This is wonderful when you are developing and debugging code. Unfortunately the IntelliJ integration doesn’t give you the access you want to those pieces by default. With some research I found that Intellij is able to do all that I want but takes a little extra setup. What follows is a discussion of what I did to complete the picture.

1. Configure IntelliJ to download sources and Javadoc

You must configure IntelliJ to download the sources and javadoc. By default they are not downloaded. You can do this by visiting the Maven/Downloading section of the project settings.

2. Install Firefox

The javadoc downloaded by Maven is in jar format. IntelliJ automatically configures itself to almost use the jar-based javadoc but not quite. To view the configured javadoc IntelliJ fires up a browser window and sends the URL to the browser for display. The problem is that Windows Internet Explorer can not display jar-based javadoc. I researched some and found that the Firefox browser can display jar-based javadoc. You need to install Firefox on your system. In my case I chose to install it as an alternate browser (leaving Windows Explorer as my primary browser).

3. Configure Intellij to use Firefox by default

You need to configure IntelliJ to use Firefox as it’s default web browser so that requests to view javadoc will be sent to there. You can do this by visiting the Web Browsers section of the IDE settings.

4. Install the JarDoc plugin

One problem with the jar-based javadoc integration in IntelliJ is that is generates a jar URL that can not be consumed by any browser that I am aware of. Even though Firefox can display jar-based javadoc it can not do so with the URL that IntelliJ generates. Someone has written an IntelliJ plugin to work around this problem. You need to install the JarDoc plugin. You can do this by visiting the Plugins section of the IDE settings. You will find the JarDoc plugin in the “Available” tab of the plugin configuration.

5. Update your keymap to access JarDoc using the standard javadoc key sequence

Even though JarDoc works around the IntelliJ jar URL problem, it does so using a special key sequence. The normal key sequence for accessing javadoc in Intellij is “Shift-F1”. The JarDoc functionality is installed by default to only active on a special sequence of “Shift-Alt-F1”. I found this rather annoying. I decided to reconfigure my keymap so that JarDoc functionality was invoked using the standard “Shift-F1”. You can do this by visiting the Keymaps section of the IDE settings. You need to create a custom keymap if you haven’t already done so because you can’t modify the built-in keymap. Remove the old mapping for “Shift-F1” then map the JarDoc plugin to “Shift-F1”.

Monday, September 14, 2009

My First Experience with JMeter

While in the process of collecting performance statistics with my own test programs I was introduced to JMeter by Michael Ottati. It’s a nice framework for declaratively composing sequences of tests and collecting statistics from runs of those tests. It has a graphical interface that allows you to compose test sequences, run the tests, and view the results in either tabular or graphical format.

As I began to learn more about JMeter by reading the overview documentation my main question became: “how do I integrate my custom tests with JMeter?” In Michael’s case he seemed to have a somewhat heavyweight integration in which his custom project was intimately intertwined with the JMeter distribution. I set out to see if a little lightweight integration was possible.

Here is a brief summary of what I ended up doing:
  1. Installed a vanilla JMeter distribution on my system.
  2. In my own project (in a separate directory tree) I created some custom “Samplers” which extended the AbstractJavaSamplerClient interface of JMeter.
  3. I created a user.properties file in the current working directory from which I would launch JMeter. In the user.properties file I added “search_paths” and “user.classpath” entries which referenced the classes and supporting jars of my custom project.
  4. I ran JMeter and created a Test Plan. In the Test Plan I added a “Java Request” sampler. In the configuration screen for “Java Request” sampler there is a pull-down list of available samplers. My new samplers showed up in the list (by virtue of the “search_paths” entry in user.properties).
That’s all it took. The connection between my custom project and the JMeter distribution was simply the user.properties file in my own directory tree. I didn’t have to modify the JMeter distribution at all.


Free File Comparison Software

Recently the question came up at work of where to get a free file comparison utility. Several recommendations were made and I thought it would be a good idea to capture the information here for others who may be pondering the same question.

I too have wondered where to get a good free file comparison utility. Unfortunately I didn't find this information soon enough and ended up purchasing a commercial product called Araxis Merge. I say unfortunately only because it cost me some money. In all other respects it is a wonderful tool and I am very happy with it. I find it better than all the free choices mentioned below. The only free choice that comes close is the DiffMerge product from SourceGear.

Here's the short list of choices along with a brief comment. I haven't used any of these products extensively but used them just enough to get a quick impression:
  1. CSDIff - I found this product rather primitive. I did not like the single window difference format. I prefer the side-by-side dual window format much easier to use.
  2. DiffMerge - I found this tool pretty nice. This is the one I'd pick
  3. WinMerge - This produce landed somewhere in the middle. It seems to have most of the features and is graphical in nature but it didn't feel as nice as DiffMerge. My biggest complaint was the folder view. I seemed to have to manually dive down into each subfolder in order to see nested differences and this was very inconvenient.

Saturday, September 12, 2009

Configure Jersey Chunked Encoding when Using Multipart

Make sure to configure the Jersey chunked encoding size when using MIME multipart content.

There are a couple different modes of HTTP content transfer used by Jersey. If the length of the content is known up front then Jersey uses a fixed streaming mode for content transfer. An example of when the length is known is when the content comes from a file. If, however, the length is not known ahead of time then Jersey has two options. It can use a non-streaming mode or a chunked encoding mode. MIME multipart is a case when the length is not known up front and therefore transfer occurs in one of these two later modes.

The problem is that the default chosen by Jersey when MIME multipart is involved is to use non-streaming mode. The non-streaming mode copies data temporarily into a ByteArrayInputStream and for larger content this process creates a lot of temporary short-lived objects which places stress on the garbage collector. It also consumes lots of memory while the entire message is being buffered.

Luckily this behavior is configurable under Jersey. The solution is to configure the chunked encoding size so that Jersey will use chunked encoding instead of non-streaming mode whenever MIME multipart content is encountered. Here is a brief example of how this is done on the client:

ClientConfig config = new DefaultClientConfig();