Friday, August 19, 2005

Building libgcj.dll for MinGW with GCC 4.1

Here's some notes from my experience building libgcj.dll for MinGW using the GCC 4.1 (May 15, 2005 snapshot). This required a good deal of effort for me to figure out since quite a few things have diverged since the last reported sighting I could find of a working libgcj.dll.

Note that this entire process was done using a cross compiler hosted on Linux.

First, it should be noted that I upgraded libtool prior to building the DLL. When I get some time, I'll rebuild it with pristine sources and make sure it works without this step. The basic process to upgrading libtool is:
  • Copy libtool.m4 from a newer libtool distro to the root gcc source directory
  • Run autoconf (version 2.59) followed by automake (version 1.9.3) in the subdirectories that will be built. Do not regenerate the config on the root directory unless if you make special arrangements to regenerate it with autoconf version 2.13.
  • In the libjava target, you will manually have to patch the generated libtool after it gets created. You will need to change the "compiler_c_o" property in the GCJ section to "yes". This is line 7278 of the version of libtool I used. Not doing this will cause problems when the Makefile tries to compile resources. There has got to be a better way to make this work, but I am not terribly familiar with libtool and didn't look into it.
My primary motivation for upgrading libtool was to see if it would just automagically generate the dll for me. Sadly, this didn't work but I think it tried. With that being the case, it is probably ok to skip the libtool upgrade altogether.

I also ran into a problem with the sequence of the build. During the first pass build, the gnu-java-beans.lo file is generated prior to the most of its classfiles being compiled. Because of this, it only contains a handful of classes, which causes a lot of undefined references when building the dll later. I think this is a general problem that just doesn't manifest very often because the java.beans.* API is very rarely used in practice. A work-around is simple enough. After you build it once, just delete the generated gnu-java-beans.lo and rerun make. Since all of the class files are present the output file will be created as expected. You should see that the backing .o file(s) are much larger the second time around.

The next step is to create the actual dll. You should be aware that this is an extremely memory intensive operation. The memory usage of ld tops off at about 630MB on my machine. If you don't have at least this much physical memory available to ld, your machine could thrash for hours.

I used the following script to create the dll:


mkdir libgcjobjs
cd libgcjobjs
ar x ../$LIBGCJ_A
cd ..

find libgcjobjs -name '*.o' > libgcjobjs.list

i686-pc-mingw32-ld --shared -Bdynamic -e _DllMainCRTStartup@12 -o libgcj.dll --out-implib libgcj.a /.1/home/tlaurenzo/gcc_cross/bin/../lib/gcc/i686-pc-mingw32/4.1.0/../../../../i686-pc-mingw32/lib/dllcrt2.o -L~/gcc_native/i686-pc-mingw32/lib -L/.1/home/tlaurenzo/gcc_cross/bin/../lib/gcc/i686-pc-mingw32/4.1.0 -L/.1/home/tlaurenzo/gcc_cross/bin/../lib/gcc -L/home/tlaurenzo/gcc_cross/lib/gcc/i686-pc-mingw32/4.1.0 -L/.1/home/tlaurenzo/gcc_cross/bin/../lib/gcc/i686-pc-mingw32/4.1.0/../../../../i686-pc-mingw32/lib -L/home/tlaurenzo/gcc_cross/lib/gcc/i686-pc-mingw32/4.1.0/../../../../i686-pc-mingw32/lib --export-all-symbols --enable-runtime-pseudo-reloc --allow-multiple-definition `cat libgcjobjs.list` -lmingw32 -lgcc -lmoldname -lmingwex -lmsvcrt -lm -lgdi32 -lws2_32 -lmingw32 -lgcc -lmoldname -lmingwex -lmsvcrt -luser32 -lkernel32 -ladvapi32 -lshell32 -lmingw32 -lgcc -lmoldname -lmingwex -lmsvcrt

(This script is just for demonstration purposes. You will obviously need to update all of the paths appropriately)

To build programs that link to the dll, you will need to use some extra switches:

i686-pc-mingw32-gcj --main=Test -L. -o test.exe -Wl,--enable-runtime-pseudo-reloc

If trying to duplicate this work, note that I am not working on pristine sources. In particular the changes described in the mailing list thread entitled "Linker name conflicts due to optimization in gjavah" are critical for this to work.

Monday, August 15, 2005

Adventures with Java 5 and GCJ

A few days ago I set out to do something that had seemingly never been tried, yet the process seemed straight-forward enough and I anticipated few problems (famous last words, I know). You see, I am working on this client-server systems management application which requires a small agent to be installed on a number of client machines (Windows, Linux, MacOS, etc). The server side is written with Java 5. I entertained some thoughts of writing the agent in C++, but my C++ is a little rusty and I haven't had much experience in managing a full-blown C/C++ cross-platform initiative... besides, I like working in Java. So, throwing caution to the wind, I set out to see if a natively compiled version of my Java agent would be possible and acceptable (I don't want to contend with a huge JRE install or the memory footprint associated with it). Just to make life more interesting, my development environment is Windows XP.

So, here is the basic process I intended to follow:
  1. Install Mingw/Msys (I already use Cygwin heavily but want to be closer to the Win32 api and not have dependencies on cygwin1.dll)
  2. Use RetroWeaver to convert my Java 5 class files to Java 1.4 compatible equivilents.
  3. Use the RetroWeaver reference verifier to find references to classes/methods which are new to Java 5.
  4. Create a compatibility layer to call through to new features of Java 5 (note that from the outset I had been trying to avoid most new APIs so this step wasn't so bad. Most of the stuff I found was as simple as using Arrays.toString(...) in test classes)
  5. Use GCJ to compile my classes and dependencies natively.
Well, I hit my first snag with RetroWeaver. While I was able to convert the class files, the reference verifier was buggy and difficult to invoke from ant. Poking around on the SourceForge site, I found a number of patches to fix the things I was having problems with. Unfortunately, the author hasn't released an updated version of RetroWeaver in over 6 months and many of these patches conflict with each other or do not note properly what version of the source code the diff was generated against. Also, for some reason that was probably my own brain-damage, I was having a particularly bad time getting patch to actually do its thing. So, if anyone is looking for a version of RetroWeaver that has what I consider to be the right set of patches applied, you can get it here. This version seems to have a "mostly working" reference verifier, and the Ant task has been updated to make it easy to invoke, like so (assumes that the directory ${common.dir}/retroweaver contains all of the jars found in the RetroWeaver distribution. also assumes that retroverf.dir contains a copy of all class files, retrort.file points to a JRE 1.4 rt.jar file and std.classpath and depend.classpath are valid path constructs that contain all the needed dependencies for your classes):

<taskdef name="retroweaver" classname="com.rc.retroweaver.ant.RetroWeaverTask">
<fileset dir="${common.dir}/retroweaver">
<include name="*.jar"/>
<retroweaver srcDir="${retroverf.dir}"
<pathelement location="${retrort.file}"/>
<path refid="std.classpath"/>
<path refid="depend.classpath"/>
<pathelement location="${retroverf.dir}"/>

Having spent far to long attempting to get RetroWeaver to do everything I wanted it to, I naively thought that GCJ would just work and life would be good. I couldn't have been more mistaken! When I tried to compile my main jar file, I got an obscure error from the assembler complaining about duplicate symbols. I was on my way down the rabbit hole now! I recognized the duplicate symbol name as the mangled form of a method on one of my classes. I looked it up and saw that the method in question was an implementation of a method from an implemented generic interface. Looking at the byte-code revealed that the class had two methods with the same signature and differing return types. I vaguely recalled some discussion about covariant return types and bridge methods in Java 5 and the problem started to become a little less murky. The error spit out by the assembler included a mangled name that did not have the return type encoded in it, and that was apparently causing the name collision.

Doing some googling and sifting through the GCC bug database revealed that this bug is officially listed as bug #9861 and it has been open since 2003 (apparently the original bug was noticed while using one of the old prototype generics compilers). There was some comment that a new ABI was expected to fix this problem, but a look at recent snapshots of GCC revealed no fix yet. Well, it started to look like my original idea had hit a dead-end.

For some reason, though, I didn't stop there. I downloaded the May 15, 2005 GCC 4.1 snapshot and started hacking. Now, the last time I actually tried to build GCC was back in 1998 on a Slackware Linux box. I don't remember much except searing pain and an eventual partial success. To be fair, the GCC build has gotten a lot easier since then... at least if you are using Linux. If you are trying to build GCC on Windows using mingw, just stop. I eventually settled for the procedure outlined by Ranjit Mathew on his website for first building a cross-compiler on Linux, then using it to build a native compiler for Windows/Mingw. He includes some scripts that work really well for this purpose. Of course, there is a reason why you can't find pre-built binaries for any GCC >= 4.0 on Windows/Mingw. There are two compilation problems that keep the suite from being built on mingw. The patches for these two problems are included in the main patch below. I'll get them to the GCC team eventually, after making sure that the fix already isn't in CVS head.

So I set my old Linux box to building the GCC compilers and went camping for the weekend. I just picked up the results today and started looking for actually how to fix the original problem. It seemed to me that the goal should be to modify the mangling routines so that they include the mangled return type in the name passed to the assmebler. For GCJ, this is actually really easy to comprehend and implement... just a one-liner change. The problem is, however, that with this change, the C++ and Java ABI's no longer match up. This may not seem like a problem at first until you consider that all of the native parts of libjava are written in C++ using CNI, which requires the mangled names of methods match up on the C++ and Java sides of the house. It took me quite a bit longer to paw through the C++ compiler to figure out what to do. It would seem that including the return type in the mangled function name is already done for function templates (I think... I didn't trace this down all the way), so the fix involved just adding another condition in which to include the return type. So, if the function is a method of a class and if the class is a Java class (descended from java::lang::Object), I do the same thing as is done for that special template case. There are some macros in G++ that make this a one-liner change as well... it's just a much scarier change for the uninitiated because the G++ compiler is really complicated!

In conclusion, I think it is a Good Thing to have a working path for moving from Java 5 source code to native binaries and I hope that this article can help make that a reality (at least in the interim, since the "official" support for Java 5 in GCJ seems to be a way off). I also think it is really important to have an up-to-date GCJ compiler for Windows and am going to pursue making actual builds available. For now, though, the patch will have to be enough for anyone interested in duplicating my work. Here it is. I haven't finished the clean build of the compiler and test suite yet, but have visually inspected and verified the results of this patch and it seems to be ok. YMMV. Since this is a breaking ABI change, I wouldn't anticpate seeing it anytime soon in the official GCJ releases.

UPDATE 8/17/05:
It turns out that there was one additional thing that complicated the process of changing the mangling scheme. This was causing unsatisfied link errors when calling, either directly or indirectly, a number of static methods on the Math class. Since this was clearly a problem with the Java side of things, I breathed a sigh of relief (for not having to delve back into the C++ compiler). I did an objdump on the Math.o file and looked at the disassembly for the round(float) method, because it called the Math.floor(float) method which was one of the unresolved symbols. I was surprised to find that even though the Java code for round(float) calls floor, this call was nowhere to be found in the disassembly. At this point I realized that the compiler must have some mechanism for inlining some operations. So I did a grep for "floor" in the gcc/java directory and found the following:
builtins.c: double_ftype_double, "_ZN4java4lang4Math5floorEd");
There was my unresolved symbol without the return type mangled into the name. There are 13 of these in the builtins.c file. Updating them with the proper signature makes everything work correctly.

File List: