I'm on my way to work, but I'll try to answer your questions.
When I started the groundwork for the compiler more than two years ago, I looked at all available compilers out there and tried to learn something. One of the things I quickly learned is that they all operated on a RISC style instruction set - typically
in a three address format. This format makes optimizations a whole lot easier than any other format. So the primary reason for the IR was to enable optimizations in a target platform neutal way. Another point for this was to support more than one source language
beside CIL. My original goal was to be able to reuse most of the compiler for things like scripting languages, regular expression compilation etc. Having a common way to do this reduces the amount of testing needed dramatically.
I have specific plans in my head to optimize and reduce memory further, but my personal time is limited right now and I stepped a bit out of the way of this project.
2. Cross assembly inlining
We have a mosacl command line compiler (I sure hope its still there) which will accept more than one input assembly and generate one (optionally bootable) output assembly containing the native code for all input assemblies.
That was a plan. Unfortunately I quickly realised that this is not really possible as optimization stages will need the instructions in some sort of tree or list model to be able to reorder, replace or modify them. Of course you could achieve the same by
storing modification information, but we kept the design simple here on purpose. Get things running first, optimize later after we know where the bottlenecks are. There are plans to reduce the number of lists though.
4. Compacting the way labels are stored
Ben, I'm not sure I understand the question correctly. Could you please elaborate on this? Our labels in the IR are the original addresses in the CIL assembly, so these are Integers. Finally the addresses are attached to basic blocks and not to the instruction
itself, to reduce weight some more. Currently the addresses hold an offset, but this is almost unused and easy to remove.
Yes, MT is possible. Methods can be compiled in parallel. However I don't think more than 4 methods is reasonable.