This project is read-only.

Why not use LLVM for IR to code gen.

Topics: General
Jul 28, 2009 at 1:09 AM

Question why dont you focus on getting all CIL to IR  working ( eg all instructions) and use LLVM for the actual IR to compilation ? the new  LLVM mono AOT jit is much faster than their current one ( despite like 6 years of work) .

Sure it is not "built here"  and its native but it will get you up and running fast ,also note you could compile LLVM itself to CIL later and then use Relfector to give you some managed optimization code . It will need work but it seems a much quicker way to get to the major goal of a running OS. OR evolve the current layers.

The other big benefits are

- immediate support for ARM ,x64 and other platforms.

- asm output for debugging

- Highly optimized output ( ie equivalent with gc++)

- parts will be converted to C# as needed

- Working code for CIL to LLVM exists in the new mono jit.

- More time to get CIL to all IR instructions out rather than worrying about optimizations.

- People can work on the OS  things like Dynamic loading and compile time bounds checking  which are huge issues.

- It follows the goal of getting a framework out ASAP which can be improved.

- Most small devices have all apps built on a PC , i have yet to encounter an arm compiler that runs on ARM.

Regards,

Ben

 

Jul 28, 2009 at 6:05 AM

Hi Ben,

that's a good question. I'll answer it from a personal point of view. Doing this is a hobby. I'm doing to spend some time learning new things, improving my skills and knowledge. This project is my place to do experiments I can't do anywhere else. Surely we would be faster by using one of the existing compilers (Heck, even the original Mono one could be used) - it just wasn't an option on my table. I personally place a lot of importance on the efficiency and design of things. I've looked at LLVM for research purposes, but I'm not sure of its design itself.

Another thing is that this project came from the background of SharpOS, Ensemble and lastly it was its goal to unify the three OS attempts under a grander architecture. We wanted to provide a place for all three projects to share code without licensing worries.

To your benefit list:

- ARM support: Let's get one platform to a stable state first. Any volunteers can approach this now, but its not on my priority list. The compiler pipeline is flexible enough for this.

- ASM Output: We have (or well used to have, its broken) the ASMCodeEmitter. One of our goals for 0.2 is to fix it again.

- Highly optimized output: Is nice, but hard to debug when something breaks. I'm certain one could turn it off for a debug build.

[Some points skipped]

- People can work on the OS things like Dynamic loading and compile time bounds checking which are huge issues:

We already have parts of the runtime. The compiler uses our dynamic loading mechanism in our VM to load assemblies for compilation. We even build the runtime internal data structures at this point. There's some cleanup and work to do there, but it ain't much anymore.

- It follows the goal of getting a framework out ASAP which can be improved

I'm not going to rush this project. I value good tests, working code and a great design much higher than anything else. If it is going to be rushed, the less likely I'm personally going to commit time to it. It's a simple fact. This ain't my payday job.

- Most small devices have all aps built on a PC, I have yet to encounter an ARM compiler that runs on ARM

Surely, what's your point? ARM isn't everything, there's still PowerPC and hundreds of the other CPU architectures. 

Jul 28, 2009 at 7:15 AM

Hi Ben:

The MOSA project is about creating an AOT/JIT compiler to support a 100% managed (and secure) operating system based on .NET framework. The JIT compiler itself needs to be managed to achieve this goal.  We would miss this goal if we used native, non managed, components, like LLVM. 

If we just wanted to run managed applications and not concern ourselves with the underlying operating system or compiler, we could just use a super small Linux distribution with Mono pre-installed. That's probably the fastest short-cut.  But where's the fun in that?

Although, it's interesting that LLVM can now target MSIL. It still have some limitations. In fact, I've registered to attend the LLVM Developers' Meeting in Cupertino, CA (at Apple) on October 2, 2009. Anyone want to meet up?

- Phil

Jul 28, 2009 at 7:18 AM

Comments in line.

Hi Ben,

that's a good question. I'll answer it from a personal point of view. Doing this is a hobby. I'm doing to spend some time learning new things, improving my skills and knowledge. This project is my place to do experiments I can't do anywhere else. Surely we would be faster by using one of the existing compilers (Heck, even the original Mono one could be used) - it just wasn't an option on my table. I personally place a lot of importance on the efficiency and design of things. I've looked at LLVM for research purposes, but I'm not sure of its design itself.

Another thing is that this project came from the background of SharpOS, Ensemble and lastly it was its goal to unify the three OS attempts under a grander architecture. We wanted to provide a place for all three projects to share code without licensing worries.

Ø The  thing is some people are interested in an OS rather than the compiler . By using something like LLVM these people can get running  , while the compiler completes. LLVM Is no solution in the long term as it cant run in a manged OS ( well it can but its unmanaged and would require a c lib )

To your benefit list:

- ARM support: Let's get one platform to a stable state first. Any volunteers can approach this now, but its not on my priority list. The compiler pipeline is flexible enough for this.

- ASM Output: We have (or well used to have, its broken) the ASMCodeEmitter. One of our goals for 0.2 is to fix it again.

- Highly optimized output: Is nice, but hard to debug when something breaks. I'm certain one could turn it off for a debug build.

[Some points skipped]

- People can work on the OS things like Dynamic loading and compile time bounds checking which are huge issues:

We already have parts of the runtime. The compiler uses our dynamic loading mechanism in our VM to load assemblies for compilation. We even build the runtime internal data structures at this point. There's some cleanup and work to do there, but it ain't much anymore.

- It follows the goal of getting a framework out ASAP which can be improved

I'm not going to rush this project. I value good tests, working code and a great design much higher than anything else. If it is going to be rushed, the less likely I'm personally going to commit time to it. It's a simple fact. This ain't my payday job.

- Most small devices have all aps built on a PC, I have yet to encounter an ARM compiler that runs on ARM

Surely, what's your point? ARM isn't everything, there's still PowerPC and hundreds of the other CPU architectures.

Just countering the argument it doesn’t run under the managed OS eg all the assemblies need to be compiled before hand.

Regards,

Ben

Jul 28, 2009 at 7:50 AM

Yes it can target MSIL but this is still in development and god knows what it does with all the clib calls and an IR to LLVM IR should be easy.

BTW Shouldn’t the ultimate goal be to create a Managed OS ?

Consider the current

Build MOSA Compiler -> Improve Compiler – > Refine MOSA Compiler – > Begin working on OS   ->  Refine OS -> Have an OS and Compiler

All the OS development is stalled which has led to the Sharp OS etc taking an extended break.

Compare

Build Mosa Compiler   -> Improve Compiler – > Refine MOSA Compiler

è Use LLVM  – > Begin working on OS   ->  Refine OS -> Have an OS and Compiler

The schedule allows more parallelism as a lot of people are not interested in  compilers.

Not saying we should just replace the compiler but use LLVM as the IR -> Code stages so OS people can get working.

Regards,

Ben

From: tgiphil [mailto:notifications@codeplex.com]
Sent: 28 Julie 2009 02:16 PM
To: bklooste@gmail.com
Subject: Re: Why not use LLVM for IR to code gen. [mosa:63702]

From: tgiphil

Hi Ben:

The MOSA project is about creating an AOT/JIT compiler to support a 100% managed (and secure) operating system based on .NET framework. The JIT compiler itself needs to be managed to achieve this goal. We would miss this goal if we used native, non managed, components, like LLVM.

If we just wanted to run managed applications and not concern ourselves with the underlying operating system or compiler, we could just use a super small Linux distribution with Mono pre-installed. That's probably the fastest short-cut. But where's the fun in that?

Although, it's interesting that LLVM can now target MSIL. It still have some limitations. In fact, I've registered to attend the LLVM Developers' Meeting in Cupertino, CA (at Apple) on October 2, 2009. Anyone want to meet up?

-

Jul 28, 2009 at 8:17 AM

Sure you're right one could parallelize development this way. I believe we can do almost the same thing with our emulated kernel concept - we allocate some memory, create a window in the host OS to serve as a frame buffer and use that to get some code running. Additionally, the release 0.1 should facilitate most concerns regarding OS development - you can write almost anything except objects, arrays and structures. You can work around those parts for some time.

Jul 28, 2009 at 11:15 AM

But how do you write C# IPC , MM/GC and a scheduler this way ?

BTW my current MM and GC which I’m writing  uses Objects ;-P  You can replace it at will  or restart if desired , first OS I know where you can restart the MM. Also if you want to use Capabilities you need objects.

Regards,

Ben

From: __grover [mailto:notifications@codeplex.com]
Sent: 28 Julie 2009 03:18 PM
To: bklooste@gmail.com
Subject: Re: Why not use LLVM for IR to code gen. [mosa:63702]

From: __grover

Sure you're right one could parallelize development this way. I believe we can do almost the same thing with our emulated kernel concept - we allocate some memory, create a window in the host OS to serve as a frame buffer and use that to get some code running. Additionally, the release 0.1 should facilitate most concerns regarding OS development - you can write almost anything except objects, arrays and structures. You can work around those parts for some time.

Read the full discussion online.

To add a post to this discussion, reply to this email (mosa@discussions.codeplex.com)

To start a new discussion for this project, email mosa@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe on codePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at codeplex.com

Jul 28, 2009 at 11:33 AM
Edited Jul 28, 2009 at 11:35 AM

Hi Ben,

To answer your points:

- MM/GC: You get memory from the host operating system and treat it as if it were the entire physical memory installed. You can do whatever you want in this block of memory, including writing a pure C# memory manager and GC on top of it.

- Scheduler: You have one (or more) host operating system threads, which are your virtual CPUs. Schedule the C# threads on them. Same thing as above.

- IPC: All IPC is based on shared memory. See my point above.

If you know it works in an emulated kernel, you know it'll work on physical hardware. You've got more to do there sure, but the higher level algorithms have been verified. This approach also ensures that you're using a proper HAL, where the higher level algorithms use defined entry points and are architecture independent. Writing against bare metal always implies the risk of taking to many architecture dependencies, which slows later ports to other platforms. And this gives you another advantage: If you've made it this far, projects like Usermode Linux (if we look accross the table) are not necessary. They've been done already.

Mike aka __grover

Jul 28, 2009 at 12:45 PM

The thread idea is interesting but it will it work ? You will be building quite a layer subdividing a thread , building a virtual timer etc.. Executing your own “Thread” when it has time  etc

Memory will work for my machines but not code that does Virtual memory  getting the paging right is a real pain.

Regards,

Ben

From: __grover [mailto:notifications@codeplex.com]
Sent: 28 Julie 2009 06:34 PM
To: bklooste@gmail.com
Subject: Re: Why not use LLVM for IR to code gen. [mosa:63702]

From: __grover

Hi Ben,

To answer your points:

- MM/GC: You get memory from the host operating system and treat it as if it were the entire physical memory installed. You can do whatever you want in this block of memory, including writing a pure C# memory manager and GC on top of it.

- Scheduler: You have one (or more) host operating system threads, which are your virtual CPUs. Scheduler the C# threads on them. Same thing as above.

- IPC: All IPC is based on shared memory. See my point above.

If you know it works in an emulated kernel, you know it'll work on physical hardware. You've got more to do there sure, but the higher level algorithms have been verified. This approach also ensures that you're using a proper HAL, where the higher level algorithms use defined entry points and are architecture independent. Writing against bare metal always implies the risk of taking to many architecture dependencies, which slows later ports to other platforms.

Mike aka __grover

Read the full discussion online.

To add a post to this discussion, reply to this email (mosa@discussions.codeplex.com)

To start a new discussion for this project, email mosa@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe on codePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at codeplex.com

Jul 28, 2009 at 2:04 PM

Ben,

I was talking about higher level algorithms. Anyways you need an emulation HAL, which is the point of our EmulatedKernel. This HAL can use whatever the host provides, e.g. Windows.Forms for the frame buffer, System.Timers for timer interrupts and scheduling. Nothing special to do there. Of course you have to do some magic, but that's really what's happening anyways. No matter if the scheduler is called by an interrupt or by a timer callback.

There are some things, which you can't do this way. Virtual memory & paging are some of them. Of course this depends on the definition of HAL and how far away from hardware it places you.

Mike

Aug 16, 2009 at 5:53 PM

hi,

i m also developing a similar compiler to generate LLVM IR from C#. its gonna be a very small subset of C# thou.

its not gonna use the compiled IL code rather will go through all the process of scanning, parsing, AST semantic analysis.

can find more info at http://projects.prabir.me/compiler

Aug 16, 2009 at 7:26 PM

Hi prabirshrestha ?

Why ? You can’t really add any value to the free Mono and MS compilers  ( these cant be made faster or better) . Why not just compile the CIL to LLVM IR ?  If your interested in developing a language you would probably still be better doing CIL to IR first and developing it for the CLR  due to the starting base ( you can compare consistent output ) and tools available and having access to the .NET or Mono runtime..

Regards,

Ben

From: prabirshrestha [mailto:notifications@codeplex.com]
Sent: 17 Augustus 2009 12:54 AM
To: bklooste@gmail.com
Subject: Re: Why not use LLVM for IR to code gen. [mosa:63702]

From: prabirshrestha

hi,

i m also developing a similar compiler to generate LLVM IR from C#. its gonna be a very small subset of C# thou.

its not gonna use the compiled IL code rather will go through all the process of scanning, parsing, AST semantic analysis.

can find more info at http://projects.prabir.me/compiler

Read the full discussion online.

To add a post to this discussion, reply to this email (mosa@discussions.codeplex.com)

To start a new discussion for this project, email mosa@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe on codePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at codeplex.com

Aug 16, 2009 at 11:04 PM


 Prabirshrestha:

I have to agree with Ben on this one. A better focus would be on CIL to LLVM. A C# to LLVM bypasses one of the major benefits of CIL - language interoperability. CIL is the glue that supports multiple source languages (C#, VB, F#, etc) choices. 

To be honest C# parsing is already well done by the Mono compiler and open source.

Phil

Aug 17, 2009 at 2:14 AM

CIL to LLVM is REALLY easy .  Once you have it you can generate new languages ( and use existing languages with LLVM eg Iron Python , C# , F# etc etc ) but best of all because the output is consistent you can check your language output with what the MS and Mono compilers generate and you have a runtime ( Mono or .NET)  making a lib of pinvokes is plain ugly. You can even make a few unit tests to compare the CIL output to that produced by cs.exe  ( You can’t really do this with LLVM IR mainly because there is no other compiler) .

Note this is CIL as in Common Intermediate Language not CIL (C intermediate Language  sourceforge project that does a lot with LLVM)

Regards ,

Ben

From: tgiphil [mailto:notifications@codeplex.com]
Sent: 17 Augustus 2009 06:04 AM
To: bklooste@gmail.com
Subject: Re: Why not use LLVM for IR to code gen. [mosa:63702]

From: tgiphil


Prabirshrestha:

I have to agree with Ben on this one. A better focus would be on CIL to LLVM. A C# to LLVM bypasses one of the major benefits of CIL - language interoperability. CIL is the glue that supports multiple source languages (C#, VB, F#, etc) choices.

To be honest C# parsing is already well done by the Mono compiler and open source.

Phil

Read the full discussion online.

To add a post to this discussion, reply to this email (mosa@discussions.codeplex.com)

To start a new discussion for this project, email mosa@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe on codePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at codeplex.com

Aug 17, 2009 at 9:00 AM

hi ben and phil,

even at first i had thought of using the mono to parse the compiled il code. but as a university project it isnt a good idea. so i landed up starting from scratch by creating scanner and parser generator CocoR ( i also created a vs plugin - http://cocor.codeplex.com )

the other alternative was to generate a c/c++ code instead of llvm, but this was already done by - http://crossnet.codeplex.com

rather then just creating the c# compiler, learning the scanner and parser also allows me to understand more and give me the power to be creative. i really wanted to make the compiler for this one http://weblogs.asp.net/bsimser/archive/2007/01/21/domain-driven-design-for-c-3-0.aspx and its quite easy to achive it now that i know how to use scanner and parser :)

anyways there are advantages and disadvantages.

and if it wasnt for my university project, i would had most probly choosen mono to parse the CIL.