Using a "Big Table" for virtual memory?

Topics: General
Jul 1, 2010 at 2:44 PM
I was wondering how a managed OS could create virtual memory, because there are no processes just objects. I guess that something like a "big table" DB is used (like voldemort of cassandra), so record can be made for objects with an arbitrary number of values. Maybe something else is used entirely? Could anyone enlighten me how the references to the objects that are swapped out are managed? Normally a reference is just a memory address but once the object is on disk that address is invalid and when the object is moved in memory again it will probably be on a different memory address. Maybe it uses a similar system to the garbage collector that tries to optimize the free space by moving objects next to each other. Maybe I'm thinking in the wrong way here?
Jul 2, 2010 at 10:02 AM

Who said there is no processes ?  You still need "processes" either to refere to a sealed Object Space ( since you dont want all your objects owned by a single GC  - each "process/domain: has its own GC ( prob compacting) - it would mean a GC sweep could stop the machine for many seconds)   or as a Scheduling mechanism  ( eg it controls the threads) .  You can run a single shared GC Nursery  but this weakens the locality for multiple tasks at the same time ( and hence cache performance) and more contention  though it may be better as you have a huge Nursery - think creating 100M of strings at stack speed without any memory  operation.

I have given some thought to VM also ..

 You can use VM but a single address space - which Mosa was intending ( though Sooos  does not). 

 I was thinking of some sort of Virtual VM but without the automatic TLB lookups more in terms of managing the address space and swapping blocks to disk..  For code and dlls this is trivial Elf format on Linux has a lot of code for pic which you can use with gcc . This generates all code as position indepent so you can load , move  and reload the code as you please.

Data is a different matter , in 32 bit you could use Segments. allas this is no longer possible for 64 bit.  SO needs to be patched for address offsets as its moved , or use indirect table for static data ( all this is avilable in gcc)  its not difficult  but includes a performance penaly for loads  ( which is dwarfed by a disk access)

Hence you can manage the VM as you wish, you need some mechanism to know whether code and data is active since you cant relly on page access .  Prob the best way is if it is using CPU usage and the scheduler .. You can then unload and load slabs of memory.  If you can solve this problem i feel it will be better than Paging  since you can swap large 2M blocks to disks using sequential reads and writes.  The fact you cant run a program if part of the space is swapped out is an issue but IMHO paging is completely broken by GCs anyways since they vist every page you dont know what was used to the extent that Linux and Windows no longer consider paging effective ( for this and some other reasons) and just use it as a memory allocation mechanism.

 Should be worth a nice paper whether it works or not.

Rather that a big table you would probably just maintain a in memory list and an ondisk lisk of memory segments

Ben

 

Jul 6, 2010 at 7:15 PM
I'm still puzzling with this one and I don't fully understand your reaction. I think that you mean that you let the compiler figure out what to swap and what not, is it not? Knowing what is not active so it can be swapped out is key for virtual memory. A number of language items could maintain meta data to determine if it's probable that they will be used soon and thus reloaded. Things like creation and access time and a usage number would incur a cost for memory and maintaining though. - AppDomain (process) Swapping a whole domain is probably to course and could be impossible if more memory is needed for a single appdomain than is physically on the machine. - Functions Could be swapped to disk but code is normally to small to be of any benefit - Classes and structs Maintaining metadata for a class next to a vtable should be manageable and maintaining a table where all the objects of a class are tracked so they can be swapped out to disc. - indivual objects on the heap I think that maintaining metadata for every object would probably be to heavy. It could be done if the programmer opts in certain objects but relying on the despondence of the programmer for the stability of the system would be wrong. - indivual objects on the stack stack objects will probably be used sooner than later why would they be on the stack otherwise. I'm thinking along the lines of upgrading the GC to also let it move the objects to disk, that are less often used.
Coordinator
Jul 7, 2010 at 8:03 AM
Edited Jul 7, 2010 at 8:04 AM
dancingdoorman wrote:
I was wondering how a managed OS could create virtual memory, because there are no processes just objects. I guess that something like a "big table" DB is used (like voldemort of cassandra), so record can be made for objects with an arbitrary number of values. Maybe something else is used entirely? Could anyone enlighten me how the references to the objects that are swapped out are managed? Normally a reference is just a memory address but once the object is on disk that address is invalid and when the object is moved in memory again it will probably be on a different memory address. Maybe it uses a similar system to the garbage collector that tries to optimize the free space by moving objects next to each other. Maybe I'm thinking in the wrong way here?

A managed OS can organized threads (execution contexts) anyway it chooses. Being "managed" doesn't exclude it from having processes at all. In fact, what's really interesting about a fully managed system is that you don't necessarily need separate address spaces for processes (or sets of App Domains) since you can verify that the safe, managed code doesn't modify other applications or their data structures.

As far as paging/swapping, a managed OS can the same thing a traditional OS does - swap pages out to disk and swap them back in when needed. This is done at the page level and not by object instances.

Also, most discussion on managed OSes assume that there are multiple heaps (at least one per AppDomain or process) managed independently by Garbage Collection system from all the rest. In other words, there is no global garbage collection - as that would stop the system completely during a GC. Also, a managed OS could have different types of GC methods for each heap depending on the characteristics of the application.

I hope this clears up any confusion; but if not, we discuss managed OSes all the time via IRC in #mosa on freenode.net. Feel free to visit.

 

Jul 7, 2010 at 11:52 AM
Edited Jul 7, 2010 at 4:46 PM

>>I'm still puzzling with this one and I don't fully understand your reaction. I think that you mean that you let the compiler figure out what to swap and what not, is it not?

The compiler has no idea... it depends on the scheduler granting CPU if you have no CPU you can be swapped out.

>>Knowing what is not active so it can be swapped out is key for virtual memory. A number of language items could maintain meta data to determine if it's probable that they will be used soon and thus reloaded. Things like creation and access time and a usage number would incur a cost for memory and maintaining though. Yes this would be too expensive.. You could get the compiler to add a write/read barrier to base the information on but basically if you are on teh Run list you will need some memory. Note it is very similar to hw paged memory except for the granularity.

> - AppDomain (process) Swapping a whole domain is probably to course and could be impossible if more memory is needed for a single appdomain than is physically on the machine.

You can swap all the memory used out by the domain or better yet in large blocks ( eg 2 Meg) as desired ( the more conetention the more big blocks you move out obviosuly tasks with some blocks already out should be favoured) , also note here a GC also grabs large blocks from the system .

>- Functions Could be swapped to disk but code is normally to small to be of any benefit - Classes and structs Maintaining metadata for a class next to a vtable should be manageable and maintaining a table where all the objects of a class are tracked so they can be swapped out to disc. - indivual objects on the heap I think that maintaining metadata for every object would probably be to heavy. It could be done if the programmer opts in certain objects but relying on the despondence of the programmer for the stability of the system would be wrong. - indivual objects on the stack stack objects will probably be used sooner than later why would they be on the stack otherwise. I'm thinking along the lines of upgrading the GC to also let it move the objects to disk, that are less often used. It is almost impossible to do this .

To get an idea of how often objects and memory is used you need a read and write barrier ( compiler inserted) which is prohibitively expensive.

What im looking at in SOOOS is to break applications down into smaller reusable services . In the past we have used libraries but libraries have the issues of

  • They are difficult to reuse. A service with an interface can be used on any machine local or remote and avoids the lib versioning hell you get on Linux .
  • Code that fails in a lib forces the caller to die since its state is unknown

 

By breaking apps into smaller services we have

  • - no perf penalty on managed systems due to the low task switch costs
  • - increased reliability since a service can fail and not affect a caller . eg a browser add on cant stop a browser( Flash anyone) - services can be restarted , the result is a failure will cause part of an app not to work and when restarted start working again.
  • - Services can start multiple instances to process requests and be load balanced on cores or even machines. eg if a part of an app is hot ( say encryption) the system can fire up more services which get processed on more cores without any multi threaded code.
  • - Services with no CPU which are blocking can be swapped out of memory . IMHO this maybe be superior to the current paging which is broken. - Increased code reuse and more standardization .
  • - Improved security as "apps" become really small and have few rights.
  • - Easy to isolate problems to a service which can all log.
  • - Services can be local or in a cloud.. eg a Campus could have a huge compiler bank for all services which programs ( make files/ build scripts can use) , devs can run services local on the machine or in the cloud and can move them trivially. All this can be done now it will just become trivial to code.

Note there are still DLLs just a lot less and anyway its miles off and im still working on IPC. Obviously it uses Async IPC.

Jul 7, 2010 at 11:52 AM
Edited Jul 7, 2010 at 11:53 AM

duplicate