How does compiler lay out code in memory
How does compiler lay out code in memory,memory,assembly,compiler-construction,operating-system,virtual-memory,Memory,Assembly,Compiler Construction,Operating System,Virtual Memory,Ok I have a bit of a noob student question. So I'm familiar with the fact that stacks contain subroutine calls, and heaps contain variable length data structures, and global static variables are assigned to permanant memory locations. But how does it all work on a less theoretical level? Does the compiler just assume it's got an entire memory region to itself from address 0 to address infinity? And then just start assigning stuff? And where does it layout the instructions, stack, and heap?
Ok I have a bit of a noob student question.
So I'm familiar with the fact that stacks contain subroutine calls, and heaps contain variable length data structures, and global static variables are assigned to permanant memory locations.
But how does it all work on a less theoretical level?
Does the compiler just assume it's got an entire memory region to itself from address 0 to address infinity? And then just start assigning stuff?
And where does it layout the instructions, stack, and heap? At the top of the memory region, end of memory region?
And how does this then work with virtual memory? The virtual memory is transparent to the program?
Sorry for a bajilion questions but I'm taking programming language structures and it keeps referring to these regions and I want to understand them on a more practical level.
THANKS much in advance!
A comprehensive explanation is probably beyond the scope of this forum. Entire texts are devoted to the subject. However, at a simplistic level you can look at it this way.
The compiler does not lay out the code in memory. It does assume it has the entire memory region to itself. The compiler generates object files where the symbols in the object files typically begin at offset 0.
The linker is responsible for pulling the object files together, linking symbols to their new offset location within the linked object and generating the executable file format.
The linker doesn't lay out code in memory either. It packages code and data into sections typically labeled
.text for the executable code instructions and
.data for things like global variables and string constants. (and there are other sections as well for different purposes) The linker may provide a hint to the operating system loader where to relocate symbols but the loader doesn't have to oblige.
It is the operating system loader that parses the executable file and decides where code and data are layed out in memory. The location of which depends entirely on the operating system. Typically the stack is located in a higher memory region than the program instructions and data and grows downward.
Each program is compiled/linked with the assumption it has the entire address space to itself. This is where virtual memory comes in. It is completely transparent to the program and managed entirely by the operating system.
Virtual memory typically ranges from address 0 and up to the max address supported by the platform (not infinity). This virtual address space is partitioned off by the operating system into kernel addressable space and user addressable space. Say on a hypothetical 32-bit OS, the addresses above
0x80000000 are reserved for the operating system and the addresses below are for use by the program. If the program tries to access memory above this partition it will be aborted.
The operating system may decide the stack starts at the highest addressable user memory and grows down with the program code located at a much lower address.
The location of the heap is typically managed by the run-time library against which you've built your program. It could live beginning with the next available address after your program code and data.
This is a wide open question with lots of topics.
Assuming the typical compiler -> assembler -> linker toolchain. The compiler doesnt know a whole lot, it simply encodes stack relative stuff, doesnt care how much or where the stack is, that is the purpose/beauty of a stack, dont care. The compiler generates assembler the assembler is assembled into an object, then the linker takes info linker script of some flavor or command line arguments that tell it the details of the memory space, when you
gcc hello.c -o hello
your installation of binutils has a default linker script which is tailored to your target (windows, mac, linux, whatever you are running on). And that script contains the info about where the program space starts, and then from there it knows where to start the heap (after the text, data and bss). The stack pointer is likely set either by that linker script and/or the os manages it some other way. And that defines your stack.
For an operating system with an mmu, which is what your windows and linux and mac and bsd laptop or desktop computers have, then yes each program is compiled assuming it has its own address space starting at 0x0000 that doesnt mean that the program is linked to start running at 0x0000, it depends on the operating system as to what that operating systems rules are, some start at 0x8000 for example.
For a desktop like application where it is somewhat a single linear address space from your programs perspective you will likely have .text first then either .data or .bss and then after all of that the heap will be aligned at some point after that. The stack however it is set is typically up high and works down but that can be processor and operating system specific. that stack is typically within the programs view of the world the top of its memory.
virtual memory is invisible to all of this the application normally doesnt know or care about virtual memory. if and when the application fetches an instruction or does a data tranfer it goes through hardware which is configured by the operating system and that converts between virtual and physical. If the mmu indicates a fault, meaning that space has not been mapped to a physical address, that can sometimes be intentional and then another use of the term "Virtual memory" applies. This second definition the operating system can then for example take some other chunk of memory, yours or someone elses, move that to hard disk for example, mark that other chunk as not being there, and then mark your chunk as having some ram then let you execute not knowing you were interrupted with some ram that you didnt know you had to take from someone else. Your application by design doesnt want to know any of this, it just wants to run, the operating system takes care of managing physical memory and the mmu that gives you a virtual (zero based) address space...
If you were to do a little bit of bare metal programming, without mmu stuff at first then later with, microcontroller, qemu, raspberry pi, beaglebone, etc you can get your hands dirty both with the compiler, linker script and configuring an mmu. I would use an arm or mips for this not x86, just to make your life easier, the overall big picture all translates directly across targets.
If you're compiling a bootloader, which has to start from scratch, you can assume you've got the entire memory for yourself.
On the other hand, if you're compiling an application, you can assume you've got the entire memory for yourself.
The minor difference is that in the first case, you have all physical memory for yourself. As a bootloader, there's nothing else in RAM yet. In the second case, there's an OS in memory, but it will (normally) set up virtual memory for you so that it appears you have the entire address space for yourself. Usuaully you still have to ask the OS for actual memory, though.
The latter does mean that the OS imposes some rules. E.g. the OS very much would like to know where the first instruction of your program is. A simple rule might be that your program always starts at address 0, so the C compiler could put
int main() there. The OS typically would like to know where the stack is, but this is already a more flexible rule. As far as "the heap" is concerned, the OS really couldn't care.
#4Can I ask a quick follow up? Is the underlying machinery for calling functions, creating their local variables and adding them to the stack included in the program when it's compiled? Or is the machinery a part of the operating system?
#5This is great. Exactly was looking for. Any recommendations on good sites/books on this stuff? I don't even know what to google. I don't mean parse trees but more on how all this stuff practically works at the low level when converting the parse tree to code.
#6A good place to start would be a google search for "windows memory layout". This link comes up on the first page and looks to be pretty informative: Anatomy of a Program in Memory. From here you could probably find some other concepts to google.
#7Thanks! One thing that was throwing me off was that you have to declare an address to store a variable in assembly so I was thinking the compiler/assembly generator just assumed that it had it's own full address space.
#8Thanks! So it's safe to assume you have the entire memory for yourself I see.