Linux Virtual Memory - Memory Types

2019-11-16 00:48

Paul Aiton

Tags:

Each entry in a process's virtual memory map can have several different properties that change how both humans and the kernel think about their page allocation and management. Understanding the different properties, and the contexts where they are defined and used, is necessary to understand when, why, and to/from where the kernel makes decisions about pages.

OBJECTIVE

Describe Contexts for defining terms.

Define the following terms:: Static or Dynamic,

Anonymous or File-Backed,

Variable or Constant,

Shared or Private,

Initialized or Uninitialized,

Summarize concepts

Prerequisites: Page, Page Frame, Page Boundary.

CONTEXT

To understand properties of virtual memory sections, we have to define 3 different contexts (though they can, and in future lessons will, be broken down further.) Compile-time, Load-time and Run-time.

Compile-time encompasses everything involved in turning program code into an executable file. If you're installing a package on your system from a distribution's package manager, all of compile time has already happened. If you are building a package from source, compile time is everything that happens between between downloading the source and copying it to it's final destination on the filesystem (usually make install ).

Load-time is everything that happens between one process sending the kernel an instruction to execute a different program and when the kernel first passes control of the CPU over to that process for its first machine instruction to run.

Run-time is everything after that; the time when the process is running.

STATIC OR DYNAMIC

Static is any memory whose size and mapping is determined before execution, and Dynamic is memory allocations of sizes not predetermined and whose virtual address mappings is variable.

If memory is mapped at compile-time, it's static. For example, a program has several character strings that are used for error handling, things like "Hey, that file doesn't exist", or an float constant for PI=3.14159, those pieces of data will be compiled into the program's binary executable file, and all of the references to a specific piece of data will exist as raw pointers in the program code.

If mapped at run-time, it's dynamic. Example here would be a process requests some memory from the kernel to hold user input, or space to hold the HTML you're viewing. The call to the kernel will return a pointer to the virtual memory address where that new page(s) is mapped. Because the program does not know at compile-time how big the block of memory will be or where in the virtual address space that memory will map, all references will be stored in variable pointers.

Load-time is a little more complicated. In general, all of the mappings created here are for static memory of predetermined sizes, but dynamically composed together into the virtual memory mapping. The dynamic linker (linking will get its own dedicated lesson in the future) will look at the executable file's metadata, resolve the required dependencies listed, and map their code and data into the virtual mapping. These dependencies are called shared objects, and are library files that contain their own code and data. If you come from the Windows world, DLL, or dynamic link libraries, are practically the same thing as shared objects. The gray area is that the virtual memory locations are dynamic, however the layout and size is static within the library itself, and the size remains the same in the process's map. Keep these in mind, we'll come back in just a minute.

ANONYMOUS OR FILE-BACKED

File-backed memory is probably going to be obvious, since it is primarily what we have been talking about so far; any virtual memory mapping that corresponds to data that exists in a file on the filesystem. When a program is loaded, the pages that are mapped from the executable are file backed. When the dynamic linker includes the shared objects in the virtual memory space, those pages are file backed. If a file is dynamically mapped at runtime by a process (calling the kernel,) those are file-backed.

Anonymous memory are pages not backed by a file (it has no file-name,) When your web browser allocated memory to hold the HTML you're viewing, it requested memory from the kernel, got a pointer, and uses that pointer to access that memory. If the program does not keep track of that pointer, then it loses the ability to refer to that memory, which is where memory leaks come from.

VARIABLE OR CONSTANT

Variable data is anything that can change during runtime. Constant data cannot. Read-only and Read/write are practically synonymous, but "constant vs variable" are usually used in programming languages ( like C ) and "read-only vs read-write" is more common for OS permissions.

It's at this point I need to explicitly mention that an executable file or shared library will contain multiple discrete sections. I've alluded to this before as "program code" and "data", but every section will have its own type and associated behavior. For instance, as a security measure, any executable program code is constant in Linux; new values cannot be written to it. Some architectures allow "Self-modifying code", but it's not allowed in Linux or most other modern operating systems. Data, which is not executable, can be both variable and constant, which we'll look at in a couple lessons.

SHARED OR PRIVATE

Private memory only exists in one process, and shared memory exists (or CAN exist) in multiple. This distinction is by itself not meaningful, and must be analyzed based on its effect with other properties of a mapping.

INITIALIZED OR UNINITIALIZED

Initialized data comes pre-populated with values, uninitialized is set to all zeroes. Generally if a program ever reads uninitialized data it's a bug. Initialized memory is always static, and dynamic memory is always uninitialized, but there exists one case where static memory is uninitialized: bss. Bss stands for "Block Started by Symbol". Yeah, it's a pretty bizarre name that is used for historical reasons, coming from a 1950's assembler made by an aircraft manufacturing corporation. Instead of having a series of many thousands of NULL bytes in an executable file, it just has a section that says "Allocate this many NULL bytes." It's considered static memory because the size and mapping are determined at compile time.

PUTTING IT ALL TOGETHER

The 5 attributes combine such that if you know the profile, you know what type of memory you're dealing with, and vice versa. If you look at the above it may look like there are \(2^5\) or 32 different combinations, but many times combinations wouldn't make any sense, such as constant uninitialized data. If you can't change it and it's set to zero, there's no point in taking up memory.

We are almost done with the abstract theory part of the tutorial, and can soon start using examples from the command line to show what I'm talking about. In the next post I'll briefly go over the different types of dynamic memory, and then we can inspect actual executable files to see the virtual memory mapping in practice.

If you haven't wrapped your head around this yet, don't panic. This was the most complicated piece so far, and this subject is REALLY complicated. Theory alone cannot bring about comprehension, and most professional administrators and developers don't get into this low level of detail. Keep going, work on the upcoming examples, and come back to these earlier lessons when you've had a chance to absorb everything.

Next in series     : Dynamic Memory
Previous in series : Paging pt.2
Series Index       : Linux Virtual Memory Management