Lecture #9 | NSWI200

Lectures: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13.

Paging

This is the first of two self study modules that will look at virtual memory and paging. The goal of this module is to introduce the hardware details of how paging works on current processors. It may look like there is quite a lot of reading for this module, however, much of what is presented was already outlined to some degree in the Principles of Computers and Computer Systems lectures, feel free to skip the content you are already familiar with.

At the end of this module, you should be able to:

describe how paging provides each process with its own virtual address space,
describe how having its own virtual address space provides memory protection to processes,
recognize whether a virtual or a physical address is used at a particular software or hardware location,
identify individual components of a virtual and a physical address given a specific address,
compute a translation of a virtual address to a physical address given page table content,
distinguish the roles of software and hardware in individual address translation steps,
describe the performance benefits of address translation caching,
describe a step by step handling of an address translation fault,
all of that for common hierarchical page tables such as those of the Intel processors.

Address Spaces

Until now, we have been content with the assertion that each executing process has its own (virtual) address space, that is, each process is free to use whatever addresses it (or the operating system) sees fit for its code and data. Each pointer that a process uses contains an address that is local within the particular address space. This provides the processes with not only freedom but also security - since any address a process uses is always interpreted as local to the particular address space, there is simply no way a process can attempt to access other address spaces (not without asking the operating system, or using side channels that are out of scope for now, anyway).

The question is, how can we provide a distinct address space to each process when our computer really only has one memory with one set of addresses ? The answer is - by using address translation hardware.

Address Translation

In a processor without address translation, whenever a program uses a memory address (for example to read the next instruction to execute, or to read the data for such instruction), that same address appears on the address bus of the processor and is seen by the memory chips that provide the data.

In a processor with address translation, the addresses that programs (typically including the operating system) use are called virtual addresses, and when used, they are first translated by the address translation hardware into physical addresses. Only the translated addresses are sent out on the address bus and seen by the memory chips.

To provide a distinct address space for each process, the operating system configures the address translation hardware so that the virtual addresses of each address space get translated to different physical addresses. In other words, each memory block in the virtual address space is actually some block in the physical memory that the given virtual address range translates to, and since the translations are (mostly) distinct, then so are the address spaces.

Two details that we set aside for now:

What if we have more memory blocks in the virtual address spaces than we have physical memory ? In that case, the address translation hardware can make some virtual address ranges point “nowhere” and let the operating system decide what to do when a process accesses those ranges. The operating system can then for example juggle fewer physical memory blocks among a larger number of virtual memory blocks to provide the illusion of having more memory, in a way similar to how fewer processor cores are juggled between more threads to provide the illusion of concurrent execution.
What if we actually want to share some memory between address spaces ? That is also possible, and in fact often done for example with the kernel memory (rather than having its own address space, the operating system kernel typically occupies certain range in the address spaces of all processes, this makes passing data between processes and the kernel easier). When needed, selected virtual addresses from different address spaces can point to the same physical addresses, effectively leading to sharing.

Memory Pages

What should we imagine the address translation to be like ? It must be a simple enough operation so that the hardware can perform it efficiently (remember, caches excepting, the processor accesses memory pretty much all the time). It must also be flexible enough to permit remapping of reasonably small blocks of memory.

This is where the pages come in. When a virtual address is to be translated to a physical address, it is split into two parts (just imagine splitting a number by drawing a line between certain digits, only done in binary). The upper part of the address is said to refer to a particular page, and is translated. The lower part of the address is said to be an offset within a page, copied as is. The translation itself is a simple array lookup, the page number is used as an index to a paging table that contains the value to replace the page number with.

By choosing where to make the split between the page number and the page offset in the virtual address, we determine the size of the pages and the size of the paging table needed to translate all pages in an address space. A very common split is at 12 bits, leading to 2^12=4096 bytes page size.

This particular address translation mechanism is still too simple to be used in practice, but is good enough to get the basic idea. Please see Arpaci-Dusseau Section 18 Paging Introduction for a more detailed description, or skip to the next content if you are already familiar with this.

[Q] If we have 4kB pages, what is the page number and the offset for virtual address 0x12345678 ?

Hint ...

As explained above, we just need to draw the line between the page number and the offset parts of the virtual address. The address is already expressed in hexadecimal notation, each digit taking 4 bits. Count from the rightmost (least significant) digit.

Hexadecimal notation is what makes the task easy for humans, with addresses expressed in decimal notation, the conversion would be rather tedious.

On that note, observe that the hardware has a similarly easy task. Imagine each bit using one wire, splitting the address is therefore equivalent to simply splitting the wires in two groups, one transporting the page number and another transporting the offset.

Translation Caching

Paging tables are stored in memory alongside other operating system and process data. Since an address translation is required for each memory access, we need to access the paging tables for each memory access too, effectively multiplying the number of memory accesses performed. Since memory is already quite often the performance bottleneck, multiplying accesses is not really an option. This is why address translations are cached.

Please see Arpaci-Dusseau Section 19 Paging Faster Translation for details. Again, if you are already familiar with this content, skip it.

For a practical example, see the MIPS Processor Manual Section 4.1 Translation Lookaside Buffer and Section 4.2 Address Spaces. To check if you understand the content, just see if you could explain what is on Figure 4-2.

[Q] What happens to the address translation cache content when a context switch to a different process takes place ?

Hint ...

Address translation is obviously specific to particular process address space …

[Q] If MIPS were to translate virtual address 0x12345678 to physical address 0xABCD5678, what would be the value of fields VPN, PFN and Offset from Figure 4-2 mentioned above ?

Hint ...

This is really just filling numbers into boxes, just make sure to count bits correctly. Remember, in hexadecimal notation, each digit takes 4 bits.

Hierarchical Tables

Flat page tables are not practical because they need to be present in memory in their entirety. We therefore introduce hierarchical tables, where the virtual address is split not into a page number and a page offset, but into multiple levels of paging entry numbers and an offset. Each paging level is then resolved in a manner similar to the flat page tables.

Please see Arpaci-Dusseau Section 20 Paging Smaller Tables for details. Ignore 20.2 (Paging plus Segmentation) and 20.4 (Inverted Page Tables). Again, if you are already familiar with this content, skip it.

For a practical example, see the Intel Processor Manual Volume 3A Section 4.2 Hierarchical Paging Structures and Section 4.3 32 Bit Paging. To check if you understand the content, just see if you could explain what is on Figure 4-2.

[Q] The interim page table entries contain addresses of the next page table levels. Are these virtual addresses or physical addresses ?

Hint ...

To use a virtual address, the hardware needs to traverse the entire page table hierarchy. We do not really want a chicken-and-egg problem here, do we ?

[Q] If Intel were to translate virtual (linear) address 0x12345678 to physical address 0xABCD5678, what would be the value of fields Directory and Table and Offset from Figure 4-2 mentioned above ?

Hint ...

The same exercise as above, except now we are cutting the address in three pieces rather than two. And, unfortunately, this time one split does not take place at the boundary between two hexadecimal digits, so a bit of binary arithmetic is needed to obtain a hexadecimal result.

Eager For More ?

More self test questions ? We have a few of these too !

[Q] If we observe virtual address 0x12345678 translated to physical address 0x9ABCD678, what is the upper bound on the page size involved ?

Hint ...

The virtual address and the physical address must have the same offset, both in size and in value. Just be careful to consider bits, not bytes or nibbles.

[Q] Imagine you were inspecting the page tables of several processes and you have noticed that some of the entries are exactly the same among the processes (the same numerical values in all fields). What does that tell you about the address spaces ?

Hint ...

The most important part of each entry is the physical address (of the next table level or of the page) …