COMP2004 Programming Practice
2002 Summer School

Assignment 3 - Paging Simulation


Summary

DUE DATE: 5:00pm Monday 11 February 2002
WEIGHT: 20%
SUBMISSION NAME: comp2004s.a3
MARKING TYPE: automated testing and hand marking
TASK: Write a paging simulation


Important Information

NOTE THAT THE ASSIGNMENT 3 SPEC HAS BEEN REVISED (ON 31/01/2002) SINCE ITS ORIGINAL POSTING.

Sample input and output files are now available. These are not comprehensive tests, instead they are mainly provided to illustrate the input and output formats used by the main program, and to show some simple cases of how each algorithm works. As always, you are expected to thoroughly test your own code by creating your own test cases.


Background

As a program executes, instructions and data must be fetched from memory. However, not all of the required instructions and data has to be stored in memory all of the time. Instead, most of it can be stored on disk, and then small sections can be loaded into memory as they are needed. This allows the computer system to have a relatively small amount of memory compared to the amount of disk storage, which is useful as memory is much more expensive than disk storage.

The most basic element which can be requested from disk is a single byte. Each byte on disk is uniquely identified by an address. The first byte on disk has address 0, the second byte has address 1, the third has address 2, and so on.

However, accessing information stored on disk is much slower than accessing information stored in memory. Thus reading a single byte from disk is extremely wasteful, and in practice it is never done. Instead, any time a byte is required from disk, a block of bytes called a page is read from disk. This page contains the byte required as well as many others. All pages have the same size, and the number of bytes contained in each page is called the page size. Originally pages were 512 bytes, but these days they can be up to 4kb or more.

Pages are assigned page numbers starting from 0 in the same way as bytes are assigned addresses starting from 0. You can find out the page number for a given address by dividing the address by the page size. That is,

page number = address / page size
For example, if the page size is 512, then the address 1400 is contained in page 2, the address 20 is in page 0, and the address 10240 is in page 20.

When a page is read from disk into memory, it is stored into a frame. A frames is simply a place in memory which can store the contents of a page from disk. A collection of frames is called a cache. The number of frames in a cache is far less than would be required to store all of the information required for the program being run - recall from above that not all of this information will always be available, but rather it will be loaded into frames as it is needed.

If a byte at a particular address is requested and the page containing that byte is already stored in a frame of the cache, then no page needs to be loaded from disk. This situation is called a cache hit.

On the other hand, if the page for a particular address is not stored in a frame, then the page containing that address must be loaded into a frame before the address can be used. This situation is called a cache miss.

Initially, all of the frames are unused. If there are unused frames, then a page can simply be loaded into one of these unused frames. However, if all of the frames are being used, then a page currently occupying one of the frames must be removed to make space for the new page. Once a frame has been removed, the new page can be loaded into that frame.

An algorithm for deciding which frame should be removed to make space is called a paging algorithm.

A byte at a particular address may either be read (ie. its value is found) or written (ie. its value is changed). In both cases, the page containing the byte must be in a frame before it can be read or written.

However, if the value of a byte in a frame is changed (ie. written), then the information contained in the frame in memory is different from the information contained in the page on disk (since the value of a byte has been changed). A frame is said to be dirty if it has been written to at least once, otherwise it is said to be clean.

When a dirty frame is chosen to be removed to make space for a new page, the contents of that frame must be written back to the page on disk so that the changes made to the frame aren't lost. This operation is called write-back. Frames which are clean contain data which is identical to the corresponding page on disk, and so clean frames do not need to use write-back.

When a cache miss occurs, the frame chosen to be removed may be either clean or dirty. If it is clean, then the cache miss is said to be a cache miss without write-back, but if it is dirty, then the cache miss is said to be a cache miss with write-back.

What makes this system work is a property called locality of reference which is exhibited by most software. This is the high probability that the next byte needed will be close to the previous one. This means that once a page has been loaded into a frame there is a high likelihood that the next byte will be from the same page. Of course, eventually execution will pass from a particular page and a new page will have to be loaded, e.g. due to another byte being required.


Your task


How your code will be used


Paging algorithms