COMP2004 Programming Practice
2002 Summer School
Assignment 3 - Paging Simulation
Summary
DUE DATE: |
5:00pm Monday 11 February 2002 |
WEIGHT: |
20% |
SUBMISSION NAME: |
comp2004s.a3 |
MARKING TYPE: |
automated testing and hand marking |
TASK: |
Write a paging simulation |
Important Information
NOTE THAT THE ASSIGNMENT 3 SPEC HAS BEEN REVISED (ON 31/01/2002) SINCE
ITS ORIGINAL POSTING.
Sample input and output files are now available.
These are not comprehensive tests, instead they are mainly provided to
illustrate the input and output formats used by the main program, and to
show some simple cases of how each algorithm works. As always, you are
expected to thoroughly test your own code by creating your own test cases.
Background
As a program executes, instructions and data must be fetched from memory.
However, not all of the required instructions and data has to be stored in
memory all of the time. Instead, most of it can be stored on disk, and
then small sections can be loaded into memory as they are needed. This
allows the computer system to have a relatively small amount of memory
compared to the amount of disk storage, which is useful as memory is much
more expensive than disk storage.
The most basic element which can be requested from disk is a single byte. Each byte
on disk is uniquely identified by an address. The first byte on
disk has address 0, the second byte has address 1, the third has address 2,
and so on.
However, accessing information stored on disk is much slower than accessing
information stored in memory. Thus reading a single byte from disk is
extremely wasteful, and in practice it is never done. Instead, any time a
byte is required from disk, a block of bytes called a page is
read from disk. This page contains the byte required as well as many
others. All pages have the same size, and the number of bytes contained in
each page is called the page size. Originally pages were 512
bytes, but these days they can be up to 4kb or more.
Pages are assigned page numbers starting from 0 in the same way
as bytes are assigned addresses starting from 0. You can find out the page
number for a given address by dividing the address by the page size. That
is,
page number = address / page size
For example, if the page size is 512, then the address 1400 is contained in
page 2, the address 20 is in page 0, and the address 10240 is in page 20.
When a page is read from disk into memory, it is stored into a
frame. A frames is simply a place in memory which can store the
contents of a page from disk. A collection of frames is called a
cache. The number of frames in a cache is far less than would
be required to store all of the information required for the program being
run - recall from above that not all of this information will always be
available, but rather it will be loaded into frames as it is needed.
If a byte at a particular address is requested and the page containing that
byte is already stored in a frame of the cache, then no page needs to be loaded from
disk. This situation is called a cache hit.
On the other hand, if the page for a particular address is not stored in a
frame, then the page containing that address must be loaded into a frame
before the address can be used. This situation is called a cache
miss.
Initially, all of the frames are unused. If there are unused frames, then
a page can simply be loaded into one of these unused frames. However, if
all of the frames are being used, then a page currently occupying one of
the frames must be removed to make space for the new page. Once a frame
has been removed, the new page can be loaded into that frame.
An algorithm for deciding which frame should be removed to make space is
called a paging algorithm.
A byte at a particular address may either be read (ie. its value
is found) or written (ie. its value is changed). In both cases,
the page containing the byte must be in a frame before it can be read or
written.
However, if the value of a byte in a frame is changed (ie. written), then
the information contained in the frame in memory is different from the
information contained in the page on disk (since the value of a byte has
been changed). A frame is said to be dirty if it has been
written to at least once, otherwise it is said to be clean.
When a dirty frame is chosen to be removed to make space for a new page,
the contents of that frame must be written back to the page on disk so that
the changes made to the frame aren't lost. This operation is called
write-back. Frames which are clean contain data which is
identical to the corresponding page on disk, and so clean frames do not
need to use write-back.
When a cache miss occurs, the frame chosen to be removed may be either
clean or dirty. If it is clean, then the cache miss is said to be a
cache miss without write-back, but if it is dirty, then the
cache miss is said to be a cache miss with write-back.
What makes this system work is a property called
locality of reference which is exhibited by most software.
This is the high probability that the next byte needed
will be close to the previous one. This means that once a page has
been loaded into a frame there is a high likelihood that the
next byte will be from the same page.
Of course, eventually execution will pass from a particular
page and a new page will have to be loaded, e.g. due to another byte
being required.
Your task
-
Your task is to write a Cache class to support a trace driven
simulation of the above described paging situation. That is, you are
not to write a program, but rather you are to write a set of .h and .cc
files containing the definitions and code for a set of classes.
-
Your code must work with this main.cc file which will
be used during testing. This main.cc file will take care of input and
output, and will call various methods in your Cache class.
-
You are required to write two classes which will be used by the main
program. The first is called Cache
and must be defined in the Cache.h file. The second is called
CacheFactory and must be defined in the CacheFactory.h
file.
-
In order for these classes to work with the main.cc file, the classes must
support a particular interface, that is, certain methods must be available
and must have particular parameters and return types. These interfaces are
shown in these sample Cache.h and CacheFactory.h files. You should use these files
as a basis for your Cache and CacheFactory classes. That is, you are free
to add additional methods, variables, etc to these classes, but you should
not alter the methods shown in these interfaces. If you do, then your
Cache and/or CacheFactory classes may not work with the main.cc file.
-
You cannot use a different main.cc file or main() function in your
assignment - if you supply one, it will not be used. Your code must work
with the supplied main.cc file.
-
Your code will be given a sequence of disk accesses (ie. an address and an
indication of whether the access is a read or a write operation). Your
code must simulate the paging operations described above, and keep track of
the number of cache hits, cache misses without write-back and cache misses
with write-back which occur as a result.
-
Your code does not actually have to read/write pages to/from disk, instead,
your code should store which page number is stored in each frame, and
whether or not that frame is dirty. This is all the information which you
need to be able to find the number of cache hits, cache misses without
write-back and cache misses with write-back as the simulation progresses.
-
Your program should perform efficiently with respect to small page sizes
and many frames. You should choose your data structures carefully to
ensure that they support efficient lookups (while not being inefficient for
additions/deletions). For example, there are several areas in this
assignment spec where things are described as being lists. You don't have
to store these things using a list data structure, instead you should
choose to store them in whatever method is most appropriate and/or
efficient.
-
Your program should have a good class structure so that it can deal cleanly
with the different paging algorithms required.
How your code will be used
-
The main program reads in a trace file from standard input. It reads
various information which it then passes to various methods in your
classes. It then outputs various information from your classes to standard
output. Each of these will now be described.
Input - first line
-
The first line read indicates the type of paging algorithm which is to be
used, as well as some parameters of the simulation. It is passed
as a string to the CacheFactory::createCache() static function, which
returns a pointer to a newly created Cache object which implements the
specified type of simulation with the specified parameters.
-
The first line will have one of the following three formats:
- F <page size> <num frames>
- L <page size> <num frames>
- P <page size> <num frames> <lookahead> <check frames>
-
The first character indicates the type of paging algorithm: F for FIFO, L for
LRU, and P for Pipelined LRU. These algorithms are described in more
detail below in the "Paging algorithms" section.
-
Following that is an integer of type Cache::addr_t which indicates the
size of each page in bytes (greater than or equal to 1).
-
Following that is an integer of type Cache::page_t which indicates the
number of frames to be used by the cache (greater than or equal to 1).
-
When the first letter is P, there are two additional integers of type
Cache::page_t. These indicate the number of accesses to look ahead,
and the number of frames to check, respectively (for more detail, refer
to the "Pipelined LRU" paging algorithm section below). Both will be
greater than or equal to 1.
-
You should write code in the CacheFactory::createCache()
method which
parses this string in order to determine exactly which type of
simulation is to be performed, and the parameters of this simulation.
The method should then create an appropriate Cache object which
implements this type of simulation, and return a pointer to this Cache
object.
-
The reason for using the CacheFactory class is because how your Cache
objects are constructed is entirely up to you.
-
You may, for example,
have other classes which inherit from the Cache class, and your
CacheFactory::createCache() method may choose to create one of
these classes instead of an object of the Cache class.
-
Or you may want to pass particular parameters to the constructor of
your Cache class, or call methods on the Cache object after it has
been constructued.
-
All of these can be implemented in the
CacheFactory::createCache() method.
-
You don't need to be able to create or use objects of the CacheFactory
class. In this assignment, the only reason for using the CacheFactory
class is that it is a useful place to put the createCache() method. (In
a real project, though, you would probably want to be able to have
actual "factory objects" which are capable of constructing other objects.)
Input - following lines
-
All following lines contain
an address to be accessed, and an indication of whether this
access is a read or a write operation.
-
These following lines will each have one of two possible formats:
-
The first character indicates whether the access is a read or
write operation with 'r' or 'w'.
-
This is followed by an integer of type Cache::addr_t which indicates
the address being accessed (greater than or equal to 0).
-
When a read access is encountered, the read() method of
your Cache class is called and it is passed the address which is
being read.
-
When a write access is encountered, the write() method of
your Cache class is called and it is passed the address which is
being written.
Output
-
After each input line, and again at the end of the input, the main
program prints out three values which it obtains from your Cache
object.
-
These values are the number of cache hits, the number of cache
misses without write-back, and the number of cache misses with
write-back. They are obtained by calling the getHits(),
getMisses() and getMissesWB() methods of the Cache
class (respectively).
-
To summarise: your Cache class will have its read() and write()
methods called once in turn for each disk access in the simulation. When
all of the accesses have been performed, the flush() method
will be called. The getHits(), getMisses() and
getMissesWB() methods return the number of cache hits, cache
misses without write-back and cache misses with write-back (respectively).
Paging algorithms
-
You are required to implement three paging algorithms: First In First Out
(FIFO), Least Recently Used (LRU), and Pipelined LRU. Each will now be
described in detail.
First In First Out (FIFO)
-
In this rather simple minded strategy pages are brought in as required,
and if there is no free frame, the page chosen to be removed is the one
that has been in a frame for the longest.
-
Consider the example shown below. Suppose that at some point the cache
contains the pages 5, 1, 42, 30, 4 and 3, where page 5 was read in most
recently, and page 3 has been present in the cache the longest. The
frame containing page 30 is the only dirty frame. The page size
is 512 bytes and there are 6 frames in the cache, so this cache would
have initially been specified by the string "F 512 6".
-
Suppose now that the line w 2000 is read in by the main
program. This address is located in page 3. We search the cache and
discover that page 3 is already present a frame of the cache. Thus,
this operation is a cache hit. However, since we are performing a
write operation, the frame containing page 3 is now dirty, and so we
mark it as being dirty.
-
Suppose now that the line w 1492 is read in by the main
program. This address is located in page 2. We search the cache and
discover that page 2 is not present in any of the frames of the cache.
Thus, this operation is a cache miss. The FIFO algorithm must now
decide which frame is to be removed to make way for the new page (page
2). The FIFO algorithm chooses the page which has been in a frame for
the longest, which is page 3. So page 3 is removed from the cache.
But page 3 was marked as being dirty, so that means that a write-back
is required. Thus, the operation w 1492 is not only a cache
miss, but it is in fact a cache miss with write-back. After the frame
containing page 3 has been removed, page 2 is read into a frame and
placed at the front of the list (since it is now the most recently read
page). Since we are performing a write operation, the frame containing
page 2 is immediately marked as being dirty.
Least Recently Used (LRU)
-
The LRU paging algorithm is similar to the FIFO paging algorithm,
except that when a cache hit is encountered, the corresponding frame is
moved to the front of the list. In this way, the front of the list
always contains the frame which was most recently used, and the back of
the list contains the frame which was least recently used. Now when a
page must be removed, the frame at the back of the list is removed -
which is the frame which was least recently used (hence the name of the
algorithm).
-
Consider the example shown below. It is the same as the example shown
above, in that it has the same initial state, and the operations
performed are the same. Again, the page size
is 512 bytes and there are 6 frames in the cache, so this cache would
have initially been specified by the string "L 512 6".
-
Consider the first operation of writing to address 2000, located in
page 3. We again search the cache and find page 3 present in a frame.
Thus, we have a cache hit. However, this time the frame containing
page 3 is moved to the front of the list. It is also marked as being
dirty, the same as before.
-
Now consider the second operation of writing to address 1492, located
in page 2. We again search the cache and find page 2 not present.
Thus, we have a cache miss. Again, the LRU paging algorithm chooses to
remove the frame from the back of the list. In this case, the frame
contains page 4 and is clean. Thus when it is removed, no write-back
is needed, and this is a cache miss without write-back. As before,
page 2 is loaded into a frame at the front of the list and marked as
dirty.
-
Notice that in this case, we obtained a cache miss without write-back,
whereas the same situation applied to the FIFO paging algorithm
resulting in a cache miss with write-back.
Pipelined LRU
-
The Pipelined LRU paging algorithm is similar to the LRU paging algorithm,
except that it deliberately and explicitly attempts to avoid removing a
frame which contains a page that will be required again in the near
future.
-
For example, consider the above example shown for the LRU paging
algorithm. Now suppose that after doing the w 1492 operation,
the very next operation was r 2156. The address 2156 is in
page 4, but page 4 was just removed by the LRU paging algorithm! If we
had some way of knowing in advance that page 4 was going to be used again soon,
then we could have chosen to remove a different frame, such as the one
containing page 30.
-
In fact, it somewhat is possible to know the upcoming operations in
advance. Modern CPUs have a feature known as pipelining.
In loose terms, this is where more than one instruction is in the
process of being executed at any given time. This means that we can
search the list of upcoming instructions to see what addresses will be
used in the near future, and we can then attempt to avoid removing the
pages containing those addresses (if possible). This is what the
Pipelined LRU paging algorithm does - it "looks ahead" in the sequence
of read and write operations to see which pages are used in the near
future, and then attempts to avoid removing those pages. (A note to
any hardware purists - yes, I know that it's generally not possible to
"search" the pipelined instructions as I have described above. But
this is an assignment on programming, not hardware, and so this problem
is simple ignored.)
-
Earlier in the specification it was noted that two additional integers
were specified for the Pipelined LRU paging algorithm, being the
lookahead value and the number of frames to check. The
lookahead value (referred to as lookahead) is the number of read and write operations which should
be checked in advance. The number of frames to check (referred to as
check_frames) is the number of
frames from the back of the list which should be checked for possible
removal candidates.
-
When a frame must be found for removal, the last
check_frames frames in the cache should be examined, and the
first one found which is not required by the next lookahead
operations is the frame which should be removed.
-
If all of the frames
examined are required by the next lookahead operations, then
the frame to be removed is the one examined which is required latest by
the lookahead list.
-
At this point, you might like to have a look at the Pipelined LRU
examples below.
-
If the value of check_frames is larger than the number of
frames, then all of the frames should be searched.
-
You will need to keep a list of "pipelined" read/write operations,
which is usually called the lookahead list. Initially this
list is empty, but when a read/write operation is requested, it should
push that operation onto the front of the list. When there are
lookahead operations in the list, pushing an operation
should also cause an operation to be popped from the back of the list and
processed. This means that the list will only ever have at most
lookahead operations on it. (Note that you don't have to
store the lookahead list using a list data structure, but it's usually
easiest to think of it as a list.)
-
During the initial stage when there are less than lookahead
operations on the list, then no operations should be processed. This
means that there should be no cache hits, misses without writeback or
misses with writeback recorded, and during this stage the output of the
main.cc main program should be "0 0 0".
-
When there is no more input, the main.cc main program will call the
flush() method on your Cache object. This indicates to the
Cache object that there are no more read/write operations. At this
point, the Cache object should process each of the operations in the
lookahead list in turn as per normal, until all the operations have
been processed and the list is empty. Despite the lookahead list
containing less than lookahead operations, it should still
be checked in the normal way, as described above.
-
You might think about storing all of the read/write operations in a
large list, and then processing all of them when the flush()
method is called. You shouldn't do this for two reasons:
- The list of read/write operations may be very large, and
- You have to have the correct values of the number of cache
hits, misses without write-back and misses with write-back
throughout the course of the simulation, not just at the end of the
simulation.
Pipelined LRU - Example 1
Pipelined LRU - Example 2
-
This time we will consider an example similar to the previous one.
The only difference is that the value of check_frames is 2,
not 4.
-
The initial state of the cache is again
-
The current and next read/write operations are again:
- w 1492 (page 2)
- w 50 (page 0)
- w 15400 (page 30)
- r 2200 (page 4)
-
So the current operation being performed is a write to address 1492,
which is page 2. Page 2 is not in the cache (ie. a cache miss), so we
need to find a frame to remove.
-
We start with the last frame in the
cache, which contains page 4. However, page 4 appears in the lookahead
list of operations (the r 2200 operation), so we don't remove
this frame.
-
We now look at the next frame,
which contains page 30. Again, page 30 appears in the lookahead list
of operations (the w 15400 operation), so we don't remove this
frame.
-
However, now we have run out of frames to check. This is because the
value of check_frames is 2, and we have already checked 2
frames. So all of the frames which we have checked are required in the
near future by the lookahead list.
-
So now we must choose one of the examined frames to remove, so we will
either remove the frame containing page 4 or the frame containing page
30. According to the spec above, we choose the one which is needed
latest by the lookahead list. So we work from the start of the
lookahead list - first page 0 is needed, then page 30, then page 4. So
page 4 is first needed after page 30, so we remove the frame containing
page 4.
-
After
doing this, a frame containing page 2 is inserted at the front of the
list and is marked dirty. This makes the final state of the cache be
Pipelined LRU - Example 3
-
Consider the following lookahead list of next read/write operations:
- w 1492 (page 2)
- w 50 (page 0)
- w 15400 (page 30)
- w 15300 (page 30)
- w 180 (page 0)
- r 2200 (page 4)
- w 1292 (page 2)
-
Now suppose that the frames were searched and all of the frames were
needed in the near future. In this case, the page which should be
removed is page 4.
-
This is because when you search the lookahead list, you search from the
front of the list to find the first time each page is used. If
you were then to store pages in a list in order of when they were first
found, the list would be: 2, 0, 30, 4. The page to remove is the one
at the end of this list.
-
If you instead search from the back of the lookahead list, you will end
up choosing to remove the wrong frame.
Submission Information
-
Your submission must consist of one or more .cc files, one or more .h files, and
a Makefile for creating the program.
-
The program must be called pagesim
-
The program will be compiled with the command
make
-
Your submission must be placed in a tar file. In order to do this,
refer to the submission instructions
page.
-
Once you have created the tar file you need to submit it, which
is also described on the submission
instructions page.