Container classes are the solution to a
specific kind of code reuse problem. They are building blocks used to create
object-oriented programs – they make the internals of a program much
easier to construct.
A container class describes an object that holds other
objects. Container classes are so important that they were considered
fundamental to early object-oriented languages. In Smalltalk, for example,
programmers think of the language as the program translator together with the
class library, and a critical part of that library is the container classes. So
it became natural that C++ compiler vendors also include a container class
library. You’ll note that the vector was so useful that it was
introduced in its simplest form very early in this book.
Like many other early C++ libraries, early container class
libraries followed Smalltalk’s object-based hierarchy, which worked
well for Smalltalk, but turned out to be awkward and difficult to use in C++.
Another approach was required.
This chapter attempts to slowly work you into the concepts of
the C++ Standard Template Library (STL), which is a powerful library of
containers (as well as algorithms, but these are covered in the following
chapter). In the past, I have taught that there is a relatively small subset of
elements and ideas that you need to understand in order to get much of the
usefulness from the STL. Although this can be true it turns out that
understanding the STL more deeply is important to gain the full power of the
library. This chapter and the next probe into the STL containers and
algorithms.
If you don’t know how many objects you’re going to
need to solve a particular problem, or how long they will last, you also
don’t know how to store those objects. How can you know how much space to
create? You can’t, since that information isn’t known until run
time.
The solution to most problems in object-oriented design seems
flippant: you create another type of object. For the storage problem, the new
type of object holds other objects, or pointers to objects. Of course, you can
do the same thing with an array, but there’s more. This new type of
object, which is typically referred to in C++ as a container (also called
a collection in some languages), will expand itself whenever necessary to
accommodate everything you place inside it. So you don’t need to know how
many objects you’re going to hold in a collection. You just create a
collection object and let it take care of the details.
Fortunately, a good OOP language comes with a set of
containers as part of the package. In C++, it’s the Standard Template
Library (STL). In some libraries, a generic container is considered good enough
for all needs, and in others (C++ in particular) the library has different types
of containers for different needs: a vector for consistent access to all
elements, and a linked list for consistent insertion at all elements, for
example, so you can choose the particular type that fits your needs. These may
include sets, queues, hash tables, trees, stacks, etc.
All containers have some way to put things in and get things
out. The way that you place something into a container is fairly obvious.
There’s a function called “push” or “add” or a
similar name. Fetching things out of a container is not always as apparent; if
it’s an array-like entity such as a vector, you might be able to use an
indexing operator or function. But in many situations this doesn’t make
sense. Also, a single-selection function is restrictive. What if you want to
manipulate or compare a group of elements in the container?
The solution is an iterator, which is an object whose
job is to select the elements within a container and present them to the user of
the iterator. As a class, it also provides a level of abstraction. This
abstraction can be used to separate the details of the container from the code
that’s accessing that container. The container, via the iterator, is
abstracted to be simply a sequence. The iterator allows you to traverse that
sequence without worrying about the underlying structure – that is,
whether it’s a vector, a linked list, a stack or something else. This
gives you the flexibility to easily change the underlying data structure without
disturbing the code in your program.
From the design standpoint, all you really want is a sequence
that can be manipulated to solve your problem. If a single type of sequence
satisfied all of your needs, there’d be no reason to have different kinds.
There are two reasons that you need a choice of containers. First, containers
provide different types of interfaces and external behavior. A stack has a
different interface and behavior than that of a queue, which is different than
that of a set or a list. One of these might provide a more flexible solution to
your problem than the other. Second, different containers have different
efficiencies for certain operations. The best example is a vector and a list.
Both are simple sequences that can have identical interfaces and external
behaviors. But certain operations can have radically different costs. Randomly
accessing elements in a vector is a constant-time operation; it takes the same
amount of time regardless of the element you select. However, in a linked list
it is expensive to move through the list to randomly select an element, and it
takes longer to find an element if it is further down the list. On the other
hand, if you want to insert an element in the middle of a sequence, it’s
much cheaper in a list than in a vector. These and other operations have
different efficiencies depending upon the underlying structure of the sequence.
In the design phase, you might start with a list and, when tuning for
performance, change to a vector. Because of the abstraction via iterators, you
can change from one to the other with minimal impact on your code.
In the end, remember that a container is only a storage
cabinet to put objects in. If that cabinet solves all of your needs, it
doesn’t really matter how it is implemented (a basic concept with
most types of objects). If you’re working in a programming environment
that has built-in overhead due to other factors, then the cost difference
between a vector and a linked list might not matter. You might need only one
type of sequence. You can even imagine the “perfect” container
abstraction, which can automatically change its underlying implementation
according to the way it is used.
You will notice that this chapter does not contain exhaustive
documentation describing each of the member functions in each STL container.
Although I describe the member functions that I use, I’ve left the full
descriptions to others: there are at least two very good on-line sources of STL
documentation in HTML format that you can keep resident on your computer and
view with a Web browser whenever you need to look something up. The first is the
Dinkumware library (which covers the entire Standard C and C++ library)
mentioned at the beginning of this book section (page XXX). The second is the
freely-downloadable SGI STL and documentation, freely downloadable at
http://www.sgi.com/Technology/STL/. These should provide complete references
when you’re writing code. In addition, the STL books listed in Appendix XX
will provide you with other resources.
The C++
STL[16]
is a powerful library intended to satisfy the vast bulk of your needs for
containers and algorithms, but in a completely portable fashion. This means that
not only are your programs easier to port to other platforms, but that your
knowledge itself does not depend on the libraries provided by a particular
compiler vendor (and the STL is likely to be more tested and scrutinized than a
particular vendor’s library). Thus, it will benefit you greatly to look
first to the STL for containers and algorithms, before looking at
vendor-specific solutions.
A fundamental principle of software design is that all
problems can be simplified by introducing an extra level of indirection.
This simplicity is achieved in the STL using iterators to perform
operations on a data structure while knowing as little as possible about that
structure, thus producing data structure independence. With the STL, this means
that any operation that can be performed on an array of objects can also be
performed on an STL container of objects and vice versa. The STL containers work
just as easily with built-in types as they do with user-defined types. If you
learn the library, it will work on everything.
The drawback to this independence is that you’ll have to
take a little time at first getting used to the way things are done in the STL.
However, the STL uses a consistent pattern, so once you fit your mind around it,
it doesn’t change from one STL tool to another.
Consider an example using the STL set
class. A set will allow only one
of each object value to be inserted into itself. Here is a simple set
created to work with ints by providing int as the template
argument to set:
//: C04:Intset.cpp // Simple use of STL set #include <set> #include <iostream> using namespace std; int main() { set<int> intset; for(int i = 0; i < 25; i++) for(int j = 0; j < 10; j++) // Try to insert multiple copies: intset.insert(j); // Print to output: copy(intset.begin(), intset.end(), ostream_iterator<int>(cout, "\n")); } ///:~
The insert( ) member does all the work: it tries
putting the new element in and rejects it if it’s already there. Very
often the activities involved in using a set are simply insertion and a test to
see whether it contains the element. You can also form a union, intersection, or
difference of sets, and test to see if one set is a subset of another.
In this example, the values 0 - 9 are inserted into the set 25
times, and the results are printed out to show that only one of each of the
values is actually retained in the set.
The copy( ) function is actually the instantiation
of an STL template function, of which there are many. These template functions
are generally referred to as “the STL Algorithms” and will be the
subject of the following chapter. However, several of the algorithms are so
useful that they will be introduced in this chapter. Here, copy( )
shows the use of iterators. The set member functions
begin( ) and end( ) produce iterators as their return
values. These are used by copy( ) as beginning and ending points for
its operation, which is simply to move between the boundaries established by the
iterators and copy the elements to the third argument, which is also an
iterator, but in this case, a special type created for iostreams. This places
int objects on cout and separates them with a newline.
Because of its genericity, copy( ) is certainly
not restricted to printing on a stream. It can be used in virtually any
situation: it needs only three iterators to talk to. All of the algorithms
follow the form of copy( ) and simply manipulate iterators (the use
of iterators is the “extra level of indirection”).
Now consider taking the form of Intset.cpp and
reshaping it to display a list of the words used in a document. The solution
becomes remarkably simple.
//: C04:WordSet.cpp #include "../require.h" #include <string> #include <fstream> #include <iostream> #include <set> using namespace std; int main(int argc, char* argv[]) { requireArgs(argc, 1); ifstream source(argv[1]); assure(source, argv[1]); string word; set<string> words; while(source >> word) words.insert(word); copy(words.begin(), words.end(), ostream_iterator<string>(cout, "\n")); cout << "Number of unique words:" << words.size() << endl; } ///:~
The only substantive difference here is that string is
used instead of int. The words are pulled from a file, but everything
else is the same as in Intset.cpp. The operator>> returns a
whitespace-separated group of characters each time it is called, until
there’s no more input from the file. So it approximately breaks an input
stream up into words. Each string is placed in the set using
insert( ), and the copy( ) function is used to display
the results. Because of the way set is implemented (as a tree), the words
are automatically sorted.
Consider how much effort it would be to accomplish the same
task in C, or even in C++ without the STL.
The primary idea in the STL is the container (also
known as a collection), which is just what it sounds like: a place to
hold things. You need containers because objects are constantly marching in and
out of your program and there must be someplace to put them while they’re
around. You can’t make named local objects because in a typical program
you don’t know how many, or what type, or the lifetime of the objects
you’re working with. So you need a container that will expand whenever
necessary to fill your needs.
All the containers in the STL hold objects and expand
themselves. In addition, they hold your objects in a particular way. The
difference between one container and another is the way the objects are held and
how the sequence is created. Let’s start by looking at the simplest
containers.
A vector is a linear sequence that allows rapid random
access to its elements. However, it’s expensive to insert an element in
the middle of the sequence, and is also expensive when it allocates additional
storage. A deque is also a linear sequence, and it allows random access
that’s nearly as fast as vector, but it’s significantly
faster when it needs to allocate new storage, and you can easily add new
elements at either end (vector only allows the addition of elements at
its tail). A list the third type of basic linear sequence, but it’s
expensive to move around randomly and cheap to insert an element in the middle.
Thus list, deque and vector are very similar in their basic
functionality (they all hold linear sequences), but different in the cost of
their activities. So for your first shot at a program, you could choose any one,
and only experiment with the others if you’re tuning for
efficiency.
Many of the problems you set out to solve will only require a
simple linear sequence like a vector, deque or list. All
three have a member function push_back( ) which you use to insert a
new element at the back of the sequence (deque and list also have
push_front( )).
But now how do you retrieve those elements? With a
vector or deque, it is possible to use the indexing operator[
], but that doesn’t work with list. Since it would be nicest to
learn a single interface, we’ll often use the one defined for all STL
containers: the iterator.
An iterator is a class that abstracts the process of moving
through a sequence. It allows you to select each element of a sequence
without knowing the underlying structure of that sequence. This is a
powerful feature, partly because it allows us to learn a single interface that
works with all containers, and partly because it allows containers to be used
interchangeably.
One more observation and you’re ready for another
example. Even though the STL containers hold objects by value (that is, they
hold the whole object inside themselves) that’s probably not the way
you’ll generally use them if you’re doing object-oriented
programming. That’s because in OOP, most of the time you’ll create
objects on the heap with new and then upcast the address to the
base-class type, later manipulating it as a pointer to the base class. The
beauty of this is that you don’t worry about the specific type of object
you’re dealing with, which greatly reduces the complexity of your code and
increases the maintainability of your program. This process of upcasting is what
you try to do in OOP with polymorphism, so you’ll usually be using
containers of pointers.
Consider the classic “shape” example where shapes
have a set of common operations, and you have different types of shapes.
Here’s what it looks like using the STL vector to hold pointers to
various types of Shape created on the heap:
//: C04:Stlshape.cpp // Simple shapes w/ STL #include <vector> #include <iostream> using namespace std; class Shape { public: virtual void draw() = 0; virtual ~Shape() {}; }; class Circle : public Shape { public: void draw() { cout << "Circle::draw\n"; } ~Circle() { cout << "~Circle\n"; } }; class Triangle : public Shape { public: void draw() { cout << "Triangle::draw\n"; } ~Triangle() { cout << "~Triangle\n"; } }; class Square : public Shape { public: void draw() { cout << "Square::draw\n"; } ~Square() { cout << "~Square\n"; } }; typedef std::vector<Shape*> Container; typedef Container::iterator Iter; int main() { Container shapes; shapes.push_back(new Circle); shapes.push_back(new Square); shapes.push_back(new Triangle); for(Iter i = shapes.begin(); i != shapes.end(); i++) (*i)->draw(); // ... Sometime later: for(Iter j = shapes.begin(); j != shapes.end(); j++) delete *j; } ///:~
The creation of Shape, Circle, Square and
Triangle should be fairly familiar. Shape is a pure abstract base
class (because of the pure specifier =0) that defines the
interface for all types of shapes. The derived classes redefine the
virtual function draw( ) to perform the appropriate
operation. Now we’d like to create a bunch of different types of
Shape object, but where to put them? In an STL container, of course. For
convenience, this typedef:
typedef std::vector<Shape*> Container;
creates
an alias for a vector of Shape*, and this
typedef:
typedef Container::iterator Iter;
uses that alias to
create another one, for vector<Shape*>::iterator. Notice that the
container type name must be used to produce the appropriate iterator,
which is defined as a nested class. Although there are different types of
iterators (forward, bidirectional, reverse, etc., which will be explained later)
they all have the same basic interface: you can increment them with ++,
you can dereference them to produce the object they’re currently
selecting, and you can test them to see if they’re at the end of the
sequence. That’s what you’ll want to do 90% of the time. And
that’s what is done in the above example: after creating a container,
it’s filled with different types of Shape*. Notice that the upcast
happens as the Circle, Square or Rectangle pointer is added
to the shapes container, which doesn’t know about those specific
types but instead holds only Shape*. So as soon as the pointer is added
to the container it loses its specific identity and becomes an anonymous
Shape*. This is exactly what we want: toss them all in and let
polymorphism sort it out.
The first for loop creates an iterator and sets it to
the beginning of the sequence by calling the begin( ) member
function for the container. All containers have begin( ) and
end( ) member functions that produce an iterator selecting,
respectively, the beginning of the sequence and one past the end of the
sequence. To test to see if you’re done, you make sure you’re
!= to the iterator produced by end( ). Not < or
<=. The only test that works is !=. So it’s very common
to write a loop like:
for(Iter i = shapes.begin(); i != shapes.end(); i++)
This says: “take me through every element in the
sequence.”
What do you do with the iterator to produce the element
it’s selecting? You dereference it using (what else) the
‘*’ (which is actually an overloaded operator). What you get
back is whatever the container is holding. This container holds Shape*,
so that’s what *i produces. If you want to send a message to the
Shape, you must select that message with ->, so you write the
line:
(*i)->draw();
This calls the draw( ) function for the
Shape* the iterator is currently selecting. The parentheses are ugly but
necessary to produce the proper order of evaluation. As an alternative,
operator-> is defined so that you can say:
i->draw();
As they are destroyed or in other cases where the pointers are
removed, the STL containers do not call delete for the pointers
they contain. If you create an object on the heap with new and place its
pointer in a container, the container can’t tell if that pointer is also
placed inside another container. So the STL just doesn’t do anything about
it, and puts the responsibility squarely in your lap. The last lines in the
program move through and delete every object in the container so proper cleanup
occurs.
It’s very interesting to note that you can change the
type of container that this program uses with two lines. Instead of including
<vector>, you include <list>, and in the first
typedef you say:
typedef std::list<Shape*> Container;
instead of using a vector. Everything else goes
untouched. This is possible not because of an interface enforced by inheritance
(there isn’t any inheritance in the STL, which comes as a surprise when
you first see it), but because the interface is enforced by a convention adopted
by the designers of the STL, precisely so you could perform this kind of
interchange. Now you can easily switch between vector and list and
see which one works fastest for your
needs.
In the prior example, at the end of main( ), it
was necessary to move through the whole list and delete all the
Shape pointers.
for(Iter j = shapes.begin(); j != shapes.end(); j++) delete *j;
This highlights what could be seen as a flaw in the STL:
there’s no facility in any of the STL containers to automatically
delete the pointers they contain, so you must do it by hand. It’s
as if the assumption of the STL designers was that containers of pointers
weren’t an interesting problem, although I assert that it is one of the
more common things you’ll want to do.
Automatically deleting a pointer turns out to be a rather
aggressive thing to do because of the multiple membership problem. If a
container holds a pointer to an object, it’s not unlikely that pointer
could also be in another container. A pointer to an Aluminum object in a
list of Trash pointers could also reside in a list of Aluminum
pointers. If that happens, which list is responsible for cleaning up that object
– that is, which list “owns” the object?
This question is virtually eliminated if the object rather
than a pointer resides in the list. Then it seems clear that when the list is
destroyed, the objects it contains must also be destroyed. Here, the STL shines,
as you can see when creating a container of string objects. The following
example stores each incoming line as a string in a
vector<string>:
//: C04:StringVector.cpp // A vector of strings #include "../require.h" #include <string> #include <vector> #include <fstream> #include <iostream> #include <iterator> #include <sstream> using namespace std; int main(int argc, char* argv[]) { requireArgs(argc, 1); ifstream in(argv[1]); assure(in, argv[1]); vector<string> strings; string line; while(getline(in, line)) strings.push_back(line); // Do something to the strings... int i = 1; vector<string>::iterator w; for(w = strings.begin(); w != strings.end(); w++) { ostringstream ss; ss << i++; *w = ss.str() + ": " + *w; } // Now send them out: copy(strings.begin(), strings.end(), ostream_iterator<string>(cout, "\n")); // Since they aren't pointers, string // objects clean themselves up! } ///:~
Once the vector<string> called strings is
created, each line in the file is read into a string and put in the
vector:
while(getline(in, line)) strings.push_back(line);
The operation that’s being performed on this file is to
add line numbers. A stringstream provides easy conversion from an
int to a string of characters representing that
int.
Assembling string objects is quite easy, since
operator+ is overloaded. Sensibly enough, the iterator w can be
dereferenced to produce a string that can be used as both an rvalue and
an lvalue:
*w = ss.str() + ": " + *w;
The fact that you can assign back into the container via the
iterator may seem a bit surprising at first, but it’s a tribute to the
careful design of the STL.
Because the vector<string> contains the objects
themselves, a number of interesting things take place. First, no cleanup is
necessary. Even if you were to put addresses of the string objects as
pointers into other containers, it’s clear that strings is
the “master list” and maintains ownership of the objects.
Second, you are effectively using dynamic object creation, and
yet you never use new or delete! That’s because, somehow,
it’s all taken care of for you by the vector (this is non-trivial.
You can try to figure it out by looking at the header files for the STL –
all the code is there – but it’s quite an exercise). Thus your
coding is significantly cleaned up.
The limitation of holding objects instead of pointers inside
containers is quite severe: you can’t upcast from derived types, thus you
can’t use polymorphism. The problem with upcasting objects by value is
that they get sliced and converted until their type is completely changed into
the base type, and there’s no remnant of the derived type left. It’s
pretty safe to say that you never want to do
this.
The power of instantly creating a sequence of elements is
amazing, and it makes you realize how much time you’ve spent (or rather,
wasted) in the past solving this particular problem. For example, many utility
programs involve reading a file into memory, modifying the file and writing it
back out to disk. One might as well take the functionality in
StringVector.cpp and package it into a class for later reuse.
Now the question is: do you create a member object of type
vector, or do you inherit? A general guideline is to always prefer
composition (member objects) over inheritance, but with the STL this is often
not true, because there are so many existing algorithms that work with the STL
types that you may want your new type to be an STL type. So the list of
strings should also be a vector, thus inheritance is
desired.
//: C04:FileEditor.h // File editor tool #ifndef FILEEDITOR_H #define FILEEDITOR_H #include <string> #include <vector> #include <iostream> class FileEditor : public std::vector<std::string> { public: FileEditor(char* filename); void write(std::ostream& out = std::cout); }; #endif // FILEEDITOR_H ///:~
Note the careful avoidance of a global using namespace
std statement here, to prevent the opening of the std namespace to
every file that includes this header.
The constructor opens the file and reads it into the
FileEditor, and write( ) puts the vector of
string onto any ostream. Notice in write( ) that you
can have a default argument for a reference.
The implementation is quite simple:
//: C04:FileEditor.cpp {O} #include "FileEditor.h" #include "../require.h" #include <fstream> using namespace std; FileEditor::FileEditor(char* filename) { ifstream in(filename); assure(in, filename); string line; while(getline(in, line)) push_back(line); } // Could also use copy() here: void FileEditor::write(ostream& out) { for(iterator w = begin(); w != end(); w++) out << *w << endl; } ///:~
The functions from StringVector.cpp are simply
repackaged. Often this is the way classes evolve – you start by creating a
program to solve a particular application, then discover some commonly-used
functionality within the program that can be turned into a class.
The line numbering program can now be rewritten using
FileEditor:
//: C04:FEditTest.cpp //{L} FileEditor // Test the FileEditor tool #include "FileEditor.h" #include "../require.h" #include <sstream> using namespace std; int main(int argc, char* argv[]) { requireArgs(argc, 1); FileEditor file(argv[1]); // Do something to the lines... int i = 1; FileEditor::iterator w = file.begin(); while(w != file.end()) { ostringstream ss; ss << i++; *w = ss.str() + ": " + *w; w++; } // Now send them to cout: file.write(); } ///:~
Now the operation of reading the file is in the
constructor:
FileEditor file(argv[1]);
and writing happens in the single line (which defaults to
sending the output to cout):
file.write();
As mentioned earlier, the iterator is the abstraction that
allows a piece of code to be generic, and to work with different types of
containers without knowing the underlying structure of those containers.
Every container produces iterators. You must always be able to
say:
ContainerType::iterator ContainerType::const_iterator
to produce the types of the iterators produced by that
container. Every container has a begin( ) method that produces an
iterator indicating the beginning of the elements in the container, and an
end( ) method that produces an iterator which is the as the
past-the-end value of the container. If the container is
const¸ begin( ) and end( ) produce
const iterators.
Every iterator can be moved forward to the next element using
the operator++ (an iterator may be able to do more than this, as you
shall see, but it must at least support forward movement with
operator++).
The basic iterator is only guaranteed to be able to perform
== and != comparisons. Thus, to move an iterator it forward
without running it off the end you say something like:
while(it != pastEnd) { // Do something it++; }
Where pastEnd is the past-the-end value produced by the
container’s end( ) member function.
An iterator can be used to produce the element that it is
currently selecting within a container by dereferencing the iterator. This can
take two forms. If it is an iterator and f( ) is a member
function of the objects held in the container that the iterator is pointing
within, then you can say either:
(*it).f();
or
it->f();
Knowing this, you can create a template that works with any
container. Here, the apply( ) function template calls a member
function for every object in the container, using a pointer to member that is
passed as an argument:
//: C04:Apply.cpp // Using basic iterators #include <iostream> #include <vector> #include <iterator> using namespace std; template<class Cont, class PtrMemFun> void apply(Cont& c, PtrMemFun f) { typename Cont::iterator it = c.begin(); while(it != c.end()) { (it->*f)(); // Compact form ((*it).*f)(); // Alternate form it++; } } class Z { int i; public: Z(int ii) : i(ii) {} void g() { i++; } friend ostream& operator<<(ostream& os, const Z& z) { return os << z.i; } }; int main() { ostream_iterator<Z> out(cout, " "); vector<Z> vz; for(int i = 0; i < 10; i++) vz.push_back(Z(i)); copy(vz.begin(), vz.end(), out); cout << endl; apply(vz, &Z::g); copy(vz.begin(), vz.end(), out); } ///:~
Because operator-> is defined for STL iterators, it
can be used for pointer-to-member dereferencing (in the following chapter
you’ll learn a more elegant way to handle the problem of applying a member
function or ordinary function to every object in a container).
Much of the time, this is all you need to know about iterators
– that they are produced by begin( ) and end( ),
and that you can use them to move through a container and select elements. Many
of the problems that you solve, and the STL algorithms (covered in the next
chapter) will allow you to just flail away with the basics of iterators.
However, things can at times become more subtle, and in those cases you need to
know more about iterators. The rest of this section gives you the
details.
All containers must produce the basic iterator. A
container may also be reversible, which means that it can produce
iterators that move backwards from the end, as well as the iterators that move
forward from the beginning.
A reversible container has the methods rbegin( )
(to produce a reverse_iterator selecting the end) and rend( )
(to produce a reverse_iterator indicating “one past the
beginning”). If the container is const then rbegin( )
and rend( ) will produce const_reverse_iterators.
All the basic sequence containers vector, deque
and list are reversible containers. The following example uses
vector, but will work with deque and list as
well:
//: C04:Reversible.cpp // Using reversible containers #include "../require.h" #include <vector> #include <iostream> #include <fstream> #include <string> using namespace std; int main() { ifstream in("Reversible.cpp"); assure(in, "Reversible.cpp"); string line; vector<string> lines; while(getline(in, line)) lines.push_back(line); vector<string>::reverse_iterator r; for(r = lines.rbegin(); r != lines.rend(); r++) cout << *r << endl; } ///:~
You move backward through the container using the same syntax
as moving forward through a container with an ordinary iterator.
The associative containers set, multiset,
map and multimap are also reversible. Using iterators with
associative containers is a bit different, however, and will be delayed until
those containers are more fully
introduced.
The iterators are classified into different
“categories” which describe what they are capable of doing. The
order in which they are generally described moves from the categories with the
most restricted behavior to those with the most powerful behavior.
The only predefined implementations of input iterators are
istream_iterator and istreambuf_iterator, to read from an
istream. As you can imagine, an input iterator can only be dereferenced
once for each element that’s selected, just as you can only read a
particular portion of an input stream once. They can only move forward. There is
a special constructor to define the past-the-end value. In summary, you can
dereference it for reading (once only for each value), and move it
forward.
This is the complement of an input iterator, but for writing
rather than reading. The only predefined implementations of output iterators are
ostream_iterator and ostreambuf_iterator, to write to an
ostream, and the less-commonly-used raw_storage_iterator. Again,
these can only be dereferenced once for each written value, and they can only
move forward. There is no concept of a terminal past-the-end value for an output
iterator. Summarizing, you can dereference it for writing (once only for each
value) and move it forward.
The forward iterator contains all the functionality of both
the input iterator and the output iterator, plus you can dereference an iterator
location multiple times, so you can read and write to a value multiple times. As
the name implies, you can only move forward. There are no predefined iterators
that are only forward iterators.
The bidirectional iterator has all the functionality of the
forward iterator, and in addition it can be moved backwards one location at a
time using operator--.
Finally, the random-access iterator has all the functionality
of the bidirectional iterator plus all the functionality of a pointer (a pointer
is a random-access iterator). Basically, anything you can do with a
pointer you can do with a random-access iterator, including indexing with
operator[ ], adding integral values to a pointer to move it forward or
backward by a number of locations, and comparing one iterator to another with
<, >=, etc.
Why do you care about this categorization? When you’re
just using containers in a straightforward way (for example, just hand-coding
all the operations you want to perform on the objects in the container) it
usually doesn’t impact you too much. Things either work or they
don’t. The iterator categories become important when:
The STL has a predefined set of iterator classes that can be
quite handy. For example, you’ve already seen reverse_iterator
(produced by calling rbegin( ) and rend( ) for all the
basic containers).
The insertion iterators are necessary because some of
the STL algorithms – copy( ) for example – use the
assignment operator= in order to place objects in the destination
container. This is a problem when you’re using the algorithm to
fill the container rather than to overwrite items that are already in the
destination container. That is, when the space isn’t already there. What
the insert iterators do is change the implementation of the operator= so
that instead of doing an assignment, it calls a “push” or
“insert” function for that container, thus causing it to allocate
new space. The constructors for both back_insert_iterator and
front_insert_iterator take a basic sequence container object
(vector, deque or list) as their argument and produce an
iterator that calls push_back( ) or push_front( ),
respectively, to perform assignment. The shorthand functions
back_inserter( ) and front_inserter( ) produce the same
objects with a little less typing. Since all the basic sequence containers
support push_back( ), you will probably find yourself using
back_inserter( ) with some regularity.
The insert_iterator allows you to insert elements in
the middle of the sequence, again replacing the meaning of operator=, but
this time with insert( ) instead of one of the “push”
functions. The insert( ) member function requires an iterator
indicating the place to insert before, so the insert_iterator requires
this iterator in addition to the container object. The shorthand function
inserter( ) produces the same object.
The following example shows the use of the different types of
inserters:
//: C04:Inserters.cpp // Different types of iterator inserters #include <iostream> #include <vector> #include <deque> #include <list> #include <iterator> using namespace std; int a[] = { 1, 3, 5, 7, 11, 13, 17, 19, 23 }; template<class Cont> void frontInsertion(Cont& ci) { copy(a, a + sizeof(a)/sizeof(int), front_inserter(ci)); copy(ci.begin(), ci.end(), ostream_iterator<int>(cout, " ")); cout << endl; } template<class Cont> void backInsertion(Cont& ci) { copy(a, a + sizeof(a)/sizeof(int), back_inserter(ci)); copy(ci.begin(), ci.end(), ostream_iterator<int>(cout, " ")); cout << endl; } template<class Cont> void midInsertion(Cont& ci) { typename Cont::iterator it = ci.begin(); it++; it++; it++; copy(a, a + sizeof(a)/(sizeof(int) * 2), inserter(ci, it)); copy(ci.begin(), ci.end(), ostream_iterator<int>(cout, " ")); cout << endl; } int main() { deque<int> di; list<int> li; vector<int> vi; // Can't use a front_inserter() with vector frontInsertion(di); frontInsertion(li); di.clear(); li.clear(); backInsertion(vi); backInsertion(di); backInsertion(li); midInsertion(vi); midInsertion(di); midInsertion(li); } ///:~
Since vector does not support
push_front( ), it cannot produce a front_insertion_iterator.
However, you can see that vector does support the other two types of
insertion (even though, as you shall see later, insert( ) is not a
very efficient operation for vector).
You’ve already seen some use of the
ostream_iterator (an output iterator) in conjunction with
copy( ) to place the contents of a container on an output stream.
There is a corresponding istream_iterator (an input iterator) which
allows you to “iterate” a set of objects of a specified type from an
input stream. An important difference between ostream_iterator and
istream_iterator comes from the fact that an output stream doesn’t
have any concept of an “end,” since you can always just keep writing
more elements. However, an input stream eventually terminates (for example, when
you reach the end of a file) so there needs to be a way to represent that. An
istream_iterator has two constructors, one that takes an istream
and produces the iterator you actually read from, and the other which is the
default constructor and produces an object which is the past-the-end sentinel.
In the following program this object is named end:
//: C04:StreamIt.cpp // Iterators for istreams and ostreams #include "../require.h" #include <iostream> #include <fstream> #include <vector> #include <string> using namespace std; int main() { ifstream in("StreamIt.cpp"); assure(in, "StreamIt.cpp"); istream_iterator<string> init(in), end; ostream_iterator<string> out(cout, "\n"); vector<string> vs; copy(init, end, back_inserter(vs)); copy(vs.begin(), vs.end(), out); *out++ = vs[0]; *out++ = "That's all, folks!"; } ///:~
When in runs out of input (in this case when the end of
the file is reached) then init becomes equivalent to end and the
copy( ) terminates.
Because out is an
ostream_iterator<string>, you can simply assign any string
object to the dereferenced iterator using operator= and that
string will be placed on the output stream, as seen in the two
assignments to out. Because out is defined with a newline as its
second argument, these assignments also cause a newline to be inserted along
with each assignment.
While it is possible to create an
istream_iterator<char> and ostream_iterator<char>,
these actually parse the input and thus will for example automatically
eat whitespace (spaces, tabs and newlines), which is not desirable if you want
to manipulate an exact representation of an istream. Instead, you can use
the special iterators istreambuf_iterator and ostreambuf_iterator,
which are designed strictly to move
characters[17]. Although these are templates,
the only template arguments they will accept are either char or
wchar_t (for wide characters). The following example allows you to
compare the behavior of the stream iterators vs. the streambuf
iterators:
//: C04:StreambufIterator.cpp // istreambuf_iterator & ostreambuf_iterator #include "../require.h" #include <iostream> #include <fstream> #include <iterator> #include <algorithm> using namespace std; int main() { ifstream in("StreambufIterator.cpp"); assure(in, "StreambufIterator.cpp"); // Exact representation of stream: istreambuf_iterator<char> isb(in), end; ostreambuf_iterator<char> osb(cout); while(isb != end) *osb++ = *isb++; // Copy 'in' to cout cout << endl; ifstream in2("StreambufIterator.cpp"); // Strips white space: istream_iterator<char> is(in2), end2; ostream_iterator<char> os(cout); while(is != end2) *os++ = *is++; cout << endl; } ///:~
The stream iterators use the parsing defined by
istream::operator>>, which is probably not
what you want if you
are parsing characters directly – it’s fairly rare that you would
want all the whitespace stripped out of your character stream. You’ll
virtually always want to use a streambuf iterator when using characters and
streams, rather than a stream iterator. In addition,
istream::operator>> adds significant overhead for each operation,
so it is only appropriate for higher-level operations such as parsing
floating-point numbers.[18]
This is a little more esoteric and is generally used in the
implementation of other Standard Library functions, but it is nonetheless
interesting. The raw_storage_iterator is defined in
<algorithm> and is an output iterator. It is provided to enable
algorithms to store their results into uninitialized memory. The interface is
quite simple: the constructor takes an output iterator that is pointing to the
raw memory (thus it is typically a pointer) and the operator= assigns an
object into that raw memory. The template parameters are the type of the output
iterator pointing to the raw storage, and the type of object that will be
stored. Here’s an example which creates Noisy objects (you’ll
be introduced to the Noisy class shortly; it’s not necessary to
know its details for this example):
//: C04:RawStorageIterator.cpp // Demonstrate the raw_storage_iterator #include "Noisy.h" #include <iostream> #include <iterator> #include <algorithm> using namespace std; int main() { const int quantity = 10; // Create raw storage and cast to desired type: Noisy* np = (Noisy*)new char[quantity * sizeof(Noisy)]; raw_storage_iterator<Noisy*, Noisy> rsi(np); for(int i = 0; i < quantity; i++) *rsi++ = Noisy(); // Place objects in storage cout << endl; copy(np, np + quantity, ostream_iterator<Noisy>(cout, " ")); cout << endl; // Explicit destructor call for cleanup: for(int j = 0; j < quantity; j++) (&np[j])->~Noisy(); // Release raw storage: delete (char*)np; } ///:~
To make the raw_storage_iterator template happy, the
raw storage must be of the same type as the objects you’re creating.
That’s why the pointer from the new array of char is cast to a
Noisy*. The assignment operator forces the objects into the raw storage
using the copy-constructor. Note that the explicit destructor call must be made
for proper cleanup, and this also allows the objects to be deleted one at a time
during container manipulation.
If you take a step back from the STL containers you’ll
see that there are really only two types of container: sequences
(including vector, list, deque, stack, queue,
and priority_queue) and associations (including set,
multiset, map and multimap). The sequences keep the objects
in whatever sequence that you establish (either by pushing the objects on the
end or inserting them in the middle).
Since all the sequence containers have the same basic goal (to
maintain your order) they seem relatively interchangeable. However, they differ
in the efficiency of their operations, so if you are going to manipulate a
sequence in a particular fashion you can choose the appropriate container for
those types of manipulations. The “basic” sequence containers are
vector, list and deque – these actually have
fleshed-out implementations, while stack, queue and
priority_queue are built on top of the basic sequences, and represent
more specialized uses rather than differences in underlying structure
(stack, for example, can be implemented using a deque,
vector or list).
So far in this book I have been using vector as a
catch-all container. This was acceptable because I’ve only used the
simplest and safest operations, primarily push_back( ) and
operator[ ]. However, when you start making more sophisticated uses of
containers it becomes important to know more about their underlying
implementations and behavior, so you can make the right choices (and, as
you’ll see, stay out of trouble).
Using a template, the following example shows the operations
that all the basic sequences (vector, deque or list)
support. As you shall learn in the sections on the specific sequence containers,
not all of these operations make sense for each basic sequence, but they are
supported.
//: C04:BasicSequenceOperations.cpp // The operations available for all the // basic sequence Containers. #include <iostream> #include <vector> #include <deque> #include <list> using namespace std; template<typename Container> void print(Container& c, char* s = "") { cout << s << ":" << endl; if(c.empty()) { cout << "(empty)" << endl; return; } typename Container::iterator it; for(it = c.begin(); it != c.end(); it++) cout << *it << " "; cout << endl; cout << "size() " << c.size() << " max_size() "<< c.max_size() << " front() " << c.front() << " back() " << c.back() << endl; } template<typename ContainerOfInt> void basicOps(char* s) { cout << "------- " << s << " -------" << endl; typedef ContainerOfInt Ci; Ci c; print(c, "c after default constructor"); Ci c2(10, 1); // 10 elements, values all 1 print(c2, "c2 after constructor(10,1)"); int ia[] = { 1, 3, 5, 7, 9 }; const int iasz = sizeof(ia)/sizeof(*ia); // Initialize with begin & end iterators: Ci c3(ia, ia + iasz); print(c3, "c3 after constructor(iter,iter)"); Ci c4(c2); // Copy-constructor print(c4, "c4 after copy-constructor(c2)"); c = c2; // Assignment operator print(c, "c after operator=c2"); c.assign(10, 2); // 10 elements, values all 2 print(c, "c after assign(10, 2)"); // Assign with begin & end iterators: c.assign(ia, ia + iasz); print(c, "c after assign(iter, iter)"); cout << "c using reverse iterators:" << endl; typename Ci::reverse_iterator rit = c.rbegin(); while(rit != c.rend()) cout << *rit++ << " "; cout << endl; c.resize(4); print(c, "c after resize(4)"); c.push_back(47); print(c, "c after push_back(47)"); c.pop_back(); print(c, "c after pop_back()"); typename Ci::iterator it = c.begin(); it++; it++; c.insert(it, 74); print(c, "c after insert(it, 74)"); it = c.begin(); it++; c.insert(it, 3, 96); print(c, "c after insert(it, 3, 96)"); it = c.begin(); it++; c.insert(it, c3.begin(), c3.end()); print(c, "c after insert(" "it, c3.begin(), c3.end())"); it = c.begin(); it++; c.erase(it); print(c, "c after erase(it)"); typename Ci::iterator it2 = it = c.begin(); it++; it2++; it2++; it2++; it2++; it2++; c.erase(it, it2); print(c, "c after erase(it, it2)"); c.swap(c2); print(c, "c after swap(c2)"); c.clear(); print(c, "c after clear()"); } int main() { basicOps<vector<int> >("vector"); basicOps<deque<int> >("deque"); basicOps<list<int> >("list"); } ///:~
The first function template, print( ),
demonstrates the basic information you can get from any sequence container:
whether it’s empty, its current size, the size of the largest possible
container, the element at the beginning and the element at the end. You can also
see that every container has begin( ) and end( ) methods
that return iterators.
The basicOps( ) function tests everything else
(and in turn calls print( )), including a variety of constructors:
default, copy-constructor, quantity and initial value, and beginning and ending
iterators. There’s an assignment operator= and two kinds of
assign( ) member functions, one which takes a quantity and initial
value and the other which take a beginning and ending iterator.
All the basic sequence containers are reversible containers,
as shown by the use of the rbegin( ) and rend( ) member
functions. A sequence container can be resized, and the entire contents of the
container can be removed with clear( ).
Using an iterator to indicate where you want to start
inserting into any sequence container, you can insert( ) a single
element, a number of elements that all have the same value, and a group of
elements from another container using the beginning and ending iterators of that
group.
To erase( ) a single element from the middle, use
an iterator; to erase( ) a range of elements, use a pair of
iterators. Notice that since a list only supports bidirectional
iterators, all the iterator motion must be performed with increments and
decrements (if the containers were limited to vector and deque,
which produce random-access iterators, then operator+ and
operator- could have been used to move the iterators in big
jumps).
Although both list and deque support
push_front( ) and pop_front( ), vector does not,
so the only member functions that work with all three are
push_back( ) and pop_back( ).
The naming of the member function swap( ) is a
little confusing, since there’s also a non-member swap( )
algorithm that switches two elements of a container. The member
swap( ), however, swaps everything in one container for
another (if the containers hold the same type), effectively swapping the
containers themselves. There’s also a non-member version of this
function.
The following sections on the sequence containers discuss the
particulars of each type of container.
The vector is intentionally made to look like a
souped-up array, since it has array-style indexing but also can expand
dynamically. vector is so fundamentally useful that it was introduced in
a very primitive way early in this book, and used quite regularly in previous
examples. This section will give a more in-depth look at
vector.
To achieve maximally-fast indexing and iteration, the
vector maintains its storage as a single contiguous array of objects.
This is a critical point to observe in understanding the behavior of
vector. It means that indexing and iteration are lighting-fast, being
basically the same as indexing and iterating over an array of objects. But it
also means that inserting an object anywhere but at the end (that is, appending)
is not really an acceptable operation for a vector. It also means that
when a vector runs out of pre-allocated storage, in order to maintain its
contiguous array it must allocate a whole new (larger) chunk of storage
elsewhere and copy the objects to the new storage. This has a number of
unpleasant side effects.
A vector starts by grabbing a block of storage, as if
it’s taking a guess at how many objects you plan to put in it. As long as
you don’t try to put in more objects than can be held in the initial block
of storage, everything is very rapid and efficient (note that if you do
know how many objects to expect, you can pre-allocate storage using
reserve( )). But eventually you will put in one too many objects
and, unbeknownst to you, the vector responds by:
For complex objects, this copy-construction and
destruction can end up being very expensive if you overfill your vector a lot.
To see what happens when you’re filling a vector, here is a class
that prints out information about its creations, destructions, assignments and
copy-constructions:
//: C04:Noisy.h // A class to track various object activities #ifndef NOISY_H #define NOISY_H #include <iostream> class Noisy { static long create, assign, copycons, destroy; long id; public: Noisy() : id(create++) { std::cout << "d[" << id << "]"; } Noisy(const Noisy& rv) : id(rv.id) { std::cout << "c[" << id << "]"; copycons++; } Noisy& operator=(const Noisy& rv) { std::cout << "(" << id << ")=[" << rv.id << "]"; id = rv.id; assign++; return *this; } friend bool operator<(const Noisy& lv, const Noisy& rv) { return lv.id < rv.id; } friend bool operator==(const Noisy& lv, const Noisy& rv) { return lv.id == rv.id; } ~Noisy() { std::cout << "~[" << id << "]"; destroy++; } friend std::ostream& operator<<(std::ostream& os, const Noisy& n) { return os << n.id; } friend class NoisyReport; }; struct NoisyGen { Noisy operator()() { return Noisy(); } }; // A singleton. Will automatically report the // statistics as the program terminates: class NoisyReport { static NoisyReport nr; NoisyReport() {} // Private constructor public: ~NoisyReport() { std::cout << "\n-------------------\n" << "Noisy creations: " << Noisy::create << "\nCopy-Constructions: " << Noisy::copycons << "\nAssignments: " << Noisy::assign << "\nDestructions: " << Noisy::destroy << std::endl; } }; // Because of these this file can only be used // in simple test situations. Move them to a // .cpp file for more complex programs: long Noisy::create = 0, Noisy::assign = 0, Noisy::copycons = 0, Noisy::destroy = 0; NoisyReport NoisyReport::nr; #endif // NOISY_H ///:~
Each Noisy object has its own identifier, and there are
static variables to keep track of all the creations, assignments (using
operator=), copy-constructions and destructions. The id is
initialized using the create counter inside the default constructor; the
copy-constructor and assignment operator take their id values from the
rvalue. Of course, with operator= the lvalue is already an initialized
object so the old value of id is printed before it is overwritten with
the id from the rvalue.
In order to support certain operations like sorting and
searching (which are used implicitly by some of the containers), Noisy
must have an operator< and operator==. These simply compare the
id values. The operator<< for ostream follows the
standard form and simply prints the id.
NoisyGen produces a function object (since it has an
operator( )) that is used to automatically generate Noisy
objects during testing.
NoisyReport is a type of class called a
singleton, which is a “design pattern” (these are covered
more fully in Chapter XX). Here, the goal is to make sure there is one and only
one NoisyReport object, because it is responsible for printing out the
results at program termination. It has a private constructor so no one
else can make a NoisyReport object, and a single static instance of
NoisyReport called nr. The only executable statements are in the
destructor, which is called as the program exits and the static destructors are
called; this destructor prints out the statistics captured by the static
variables in Noisy.
The one snag to this header file is the inclusion of the
definitions for the statics at the end. If you include this header in
more than one place in your project, you’ll get multiple-definition errors
at link time. Of course, you can put the static definitions in a
separate cpp file and link it in, but that is less convenient, and since
Noisy is just intended for quick-and-dirty experiments the header file
should be reasonable for most situations.
Using Noisy.h, the following program will show the
behaviors that occur when a vector overflows its currently allocated
storage:
//: C04:VectorOverflow.cpp // Shows the copy-construction and destruction // That occurs when a vector must reallocate // (It maintains a linear array of elements) #include "Noisy.h" #include "../require.h" #include <vector> #include <iostream> #include <string> #include <cstdlib> using namespace std; int main(int argc, char* argv[]) { requireArgs(argc, 1); int size = 1000; if(argc >= 2) size = atoi(argv[1]); vector<Noisy> vn; Noisy n; for(int i = 0; i < size; i++) vn.push_back(n); cout << "\n cleaning up \n"; } ///:~
You can either use the default value of 1000, or use your own
value by putting it on the command-line.
When you run this program, you’ll see a single default
constructor call (for n), then a lot of copy-constructor calls, then some
destructor calls, then some more copy-constructor calls, and so on. When the
vector runs out of space in the linear array of bytes it has allocated, it must
(to maintain all the objects in a linear array, which is an essential part of
its job) get a bigger piece of storage and move everything over, copying first
and then destroying the old objects. You can imagine that if you store a lot of
large and complex objects, this process could rapidly become
prohibitive.
There are two solutions to this problem. The nicest one
requires that you know beforehand how many objects you’re going to make.
In that case you can use reserve( ) to tell the vector how much
storage to pre-allocate, thus eliminating all the copies and destructions and
making everything very fast (especially random access to the objects with
operator[ ]). Note that the use of reserve( ) is different
from using the vector constructor with an integral first argument; the
latter initializes each element using the default copy-constructor.
However, in the more general case you won’t know how
many objects you’ll need. If vector reallocations are slowing
things down, you can change sequence containers. You could use a list,
but as you’ll see, the deque allows speedy insertions at
either end of the sequence, and never needs to copy or destroy objects as it
expands its storage. The deque also allows random access with
operator[ ], but it’s not quite as fast as vector’s
operator[ ]. So in the case where you’re creating all your
objects in one part of the program and randomly accessing them in another, you
may find yourself filling a deque, then creating a vector from the
deque and using the vector for rapid indexing. Of course, you
don’t want to program this way habitually, just be aware of these issues
(avoid premature optimization).
There is a darker side to vector’s reallocation
of memory, however. Because vector keeps its objects in a nice, neat
array (allowing, for one thing, maximally-fast random access), the iterators
used by vector are generally just pointers. This is a good thing –
of all the sequence containers, these pointers allow the fastest selection and
manipulation. However, consider what happens when you’re holding onto an
iterator (i.e. a pointer) and then you add the one additional object that causes
the vector to reallocate storage and move it elsewhere. Your pointer is
now pointing off into nowhere:
//: C04:VectorCoreDump.cpp // How to break a program using a vector #include <vector> #include <iostream> using namespace std; int main() { vector<int> vi(10, 0); ostream_iterator<int> out(cout, " "); copy(vi.begin(), vi.end(), out); vector<int>::iterator i = vi.begin(); cout << "\n i: " << long(i) << endl; *i = 47; copy(vi.begin(), vi.end(), out); // Force it to move memory (could also just add // enough objects): vi.resize(vi.capacity() + 1); // Now i points to wrong memory: cout << "\n i: " << long(i) << endl; cout << "vi.begin(): " << long(vi.begin()); *i = 48; // Access violation } ///:~
If your program is breaking mysteriously, look for places
where you hold onto an iterator while adding more objects to a vector.
You’ll need to get a new iterator after adding elements, or use
operator[ ] instead for element selections. If you combine the above
observation with the awareness of the potential expense of adding new objects to
a vector, you may conclude that the safest way to use one is to fill it
up all at once (ideally, knowing first how many objects you’ll need) and
then just use it (without adding more objects) elsewhere in the program. This is
the way vector has been used in the book up to this point.
You may observe that using vector as the
“basic” container in the earlier chapters of this book may not be
the best choice in all cases. This is a fundamental issue in containers, and in
data structures in general: the “best” choice varies according to
the way the container is used. The reason vector has been the
“best” choice up until now is that it looks a lot like an array, and
was thus familiar and easy for you to adopt. But from now on it’s also
worth thinking about other issues when choosing
containers.
The vector is most efficient if:
It is possible to insert and
erase elements from the middle of a vector using an iterator, but the
following program demonstrates what a bad idea it is:
//: C04:VectorInsertAndErase.cpp // Erasing an element from a vector #include "Noisy.h" #include <iostream> #include <vector> #include <algorithm> using namespace std; int main() { vector<Noisy> v; v.reserve(11); cout << "11 spaces have been reserved" << endl; generate_n(back_inserter(v), 10, NoisyGen()); ostream_iterator<Noisy> out(cout, " "); cout << endl; copy(v.begin(), v.end(), out); cout << "Inserting an element:" << endl; vector<Noisy>::iterator it = v.begin() + v.size() / 2; // Middle v.insert(it, Noisy()); cout << endl; copy(v.begin(), v.end(), out); cout << "\nErasing an element:" << endl; // Cannot use the previous value of it: it = v.begin() + v.size() / 2; v.erase(it); cout << endl; copy(v.begin(), v.end(), out); cout << endl; } ///:~
When you run the program you’ll see that the call to
reserve( ) really does only allocate storage – no constructors
are called. The generate_n( ) call is pretty busy: each call to
NoisyGen::operator( ) results in a construction, a copy-construction
(into the vector) and a destruction of the temporary. But when an object
is inserted into the vector in the middle, it must shove everything down
to maintain the linear array and – since there is enough space – it
does this with the assignment operator (if the argument of
reserve( ) is 10 instead of eleven then it would have to reallocate
storage). When an object is erased from the vector, the assignment
operator is once again used to move everything up to cover the place that is
being erased (notice that this requires that the assignment operator properly
cleans up the lvalue). Lastly, the object on the end of the array is
deleted.
You can imagine how enormous the overhead can become if
objects are inserted and removed from the middle of a vector if the
number of elements is large and the objects are complicated. It’s
obviously a practice to avoid.
The deque (double-ended-queue, pronounced
“deck”) is the basic sequence container optimized for adding and
removing elements from either end. It also allows for reasonably fast random
access – it has an operator[ ] like vector. However, it does
not have vector’s constraint of keeping everything in a single
sequential block of memory. Instead, deque uses multiple blocks of
sequential storage (keeping track of all the blocks and their order in a mapping
structure). For this reason the overhead for a deque to add or remove
elements at either end is very low. In addition, it never needs to copy and
destroy contained objects during a new storage allocation (like vector
does) so it is far more efficient than vector if you are adding an
unknown quantity of objects. This means that vector is the best choice
only if you have a pretty good idea of how many objects you need. In addition,
many of the programs shown earlier in this book that use vector and
push_back( ) might be more efficient with a deque. The
interface to deque is only slightly different from a vector (deque
has a push_front( ) and pop_front( ) while vector
does not, for example) so converting code from using vector to using
deque is almost trivial. Consider StringVector.cpp, which can be
changed to use deque by replacing the word “vector” with
“deque” everywhere. The following program adds parallel deque
operations to the vector operations in StringVector.cpp, and
performs timing comparisons:
//: C04:StringDeque.cpp // Converted from StringVector.cpp #include "../require.h" #include <string> #include <deque> #include <vector> #include <fstream> #include <iostream> #include <iterator> #include <sstream> #include <ctime> using namespace std; int main(int argc, char* argv[]) { requireArgs(argc, 1); ifstream in(argv[1]); assure(in, argv[1]); vector<string> vstrings; deque<string> dstrings; string line; // Time reading into vector: clock_t ticks = clock(); while(getline(in, line)) vstrings.push_back(line); ticks = clock() - ticks; cout << "Read into vector: " << ticks << endl; // Repeat for deque: ifstream in2(argv[1]); assure(in2, argv[1]); ticks = clock(); while(getline(in2, line)) dstrings.push_back(line); ticks = clock() - ticks; cout << "Read into deque: " << ticks << endl; // Now compare indexing: ticks = clock(); for(int i = 0; i < vstrings.size(); i++) { ostringstream ss; ss << i; vstrings[i] = ss.str() + ": " + vstrings[i]; } ticks = clock() - ticks; cout << "Indexing vector: " << ticks << endl; ticks = clock(); for(int j = 0; j < dstrings.size(); j++) { ostringstream ss; ss << j; dstrings[j] = ss.str() + ": " + dstrings[j]; } ticks = clock() - ticks; cout << "Indexing deqeue: " << ticks << endl; // Compare iteration ofstream tmp1("tmp1.tmp"), tmp2("tmp2.tmp"); ticks = clock(); copy(vstrings.begin(), vstrings.end(), ostream_iterator<string>(tmp1, "\n")); ticks = clock() - ticks; cout << "Iterating vector: " << ticks << endl; ticks = clock(); copy(dstrings.begin(), dstrings.end(), ostream_iterator<string>(tmp2, "\n")); ticks = clock() - ticks; cout << "Iterating deqeue: " << ticks << endl; } ///:~
Knowing now what you do about the inefficiency of adding
things to vector because of storage reallocation, you may expect dramatic
differences between the two. However, on a 1.7 Megabyte text file one
compiler’s program produced the following (measured in platform/compiler
specific clock ticks, not seconds):
Read into vector: 8350 Read into deque: 7690 Indexing vector: 2360 Indexing deqeue: 2480 Iterating vector: 2470 Iterating deqeue: 2410
A different compiler and platform roughly agreed with this.
It’s not so dramatic, is it? This points out some important
issues:
Of course, this doesn’t mean you
shouldn’t use a deque rather than a vector when you know
that an uncertain number of objects will be pushed onto the end of the
container. On the contrary, you should – when you’re tuning for
performance. But you should also be aware that performance issues are usually
not where you think they are, and the only way to know for sure where your
bottlenecks are is by testing. Later in this chapter there will be a more
“pure” comparison of performance between vector, deque
and list.
Sometimes you need the behavior or efficiency of one kind of
container for one part of your program, and a different container’s
behavior or efficiency in another part of the program. For example, you may need
the efficiency of a deque when adding objects to the container but the
efficiency of a vector when indexing them. Each of the basic sequence
containers (vector, deque and list) has a two-iterator
constructor (indicating the beginning and ending of the sequence to read from
when creating a new object) and an assign( ) member function to read
into an existing container, so you can easily move objects from one sequence
container to another.
The following example reads objects into a deque and
then converts to a vector:
//: C04:DequeConversion.cpp // Reading into a Deque, converting to a vector #include "Noisy.h" #include <deque> #include <vector> #include <iostream> #include <algorithm> #include <cstdlib> using namespace std; int main(int argc, char* argv[]) { int size = 25; if(argc >= 2) size = atoi(argv[1]); deque<Noisy> d; generate_n(back_inserter(d), size, NoisyGen()); cout << "\n Converting to a vector(1)" << endl; vector<Noisy> v1(d.begin(), d.end()); cout << "\n Converting to a vector(2)" << endl; vector<Noisy> v2; v2.reserve(d.size()); v2.assign(d.begin(), d.end()); cout << "\n Cleanup" << endl; } ///:~
You can try various sizes, but you should see that it makes no
difference – the objects are simply copy-constructed into the new
vectors. What’s interesting is that v1 does not cause
multiple allocations while building the vector, no matter how many
elements you use. You might initially think that you must follow the process
used for v2 and preallocate the storage to prevent messy reallocations,
but the constructor used for v1 determines the memory need ahead of time
so this is unnecessary.
It’s illuminating to see what happens with a
deque when it overflows a block of storage, in contrast with
VectorOverflow.cpp:
//: C04:DequeOverflow.cpp // A deque is much more efficient than a vector // when pushing back a lot of elements, since it // doesn't require copying and destroying. #include "Noisy.h" #include <deque> #include <cstdlib> using namespace std; int main(int argc, char* argv[]) { int size = 1000; if(argc >= 2) size = atoi(argv[1]); deque<Noisy> dn; Noisy n; for(int i = 0; i < size; i++) dn.push_back(n); cout << "\n cleaning up \n"; } ///:~
Here you will never see any destructors before the words
“cleaning up” appear. Since the deque allocates all its
storage in blocks instead of a contiguous array like vector, it never
needs to move existing storage (thus no additional copy-constructions and
destructions occur). It simply allocates a new block. For the same reason, the
deque can just as efficiently add elements to the beginning of the
sequence, since if it runs out of storage it (again) just allocates a new block
for the beginning. Insertions in the middle of a deque, however, could be
even messier than for vector (but not as costly).
Because a deque never moves its storage, a held
iterator never becomes invalid when you add new things to either end of a deque,
as it was demonstrated to do with vector (in VectorCoreDump.cpp).
However, it’s still possible (albeit harder) to do bad things:
//: C04:DequeCoreDump.cpp // How to break a program using a deque #include <queue> #include <iostream> using namespace std; int main() { deque<int> di(100, 0); // No problem iterating from beginning to end, // even though it spans multiple blocks: copy(di.begin(), di.end(), ostream_iterator<int>(cout, " ")); deque<int>::iterator i = // In the middle: di.begin() + di.size() / 2;; // Walk the iterator forward as you perform // a lot of insertions in the middle: for(int j = 0; j < 1000; j++) { cout << j << endl; di.insert(i++, 1); // Eventually breaks } } ///:~
Of course, there are two things here that you wouldn’t
normally do with a deque: first, elements are inserted in the middle,
which deque allows but isn’t designed for. Second, calling
insert( ) repeatedly with the same iterator would not ordinarily
cause an access violation, but the iterator is walked forward after each
insertion. I’m guessing it eventually walks off the end of a block, but
I’m not sure what actually causes the problem.
If you stick to what deque is best at –
insertions and removals from either end, reasonably rapid traversals and fairly
fast random-access using operator[ ] – you’ll be in good
shape.
Both vector and deque provide two ways to
perform random access of their elements: the operator[ ], which
you’ve seen already, and at( ), which checks the boundaries of
the container that’s being indexed and throws an exception if you go out
of bounds. It does cost more to use at( ):
//: C04:IndexingVsAt.cpp // Comparing "at()" to operator[] #include "../require.h" #include <vector> #include <deque> #include <iostream> #include <ctime> using namespace std; int main(int argc, char* argv[]) { requireMinArgs(argc, 1); long count = 1000; int sz = 1000; if(argc >= 2) count = atoi(argv[1]); if(argc >= 3) sz = atoi(argv[2]); vector<int> vi(sz); clock_t ticks = clock(); for(int i1 = 0; i1 < count; i1++) for(int j = 0; j < sz; j++) vi[j]; cout << "vector[]" << clock() - ticks << endl; ticks = clock(); for(int i2 = 0; i2 < count; i2++) for(int j = 0; j < sz; j++) vi.at(j); cout << "vector::at()" << clock()-ticks <<endl; deque<int> di(sz); ticks = clock(); for(int i3 = 0; i3 < count; i3++) for(int j = 0; j < sz; j++) di[j]; cout << "deque[]" << clock() - ticks << endl; ticks = clock(); for(int i4 = 0; i4 < count; i4++) for(int j = 0; j < sz; j++) di.at(j); cout << "deque::at()" << clock()-ticks <<endl; // Demonstrate at() when you go out of bounds: di.at(vi.size() + 1); } ///:~
As you’ll learn in the exception-handling chapter,
different systems may handle the uncaught exception in different ways, but
you’ll know one way or another that something went wrong with the program
when using at( ), whereas it’s possible to go blundering ahead
using operator[ ].
A list is implemented as a doubly-linked list and is
thus designed for rapid insertion and removal of elements in the middle of the
sequence (whereas for vector and deque this is a much more costly
operation). A list is so slow when randomly accessing elements that it does not
have an operator[ ]. It’s best used when you’re traversing a
sequence, in order, from beginning to end (or end to beginning) rather than
choosing elements randomly from the middle. Even then the traversal is
significantly slower than either a vector or a deque, but if you
aren’t doing a lot of traversals that won’t be your
bottleneck.
Another thing to be aware of with a list is the memory
overhead of each link, which requires a forward and backward pointer on top of
the storage for the actual object. Thus a list is a better choice when
you have larger objects that you’ll be inserting and removing from the
middle of the list. It’s better not to use a list if you
think you might be traversing it a lot, looking for objects, since the amount of
time it takes to get from the beginning of the list – which is the
only place you can start unless you’ve already got an iterator to
somewhere you know is closer to your destination – to the object of
interest is proportional to the number of objects between the beginning and that
object.
The objects in a list never move after they are
created; “moving” a list element means changing the links, but never
copying or assigning the actual objects. This means that a held iterator never
moves when you add new things to a list as it was demonstrated to do in
vector. Here’s an example using the Noisy class:
//: C04:ListStability.cpp // Things don't move around in lists #include "Noisy.h" #include <list> #include <iostream> #include <algorithm> using namespace std; int main() { list<Noisy> l; ostream_iterator<Noisy> out(cout, " "); generate_n(back_inserter(l), 25, NoisyGen()); cout << "\n Printing the list:" << endl; copy(l.begin(), l.end(), out); cout << "\n Reversing the list:" << endl; l.reverse(); copy(l.begin(), l.end(), out); cout << "\n Sorting the list:" << endl; l.sort(); copy(l.begin(), l.end(), out); cout << "\n Swapping two elements:" << endl; list<Noisy>::iterator it1, it2; it1 = it2 = l.begin(); it2++; swap(*it1, *it2); cout << endl; copy(l.begin(), l.end(), out); cout << "\n Using generic reverse(): " << endl; reverse(l.begin(), l.end()); cout << endl; copy(l.begin(), l.end(), out); cout << "\n Cleanup" << endl; } ///:~
Operations as seemingly radical as reversing and sorting the
list require no copying of objects, because instead of moving the objects, the
links are simply changed. However, notice that sort( ) and
reverse( ) are member functions of list, so they have special
knowledge of the internals of list and can perform the pointer movement
instead of copying. On the other hand, the swap( ) function is a
generic algorithm, and doesn’t know about list in particular and so
it uses the copying approach for swapping two elements. There are also generic
algorithms for sort( ) and reverse( ), but if you try to
use these you’ll discover that the generic reverse( ) performs
lots of copying and destruction (so you should never use it with a list)
and the generic sort( ) simply doesn’t work because it
requires random-access iterators that list doesn’t provide (a
definite benefit, since this would certainly be an expensive way to sort
compared to list’s own sort( )). The generic
sort( ) and reverse( ) should only be used with arrays,
vectors and deques.
If you have large and complex objects you may want to choose a
list first, especially if construction, destruction, copy-construction
and assignment are expensive and if you are doing things like sorting the
objects or otherwise reordering them a
lot.
The list has some special operations that are built-in
to make the best use of the structure of the list. You’ve already
seen reverse( ) and sort( ), and here are some of the
others in use:
//: C04:ListSpecialFunctions.cpp #include "Noisy.h" #include <list> #include <iostream> #include <algorithm> using namespace std; ostream_iterator<Noisy> out(cout, " "); void print(list<Noisy>& ln, char* comment = "") { cout << "\n" << comment << ":\n"; copy(ln.begin(), ln.end(), out); cout << endl; } int main() { typedef list<Noisy> LN; LN l1, l2, l3, l4; generate_n(back_inserter(l1), 6, NoisyGen()); generate_n(back_inserter(l2), 6, NoisyGen()); generate_n(back_inserter(l3), 6, NoisyGen()); generate_n(back_inserter(l4), 6, NoisyGen()); print(l1, "l1"); print(l2, "l2"); print(l3, "l3"); print(l4, "l4"); LN::iterator it1 = l1.begin(); it1++; it1++; it1++; l1.splice(it1, l2); print(l1, "l1 after splice(it1, l2)"); print(l2, "l2 after splice(it1, l2)"); LN::iterator it2 = l3.begin(); it2++; it2++; it2++; l1.splice(it1, l3, it2); print(l1, "l1 after splice(it1, l3, it2)"); LN::iterator it3 = l4.begin(), it4 = l4.end(); it3++; it4--; l1.splice(it1, l4, it3, it4); print(l1, "l1 after splice(it1,l4,it3,it4)"); Noisy n; LN l5(3, n); generate_n(back_inserter(l5), 4, NoisyGen()); l5.push_back(n); print(l5, "l5 before remove()"); l5.remove(l5.front()); print(l5, "l5 after remove()"); l1.sort(); l5.sort(); l5.merge(l1); print(l5, "l5 after l5.merge(l1)"); cout << "\n Cleanup" << endl; } ///:~
The print( ) function is used to display results.
After filling four lists with Noisy objects, one list is spliced
into another in three different ways. In the first, the entire list l2 is
spliced into l1 at the iterator it1. Notice that after the splice,
l2 is empty – splicing means removing the elements from the source
list. The second splice inserts elements from l3 starting at it2
into l1 starting at it1. The third splice starts at it1 and
uses elements from l4 starting at it3 and ending at it4
(the seemingly-redundant mention of the source list is because the elements must
be erased from the source list as part of the transfer to the destination
list).
The output from the code that demonstrates
remove( ) shows that the list does not have to be sorted in order
for all the elements of a particular value to be removed.
Finally, if you merge( ) one list with another,
the merge only works sensibly if the lists have been sorted. What you end up
with in that case is a sorted list containing all the elements from both lists
(the source list is erased – that is, the elements are moved to the
destination list).
There’s also a unique( ) member function
that removes all duplicates, but only if the list has been sorted
first:
//: C04:UniqueList.cpp // Testing list's unique() function #include <list> #include <iostream> using namespace std; int a[] = { 1, 3, 1, 4, 1, 5, 1, 6, 1 }; const int asz = sizeof a / sizeof *a; int main() { // For output: ostream_iterator<int> out(cout, " "); list<int> li(a, a + asz); li.unique(); // Oops! No duplicates removed: copy(li.begin(), li.end(), out); cout << endl; // Must sort it first: li.sort(); copy(li.begin(), li.end(), out); cout << endl; // Now unique() will have an effect: li.unique(); copy(li.begin(), li.end(), out); cout << endl; } ///:~
The list constructor used here takes the starting and
past-the-end iterator from another container, and it copies all the elements
from that container into itself (a similar constructor is available for all the
containers). Here, the “container” is just an array, and the
“iterators” are pointers into that array, but because of the design
of the STL it works with arrays just as easily as any other container.
If you run this program, you’ll see that
unique( ) will only remove adjacent duplicate elements, and
thus sorting is necessary before calling unique( ).
There are four additional list member functions that
are not demonstrated here: a remove_if( ) that takes a predicate
which is used to decide whether an object should be removed, a
unique( ) that takes a binary predicate to perform uniqueness
comparisons, a merge( ) that takes an additional argument which
performs comparisons, and a sort( ) that takes a comparator (to
provide a comparison or override the existing one).
Looking at the previous example you may note that if you want
a sorted list with no duplicates, a set can give you that, right?
It’s interesting to compare the performance of the two
containers:
//: C04:ListVsSet.cpp // Comparing list and set performance #include <iostream> #include <list> #include <set> #include <algorithm> #include <ctime> #include <cstdlib> using namespace std; class Obj { int a[20]; // To take up extra space int val; public: Obj() : val(rand() % 500) {} friend bool operator<(const Obj& a, const Obj& b) { return a.val < b.val; } friend bool operator==(const Obj& a, const Obj& b) { return a.val == b.val; } friend ostream& operator<<(ostream& os, const Obj& a) { return os << a.val; } }; template<class Container> void print(Container& c) { typename Container::iterator it; for(it = c.begin(); it != c.end(); it++) cout << *it << " "; cout << endl; } struct ObjGen { Obj operator()() { return Obj(); } }; int main() { const int sz = 5000; srand(time(0)); list<Obj> lo; clock_t ticks = clock(); generate_n(back_inserter(lo), sz, ObjGen()); lo.sort(); lo.unique(); cout << "list:" << clock() - ticks << endl; set<Obj> so; ticks = clock(); generate_n(inserter(so, so.begin()), sz, ObjGen()); cout << "set:" << clock() - ticks << endl; print(lo); print(so); } ///:~
When you run the program, you should discover that set
is much faster than list. This is reassuring – after all, it
is set’s primary job
description!
It turns out that all basic sequences have a member function
swap( ) that’s designed to switch one sequence with another
(however, this swap( ) is only defined for sequences of the same
type). The member swap( ) makes use of its knowledge of the internal
structure of the particular container in order to be efficient:
//: C04:Swapping.cpp // All basic sequence containers can be swapped #include "Noisy.h" #include <list> #include <vector> #include <deque> #include <iostream> #include <algorithm> using namespace std; ostream_iterator<Noisy> out(cout, " "); template<class Cont> void print(Cont& c, char* comment = "") { cout << "\n" << comment << ": "; copy(c.begin(), c.end(), out); cout << endl; } template<class Cont> void testSwap(char* cname) { Cont c1, c2; generate_n(back_inserter(c1), 10, NoisyGen()); generate_n(back_inserter(c2), 5, NoisyGen()); cout << "\n" << cname << ":" << endl; print(c1, "c1"); print(c2, "c2"); cout << "\n Swapping the " << cname << ":" << endl; c1.swap(c2); print(c1, "c1"); print(c2, "c2"); } int main() { testSwap<vector<Noisy> >("vector"); testSwap<deque<Noisy> >("deque"); testSwap<list<Noisy> >("list"); } ///:~
When you run this, you’ll discover that each type of
sequence container is able to swap one sequence for another without any copying
or assignments, even if the sequences are of different sizes. In effect,
you’re completely swapping the memory of one object for another.
The STL algorithms also contain a swap( ), and
when this function is applied to two containers of the same type, it will use
the member swap( ) to achieve fast performance. Consequently, if you
apply the sort( ) algorithm to a container of containers, you will
find that the performance is very fast – it turns out that fast sorting of
a container of containers was a design goal of the
STL.
To break a list, you have to work pretty
hard:
//: C04:ListRobustness.cpp // lists are harder to break #include <list> #include <iostream> using namespace std; int main() { list<int> li(100, 0); list<int>::iterator i = li.begin(); for(int j = 0; j < li.size() / 2; j++) i++; // Walk the iterator forward as you perform // a lot of insertions in the middle: for(int k = 0; k < 1000; k++) li.insert(i++, 1); // No problem li.erase(i); i++; *i = 2; // Oops! It's invalid } ///:~
When the link that the iterator i was pointing to was
erased, it was unlinked from the list and thus became invalid. Trying to move
forward to the “next link” from an invalid link is poorly-formed
code. Notice that the operation that broke deque in
DequeCoreDump.cpp is perfectly fine with a
list.
To get a better feel for the differences between the sequence
containers, it’s illuminating to race them against each other while
performing various operations.
//: C04:SequencePerformance.cpp // Comparing the performance of the basic // sequence containers for various operations #include <vector> #include <queue> #include <list> #include <iostream> #include <string> #include <typeinfo> #include <ctime> #include <cstdlib> using namespace std; class FixedSize { int x[20]; // Automatic generation of default constructor, // copy-constructor and operator= } fs; template<class Cont> struct InsertBack { void operator()(Cont& c, long count) { for(long i = 0; i < count; i++) c.push_back(fs); } char* testName() { return "InsertBack"; } }; template<class Cont> struct InsertFront { void operator()(Cont& c, long count) { long cnt = count * 10; for(long i = 0; i < cnt; i++) c.push_front(fs); } char* testName() { return "InsertFront"; } }; template<class Cont> struct InsertMiddle { void operator()(Cont& c, long count) { typename Cont::iterator it; long cnt = count / 10; for(long i = 0; i < cnt; i++) { // Must get the iterator every time to keep // from causing an access violation with // vector. Increment it to put it in the // middle of the container: it = c.begin(); it++; c.insert(it, fs); } } char* testName() { return "InsertMiddle"; } }; template<class Cont> struct RandomAccess { // Not for list void operator()(Cont& c, long count) { int sz = c.size(); long cnt = count * 100; for(long i = 0; i < cnt; i++) c[rand() % sz]; } char* testName() { return "RandomAccess"; } }; template<class Cont> struct Traversal { void operator()(Cont& c, long count) { long cnt = count / 100; for(long i = 0; i < cnt; i++) { typename Cont::iterator it = c.begin(), end = c.end(); while(it != end) it++; } } char* testName() { return "Traversal"; } }; template<class Cont> struct Swap { void operator()(Cont& c, long count) { int middle = c.size() / 2; typename Cont::iterator it = c.begin(), mid = c.begin(); it++; // Put it in the middle for(int x = 0; x < middle + 1; x++) mid++; long cnt = count * 10; for(long i = 0; i < cnt; i++) swap(*it, *mid); } char* testName() { return "Swap"; } }; template<class Cont> struct RemoveMiddle { void operator()(Cont& c, long count) { long cnt = count / 10; if(cnt > c.size()) { cout << "RemoveMiddle: not enough elements" << endl; return; } for(long i = 0; i < cnt; i++) { typename Cont::iterator it = c.begin(); it++; c.erase(it); } } char* testName() { return "RemoveMiddle"; } }; template<class Cont> struct RemoveBack { void operator()(Cont& c, long count) { long cnt = count * 10; if(cnt > c.size()) { cout << "RemoveBack: not enough elements" << endl; return; } for(long i = 0; i < cnt; i++) c.pop_back(); } char* testName() { return "RemoveBack"; } }; template<class Op, class Container> void measureTime(Op f, Container& c, long count){ string id(typeid(f).name()); bool Deque = id.find("deque") != string::npos; bool List = id.find("list") != string::npos; bool Vector = id.find("vector") !=string::npos; string cont = Deque ? "deque" : List ? "list" : Vector? "vector" : "unknown"; cout << f.testName() << " for " << cont << ": "; // Standard C library CPU ticks: clock_t ticks = clock(); f(c, count); // Run the test ticks = clock() - ticks; cout << ticks << endl; } typedef deque<FixedSize> DF; typedef list<FixedSize> LF; typedef vector<FixedSize> VF; int main(int argc, char* argv[]) { srand(time(0)); long count = 1000; if(argc >= 2) count = atoi(argv[1]); DF deq; LF lst; VF vec, vecres; vecres.reserve(count); // Preallocate storage measureTime(InsertBack<VF>(), vec, count); measureTime(InsertBack<VF>(), vecres, count); measureTime(InsertBack<DF>(), deq, count); measureTime(InsertBack<LF>(), lst, count); // Can't push_front() with a vector: //! measureTime(InsertFront<VF>(), vec, count); measureTime(InsertFront<DF>(), deq, count); measureTime(InsertFront<LF>(), lst, count); measureTime(InsertMiddle<VF>(), vec, count); measureTime(InsertMiddle<DF>(), deq, count); measureTime(InsertMiddle<LF>(), lst, count); measureTime(RandomAccess<VF>(), vec, count); measureTime(RandomAccess<DF>(), deq, count); // Can't operator[] with a list: //! measureTime(RandomAccess<LF>(), lst, count); measureTime(Traversal<VF>(), vec, count); measureTime(Traversal<DF>(), deq, count); measureTime(Traversal<LF>(), lst, count); measureTime(Swap<VF>(), vec, count); measureTime(Swap<DF>(), deq, count); measureTime(Swap<LF>(), lst, count); measureTime(RemoveMiddle<VF>(), vec, count); measureTime(RemoveMiddle<DF>(), deq, count); measureTime(RemoveMiddle<LF>(), lst, count); vec.resize(vec.size() * 10); // Make it bigger measureTime(RemoveBack<VF>(), vec, count); measureTime(RemoveBack<DF>(), deq, count); measureTime(RemoveBack<LF>(), lst, count); } ///:~
This example makes heavy use of templates to eliminate
redundancy, save space, guarantee identical code and improve clarity. Each test
is represented by a class that is templatized on the container it will operate
on. The test itself is inside the operator( ) which, in each case,
takes a reference to the container and a repeat count – this count is not
always used exactly as it is, but sometimes increased or decreased to prevent
the test from being too short or too long. The repeat count is just a factor,
and all tests are compared using the same value.
Each test class also has a member function that returns its
name, so that it can easily be printed. You might think that this should be
accomplished using run-time type identification, but since the actual name of
the class involves a template expansion, this turns out to be the more direct
approach.
The measureTime( ) function template takes as its
first template argument the operation that it’s going to test –
which is itself a class template selected from the group defined previously in
the listing. The template argument Op will not only contain the name of
the class, but also (decorated into it) the type of the container it’s
working with. The RTTI typeid( ) operation allows the name of the
class to be extracted as a char*, which can then be used to create a
string called id. This string can be searched using
string::find( ) to look for deque, list or
vector. The bool variable that corresponds to the matching
string becomes true, and this is used to properly initialize the
string cont so the container name can be accurately printed, along
with the test name.
Once the type of test and the container being tested has been
printed out, the actual test is quite simple. The Standard C library function
clock( ) is used to capture the starting and ending CPU ticks (this
is typically more fine-grained than trying to measure seconds). Since f
is an object of type Op, which is a class that has an
operator( ), the line:
f(c, count);
is actually calling the operator( ) for the object
f.
In main( ), you can see that each different type
of test is run on each type of container, except for the containers that
don’t support the particular operation being tested (these are commented
out).
When you run the program, you’ll get comparative
performance numbers for your particular compiler and your particular operating
system and platform. Although this is only intended to give you a feel for the
various performance features relative to the other sequences, it is not a bad
way to get a quick-and-dirty idea of the behavior of your library, and also to
compare one library with another.
The set produces a container that will accept only one
of each thing you place in it; it also sorts the elements (sorting isn’t
intrinsic to the conceptual definition of a set, but the STL set stores
its elements in a balanced binary tree to provide rapid lookups, thus producing
sorted results when you traverse it). The first two examples in this chapter
used sets.
Consider the problem of creating an index for a book. You
might like to start with all the words in the book, but you only want one
instance of each word and you want them sorted. Of course, a set is
perfect for this, and solves the problem effortlessly. However, there’s
also the problem of punctuation and any other non-alpha characters, which must
be stripped off to generate proper words. One solution to this problem is to use
the Standard C library function strtok( ), which produces tokens (in
our case, words) given a set of delimiters to strip out:
//: C04:WordList.cpp // Display a list of words used in a document #include "../require.h" #include <string> #include <cstring> #include <set> #include <iostream> #include <fstream> using namespace std; const char* delimiters = " \t;()\"<>:{}[]+-=&*#.,/\\~"; int main(int argc, char* argv[]) { requireArgs(argc, 1); ifstream in(argv[1]); assure(in, argv[1]); set<string> wordlist; string line; while(getline(in, line)) { // Capture individual words: char* s = // Cast probably won’t crash: strtok((char*)line.c_str(), delimiters); while(s) { // Automatic type conversion: wordlist.insert(s); s = strtok(0, delimiters); } } // Output results: copy(wordlist.begin(), wordlist.end(), ostream_iterator<string>(cout, "\n")); } ///:~
strtok( )
takes the starting address of
a character buffer (the first argument) and looks for delimiters (the second
argument). It replaces the delimiter with a zero, and returns the address of the
beginning of the token. If you call it subsequent times with a first argument of
zero it will continue extracting tokens from the rest of the string until it
finds the end. In this case, the delimiters are those that delimit the keywords
and identifiers of C++, so it extracts these keywords and identifiers. Each word
is turned into a string and placed into the wordlist vector, which
eventually contains the whole file, broken up into words.
You don’t have to use a set just to get a sorted
sequence. You can use the sort( ) function (along with a multitude
of other functions in the STL) on different STL containers. However, it’s
likely that set will be faster.
Some programmers consider strtok( ) to be the
poorest design in the Standard C library because it uses a static buffer
to hold its data between function calls. This means:
For
all these reasons it seems like a good idea to find an alternative for
strtok( ). The following example will use an
istreambuf_iterator (introduced earlier) to move the characters from one
place (which happens to be an istream) to another (which happens to be a
string), depending on whether the Standard C library function
isalpha( ) is true:
//: C04:WordList2.cpp // Eliminating strtok() from Wordlist.cpp #include "../require.h" #include <string> #include <cstring> #include <set> #include <iostream> #include <fstream> #include <iterator> using namespace std; int main(int argc, char* argv[]) { requireArgs(argc, 1); ifstream in(argv[1]); assure(in, argv[1]); istreambuf_iterator<char> p(in), end; set<string> wordlist; while (p != end) { string word; insert_iterator<string> ii(word, word.begin()); // Find the first alpha character: while(!isalpha(*p) && p != end) p++; // Copy until the first non-alpha character: while (isalpha(*p) && p != end) *ii++ = *p++; if (word.size() != 0) wordlist.insert(word); } // Output results: copy(wordlist.begin(), wordlist.end(), ostream_iterator<string>(cout, "\n")); } ///:~
This example was suggested by Nathan Myers, who invented the
istreambuf_iterator and its relatives. This iterator extracts information
character-by-character from a stream. Although the istreambuf_iterator
template argument might suggest to you that you could extract, for example,
ints instead of char, that’s not the case. The argument must
be of some character type – a regular char or a wide
character.
After the file is open, an istreambuf_iterator called
p is attached to the istream so characters can be extracted from
it. The set<string> called wordlist will be used to hold the
resulting words.
The while loop reads words until the end of the input
stream is found. This is detected using the default constructor for
istreambuf_iterator which produces the past-the-end iterator object
end. Thus, if you want to test to make sure you’re not at the end
of the stream, you simply say p != end.
The second type of iterator that’s used here is the
insert_iterator, which creates an iterator that knows how to insert
objects into a container. Here, the “container” is the string
called word which, for the purposes of insert_iterator, behaves
like a container. The constructor for insert_iterator requires the
container and an iterator indicating where it should start inserting the
characters. You could also use a back_insert_iterator, which requires
that the container have a push_back( ) (string
does).
After the while loop sets everything up, it begins by
looking for the first alpha character, incrementing start until that
character is found. Then it copies characters from one iterator to the other,
stopping when a non-alpha character is found. Each word, assuming it is
non-empty, is added to wordlist.
The above program parses its input into strings of words
containing only alpha characters, but that’s still a special case compared
to the generality of strtok( ). What we’d like now is an
actual replacement for strtok( ) so we’re never tempted to use
it. WordList2.cpp can be modified to create a class called
StreamTokenizer that delivers a new token as a string whenever you
call next( ), according to the delimiters you give it upon
construction (very similar to strtok( )):
//: C04:StreamTokenizer.h // C++ Replacement for Standard C strtok() #ifndef STREAMTOKENIZER_H #define STREAMTOKENIZER_H #include <string> #include <iostream> #include <iterator> class StreamTokenizer { typedef std::istreambuf_iterator<char> It; It p, end; std::string delimiters; bool isDelimiter(char c) { return delimiters.find(c) != std::string::npos; } public: StreamTokenizer(std::istream& is, std::string delim = " \t\n;()\"<>:{}[]+-=&*#" ".,/\\~!0123456789") : p(is), end(It()), delimiters(delim) {} std::string next(); // Get next token }; #endif STREAMTOKENIZER_H ///:~
The default delimiters for the StreamTokenizer
constructor extract words with only alpha characters, as before, but now you can
choose different delimiters to parse different tokens. The implementation of
next( ) looks similar to Wordlist2.cpp:
//: C04:StreamTokenizer.cpp {O} #include "StreamTokenizer.h" using namespace std; string StreamTokenizer::next() { string result; if(p != end) { insert_iterator<string> ii(result, result.begin()); while(isDelimiter(*p) && p != end) p++; while (!isDelimiter(*p) && p != end) *ii++ = *p++; } return result; } ///:~
The first non-delimiter is found, then characters are copied
until a delimiter is found, and the resulting string is returned.
Here’s a test:
//: C04:TokenizeTest.cpp //{L} StreamTokenizer // Test StreamTokenizer #include "StreamTokenizer.h" #include "../require.h" #include <iostream> #include <fstream> #include <set> using namespace std; int main(int argc, char* argv[]) { requireArgs(argc, 1); ifstream in(argv[1]); assure(in, argv[1]); StreamTokenizer words(in); set<string> wordlist; string word; while((word = words.next()).size() != 0) wordlist.insert(word); // Output results: copy(wordlist.begin(), wordlist.end(), ostream_iterator<string>(cout, "\n")); } ///:~
Now the tool is more reusable than before, but it’s
still inflexible, because it can only work with an istream. This
isn’t as bad as it first seems, since a string can be turned into
an istream via an istringstream. But in the next section
we’ll come up with the most general, reusable tokenizing tool, and this
should give you a feeling of what “reusable” really means, and the
effort necessary to create truly reusable
code.
Since the STL containers and algorithms all revolve around
iterators, the most flexible solution will itself be an iterator. You could
think of the TokenIterator as an iterator that wraps itself around any
other iterator that can produce characters. Because it is designed as an input
iterator (the most primitive type of iterator) it can be used with any STL
algorithm. Not only is it a useful tool in itself, the TokenIterator is
also a good example of how you can design your own
iterators.[19]
The TokenIterator is doubly flexible: first, you can
choose the type of iterator that will produce the char input. Second,
instead of just saying what characters represent the delimiters,
TokenIterator will use a predicate which is a function object whose
operator( ) takes a char and decides if it should be in the
token or not. Although the two examples given here have a static concept of what
characters belong in a token, you could easily design your own function object
to change its state as the characters are read, producing a more sophisticated
parser.
The following header file contains the two basic predicates
Isalpha and Delimiters, along with the template for
TokenIterator:
//: C04:TokenIterator.h #ifndef TOKENITERATOR_H #define TOKENITERATOR_H #include <string> #include <iterator> #include <algorithm> #include <cctype> struct Isalpha { bool operator()(char c) { using namespace std; //[[For a compiler bug]] return isalpha(c); } }; class Delimiters { std::string exclude; public: Delimiters() {} Delimiters(const std::string& excl) : exclude(excl) {} bool operator()(char c) { return exclude.find(c) == std::string::npos; } }; template <class InputIter, class Pred = Isalpha> class TokenIterator: public std::iterator< std::input_iterator_tag,std::string,ptrdiff_t>{ InputIter first; InputIter last; std::string word; Pred predicate; public: TokenIterator(InputIter begin, InputIter end, Pred pred = Pred()) : first(begin), last(end), predicate(pred) { ++*this; } TokenIterator() {} // End sentinel // Prefix increment: TokenIterator& operator++() { word.resize(0); first = std::find_if(first, last, predicate); while (first != last && predicate(*first)) word += *first++; return *this; } // Postfix increment class Proxy { std::string word; public: Proxy(const std::string& w) : word(w) {} std::string operator*() { return word; } }; Proxy operator++(int) { Proxy d(word); ++*this; return d; } // Produce the actual value: std::string operator*() const { return word; } std::string* operator->() const { return &(operator*()); } // Compare iterators: bool operator==(const TokenIterator&) { return word.size() == 0 && first == last; } bool operator!=(const TokenIterator& rv) { return !(*this == rv); } }; #endif // TOKENITERATOR_H ///:~
TokenIterator is inherited from the
std::iterator template. It might appear that there’s some kind of
functionality that comes with std::iterator, but it is purely a way of
tagging an iterator so that a container that uses it knows what it’s
capable of. Here, you can see input_iterator_tag as a template argument
– this tells anyone who asks that a TokenIterator only has the
capabilities of an input iterator, and cannot be used with algorithms requiring
more sophisticated iterators. Apart from the tagging, std::iterator
doesn’t do anything else, which means you must design all the other
functionality in yourself.
TokenIterator may look a little strange at first,
because the first constructor requires both a “begin” and
“end” iterator as arguments, along with the predicate. Remember that
this is a “wrapper” iterator that has no idea of how to tell whether
it’s at the end of its input source, so the ending iterator is necessary
in the first constructor. The reason for the second (default) constructor is
that the STL algorithms (and any algorithms you write) need a TokenIterator
sentinel to be the past-the-end value. Since all the information necessary
to see if the TokenIterator has reached the end of its input is collected
in the first constructor, this second constructor creates a TokenIterator
that is merely used as a placeholder in algorithms.
The core of the behavior happens in operator++. This
erases the current value of word using string::resize( ),
then finds the first character that satisfies the predicate (thus discovering
the beginning of the new token) using find_if( ) (from the STL
algorithms, discussed in the following chapter). The resulting iterator is
assigned to first, thus moving first forward to the beginning of
the token. Then, as long as the end of the input is not reached and the
predicate is satisfied, characters are copied into the word from the input.
Finally, the TokenIterator object is returned, and must be dereferenced to
access the new token.
The postfix increment requires a proxy object to hold the
value before the increment, so it can be returned (see the operator overloading
chapter for more details of this). Producing the actual value is a
straightforward operator*. The only other functions that must be defined
for an output iterator are the operator== and operator!= to
indicate whether the TokenIterator has reached the end of its input. You
can see that the argument for operator== is ignored – it only cares
about whether it has reached its internal last iterator. Notice that
operator!= is defined in terms of operator==.
A good test of TokenIterator includes a number of
different sources of input characters including a streambuf_iterator, a
char*, and a deque<char>::iterator. Finally, the original
Wordlist.cpp problem is solved:
//: C04:TokenIteratorTest.cpp #include "TokenIterator.h" #include "../require.h" #include <fstream> #include <iostream> #include <vector> #include <deque> #include <set> using namespace std; int main() { ifstream in("TokenIteratorTest.cpp"); assure(in, "TokenIteratorTest.cpp"); ostream_iterator<string> out(cout, "\n"); typedef istreambuf_iterator<char> IsbIt; IsbIt begin(in), isbEnd; Delimiters delimiters(" \t\n~;()\"<>:{}[]+-=&*#.,/\\"); TokenIterator<IsbIt, Delimiters> wordIter(begin, isbEnd, delimiters), end; vector<string> wordlist; copy(wordIter, end, back_inserter(wordlist)); // Output results: copy(wordlist.begin(), wordlist.end(), out); *out++ = "-----------------------------------"; // Use a char array as the source: char* cp = "typedef std::istreambuf_iterator<char> It"; TokenIterator<char*, Delimiters> charIter(cp, cp + strlen(cp), delimiters), end2; vector<string> wordlist2; copy(charIter, end2, back_inserter(wordlist2)); copy(wordlist2.begin(), wordlist2.end(), out); *out++ = "-----------------------------------"; // Use a deque<char> as the source: ifstream in2("TokenIteratorTest.cpp"); deque<char> dc; copy(IsbIt(in2), IsbIt(), back_inserter(dc)); TokenIterator<deque<char>::iterator,Delimiters> dcIter(dc.begin(), dc.end(), delimiters), end3; vector<string> wordlist3; copy(dcIter, end3, back_inserter(wordlist3)); copy(wordlist3.begin(), wordlist3.end(), out); *out++ = "-----------------------------------"; // Reproduce the Wordlist.cpp example: ifstream in3("TokenIteratorTest.cpp"); TokenIterator<IsbIt, Delimiters> wordIter2(IsbIt(in3), isbEnd, delimiters); set<string> wordlist4; while(wordIter2 != end) wordlist4.insert(*wordIter2++); copy(wordlist4.begin(), wordlist4.end(), out); } ///:~
When using an istreambuf_iterator, you create one to
attach to the istream object, and one with the default constructor as the
past-the-end marker. Both of these are used to create the TokenIterator
that will actually produce the tokens; the default constructor produces the faux
TokenIterator past-the-end sentinel (this is just a placeholder, and as
mentioned previously is actually ignored). The TokenIterator produces
strings that are inserted into a container which must, naturally, be a
container of string – here a vector<string> is used in
all cases except the last (you could also concatenate the results onto a
string). Other than that, a TokenIterator works like any other
input iterator.
The stack, along with the queue and
priority_queue, are classified as adapters, which means they are
implemented using one of the basic sequence containers: vector,
list or deque. This, in my opinion, is an unfortunate case of
confusing what something does with the details of its underlying implementation
– the fact that these are called “adapters” is of primary
value only to the creator of the library. When you use them, you generally
don’t care that they’re adapters, but instead that they solve your
problem. Admittedly there are times when it’s useful to know that you can
choose an alternate implementation or build an adapter from an existing
container object, but that’s generally one level removed from the
adapter’s behavior. So, while you may see it emphasized elsewhere that a
particular container is an adapter, I shall only point out that fact when
it’s useful. Note that each type of adapter has a default container that
it’s built upon, and this default is the most sensible implementation, so
in most cases you won’t need to concern yourself with the underlying
implementation.
The following example shows stack<string>
implemented in the three possible ways: the default (which uses deque),
with a vector and with a list:
//: C04:Stack1.cpp // Demonstrates the STL stack #include "../require.h" #include <iostream> #include <fstream> #include <stack> #include <list> #include <vector> #include <string> using namespace std; // Default: deque<string>: typedef stack<string> Stack1; // Use a vector<string>: typedef stack<string, vector<string> > Stack2; // Use a list<string>: typedef stack<string, list<string> > Stack3; int main(int argc, char* argv[]) { requireArgs(argc, 1); // File name is argument ifstream in(argv[1]); assure(in, argv[1]); Stack1 textlines; // Try the different versions // Read file and store lines in the stack: string line; while(getline(in, line)) textlines.push(line + "\n"); // Print lines from the stack and pop them: while(!textlines.empty()) { cout << textlines.top(); textlines.pop(); } } ///:~
The top( ) and pop( ) operations will
probably seem non-intuitive if you’ve used other stack classes.
When you call pop( ) it returns void rather than the top element
that you might have expected. If you want the top element, you get a reference
to it with top( ). It turns out this is more efficient, since a
traditional pop( ) would have to return a value rather than a
reference, and thus invoke the copy-constructor. When you’re using a
stack (or a priority_queue, described later) you can efficiently
refer to top( ) as many times as you want, then discard the top
element explicitly using pop( ) (perhaps if some other term than the
familiar “pop” had been used, this would have been a bit
clearer).
The stack template has a very simple interface,
essentially the member functions you see above. It doesn’t have
sophisticated forms of initialization or access, but if you need that you can
use the underlying container that the stack is implemented upon. For
example, suppose you have a function that expects a stack interface but
in the rest of your program you need the objects stored in a list. The
following program stores each line of a file along with the leading number of
spaces in that line (you might imagine it as a starting point for performing
some kinds of source-code reformatting):
//: C04:Stack2.cpp // Converting a list to a stack #include "../require.h" #include <iostream> #include <fstream> #include <stack> #include <list> #include <string> using namespace std; // Expects a stack: template<class Stk> void stackOut(Stk& s, ostream& os = cout) { while(!s.empty()) { os << s.top() << "\n"; s.pop(); } } class Line { string line; // Without leading spaces int lspaces; // Number of leading spaces public: Line(string s) : line(s) { lspaces = line.find_first_not_of(' '); if(lspaces == string::npos) lspaces = 0; line = line.substr(lspaces); } friend ostream& operator<<(ostream& os, const Line& l) { for(int i = 0; i < l.lspaces; i++) os << ' '; return os << l.line; } // Other functions here... }; int main(int argc, char* argv[]) { requireArgs(argc, 1); // File name is argument ifstream in(argv[1]); assure(in, argv[1]); list<Line> lines; // Read file and store lines in the list: string s; while(getline(in, s)) lines.push_front(s); // Turn the list into a stack for printing: stack<Line, list<Line> > stk(lines); stackOut(stk); } ///:~
The function that requires the stack interface just
sends each top( ) object to an ostream and then removes it by
calling pop( ). The Line class determines the number of
leading spaces, then stores the contents of the line without the leading
spaces. The ostream operator<< re-inserts the leading spaces
so the line prints properly, but you can easily change the number of spaces by
changing the value of lspaces (the member functions to do this are not
shown here).
In main( ), the input file is read into a
list<Line>, then a stack is wrapped around this list so it
can be sent to stackOut( ).
You cannot iterate through a stack; this emphasizes
that you only want to perform stack operations when you create a
stack. You can get equivalent “stack” functionality using a
vector and its back( ), push_back( ) and
pop_back( ) methods, and then you have all the additional
functionality of the vector. Stack1.cpp can be rewritten to show
this:
//: C04:Stack3.cpp // Using a vector as a stack; modified Stack1.cpp #include "../require.h" #include <iostream> #include <fstream> #include <vector> #include <string> using namespace std; int main(int argc, char* argv[]) { requireArgs(argc, 1); ifstream in(argv[1]); assure(in, argv[1]); vector<string> textlines; string line; while(getline(in, line)) textlines.push_back(line + "\n"); while(!textlines.empty()) { cout << textlines.back(); textlines.pop_back(); } } ///:~
You’ll see this produces the same output as
Stack1.cpp, but you can now perform vector operations as well. Of
course, list has the additional ability to push things at the front, but
it’s generally less efficient than using push_back( ) with
vector. (In addition, deque is usually more efficient than
list for pushing things at the
front).
The queue is a restricted form of a deque
– you can only enter elements at one end, and pull them off the other end.
Functionally, you could use a deque anywhere you need a queue, and
you would then also have the additional functionality of the deque. The
only reason you need to use a queue rather than a deque, then, is
if you want to emphasize that you will only be performing queue-like
behavior.
The queue is an adapter class like stack, in
that it is built on top of another sequence container. As you might guess, the
ideal implementation for a queue is a deque, and that is the
default template argument for the queue; you’ll rarely need a
different implementation.
Queues are often used when modeling systems where some
elements of the system are waiting to be served by other elements in the system.
A classic example of this is the “bank-teller problem,” where you
have customers arriving at random intervals, getting into a line, and then being
served by a set of tellers. Since the customers arrive randomly and each take a
random amount of time to be served, there’s no way to deterministically
know how long the line will be at any time. However, it’s possible to
simulate the situation and see what happens.
A problem in performing this simulation is the fact that, in
effect, each customer and teller should be run by a separate process. What
we’d like is a multithreaded environment, then each customer or teller
would have their own thread. However, Standard C++ has no model for
multithreading so there is no standard solution to this problem. On the other
hand, with a little adjustment to the code it’s possible to simulate
enough multithreading to provide a satisfactory solution to our
problem.
Multithreading means you have multiple threads of control
running at once, in the same address space (this differs from
multitasking, where you have different processes each running in their
own address space). The trick is that you have fewer CPUs than you do threads
(and very often only one CPU) so to give the illusion that each thread has its
own CPU there is a time-slicing mechanism that says “OK, current
thread – you’ve had enough time. I’m going to stop you and go
give time to some other thread.” This automatic stopping and starting of
threads is called pre-emptive and it means you don’t need to manage
the threading process at all.
An alternative approach is for each thread to voluntarily
yield the CPU to the scheduler, which then goes and finds another thread that
needs running. This is easier to synthesize, but it still requires a method of
“swapping” out one thread and swapping in another (this usually
involves saving the stack frame and using the standard C library functions
setjmp( ) and longjmp( ); see my article in the (XX)
issue of Computer Language magazine for an example). So instead, we’ll
build the time-slicing into the classes in the system. In this case, it will be
the tellers that represent the “threads,” (the customers will be
passive) so each teller will have an infinite-looping run( ) method
that will execute for a certain number of “time units,” and then
simply return. By using the ordinary return mechanism, we eliminate the need for
any swapping. The resulting program, although small, provides a remarkably
reasonable simulation:
//: C04:BankTeller.cpp // Using a queue and simulated multithreading // To model a bank teller system #include <iostream> #include <queue> #include <list> #include <cstdlib> #include <ctime> using namespace std; class Customer { int serviceTime; public: Customer() : serviceTime(0) {} Customer(int tm) : serviceTime(tm) {} int getTime() { return serviceTime; } void setTime(int newtime) { serviceTime = newtime; } friend ostream& operator<<(ostream& os, const Customer& c) { return os << '[' << c.serviceTime << ']'; } }; class Teller { queue<Customer>& customers; Customer current; static const int slice = 5; int ttime; // Time left in slice bool busy; // Is teller serving a customer? public: Teller(queue<Customer>& cq) : customers(cq), ttime(0), busy(false) {} Teller& operator=(const Teller& rv) { customers = rv.customers; current = rv.current; ttime = rv.ttime; busy = rv.busy; return *this; } bool isBusy() { return busy; } void run(bool recursion = false) { if(!recursion) ttime = slice; int servtime = current.getTime(); if(servtime > ttime) { servtime -= ttime; current.setTime(servtime); busy = true; // Still working on current return; } if(servtime < ttime) { ttime -= servtime; if(!customers.empty()) { current = customers.front(); customers.pop(); // Remove it busy = true; run(true); // Recurse } return; } if(servtime == ttime) { // Done with current, set to empty: current = Customer(0); busy = false; return; // No more time in this slice } } }; // Inherit to access protected implementation: class CustomerQ : public queue<Customer> { public: friend ostream& operator<<(ostream& os, const CustomerQ& cd) { copy(cd.c.begin(), cd.c.end(), ostream_iterator<Customer>(os, "")); return os; } }; int main() { CustomerQ customers; list<Teller> tellers; typedef list<Teller>::iterator TellIt; tellers.push_back(Teller(customers)); srand(time(0)); // Seed random number generator while(true) { // Add a random number of customers to the // queue, with random service times: for(int i = 0; i < rand() % 5; i++) customers.push(Customer(rand() % 15 + 1)); cout << '{' << tellers.size() << '}' << customers << endl; // Have the tellers service the queue: for(TellIt i = tellers.begin(); i != tellers.end(); i++) (*i).run(); cout << '{' << tellers.size() << '}' << customers << endl; // If line is too long, add another teller: if(customers.size() / tellers.size() > 2) tellers.push_back(Teller(customers)); // If line is short enough, remove a teller: if(tellers.size() > 1 && customers.size() / tellers.size() < 2) for(TellIt i = tellers.begin(); i != tellers.end(); i++) if(!(*i).isBusy()) { tellers.erase(i); break; // Out of for loop } } } ///:~
Each customer requires a certain amount of service time, which
is the number of time units that a teller must spend on the customer in order to
serve that customer’s needs. Of course, the amount of service time will be
different for each customer, and will be determined randomly. In addition, you
won’t know how many customers will be arriving in each interval, so this
will also be determined randomly.
The Customer objects are kept in a
queue<Customer>, and each Teller object keeps a reference to
that queue. When a Teller object is finished with its current
Customer object, that Teller will get another Customer from
the queue and begin working on the new Customer, reducing the
Customer’s service time during each time slice that the
Teller is allotted. All this logic is in the run( ) member
function, which is basically a three-way if statement based on whether
the amount of time necessary to serve the customer is less than, greater than or
equal to the amount of time left in the teller’s current time slice.
Notice that if the Teller has more time after finishing with a
Customer, it gets a new customer and recurses into itself.
Just as with a stack, when you use a queue,
it’s only a queue and doesn’t have any of the other
functionality of the basic sequence containers. This includes the ability to get
an iterator in order to step through the stack. However, the underlying
sequence container (that the queue is built upon) is held as a
protected member inside the queue, and the identifier for this
member is specified in the C++ Standard as ‘c’, which means
that you can inherit from queue in order to access the underlying
implementation. The CustomerQ class does exactly that, for the sole
purpose of defining an ostream operator<< that can iterate
through the queue and print out its members.
The driver for the simulation is the infinite while
loop in main( ). At the beginning of each pass through the loop, a
random number of customers are added, with random service times. Both the number
of tellers and the queue contents are displayed so you can see the state of the
system. After running each teller, the display is repeated. At this point, the
system adapts by comparing the number of customers and the number of tellers; if
the line is too long another teller is added and if it is short enough a teller
can be removed. It is in this adaptation section of the program that you can
experiment with policies regarding the optimal addition and removal of tellers.
If this is the only section that you’re modifying, you may want to
encapsulate policies inside of different
objects.
When you push( ) an object onto a
priority_queue, that object is sorted into the queue according to a
function or function object (you can allow the default less template to
supply this, or provide one of your own). The priority_queue ensures that
when you look at the top( ) element it will be the one with the
highest priority. When you’re done with it, you call pop( ) to
remove it and bring the next one into place. Thus, the priority_queue has
nearly the same interface as a stack, but it behaves
differently.
Like stack and queue, priority_queue is
an adapter which is built on top of one of the basic sequences – the
default is vector.
It’s trivial to make a priority_queue that works
with ints:
//: C04:PriorityQueue1.cpp #include <iostream> #include <queue> #include <cstdlib> #include <ctime> using namespace std; int main() { priority_queue<int> pqi; srand(time(0)); // Seed random number generator for(int i = 0; i < 100; i++) pqi.push(rand() % 25); while(!pqi.empty()) { cout << pqi.top() << ' '; pqi.pop(); } } ///:~
This pushes into the priority_queue 100 random values
from 0 to 24. When you run this program you’ll see that duplicates are
allowed, and the highest values appear first. To show how you can change the
ordering by providing your own function or function object, the following
program gives lower-valued numbers the highest priority:
//: C04:PriorityQueue2.cpp // Changing the priority #include <iostream> #include <queue> #include <cstdlib> #include <ctime> using namespace std; struct Reverse { bool operator()(int x, int y) { return y < x; } }; int main() { priority_queue<int, vector<int>, Reverse> pqi; // Could also say: // priority_queue<int, vector<int>, // greater<int> > pqi; srand(time(0)); for(int i = 0; i < 100; i++) pqi.push(rand() % 25); while(!pqi.empty()) { cout << pqi.top() << ' '; pqi.pop(); } } ///:~
Although you can easily use the Standard Library
greater template to produce the predicate, I went to the trouble of
creating Reverse so you could see how to do it in case you have a more
complex scheme for ordering your objects.
If you look at the description for priority_queue, you
see that the constructor can be handed a “Compare” object, as shown
above. If you don’t use your own “Compare” object, the default
template behavior is the less template function. You might think (as I
did) that it would make sense to leave the template instantiation as
priority_queue<int>, thus using the default template arguments of
vector<int> and less<int>. Then you could inherit a
new class from less<int>, redefine operator( ) and hand
an object of that type to the priority_queue constructor. I tried this,
and got it to compile, but the resulting program produced the same old
less<int> behavior. The answer lies in the less< >
template:
template <class T> struct less : binary_function<T, T, bool> { // Other stuff... bool operator()(const T& x, const T& y) const { return x < y; } };
The operator( ) is not virtual, so even
though the constructor takes your subclass of less<int> by
reference (thus it doesn’t slice it down to a plain
less<int>), when operator( ) is called, it is the
base-class version that is used. While it is generally reasonable to expect
ordinary classes to behave polymorphically, you cannot make this assumption when
using the STL.
Of course, a priority_queue of int is trivial. A
more interesting problem is a to-do list, where each object contains a
string and a primary and secondary priority value:
//: C04:PriorityQueue3.cpp // A more complex use of priority_queue #include <iostream> #include <queue> #include <string> using namespace std; class ToDoItem { char primary; int secondary; string item; public: ToDoItem(string td, char pri ='A', int sec =1) : item(td), primary(pri), secondary(sec) {} friend bool operator<( const ToDoItem& x, const ToDoItem& y) { if(x.primary > y.primary) return true; if(x.primary == y.primary) if(x.secondary > y.secondary) return true; return false; } friend ostream& operator<<(ostream& os, const ToDoItem& td) { return os << td.primary << td.secondary << ": " << td.item; } }; int main() { priority_queue<ToDoItem> toDoList; toDoList.push(ToDoItem("Empty trash", 'C', 4)); toDoList.push(ToDoItem("Feed dog", 'A', 2)); toDoList.push(ToDoItem("Feed bird", 'B', 7)); toDoList.push(ToDoItem("Mow lawn", 'C', 3)); toDoList.push(ToDoItem("Water lawn", 'A', 1)); toDoList.push(ToDoItem("Feed cat", 'B', 1)); while(!toDoList.empty()) { cout << toDoList.top() << endl; toDoList.pop(); } } ///:~
ToDoItem’s operator< must be a
non-member function for it to work with less< >. Other than that,
everything happens automatically. The output is:
A1: Water lawn A2: Feed dog B1: Feed cat B7: Feed bird C3: Mow lawn C4: Empty trash
Note that you cannot iterate through a priority_queue.
However, it is possible to emulate the behavior of a priority_queue using
a vector, thus allowing you access to that vector. You can do this
by looking at the implementation of priority_queue, which uses
make_heap( ), push_heap( ) and pop_heap( )
(they are the soul of the priority_queue; in fact you could say that the
heap is the priority queue and priority_queue is just a wrapper
around it). This turns out to be reasonably straightforward, but you might think
that a shortcut is possible. Since the container used by priority_queue
is protected (and has the identifier, according to the Standard C++
specification, named c) you can inherit a new class which provides access
to the underlying implementation:
//: C04:PriorityQueue4.cpp // Manipulating the underlying implementation #include <iostream> #include <queue> #include <cstdlib> #include <ctime> using namespace std; class PQI : public priority_queue<int> { public: vector<int>& impl() { return c; } }; int main() { PQI pqi; srand(time(0)); for(int i = 0; i < 100; i++) pqi.push(rand() % 25); copy(pqi.impl().begin(), pqi.impl().end(), ostream_iterator<int>(cout, " ")); cout << endl; while(!pqi.empty()) { cout << pqi.top() << ' '; pqi.pop(); } } ///:~
However, if you run this program you’ll discover that
the vector doesn’t contain the items in the descending order that
you get when you call pop( ), the order that you want from the
priority queue. It would seem that if you want to create a vector that is
a priority queue, you have to do it by hand, like this:
//: C04:PriorityQueue5.cpp // Building your own priority queue #include <iostream> #include <queue> #include <cstdlib> #include <ctime> using namespace std; template<class T, class Compare> class PQV : public vector<T> { Compare comp; public: PQV(Compare cmp = Compare()) : comp(cmp) { make_heap(begin(), end(), comp); } const T& top() { return front(); } void push(const T& x) { push_back(x); push_heap(begin(), end(), comp); } void pop() { pop_heap(begin(), end(), comp); pop_back(); } }; int main() { PQV<int, less<int> > pqi; srand(time(0)); for(int i = 0; i < 100; i++) pqi.push(rand() % 25); copy(pqi.begin(), pqi.end(), ostream_iterator<int>(cout, " ")); cout << endl; while(!pqi.empty()) { cout << pqi.top() << ' '; pqi.pop(); } } ///:~
But this program behaves in the same way as the previous one!
What you are seeing in the underlying vector is called a heap.
This heap represents the tree of the priority queue (stored in the linear
structure of the vector), but when you iterate through it you do not get
a linear priority-queue order. You might think that you can simply call
sort_heap( ), but that only works once, and then you don’t
have a heap anymore, but instead a sorted list. This means that to go back to
using it as a heap the user must remember to call make_heap( )
first. This can be encapsulated into your custom priority queue:
//: C04:PriorityQueue6.cpp #include <iostream> #include <queue> #include <algorithm> #include <cstdlib> #include <ctime> using namespace std; template<class T, class Compare> class PQV : public vector<T> { Compare comp; bool sorted; void assureHeap() { if(sorted) { // Turn it back into a heap: make_heap(begin(), end(), comp); sorted = false; } } public: PQV(Compare cmp = Compare()) : comp(cmp) { make_heap(begin(), end(), comp); sorted = false; } const T& top() { assureHeap(); return front(); } void push(const T& x) { assureHeap(); // Put it at the end: push_back(x); // Re-adjust the heap: push_heap(begin(), end(), comp); } void pop() { assureHeap(); // Move the top element to the last position: pop_heap(begin(), end(), comp); // Remove that element: pop_back(); } void sort() { if(!sorted) { sort_heap(begin(), end(), comp); reverse(begin(), end()); sorted = true; } } }; int main() { PQV<int, less<int> > pqi; srand(time(0)); for(int i = 0; i < 100; i++) { pqi.push(rand() % 25); copy(pqi.begin(), pqi.end(), ostream_iterator<int>(cout, " ")); cout << "\n-----\n"; } pqi.sort(); copy(pqi.begin(), pqi.end(), ostream_iterator<int>(cout, " ")); cout << "\n-----\n"; while(!pqi.empty()) { cout << pqi.top() << ' '; pqi.pop(); } } ///:~
If sorted is true, then the vector is not
organized as a heap, but instead as a sorted sequence. assureHeap( )
guarantees that it’s put back into heap form before performing any heap
operations on it.
The first for loop in main( ) now has the
additional quality that it displays the heap as it’s being
built.
The only drawback to this solution is that the user must
remember to call sort( ) before viewing it as a sorted sequence
(although one could conceivably override all the methods that produce iterators
so that they guarantee sorting). Another solution is to build a priority queue
that is not a vector, but will build you a vector whenever you
want one:
//: C04:PriorityQueue7.cpp // A priority queue that will hand you a vector #include <iostream> #include <queue> #include <algorithm> #include <cstdlib> #include <ctime> using namespace std; template<class T, class Compare> class PQV { vector<T> v; Compare comp; public: // Don't need to call make_heap(); it's empty: PQV(Compare cmp = Compare()) : comp(cmp) {} void push(const T& x) { // Put it at the end: v.push_back(x); // Re-adjust the heap: push_heap(v.begin(), v.end(), comp); } void pop() { // Move the top element to the last position: pop_heap(v.begin(), v.end(), comp); // Remove that element: v.pop_back(); } const T& top() { return v.front(); } bool empty() const { return v.empty(); } int size() const { return v.size(); } typedef vector<T> TVec; TVec vector() { TVec r(v.begin(), v.end()); // It’s already a heap sort_heap(r.begin(), r.end(), comp); // Put it into priority-queue order: reverse(r.begin(), r.end()); return r; } }; int main() { PQV<int, less<int> > pqi; srand(time(0)); for(int i = 0; i < 100; i++) pqi.push(rand() % 25); const vector<int>& v = pqi.vector(); copy(v.begin(), v.end(), ostream_iterator<int>(cout, " ")); cout << "\n-----------\n"; while(!pqi.empty()) { cout << pqi.top() << ' '; pqi.pop(); } } ///:~
PQV follows the same form as the STL’s
priority_queue, but has the additional member vector( ),
which creates a new vector that’s a copy of the one in PQV
(which means that it’s already a heap), then sorts it (thus it
leave’s PQV’s vector untouched), then reverses the
order so that traversing the new vector produces the same effect as
popping the elements from the priority queue.
You may observe that the approach of inheriting from
priority_queue used in PriorityQueue4.cpp could be used with the
above technique to produce more succinct code:
//: C04:PriorityQueue8.cpp // A more compact version of PriorityQueue7.cpp #include <iostream> #include <queue> #include <algorithm> #include <cstdlib> #include <ctime> using namespace std; template<class T> class PQV : public priority_queue<T> { public: typedef vector<T> TVec; TVec vector() { TVec r(c.begin(), c.end()); // c is already a heap sort_heap(r.begin(), r.end(), comp); // Put it into priority-queue order: reverse(r.begin(), r.end()); return r; } }; int main() { PQV<int> pqi; srand(time(0)); for(int i = 0; i < 100; i++) pqi.push(rand() % 25); const vector<int>& v = pqi.vector(); copy(v.begin(), v.end(), ostream_iterator<int>(cout, " ")); cout << "\n-----------\n"; while(!pqi.empty()) { cout << pqi.top() << ' '; pqi.pop(); } } ///:~
The brevity of this solution makes it the simplest and most
desirable, plus it’s guaranteed that the user will not have a
vector in the unsorted state. The only potential problem is that the
vector( ) member function returns the vector<T> by
value, which might cause some overhead issues with complex values of the
parameter type T.
Most of my computer education was in hardware-level design and
programming, and I spent my first few years doing embedded systems development.
Because C was a language that purported to be “close to the
hardware,” I have always found it dismaying that there was no native
binary representation for numbers. Decimal, of course, and hexadecimal
(tolerable only because it’s easier to group the bits in your mind), but
octal? Ugh. Whenever you read specs for chips you’re trying to program,
they don’t describe the chip registers in octal, or even hexadecimal
– they use binary. And yet C won’t let you say 0b0101101,
which is the obvious solution for a language close to the hardware.
Although there’s still no native binary representation
in C++, things have improved with the addition of two classes: bitset and
vector<bool>, both of which are designed to manipulate a group of
on-off values. The primary differences between these types are:
There is no trivial
conversion between a bitset and a vector<bool>, which
implies that the two are for very different
purposes.
The template for bitset accepts an integral template
argument which is the number of bits to represent. Thus, bitset<10>
is a different type than bitset<20>, and you cannot perform
comparisons, assignments, etc. between the two.
A bitset provides virtually any bit operation that you
could ask for, in a very efficient form. However, each bitset is made up
of an integral number of longs (typically 32 bits), so even though it
uses no more space than it needs, it always uses at least the size of a
long. This means you’ll use space most efficiently if you increase
the size of your bitsets in chunks of the number of bits in a
long. In addition, the only conversion from a bitset to a
numerical value is to an unsigned long, which means that 32 bits (if your
long is the typical size) is the most flexible form of a
bitset.
The following example tests almost all the functionality of
the bitset (the missing operations are redundant or trivial).
You’ll see the description of each of the bitset outputs to the right
of the output so that the bits all line up and you can compare them to the
source values. If you still don’t understand bitwise operations, running
this program should help.
//: C04:BitSet.cpp // Exercising the bitset class #include <iostream> #include <bitset> #include <cstdlib> #include <ctime> #include <climits> #include <string> using namespace std; const int sz = 32; typedef bitset<sz> BS; template<int bits> bitset<bits> randBitset() { bitset<bits> r(rand()); for(int i = 0; i < bits/16 - 1; i++) { r <<= 16; // "OR" together with a new lower 16 bits: r |= bitset<bits>(rand()); } return r; } int main() { srand(time(0)); cout << "sizeof(bitset<16>) = " << sizeof(bitset<16>) << endl; cout << "sizeof(bitset<32>) = " << sizeof(bitset<32>) << endl; cout << "sizeof(bitset<48>) = " << sizeof(bitset<48>) << endl; cout << "sizeof(bitset<64>) = " << sizeof(bitset<64>) << endl; cout << "sizeof(bitset<65>) = " << sizeof(bitset<65>) << endl; BS a(randBitset<sz>()), b(randBitset<sz>()); // Converting from a bitset: unsigned long ul = a.to_ulong(); string s = b.to_string(); // Converting a string to a bitset: char* cbits = "111011010110111"; cout << "char* cbits = " << cbits <<endl; cout << BS(cbits) << " [BS(cbits)]" << endl; cout << BS(cbits, 2) << " [BS(cbits, 2)]" << endl; cout << BS(cbits, 2, 11) << " [BS(cbits, 2, 11)]" << endl; cout << a << " [a]" << endl; cout << b << " [b]"<< endl; // Bitwise AND: cout << (a & b) << " [a & b]" << endl; cout << (BS(a) &= b) << " [a &= b]" << endl; // Bitwise OR: cout << (a | b) << " [a | b]" << endl; cout << (BS(a) |= b) << " [a |= b]" << endl; // Exclusive OR: cout << (a ^ b) << " [a ^ b]" << endl; cout << (BS(a) ^= b) << " [a ^= b]" << endl; cout << a << " [a]" << endl; // For reference // Logical left shift (fill with zeros): cout << (BS(a) <<= sz/2) << " [a <<= (sz/2)]" << endl; cout << (a << sz/2) << endl; cout << a << " [a]" << endl; // For reference // Logical right shift (fill with zeros): cout << (BS(a) >>= sz/2) << " [a >>= (sz/2)]" << endl; cout << (a >> sz/2) << endl; cout << a << " [a]" << endl; // For reference cout << BS(a).set() << " [a.set()]" << endl; for(int i = 0; i < sz; i++) if(!a.test(i)) { cout << BS(a).set(i) << " [a.set(" << i <<")]" << endl; break; // Just do one example of this } cout << BS(a).reset() << " [a.reset()]"<< endl; for(int j = 0; j < sz; j++) if(a.test(j)) { cout << BS(a).reset(j) << " [a.reset(" << j <<")]" << endl; break; // Just do one example of this } cout << BS(a).flip() << " [a.flip()]" << endl; cout << ~a << " [~a]" << endl; cout << a << " [a]" << endl; // For reference cout << BS(a).flip(1) << " [a.flip(1)]"<< endl; BS c; cout << c << " [c]" << endl; cout << "c.count() = " << c.count() << endl; cout << "c.any() = " << (c.any() ? "true" : "false") << endl; cout << "c.none() = " << (c.none() ? "true" : "false") << endl; c[1].flip(); c[2].flip(); cout << c << " [c]" << endl; cout << "c.count() = " << c.count() << endl; cout << "c.any() = " << (c.any() ? "true" : "false") << endl; cout << "c.none() = " << (c.none() ? "true" : "false") << endl; // Array indexing operations: c.reset(); for(int k = 0; k < c.size(); k++) if(k % 2 == 0) c[k].flip(); cout << c << " [c]" << endl; c.reset(); // Assignment to bool: for(int ii = 0; ii < c.size(); ii++) c[ii] = (rand() % 100) < 25; cout << c << " [c]" << endl; // bool test: if(c[1] == true) cout << "c[1] == true"; else cout << "c[1] == false" << endl; } ///:~
To generate interesting random bitsets, the
randBitset( ) function is created. The Standard C
rand( ) function only generates an int, so this function
demonstrates operator<<= by shifting each 16 random bits to the
left until the bitset (which is templatized in this function for size) is
full. The generated number and each new 16 bits is combined using the
operator|=.
The first thing demonstrated in main( ) is the
unit size of a bitset. If it is less than 32 bits, sizeof produces
4 (4 bytes = 32 bits), which is the size of a single long on most
implementations. If it’s between 32 and 64, it requires two longs,
greater than 64 requires 3 longs, etc. Thus you make the best use of
space if you use a bit quantity that fits in an integral number of longs.
However, notice there’s no extra overhead for the object –
it’s as if you were hand-coding to use a long.
Another clue that bitset is optimized for longs
is that there is a to_ulong( ) member function that produces the
value of the bitset as an unsigned long. There are no other numerical
conversions from bitset, but there is a to_string( )
conversion that produces a string containing ones and zeros, and this can
be as long as the actual bitset. However, using bitset<32>
may make your life simpler because of to_ulong( ).
There’s still no primitive format for binary values, but
the next best thing is supported by bitset: a string of ones and
zeros with the least-significant bit (lsb) on the right. The three constructors
demonstrated show taking the entire string (the char array is
automatically converted to a string), the string starting at
character 2, and the string from character 2 through 11. You can write to an
ostream from a bitset using operator<< and it comes
out as ones and zeros. You can also read from an istream using
operator>> (not shown here).
You’ll notice that bitset only has three
non-member operators: and (&), or (|) and
exclusive-or (^). Each of these create a new bitset as
their return value. All of the member operators opt for the more efficient
&=, |=, etc. form where a temporary is not created. However,
these forms actually change their lvalue (which is a in most of the tests
in the above example). To prevent this, I created a temporary to be used as the
lvalue by invoking the copy-constructor on a; this is why you see the
form BS(a). The result of each test is printed out, and occasionally
a is reprinted so you can easily look at it for reference.
The rest of the example should be self-explanatory when you
run it; if not you can find the details in your compiler’s documentation
or the other documentation mentioned earlier in this
chapter.
vector<bool> is a specialization of the
vector template. A normal bool variable requires at least one
byte, but since a bool only has two states the ideal implementation of
vector<bool> is such that each bool value only requires one
bit. This means the iterator must be specially-defined, and cannot be a
bool*.
The bit-manipulation functions for vector<bool>
are much more limited than those of bitset. The only member function that
was added to those already in vector is flip( ), to invert
all the bits; there is no set( ) or reset( ) as in
bitset. When you use operator[ ], you get back an object of type
vector<bool>::reference, which also has a flip( ) to
invert that individual bit.
//: C04:VectorOfBool.cpp // Demonstrate the vector<bool> specialization #include <iostream> #include <sstream> #include <vector> #include <bitset> #include <iterator> using namespace std; int main() { vector<bool> vb(10, true); vector<bool>::iterator it; for(it = vb.begin(); it != vb.end(); it++) cout << *it; cout << endl; vb.push_back(false); ostream_iterator<bool> out(cout, ""); copy(vb.begin(), vb.end(), out); cout << endl; bool ab[] = { true, false, false, true, true, true, true, false, false, true }; // There's a similar constructor: vb.assign(ab, ab + sizeof(ab)/sizeof(bool)); copy(vb.begin(), vb.end(), out); cout << endl; vb.flip(); // Flip all bits copy(vb.begin(), vb.end(), out); cout << endl; for(int i = 0; i < vb.size(); i++) vb[i] = 0; // (Equivalent to "false") vb[4] = true; vb[5] = 1; vb[7].flip(); // Invert one bit copy(vb.begin(), vb.end(), out); cout << endl; // Convert to a bitset: ostringstream os; copy(vb.begin(), vb.end(), ostream_iterator<bool>(os, "")); bitset<10> bs(os.str()); cout << "Bitset:\n" << bs << endl; } ///:~
The last part of this example takes a
vector<bool> and converts it to a bitset by first turning it
into a string of ones and zeros. Of course, you must know the size of the
bitset at compile-time. You can see that this conversion is not the kind
of operation you’ll want to do on a regular
basis.
The set, map, multiset and
multimap are called associative containers because they associate
keys with values. Well, at least maps and multimaps
associate keys to values, but you can look at a set as a map that
has no values, only keys (and they can in fact be implemented this way), and the
same for the relationship between multiset and multimap. So,
because of the structural similarity sets and multisets are lumped
in with associative containers.
The most important basic operations with associative
containers are putting things in, and in the case of a set, seeing if
something is in the set. In the case of a map, you want to first see if a
key is in the map, and if it exists you want the associated value for
that key to be returned. Of course, there are many variations on this theme but
that’s the fundamental concept. The following example shows these
basics:
//: C04:AssociativeBasics.cpp // Basic operations with sets and maps #include "Noisy.h" #include <iostream> #include <set> #include <map> using namespace std; int main() { Noisy na[] = { Noisy(), Noisy(), Noisy(), Noisy(), Noisy(), Noisy(), Noisy() }; // Add elements via constructor: set<Noisy> ns(na, na+ sizeof na/sizeof(Noisy)); // Ordinary insertion: Noisy n; ns.insert(n); cout << endl; // Check for set membership: cout << "ns.count(n)= " << ns.count(n) << endl; if(ns.find(n) != ns.end()) cout << "n(" << n << ") found in ns" << endl; // Print elements: copy(ns.begin(), ns.end(), ostream_iterator<Noisy>(cout, " ")); cout << endl; cout << "\n-----------\n"; map<int, Noisy> nm; for(int i = 0; i < 10; i++) nm[i]; // Automatically makes pairs cout << "\n-----------\n"; for(int j = 0; j < nm.size(); j++) cout << "nm[" << j <<"] = " << nm[j] << endl; cout << "\n-----------\n"; nm[10] = n; cout << "\n-----------\n"; nm.insert(make_pair(47, n)); cout << "\n-----------\n"; cout << "\n nm.count(10)= " << nm.count(10) << endl; cout << "nm.count(11)= " << nm.count(11) << endl; map<int, Noisy>::iterator it = nm.find(6); if(it != nm.end()) cout << "value:" << (*it).second << " found in nm at location 6" << endl; for(it = nm.begin(); it != nm.end(); it++) cout << (*it).first << ":" << (*it).second << ", "; cout << "\n-----------\n"; } ///:~
The set<Noisy> object ns is created using
two iterators into an array of Noisy objects, but there is also a default
constructor and a copy-constructor, and you can pass in an object that provides
an alternate scheme for doing comparisons. Both sets and maps have
an insert( ) member function to put things in, and there are a
couple of different ways to check to see if an object is already in an
associative container: count( ), when given a key, will tell you how
many times that key occurs (this can only be zero or one in a set or
map, but it can be more than one with a multiset or
multimap). The find( ) member function will produce an
iterator indicating the first occurrence (with set and map, the
only occurrence) of the key that you give it, or the past-the-end
iterator if it can’t find the key. The count( ) and
find( ) member functions exist for all the associative containers,
which makes sense. The associative containers also have member functions
lower_bound( ), upper_bound( ) and
equal_range( ), which actually only make sense for multiset
and multimap, as you shall see (but don’t try to figure out how
they would be useful for set and map, since they are designed for
dealing with a range of duplicate keys, which those containers don’t
allow).
Designing an operator[ ] always produces a little bit
of a dilemma because it’s intended to be treated as an array-indexing
operation, so people don’t tend to think about performing a test before
they use it. But what happens if you decide to index out of the bounds of the
array? One option, of course, is to throw an exception, but with a map
“indexing out of the array” could mean that you want an entry there,
and that’s the way the STL map treats it. The first for loop
after the creation of the map<int, Noisy> nm just “looks
up” objects using the operator[ ], but this is actually creating
new Noisy objects! The map creates a new key-value pair (using the
default constructor for the value) if you look up a value with operator[
] and it isn’t there. This means that if you really just want to look
something up and not create a new entry, you must use count( ) (to
see if it’s there) or find( ) (to get an iterator to
it).
The for loop that prints out the values of the
container using operator[ ] has a number of problems. First, it requires
integral keys (which we happen to have in this case). Next and worse, if all the
keys are not sequential, you’ll end up counting from 0 to the size of the
container, and if there are some spots which don’t have key-value pairs
you’ll automatically create them, and miss some of the higher values of
the keys. Finally, if you look at the output from the for loop
you’ll see that things are very busy, and it’s quite puzzling
at first why there are so many constructions and destructions for what appears
to be a simple lookup. The answer only becomes clear when you look at the code
in the map template for operator[ ], which will be something like
this:
mapped_type& operator[] (const key_type& k) { value_type tmp(k,T()); return (*((insert(tmp)).first)).second; }
Following the trail, you’ll find that
map::value_type is:
typedef pair<const Key, T> value_type;
Now you need to know what a pair is, which can be found
in <utility>:
template <class T1, class T2> struct pair { typedef T1 first_type; typedef T2 second_type; T1 first; T2 second; pair(); pair(const T1& x, const T2& y) : first(x), second(y) {} // Templatized copy-constructor: template<class U, class V> pair(const pair<U, V> &p); };
It turns out this is a very important (albeit simple)
struct which is used quite a bit in the STL. All it really does it
package together two objects, but it’s very useful, especially when you
want to return two objects from a function (since a return statement only
takes one object). There’s even a shorthand for creating a pair called
make_pair( ), which is used in
AssociativeBasics.cpp.
So to retrace the steps, map::value_type is a
pair of the key and the value of the map – actually, it’s a
single entry for the map. But notice that pair packages its objects by
value, which means that copy-constructions are necessary to get the objects into
the pair. Thus, the creation of tmp in map::operator[ ]
will involve at least a copy-constructor call and destructor call for each
object in the pair. Here, we’re getting off easy because the key is
an int. But if you want to really see what kind of activity can result
from map::operator[ ], try running this:
//: C04:NoisyMap.cpp // Mapping Noisy to Noisy #include "Noisy.h" #include <map> using namespace std; int main() { map<Noisy, Noisy> mnn; Noisy n1, n2; cout << "\n--------\n"; mnn[n1] = n2; cout << "\n--------\n"; cout << mnn[n1] << endl; cout << "\n--------\n"; } ///:~
You’ll see that both the insertion and lookup generate a
lot of extra objects, and that’s because of the creation of the tmp
object. If you look back up at map::operator[ ] you’ll see that the
second line calls insert( ) passing it tmp – that is,
operator[ ] does an insertion every time. The return value of
insert( ) is a different kind of pair, where first is
an iterator pointing to the key-value pair that was just inserted, and
second is a bool indicating whether the insertion took place. You
can see that operator[ ] grabs first (the iterator), dereferences
it to produce the pair, and then returns the second which is the
value at that location.
So on the upside, map has this fancy “make a new
entry if one isn’t there” behavior, but the downside is that you
always get a lot of extra object creations and destructions when you use
map::operator[ ]. Fortunately, AssociativeBasics.cpp also
demonstrates how to reduce the overhead of insertions and deletions, by not
using operator[ ] if you don’t have to. The insert( )
member function is slightly more efficient than operator[ ]. With a
set you only hold one object, but with a map you hold key-value
pairs, so insert( ) requires a pair as its argument.
Here’s where make_pair( ) comes in handy, as you can
see.
For looking objects up in a map, you can use
count( ) to see whether a key is in the map, or you can use
find( ) to produce an iterator pointing directly at the key-value
pair. Again, since the map contains pairs that’s what the
iterator produces when you dereference it, so you have to select first
and second. When you run AssociativeBasics.cpp you’ll notice
that the iterator approach involves no extra object creations or destructions at
all. It’s not as easy to write or read, though.
If you use a map with large, complex objects and
discover there’s too much overhead when doing lookups and insertions
(don’t assume this from the beginning – take the easy approach first
and use a profiler to discover bottlenecks), then you can use the counted-handle
approach shown in Chapter XX so that you are only passing around small,
lightweight objects.
Of course, you can also iterate through a set or
map and operate on each of its objects. This will be demonstrated in
later examples.
You’ve seen how useful the fill( ),
fill_n( ), generate( ) and generate_n( )
function templates in <algorithm> have been for filling the
sequential containers (vector, list and deque) with data.
However, these are implemented by using operator= to assign values into
the sequential containers, and the way that you add objects to associative
containers is with their respective insert( ) member functions. Thus
the default “assignment” behavior causes a problem when trying to
use the “fill” and “generate” functions with associative
containers.
One solution is to duplicate the “fill” and
“generate” functions, creating new ones that can be used with
associative containers. It turns out that only the fill_n( ) and
generate_n( ) functions can be duplicated (fill( ) and
generate( ) copy in between two iterators, which doesn’t make
sense with associative containers), but the job is fairly easy, since you have
the <algorithm> header file to work from (and since it contains
templates, all the source code is there):
//: C04:assocGen.h // The fill_n() and generate_n() equivalents // for associative containers. #ifndef ASSOCGEN_H #define ASSOCGEN_H template<class Assoc, class Count, class T> void assocFill_n(Assoc& a, Count n, const T& val) { while(n-- > 0) a.insert(val); } template<class Assoc, class Count, class Gen> void assocGen_n(Assoc& a, Count n, Gen g) { while(n-- > 0) a.insert(g()); } #endif // ASSOCGEN_H ///:~
You can see that instead of using iterators, the container
class itself is passed (by reference, of course, since you wouldn’t want
to make a local copy, fill it, and then have it discarded at the end of the
scope).
This code demonstrates two valuable lessons. The first lesson
is that if the algorithms don’t do what you want, copy the nearest thing
and modify it. You have the example at hand in the STL header, so most of the
work has already been done.
The second lesson is more pointed: if you look long enough,
there’s probably a way to do it in the STL without inventing
anything new. The present problem can instead be solved by using an
insert_iterator (produced by a call to inserter( )), which
calls insert( ) to place items in the container instead of
operator=. This is not simply a variation of
front_insert_iterator (produced by a call to
front_inserter( )) or back_insert_iterator (produced by a
call to back_inserter( )), since those iterators use
push_front( ) and push_back( ), respectively. Each of
the insert iterators is different by virtue of the member function it uses for
insertion, and insert( ) is the one we need. Here’s a
demonstration that shows filling and generating both a map and a
set (of course, it can also be used with multimap and
multiset). First, some templatized, simple generators are created (this
may seem like overkill, but you never know when you’ll need them; for that
reason they’re placed in a header file):
//: C04:SimpleGenerators.h // Generic generators, including // one that creates pairs #include <iostream> #include <utility> // A generator that increments its value: template<typename T> class IncrGen { T i; public: IncrGen(T ii) : i (ii) {} T operator()() { return i++; } }; // A generator that produces an STL pair<>: template<typename T1, typename T2> class PairGen { T1 i; T2 j; public: PairGen(T1 ii, T2 jj) : i(ii), j(jj) {} std::pair<T1,T2> operator()() { return std::pair<T1,T2>(i++, j++); } }; // A generic global operator<< // for printing any STL pair<>: template<typename Pair> std::ostream& operator<<(std::ostream& os, const Pair& p) { return os << p.first << "\t" << p.second << std::endl; } ///:~
Both generators expect that T can be incremented, and
they simply use operator++ to generate new values from whatever you used
for initialization. PairGen creates an STL pair object as its
return value, and that’s what can be placed into a map or
multimap using insert( ).
The last function is a generalization of
operator<< for ostreams, so that any pair can be
printed, assuming each element of the pair supports a stream
operator<<. As you can see below, this allows the use of
copy( ) to output the map:
//: C04:AssocInserter.cpp // Using an insert_iterator so fill_n() and // generate_n() can be used with associative // containers #include "SimpleGenerators.h" #include <iterator> #include <iostream> #include <algorithm> #include <set> #include <map> using namespace std; int main() { set<int> s; fill_n(inserter(s, s.begin()), 10, 47); generate_n(inserter(s, s.begin()), 10, IncrGen<int>(12)); copy(s.begin(), s.end(), ostream_iterator<int>(cout, "\n")); map<int, int> m; fill_n(inserter(m, m.begin()), 10, make_pair(90,120)); generate_n(inserter(m, m.begin()), 10, PairGen<int, int>(3, 9)); copy(m.begin(), m.end(), ostream_iterator<pair<int,int> >(cout,"\n")); } ///:~
The second argument to inserter is an iterator, which
actually isn’t used in the case of associative containers since they
maintain their order internally, rather than allowing you to tell them where the
element should be inserted. However, an insert_iterator can be used with
many different types of containers so you must provide the iterator.
Note how the ostream_iterator is created to output a
pair; this wouldn’t have worked if the operator<<
hadn’t been created, and since it’s a template it is automatically
instantiated for pair<int,
int>.
An ordinary array uses an integral value to index into a
sequential set of elements of some type. A map is an associative
array, which means you associate one object with another in an array-like
fashion, but instead of selecting an array element with a number as you do with
an ordinary array, you look it up with an object! The example which follows
counts the words in a text file, so the index is the string object
representing the word, and the value being looked up is the object that keeps
count of the strings.
In a single-item container like a vector or
list, there’s only one thing being held. But in a map,
you’ve got two things: the key (what you look up by, as in
mapname[key]) and the value that results from the lookup with the
key. If you simply want to move through the entire map and list each
key-value pair, you use an iterator, which when dereferenced produces a
pair object containing both the key and the value. You access the members
of a pair by selecting first or second.
This same philosophy of packaging two items together is also
used to insert elements into the map, but the pair is created as part of
the instantiated map and is called value_type, containing the key
and the value. So one option for inserting a new element is to create a
value_type object, loading it with the appropriate objects and then
calling the insert( ) member function for the map. Instead,
the following example makes use of the aforementioned special feature of
map: if you’re trying to find an object by passing in a key to
operator[ ] and that object doesn’t exist, operator[ ] will
automatically insert a new key-value pair for you, using the default constructor
for the value object. With that in mind, consider an implementation of a word
counting program:
//: C04:WordCount.cpp //{L} StreamTokenizer // Count occurrences of words using a map #include "StreamTokenizer.h" #include "../require.h" #include <string> #include <map> #include <iostream> #include <fstream> using namespace std; class Count { int i; public: Count() : i(0) {} void operator++(int) { i++; } // Post-increment int& val() { return i; } }; typedef map<string, Count> WordMap; typedef WordMap::iterator WMIter; int main(int argc, char* argv[]) { requireArgs(argc, 1); ifstream in(argv[1]); assure(in, argv[1]); StreamTokenizer words(in); WordMap wordmap; string word; while((word = words.next()).size() != 0) wordmap[word]++; for(WMIter w = wordmap.begin(); w != wordmap.end(); w++) cout << (*w).first << ": " << (*w).second.val() << endl; } ///:~
The need for the Count class is to contain an
int that’s automatically initialized to zero. This is necessary
because of the crucial line:
wordmap[word]++;
This finds the word that has been produced by
StreamTokenizer and increments the Count object associated with
that word, which is fine as long as there is a key-value pair for that
string. If there isn’t, the map automatically inserts a key
for the word you’re looking up, and a Count object, which is
initialized to zero by the default constructor. Thus, when it’s
incremented the Count becomes 1.
Printing the entire list requires traversing it with an
iterator (there’s no copy( ) shortcut for a map unless
you want to write an operator<< for the pair in the map). As
previously mentioned, dereferencing this iterator produces a pair object,
with the first member the key and the second member the value. In
this case second is a Count object, so its val( )
member must be called to produce the actual word count.
If you want to find the count for a particular word, you can
use the array index operator, like this:
cout << "the: " << wordmap["the"].val() << endl;
You can see that one of the great advantages of the map
is the clarity of the syntax; an associative array makes intuitive sense to the
reader (note, however, that if “the” isn’t already in the
wordmap a new entry will be created!).
A problem that often comes up in programming is the management
of program arguments that you can specify on the command line. Usually
you’d like to have a set of defaults that can be changed via the command
line. The following tool expects the command line arguments to be in the form
flag1=value1 with no spaces around the ‘=‘ (so it will
be treated as a single argument). The ProgVal class simply inherits from
map<string, string>:
//: C04:ProgVals.h // Program values can be changed by command line #ifndef PROGVALS_H #define PROGVALS_H #include <map> #include <iostream> #include <string> class ProgVals : public std::map<std::string, std::string> { public: ProgVals(std::string defaults[][2], int sz); void parse(int argc, char* argv[], std::string usage, int offset = 1); void print(std::ostream& out = std::cout); }; #endif // PROGVALS_H ///:~
The constructor expects an array of string pairs (as
you’ll see, this allows you to initialize it with an array of
char*) and the size of that array. The parse( ) member
function is handed the command-line arguments along with a “usage”
string to print if the command line is given incorrectly, and the
“offset” which tells it which command-line argument to start with
(so you can have non-flag arguments at the beginning of the command line).
Finally, print( ) displays the values. Here is the
implementation:
//: C04:ProgVals.cpp {O} #include "ProgVals.h" using namespace std; ProgVals::ProgVals( std::string defaults[][2], int sz) { for(int i = 0; i < sz; i++) insert(make_pair( defaults[i][0], defaults[i][1])); } void ProgVals::parse(int argc, char* argv[], string usage, int offset) { // Parse and apply additional // command-line arguments: for(int i = offset; i < argc; i++) { string flag(argv[i]); int equal = flag.find('='); if(equal == string::npos) { cerr << "Command line error: " << argv[i] << endl << usage << endl; continue; // Next argument } string name = flag.substr(0, equal); string value = flag.substr(equal + 1); if(find(name) == end()) { cerr << name << endl << usage << endl; continue; // Next argument } operator[](name) = value; } } void ProgVals::print(ostream& out) { out << "Program values:" << endl; for(iterator it = begin(); it != end(); it++) out << (*it).first << " = " << (*it).second << endl; } ///:~
The constructor uses the STL make_pair( ) helper
function to convert each pair of char* into a pair object that can
be inserted into the map. In parse( ), each command-line
argument is checked for the existence of the telltale ‘=‘
sign (reporting an error if it isn’t there), and then is broken into two
strings, the name which appears before the ‘=‘, and
the value which appears after. The operator[ ] is then used
to change the existing value to the new one.
Here’s an example to test the tool:
//: C04:ProgValTest.cpp //{L} ProgVals #include "ProgVals.h" using namespace std; string defaults[][2] = { { "color", "red" }, { "size", "medium" }, { "shape", "rectangular" }, { "action", "hopping"}, }; const char* usage = "usage:\n" "ProgValTest [flag1=val1 flag2=val2 ...]\n" "(Note no space around '=')\n" "Where the flags can be any of: \n" "color, size, shape, action \n"; // So it can be used globally: ProgVals pvals(defaults, sizeof defaults / sizeof *defaults); class Animal { string color, size, shape, action; public: Animal(string col, string sz, string shp, string act) :color(col),size(sz),shape(shp),action(act){} // Default constructor uses program default // values, possibly change on command line: Animal() : color(pvals["color"]), size(pvals["size"]), shape(pvals["shape"]), action(pvals["action"]) {} void print() { cout << "color = " << color << endl << "size = " << size << endl << "shape = " << shape << endl << "action = " << action << endl; } // And of course pvals can be used anywhere // else you'd like. }; int main(int argc, char* argv[]) { // Initialize and parse command line values // before any code that uses pvals is called: pvals.parse(argc, argv, usage); pvals.print(); Animal a; cout << "Animal a values:" << endl; a.print(); } ///:~
This program can create Animal objects with different
characteristics, and those characteristics can be established with the command
line. The default characteristics are given in the two-dimensional array of
char* called defaults and, after the usage string you can
see a global instance of ProgVals called pvals is created; this is
important because it allows the rest of the code in the program to access the
values.
Note that Animal’s default constructor uses the
values in pvals inside its constructor initializer list. When you run the
program you can try creating different animal characteristics.
Many command-line programs also use a style of beginning a
flag with a hyphen, and sometimes they use single-character flags.
A multimap is a map that can contain duplicate
keys. At first this may seem like a strange idea, but it can occur surprisingly
often. A phone book, for example, can have many entries with the same name.
Suppose you are monitoring wildlife, and you want to keep
track of where and when each type of animal is spotted. Thus, you may see many
animals of the same kind, all in different locations and at different times. So
if the type of animal is the key, you’ll need a multimap.
Here’s what it looks like:
//: C04:WildLifeMonitor.cpp #include <vector> #include <map> #include <string> #include <algorithm> #include <iostream> #include <sstream> #include <ctime> using namespace std; class DataPoint { int x, y; // Location coordinates time_t time; // Time of Sighting public: DataPoint() : x(0), y(0), time(0) {} DataPoint(int xx, int yy, time_t tm) : x(xx), y(yy), time(tm) {} // Synthesized operator=, copy-constructor OK int getX() { return x; } int getY() { return y; } time_t* getTime() { return &time; } }; string animal[] = { "chipmunk", "beaver", "marmot", "weasel", "squirrel", "ptarmigan", "bear", "eagle", "hawk", "vole", "deer", "otter", "hummingbird", }; const int asz = sizeof animal/sizeof *animal; vector<string> animals(animal, animal + asz); // All the information is contained in a // "Sighting," which can be sent to an ostream: typedef pair<string, DataPoint> Sighting; ostream& operator<<(ostream& os, const Sighting& s) { return os << s.first << " sighted at x= " << s.second.getX() << ", y= " << s.second.getY() << ", time = " << ctime(s.second.getTime()); } // A generator for Sightings: class SightingGen { vector<string>& animals; static const int d = 100; public: SightingGen(vector<string>& an) : animals(an) { srand(time(0)); } Sighting operator()() { Sighting result; int select = rand() % animals.size(); result.first = animals[select]; result.second = DataPoint( rand() % d, rand() % d, time(0)); return result; } }; typedef multimap<string, DataPoint> DataMap; typedef DataMap::iterator DMIter; int main() { DataMap sightings; generate_n( inserter(sightings, sightings.begin()), 50, SightingGen(animals)); // Print everything: copy(sightings.begin(), sightings.end(), ostream_iterator<Sighting>(cout, "")); // Print sightings for selected animal: while(true) { cout << "select an animal or 'q' to quit: "; for(int i = 0; i < animals.size(); i++) cout <<'['<< i <<']'<< animals[i] << ' '; cout << endl; string reply; cin >> reply; if(reply.at(0) == 'q') return 0; istringstream r(reply); int i; r >> i; // Converts to int i %= animals.size(); // Iterators in "range" denote begin, one // past end of matching range: pair<DMIter, DMIter> range = sightings.equal_range(animals[i]); copy(range.first, range.second, ostream_iterator<Sighting>(cout, "")); } } ///:~
All the data about a sighting is encapsulated into the class
DataPoint, which is simple enough that it can rely on the synthesized
assignment and copy-constructor. It uses the Standard C library time functions
to record the time of the sighting.
In the array of string animal, notice that the
char* constructor is automatically used during initialization, which
makes initializing an array of string quite convenient. Since it’s
easier to use the animal names in a vector, the length of the array is
calculated and a vector<string> is initialized using the
vector(iterator, iterator) constructor.
The key-value pairs that make up a Sighting are the
string which names the type of animal, and the DataPoint that says
where and when it was sighted. The standard pair template combines these
two types and is typedefed to produce the Sighting type. Then an
ostream operator<< is created for Sighting; this will
allow you to iterate through a map or multimap of Sightings and
print it out.
SightingGen generates random sightings at random data
points to use for testing. It has the usual operator( ) necessary
for a function object, but it also has a constructor to capture and store a
reference to a vector<string>, which is where the aforementioned
animal names are stored.
A DataMap is a multimap of
string-DataPoint pairs, which means it stores Sightings. It
is filled with 50 Sightings using generate_n( ), and printed
out (notice that because there is an operator<< that takes a
Sighting, an ostream_iterator can be created). At this point the
user is asked to select the animal that they want to see all the sightings for.
If you press ‘q’ the program will quit, but if you select an
animal number, then the equal_range( ) member function is invoked.
This returns an iterator (DMIter) to the beginning of the set of matching
pairs, and one indicating past-the-end of the set. Since only one object can be
returned from a function, equal_range( ) makes use of pair.
Since the range pair has the beginning and ending iterators of the
matching set, those iterators can be used in copy( ) to print out
all the sightings for a particular type of
animal.
You’ve seen the set, which only allows one object
of each value to be inserted. The multiset is odd by comparison since it
allows more than one object of each value to be inserted. This seems to go
against the whole idea of “setness,” where you can ask “is
‘it’ in this set?” If there can be more than one of
‘it’, then what does that question mean?
With some thought, you can see that it makes no sense to have
more than one object of the same value in a set if those duplicate objects are
exactly the same (with the possible exception of counting occurrences of
objects, but as seen earlier in this chapter that can be handled in an
alternative, more elegant fashion). Thus each duplicate object will have
something that makes it unique from the other duplicates – most likely
different state information that is not used in the calculation of the value
during the comparison. That is, to the comparison operation, the objects look
the same but they actually contain some differing internal state.
Like any STL container that must order its elements, the
multiset template uses the less template by default to determine
element ordering. This uses the contained classes’ operator<,
but you may of course substitute your own comparison function.
Consider a simple class that contains one element that is used
in the comparison, and another that is not:
//: C04:MultiSet1.cpp // Demonstration of multiset behavior #include <iostream> #include <set> #include <algorithm> #include <ctime> using namespace std; class X { char c; // Used in comparison int i; // Not used in comparison // Don't need default constructor and operator= X(); X& operator=(const X&); // Usually need a copy-constructor (but the // synthesized version works here) public: X(char cc, int ii) : c(cc), i(ii) {} // Notice no operator== is required friend bool operator<(const X& x, const X& y) { return x.c < y.c; } friend ostream& operator<<(ostream& os, X x) { return os << x.c << ":" << x.i; } }; class Xgen { static int i; // Number of characters to select from: static const int span = 6; public: Xgen() { srand(time(0)); } X operator()() { char c = 'A' + rand() % span; return X(c, i++); } }; int Xgen::i = 0; typedef multiset<X> Xmset; typedef Xmset::const_iterator Xmit; int main() { Xmset mset; // Fill it with X's: generate_n(inserter(mset, mset.begin()), 25, Xgen()); // Initialize a regular set from mset: set<X> unique(mset.begin(), mset.end()); copy(unique.begin(), unique.end(), ostream_iterator<X>(cout, " ")); cout << "\n----\n"; // Iterate over the unique values: for(set<X>::iterator i = unique.begin(); i != unique.end(); i++) { pair<Xmit, Xmit> p = mset.equal_range(*i); copy(p.first, p.second, ostream_iterator<X>(cout, " ")); cout << endl; } } ///:~
In X, all the comparisons are made with the char
c. The comparison is performed with operator<, which is all that
is necessary for the multiset, since in this example the default
less comparison object is used. The class Xgen is used to randomly
generate X objects, but the comparison value is restricted to the span
from ‘A’ to ‘E’. In main( ), a
multiset<X> is created and filled with 25 X objects using
Xgen, guaranteeing that there will be duplicate keys. So that we know
what the unique values are, a regular set<X> is created from the
multiset (using the iterator, iterator constructor). These values
are displayed, then each one is used to produce the equal_range( )
in the multiset (equal_range( ) has the same meaning here as
it does with multimap: all the elements with matching keys). Each set of
matching keys is then printed.
As a second example, a (possibly) more elegant version of
WordCount.cpp can be created using multiset:
//: C04:MultiSetWordCount.cpp //{L} StreamTokenizer // Count occurrences of words using a multiset #include "StreamTokenizer.h" #include "../require.h" #include <string> #include <set> #include <fstream> #include <iterator> using namespace std; int main(int argc, char* argv[]) { requireArgs(argc, 1); ifstream in(argv[1]); assure(in, argv[1]); StreamTokenizer words(in); multiset<string> wordmset; string word; while((word = words.next()).size() != 0) wordmset.insert(word); typedef multiset<string>::iterator MSit; MSit it = wordmset.begin(); while(it != wordmset.end()) { pair<MSit, MSit> p=wordmset.equal_range(*it); int count = distance(p.first, p.second); cout << *it << ": " << count << endl; it = p.second; // Move to the next word } } ///:~
The setup in main( ) is identical to
WordCount.cpp, but then each word is simply inserted into the
multiset<string>. An iterator is created and initialized to the
beginning of the multiset; dereferencing this iterator produces the
current word. equal_range( ) produces the starting and ending
iterators of the word that’s currently selected, and the STL algorithm
distance( ) (which is in <iterator>) is used to count
the number of elements in that range. Then the iterator it is moved
forward to the end of the range, which puts it at the next word. Although if
you’re unfamiliar with the multiset this code can seem more
complex, the density of it and the lack of need for supporting classes like
Count has a lot of appeal.
In the end, is this really a “set,” or should it
be called something else? An alternative is the generic “bag” that
has been defined in some container libraries, since a bag holds anything at all
without discrimination – including duplicate objects. This is close, but
it doesn’t quite fit since a bag has no specification about how elements
should be ordered, while a multiset (which requires that all duplicate
elements be adjacent to each other) is even more restrictive than the concept of
a set, which could use a hashing function to order its elements, in which case
they would not be in sorted order. Besides, if you wanted to store a bunch of
objects without any special criterions, you’d probably just use a
vector, deque or list.
When using a thesaurus, you have a word and you want to know
all the words that are similar. When you look up a word, then, you want a list
of words as the result. Here, the “multi” containers
(multimap or multiset) are not appropriate. The solution is to
combine containers, which is easily done using the STL. Here, we need a tool
that turns out to be a powerful general concept, which is a map of
vector:
//: C04:Thesaurus.cpp // A map of vectors #include <map> #include <vector> #include <string> #include <iostream> #include <algorithm> #include <ctime> using namespace std; typedef map<string, vector<string> > Thesaurus; typedef pair<string, vector<string> > TEntry; typedef Thesaurus::iterator TIter; ostream& operator<<(ostream& os,const TEntry& t){ os << t.first << ": "; copy(t.second.begin(), t.second.end(), ostream_iterator<string>(os, " ")); return os; } // A generator for thesaurus test entries: class ThesaurusGen { static const string letters; static int count; public: int maxSize() { return letters.size(); } ThesaurusGen() { srand(time(0)); } TEntry operator()() { TEntry result; if(count >= maxSize()) count = 0; result.first = letters[count++]; int entries = (rand() % 5) + 2; for(int i = 0; i < entries; i++) { int choice = rand() % maxSize(); char cbuf[2] = { 0 }; cbuf[0] = letters[choice]; result.second.push_back(cbuf); } return result; } }; int ThesaurusGen::count = 0; const string ThesaurusGen::letters("ABCDEFGHIJKL" "MNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"); int main() { Thesaurus thesaurus; // Fill with 10 entries: generate_n( inserter(thesaurus, thesaurus.begin()), 10, ThesaurusGen()); // Print everything: copy(thesaurus.begin(), thesaurus.end(), ostream_iterator<TEntry>(cout, "\n")); // Ask for a "word" to look up: while(true) { cout << "Select a \"word\", 0 to quit: "; for(TIter it = thesaurus.begin(); it != thesaurus.end(); it++) cout << (*it).first << ' '; cout << endl; string reply; cin >> reply; if(reply.at(0) == '0') return 0; // Quit if(thesaurus.find(reply) == thesaurus.end()) continue; // Not in list, try again vector<string>& v = thesaurus[reply]; copy(v.begin(), v.end(), ostream_iterator<string>(cout, " ")); cout << endl; } } ///:~
A Thesaurus maps a string (the word) to a
vector<string> (the synonyms). A TEntry is a single entry in
a Thesaurus. By creating an ostream operator<< for a
TEntry, a single entry from the Thesaurus can easily be printed
(and the whole Thesaurus can easily be printed with copy( )).
The ThesaurusGen creates “words” (which are just single
letters) and “synonyms” for those words (which are just other
randomly-chosen single letters) to be used as thesaurus entries. It randomly
chooses the number of synonym entries to make, but there must be at least two.
All the letters are chosen by indexing into a static string that is part
of ThesaurusGen.
In main( ), a Thesaurus is created, filled
with 10 entries and printed using the copy( ) algorithm. Then the
user is requested to choose a “word” to look up by typing the letter
of that word. The find( ) member function is used to find whether
the entry exists in the map (remember, you don’t want to use
operator[ ] or it will automatically make a new entry if it doesn’t
find a match!). If so, operator[ ] is used to fetch out the
vector<string> which is displayed.
Because templates make the expression of powerful concepts
easy, you can take this concept much further, creating a map of
vectors containing maps, etc. For that matter, you can combine any
of the STL containers this way.
In Stlshape.cpp, the pointers did not clean themselves
up automatically. It would be convenient to be able to do this easily, rather
than writing out the code each time. Here is a function template that will clean
up the pointers in any sequence container; note that it is placed in the
book’s root directory for easy access:
//: :purge.h // Delete pointers in an STL sequence container #ifndef PURGE_H #define PURGE_H #include <algorithm> template<class Seq> void purge(Seq& c) { typename Seq::iterator i; for(i = c.begin(); i != c.end(); i++) { delete *i; *i = 0; } } // Iterator version: template<class InpIt> void purge(InpIt begin, InpIt end) { while(begin != end) { delete *begin; *begin = 0; begin++; } } #endif // PURGE_H ///:~
In the first version of purge( ), note that
typename is absolutely necessary; indeed this is exactly the case that
the keyword was added for: Seq is a template argument, and
iterator is something that is nested within that template. So what does
Seq::iterator refer to? The typename keyword specifies that it
refers to a type, and not something else.
While the container version of purge must work with an
STL-style container, the iterator version of purge( ) will work with
any range, including an array.
Here is Stlshape.cpp, modified to use the
purge( ) function:
//: C04:Stlshape2.cpp // Stlshape.cpp with the purge() function #include "../purge.h" #include <vector> #include <iostream> using namespace std; class Shape { public: virtual void draw() = 0; virtual ~Shape() {}; }; class Circle : public Shape { public: void draw() { cout << "Circle::draw\n"; } ~Circle() { cout << "~Circle\n"; } }; class Triangle : public Shape { public: void draw() { cout << "Triangle::draw\n"; } ~Triangle() { cout << "~Triangle\n"; } }; class Square : public Shape { public: void draw() { cout << "Square::draw\n"; } ~Square() { cout << "~Square\n"; } }; typedef std::vector<Shape*> Container; typedef Container::iterator Iter; int main() { Container shapes; shapes.push_back(new Circle); shapes.push_back(new Square); shapes.push_back(new Triangle); for(Iter i = shapes.begin(); i != shapes.end(); i++) (*i)->draw(); purge(shapes); } ///:~
When using purge( ), you must be careful to
consider ownership issues – if an object pointer is held in more than one
container, then you must be sure not to delete it twice, and you don’t
want to destroy the object in the first container before the second one is
finished with it. Purging the same container twice is not a problem, because
purge( ) sets the pointer to zero once it deletes that pointer, and
calling delete for a zero pointer is a safe
operation.
With the STL as a foundation, it’s possible to create
your own containers. Assuming you follow the same model of providing iterators,
your new container will behave as if it were a built-in STL container.
Consider the “ring” data structure, which is a
circular sequence container. If you reach the end, it just wraps around to the
beginning. This can be implemented on top of a list as follows:
//: C04:Ring.cpp // Making a "ring" data structure from the STL #include <iostream> #include <list> #include <string> using namespace std; template<class T> class Ring { list<T> lst; public: // Declaration necessary so the following // 'friend' statement sees this 'iterator' // instead of std::iterator: class iterator; friend class iterator; class iterator : public std::iterator< std::bidirectional_iterator_tag,T,ptrdiff_t>{ list<T>::iterator it; list<T>* r; public: // "typename" necessary to resolve nesting: iterator(list<T>& lst, const typename list<T>::iterator& i) : r(&lst), it(i) {} bool operator==(const iterator& x) const { return it == x.it; } bool operator!=(const iterator& x) const { return !(*this == x); } list<T>::reference operator*() const { return *it; } iterator& operator++() { ++it; if(it == r->end()) it = r->begin(); return *this; } iterator operator++(int) { iterator tmp = *this; ++*this; return tmp; } iterator& operator--() { if(it == r->begin()) it = r->end(); --it; return *this; } iterator operator--(int) { iterator tmp = *this; --*this; return tmp; } iterator insert(const T& x){ return iterator(*r, r->insert(it, x)); } iterator erase() { return iterator(*r, r->erase(it)); } }; void push_back(const T& x) { lst.push_back(x); } iterator begin() { return iterator(lst, lst.begin()); } int size() { return lst.size(); } }; int main() { Ring<string> rs; rs.push_back("one"); rs.push_back("two"); rs.push_back("three"); rs.push_back("four"); rs.push_back("five"); Ring<string>::iterator it = rs.begin(); it++; it++; it.insert("six"); it = rs.begin(); // Twice around the ring: for(int i = 0; i < rs.size() * 2; i++) cout << *it++ << endl; } ///:~
You can see that the iterator is where most of the coding is
done. The Ring iterator must know how to loop back to the
beginning, so it must keep a reference to the list of its
“parent” Ring object in order to know if it’s at the
end and how to get back to the beginning.
You’ll notice that the interface for Ring is
quite limited; in particular there is no end( ), since a ring just
keeps looping. This means that you won’t be able to use a Ring in
any STL algorithms that require a past-the-end iterator – which is many of
them. (It turns out that adding this feature is a non-trivial exercise).
Although this can seem limiting, consider stack, queue and
priority_queue, which don’t produce any iterators at
all!
Although the STL containers may provide all the functionality
you’ll ever need, they are not complete. For example, the standard
implementations of set and map use trees, and although these are
reasonably fast they may not be fast enough for your needs. In the C++ Standards
Committee it was generally agreed that hashed implementations of set and
map should have been included in Standard C++, however there was not
considered to be enough time to add these components, and thus they were left
out.
Fortunately, there are freely-available alternatives. One of
the nice things about the STL is that it establishes a basic model for creating
STL-like classes, so anything built using the same model is easy to understand
if you are already familiar with the STL.
The SGI STL (freely available at
http://www.sgi.com/Technology/STL/) is one of the most robust implementations of
the STL, and can be used to replace your compiler’s STL if that is found
wanting. In addition they’ve added a number of extensions including
hash_set, hash_multiset, hash_map, hash_multimap,
slist (a singly-linked list) and rope (a variant of string
optimized for very large strings and fast concatenation and substring
operations).
Let’s consider a performance comparison between a
tree-based map and the SGI hash_map. To keep things simple, the
mappings will be from int to int:
//: C04:MapVsHashMap.cpp // The hash_map header is not part of the // Standard C++ STL. It is an extension that // is only available as part of the SGI STL: #include <hash_map> #include <iostream> #include <map> #include <ctime> using namespace std; int main(){ hash_map<int, int> hm; map<int, int> m; clock_t ticks = clock(); for(int i = 0; i < 100; i++) for(int j = 0; j < 1000; j++) m.insert(make_pair(j,j)); cout << "map insertions: " << clock() - ticks << endl; ticks = clock(); for(int i = 0; i < 100; i++) for(int j = 0; j < 1000; j++) hm.insert(make_pair(j,j)); cout << "hash_map insertions: " << clock() - ticks << endl; ticks = clock(); for(int i = 0; i < 100; i++) for(int j = 0; j < 1000; j++) m[j]; cout << "map::operator[] lookups: " << clock() - ticks << endl; ticks = clock(); for(int i = 0; i < 100; i++) for(int j = 0; j < 1000; j++) hm[j]; cout << "hash_map::operator[] lookups: " << clock() - ticks << endl; ticks = clock(); for(int i = 0; i < 100; i++) for(int j = 0; j < 1000; j++) m.find(j); cout << "map::find() lookups: " << clock() - ticks << endl; ticks = clock(); for(int i = 0; i < 100; i++) for(int j = 0; j < 1000; j++) hm.find(j); cout << "hash_map::find() lookups: " << clock() - ticks << endl; } ///:~
The performance test I ran showed a speed improvement of
roughly 4:1 for the hash_map over the map in all operations (and
as expected, find( ) is slightly faster than operator[ ] for
lookups for both types of map). If a profiler shows a bottleneck in your
map, you should consider a
hash_map.
The goal of this chapter was not just to introduce the STL
containers in some considerable depth (of course, not every detail could be
covered here, but you should have enough now that you can look up further
information in the other resources). My higher hope is that this chapter has
made you grasp the incredible power available in the STL, and shown you how much
faster and more efficient your programming activities can be by using and
understanding the STL.
The fact that I could not escape from introducing some of the
STL algorithms in this chapter suggests how useful they can be. In the next
chapter you’ll get a much more focused look at the
algorithms.
[16] Contributed
to the C++ Standard by Alexander Stepanov and Meng Lee at
Hewlett-Packard.
[17] These were
actually created to abstract the “locale” facets away from
iostreams, so that locale facets could operate on any sequence of characters,
not only iostreams. Locales allow iostreams to easily handle
culturally-different formatting (such as representation of money), and are
beyond the scope of this book.
[18] I am
indebted to Nathan Myers for explaining this to me.
[19] This is
another example coached by Nathan Myers.