Wednesday, August 03, 2011

g++ unordered_multimap: an exercise

I discovered this randomly several weeks ago while debugging something else at work, and I thought it was worth sharing since the g++ STL is widely used.

Step 1: Save the following file as mapdemo.cpp:

#include <iostream>
#include <unordered_map>

int main() {
    typedef std::unordered_multimap<int, int> int_multimap;
    int_multimap map;
    for (int i = 0; i < 10000; ++i) {
        map.insert(int_multimap::value_type(17, i));
    }
    std::cerr << *static_cast<int*>(0);
    return 0;
}

Step 2: Compile the file:

g++ --std=c++0x -g mapdemo.cpp

Step 3: Load the file in gdb and examine the buckets:

$ gdb -silent ./a.out
Reading symbols from xxx/a.out...done.
(gdb) run
Starting program: xxx/a.out 

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400b4d in main () at mapdemo.cpp:10
10     std::cerr << *static_cast<int*>(0);
(gdb) p map._M_bucket_count
$1 = 15173
(gdb) p map._M_buckets[0]
$2 = (std::__detail::_Hash_node<std::pair<int const, int>, false> *) 0x0
(gdb) p map._M_buckets[15172]
$3 = (std::__detail::_Hash_node<std::pair<int const, int>, false> *) 0x0
(gdb) p map._M_buckets[17]
$4 = (std::__detail::_Hash_node<std::pair<int const, int>, false> *) 0x605080
(gdb) p *map._M_buckets[17]
$5 = {_M_v = {first = 17, second = 0}, _M_next = 0x670c90}

Yes, g++'s implementation of unordered_multimap (known as hash_multimap in pre-C++0x versions of C++) uses bucket hashing, but the size of the backing array is proportional to the count of elements in the multimap, not the count of distinct keys.

Exercise for the reader: Explain what I just did; explain why the result of step 3 is curious; and then explain why the authors might have chosen to do it this way anyway.

It occurs to me that this would have made a decent interview question if I hadn't written it up here. Oh well, I have others.

No comments:

Post a Comment