I discovered this randomly several weeks ago while debugging something else at work, and I thought it was worth sharing since the g++ STL is widely used.
Step 1: Save the following file as mapdemo.cpp:
#include <iostream> #include <unordered_map> int main() { typedef std::unordered_multimap<int, int> int_multimap; int_multimap map; for (int i = 0; i < 10000; ++i) { map.insert(int_multimap::value_type(17, i)); } std::cerr << *static_cast<int*>(0); return 0; }
Step 2: Compile the file:
g++ --std=c++0x -g mapdemo.cpp
Step 3: Load the file in gdb and examine the buckets:
$ gdb -silent ./a.out Reading symbols from xxx/a.out...done. (gdb) run Starting program: xxx/a.out Program received signal SIGSEGV, Segmentation fault. 0x0000000000400b4d in main () at mapdemo.cpp:10 10 std::cerr << *static_cast<int*>(0); (gdb) p map._M_bucket_count $1 = 15173 (gdb) p map._M_buckets[0] $2 = (std::__detail::_Hash_node<std::pair<int const, int>, false> *) 0x0 (gdb) p map._M_buckets[15172] $3 = (std::__detail::_Hash_node<std::pair<int const, int>, false> *) 0x0 (gdb) p map._M_buckets[17] $4 = (std::__detail::_Hash_node<std::pair<int const, int>, false> *) 0x605080 (gdb) p *map._M_buckets[17] $5 = {_M_v = {first = 17, second = 0}, _M_next = 0x670c90}
Yes, g++'s implementation of unordered_multimap (known as hash_multimap in pre-C++0x versions of C++) uses bucket hashing, but the size of the backing array is proportional to the count of elements in the multimap, not the count of distinct keys.
Exercise for the reader: Explain what I just did; explain why the result of step 3 is curious; and then explain why the authors might have chosen to do it this way anyway.
It occurs to me that this would have made a decent interview question if I hadn't written it up here. Oh well, I have others.
No comments:
Post a Comment