Contents |
This article looks into the internals of the libstdc++ implementation of std::vector. You can find it in your gcc installation, in a directory like this:
/usr/lib/gcc/i686-pc-linux-gnu/3.4.6/include/g++-v3/
new_allocator<T>
∆
│
allocator<T>
∆
│
_Vector_base ◊── _Vector_impl
∆
│
vector<T>
Let's look at an example:
vector<int> V(2);
This vector ctor is called (from bits/stl_vector.h):
explicit vector(size_type __n)
: _Base(__n, allocator_type())
{ this->_M_impl._M_finish =
std::uninitialized_fill_n(this->_M_impl._M_start, __n, value_type()); }
It calls the _Vector_base ctor - in the example the call would be:
_Vector_base::_Vector_base(2, allocator<int>)
Which is this ctor:
_Vector_base(size_t __n, const allocator_type& __a)
: _M_impl(__a)
{
this->_M_impl._M_start = this->_M_allocate(__n);
this->_M_impl._M_finish = this->_M_impl._M_start;
this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n;
}
_M_impl is a struct _Vector_impl which is defined as
struct _Vector_impl
: public _Alloc {
_Tp* _M_start;
_Tp* _M_finish;
_Tp* _M_end_of_storage;
_Vector_impl (_Alloc const& __a)
: _Alloc(__a), _M_start(0), _M_finish(0), _M_end_of_storage(0)
{ }
};
The _Vector_impl ctor calls the base copy ctor allocator<int>::allocator(allocator<int> const &a) and sets the pointers to 0. So now the vector has an allocator and pointers defining data location, end of data, and end of allocated space.
Returning to
_Vector_base(size_t __n, const allocator_type& __a)
: _M_impl(__a)
{
this->_M_impl._M_start = this->_M_allocate(__n);
this->_M_impl._M_finish = this->_M_impl._M_start;
this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n;
}
The first statement allocates space for __n * sizeof(_Tp). As _M_impl inherits from allocator<int>, _M_allocate may just call allocate() on _M_impl:
_Tp* _M_allocate(size_t __n) { return _M_impl.allocate(__n); }
The standard allocator is defined in bits/allocator.h:
template<typename _Tp> class allocator: public ___glibcxx_base_allocator<_Tp>
Actually it just inherits from new_allocator<int>, which is defined in ext/new_allocator.h:
template<typename _Tp>
class new_allocator
{
public:
...
typedef _Tp* pointer;
...
pointer allocate(size_type __n, const void* = 0)
{ return static_cast<_Tp*>(::operator new(__n * sizeof(_Tp))); }
This strange usage of the new operator is defined in the header file 'new':
void* operator new(std::size_t) throw (std::bad_alloc);
In our example, it just allocates sizeof(int) * 2 bytes, and casts the pointer to a int *.
Returning yet again to
_Vector_base(size_t __n, const allocator_type& __a)
: _M_impl(__a)
{
this->_M_impl._M_start = this->_M_allocate(__n);
this->_M_impl._M_finish = this->_M_impl._M_start;
this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n;
}
We initialize _M_finish = _M_start (signifying that the vector is empty) and indicates that we have allocated sizeof(int) * __n bytes by setting _M_end_of_storage = _M_start + __n (which is one-past-the-end of the allocated memory).
We are done with the _Vector_base ctor, and may return to the ctor of vector itself:
explicit vector(size_type __n)
: _Base(__n, allocator_type())
{ this->_M_impl._M_finish =
std::uninitialized_fill_n(this->_M_impl._M_start, __n, value_type()); }
where we fill the newly allocated memory with a default value. Note that value_type() actually is a call to the default ctor for our type, in the example it is a call to int() which yields 0.
uninitialized_fill_n is from bits/stl_uninitialized.h:
template<typename _ForwardIterator, typename _Size, typename _Tp>
inline _ForwardIterator
uninitialized_fill_n(_ForwardIterator __first, _Size __n, const _Tp& __x)
{
typedef typename iterator_traits<_ForwardIterator>::value_type _ValueType;
typedef typename __type_traits<_ValueType>::is_POD_type _Is_POD;
return std::__uninitialized_fill_n_aux(__first, __n, __x, _Is_POD());
}
Now, _Tp is Plain-Old-Data (POD), so this version of __uninitialized_fill_n_aux is called:
template<typename _ForwardIterator, typename _Size, typename _Tp>
inline _ForwardIterator
__uninitialized_fill_n_aux(_ForwardIterator __first, _Size __n,
const _Tp& __x, __true_type)
{ return std::fill_n(__first, __n, __x); }
From bits/stl_algobase.h:
template<typename _OutputIterator, typename _Size, typename _Tp>
_OutputIterator
fill_n(_OutputIterator __first, _Size __n, const _Tp& __value)
{
...
for ( ; __n > 0; --__n, ++__first)
*__first = __value;
return __first;
}
So we basically just set every value to the default value of the datatype, in the example 0. So in conclusion, the ctor:
explicit vector(size_type __n)
: _Base(__n, allocator_type())
{ this->_M_impl._M_finish =
std::uninitialized_fill_n(this->_M_impl._M_start, __n, value_type()); }
So, in our example
Vector V(2);
What would vector::iterator and vector::const_iterator be? Well, they are defined as:
template<typename _Tp, typename _Alloc = allocator<_Tp> >
class vector : protected _Vector_base<_Tp, _Alloc>
{
...
typedef vector<_Tp, _Alloc> vector_type;
public:
typedef _Tp value_type;
typedef typename _Alloc::pointer pointer;
typedef typename _Alloc::const_pointer const_pointer;
...
typedef __gnu_cxx::__normal_iterator<pointer, vector_type> iterator;
typedef __gnu_cxx::__normal_iterator<const_pointer, vector_type>
const_iterator;
...
allocator defines pointer and const_pointer:
template<typename _Tp>
class allocator: public ___glibcxx_base_allocator<_Tp>
{
...
typedef _Tp* pointer;
typedef const _Tp* const_pointer;
...
So, in our example:
iterator is a __normal_iterator<int *, vector<int, allocator<int>>> , and const_iterator is a __normal_iterator<const int *, vector<int, allocator<int>>>
In bits/stl_iterator.h, we find __normal_iterator:
template<typename _Iterator, typename _Container>
class __normal_iterator
{
protected:
_Iterator _M_current;
public:
...
typedef typename iterator_traits<_Iterator>::value_type value_type;
typedef typename iterator_traits<_Iterator>::difference_type
difference_type;
typedef typename iterator_traits<_Iterator>::reference reference;
typedef typename iterator_traits<_Iterator>::pointer pointer;
__normal_iterator() : _M_current(_Iterator()) { }
...
// Forward iterator requirements
reference operator*() const { return *_M_current; }
pointer operator->() const { return _M_current; }
__normal_iterator& operator++() { ++_M_current; return *this; }
...
_M_current, which is the only variable of the iterator, is of type _Iterator, which in our example was a int*. So a vector<int>::iterator is just a (very) fancy int* underneath.
From bits/stl_vector.h:
/**
* The dtor only erases the elements, and note that if the
* elements themselves are pointers, the pointed-to memory is
* not touched in any way. Managing the pointer is the user's
* responsibilty.
*/
~vector() { std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish); }
bits/stl_construct defines _Destroy:
template<typename _ForwardIterator>
inline void
_Destroy(_ForwardIterator __first, _ForwardIterator __last)
{
typedef typename iterator_traits<_ForwardIterator>::value_type
_Value_type;
typedef typename __type_traits<_Value_type>::has_trivial_destructor
_Has_trivial_destructor;
std::__destroy_aux(__first, __last, _Has_trivial_destructor());
}
bits/stl_iterator_base_types.h defines iterator_traits<_ForwardIterator>::value_type
template<typename _Tp>
struct iterator_traits<_Tp*>
{
typedef random_access_iterator_tag iterator_category;
typedef _Tp value_type;
typedef ptrdiff_t difference_type;
typedef _Tp* pointer;
typedef _Tp& reference;
};
In our case, _ForwardIterator is an int*, and thus
iterator_traits<int*>::value_type is an int
To evaluate, whether _Value_type (int, that is) has a trivial destructor, we look in bits/type_traits.h:
template<>
struct __type_traits<int>
{
typedef __true_type has_trivial_default_constructor;
typedef __true_type has_trivial_copy_constructor;
typedef __true_type has_trivial_assignment_operator;
typedef __true_type has_trivial_destructor;
typedef __true_type is_POD_type;
};
It does.
So, to look at _Destroy from bits/stl_construct.h again:
template<typename _ForwardIterator>
inline void
_Destroy(_ForwardIterator __first, _ForwardIterator __last)
{
typedef typename iterator_traits<_ForwardIterator>::value_type
_Value_type;
typedef typename __type_traits<_Value_type>::has_trivial_destructor
_Has_trivial_destructor;
std::__destroy_aux(__first, __last, _Has_trivial_destructor());
}
Which then in our example calls:
template<typename _ForwardIterator>
inline void
__destroy_aux(_ForwardIterator, _ForwardIterator, __true_type)
{ }
Which does nothing. So this whole endeavour did nothing. Big Whoop. So when does the memory actually get deallocated? Well, our base class destructor also gets called:
~_Vector_base()
{ _M_deallocate(this->_M_impl._M_start,
this->_M_impl._M_end_of_storage - this->_M_impl._M_start); }
Which deallocates (_M_end_of_storage - _M_start) bytes at _M_start using _M_deallocate:
void
_M_deallocate(_Tp* __p, size_t __n)
{ if (__p) _M_impl.deallocate(__p, __n); }
Which calls the allocators deallocate method, as defined in ext/new_allocator.h:
// __p is not permitted to be a null pointer.
void
deallocate(pointer __p, size_type)
{ ::operator delete(__p); }
Now we can look at some simple vector operations:
template<typename _Tp, typename _Alloc = allocator<_Tp> >
class vector : protected _Vector_base<_Tp, _Alloc>
{
...
typedef typename _Alloc::reference reference;
typedef __gnu_cxx::__normal_iterator<pointer, vector_type> iterator;
typedef size_t size_type;
...
iterator begin() { return iterator (this->_M_impl._M_start); }
iterator end() { return iterator (this->_M_impl._M_finish); }
bool empty() const { return begin() == end(); }
reference operator[](size_type __n) { return *(begin() + __n); }
size_type size() const { return size_type(end() - begin()); }
...
begin() and end() constructs a __normal_iterator, using this ctor:
explicit __normal_iterator(const _Iterator& __i) : _M_current(__i) { }
where the iterator value _M_current is just set to the _M_start and _M_finish pointers, respectively.
Let's look at a common operation, push_back. If _M_finish != _M_end_of_storage, meaning that there is available space in the vector, the new element is just placed at _M_finish, which is then incremented.
void push_back(const value_type& __x)
{
if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
{
std::_Construct(this->_M_impl._M_finish, __x);
++this->_M_impl._M_finish;
}
else
_M_insert_aux(end(), __x);
}
However, if there is no available space, _M_insert_aux is called, which is a little more complex. It is an insert function, which involves copying the range from the insert position to the end one index to the right, and inserting the new value afterwards. It is structured like this:
if space is available at the end of the vector:
insert new value
else
reallocate
Where
insert new value:
construct new element at finish, copy value from finish-1
increment finish
copy range [position;finish-2] one index to the right
set position to new value
As an example, we would like to insert the value D into the vector [A,B,C] at position 1 (where B is):
sta fin eos
┌───┬───┬───┬───┬───┬───┐
│ A │ B │ C │ │ │░░░│
└───┴───┴───┴───┴───┴───┘
∆
│
┌───┐
│ D │
└───┘
construct new element at finish, copy value from finish-1, increment finish:
sta fin eos ┌───┬───┬───┬───┬───┬───┐ │ A │ B │ C │ C │ │░░░│ └───┴───┴───┴───┴───┴───┘
copy range [position;finish-2] one index to the right. This operation is obviously linear time in the length of the vector, so insertion is slow for vectors! But push_back never does this.
sta f-2 f-1 fin eos ┌───┬───┬───┬───┬───┬───┐ │ A │ B │ B │ C │ │░░░│ └───┴───┴───┴───┴───┴───┘
set position to new value (D):
sta fin eos ┌───┬───┬───┬───┬───┬───┐ │ A │ D │ B │ C │ │░░░│ └───┴───┴───┴───┴───┴───┘
Reallocation is structured like this
reallocate:
new size = 2 * old size
new_start = allocate(new size)
copy range [start;position] to new_start
construct new element at new_start + position with new value
increment new_finish
copy range [position;finish] to new_finish
destroy [start;finish]
deallocate [start;end of storage]
start = new_start
finish = new_finish
end_of_storage = new size
Reallocation is also linear time in the number of elements, so remember to initialize your vectors to the right size from the beginning!
It looks like this:
template<typename _Tp, typename _Alloc>
void
vector<_Tp,_Alloc>::
_M_insert_aux(iterator __position, const _Tp& __x)
{
if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
{
std::_Construct(this->_M_impl._M_finish, *(this->_M_impl._M_finish - 1));
++this->_M_impl._M_finish;
_Tp __x_copy = __x;
std::copy_backward(__position,
iterator(this->_M_impl._M_finish-2),
iterator(this->_M_impl._M_finish-1));
*__position = __x_copy;
}
else
{
const size_type __old_size = size();
const size_type __len = __old_size != 0 ? 2 * __old_size : 1;
iterator __new_start(this->_M_allocate(__len));
iterator __new_finish(__new_start);
try
{
__new_finish = std::uninitialized_copy(iterator(this->_M_impl._M_start),
__position,
__new_start);
std::_Construct(__new_finish.base(), __x);
++__new_finish;
__new_finish = std::uninitialized_copy(__position,
iterator(this->_M_impl._M_finish),
__new_finish);
}
catch(...)
{
std::_Destroy(__new_start,__new_finish);
_M_deallocate(__new_start.base(),__len);
__throw_exception_again;
}
std::_Destroy(begin(), end());
_M_deallocate(this->_M_impl._M_start,
this->_M_impl._M_end_of_storage - this->_M_impl._M_start);
this->_M_impl._M_start = __new_start.base();
this->_M_impl._M_finish = __new_finish.base();
this->_M_impl._M_end_of_storage = __new_start.base() + __len;
}
}