A segmented_array is a container implemented as both a linked list and a red-black tree of fixed sized segments (vecarray arrays). Being a segmented array (as opposed to a normal flat array) its elements are not stored contiguously in memory but are contiguous within a segment. Compared to a std::vector or std::deque it performs considerably better at inserting/erasing elements at any point in the controlled sequence. Random access performs well being O(lg N) complexity so performs considerably better than std::list. push_back() and pop_back() are O(lg N) complexity. Single element insert() and erase(), push_front() and pop_front() are also O(lg N) complexity (an example which illustrates this). The performance of multiple element insert() and erase() is linear in the number of items inserted/erased.
You specify the segment size via a template parameter (or rely on the default of 64) and varying this can affect performance: increasing segment size improves random access performance whilst decreasing segment size improves element insert performance (see example below for timings for various segment sizes).
Random access iterators are provided so the container is compatible with algorithms such as std::sort (see example below) however the iterators do not meet the requirements of a true random access iterator as they do not provide constant time iteration: they may have to traverse the red-black tree if the start and end positions are in different segments so require O(lg N) time (affects [], +=, +, -= and - iterator operators), increment (++) and decrement (--) are constant time however (modulo the additional constant time needed if the current segment needs changing). The iterators contain an integer position ensuring that std::distance and the iterator comparison operators run in constant time.
Iterators become invalid when they are past the point of element insertion/removal. Element references may become invalid when they are past the point of element insertion/removal but clients should not rely on this (i.e. assume always invalidated).
If a segment becomes empty after an element is erased then that segment is automatically deleted. Depending on the number and type of operations performed on a segmented_array object many of its segments may over time become partially full resulting in wasted memory and slower random access; if this is a problem then the following trick can be used to reclaim that unused memory and "compress" the segmented array:
{
// the segmented_array sa
contains partially full segments
neolib::segmented_array<int> tmp(sa);
tmp.swap(sa);
// sa is now compressed, only its
last segment may be partially full
}
N.B. The swap() method only runs in constant time if the two segmented_array objects are instantiations of the same template instantiation i.e. have the same template parameters (same element type and segment size).
A possible example application for segmented_array might be to use a segmented_array<char, 1024> as the data structure to represent the text in a text editor allowing the insertion/deletion of text anywhere in the text document quickly. A std::list<char> would be inefficient for this due to the overheads of storing two list node pointers per character and allocating one list node per character. A std::vector<char> or std::string would be inefficient for this if the document text was of any length, how noticeable this inefficiency would be would depend on CPU speed of course: probably not noticeable at all on a modern PC unless the text document length is in the admittedly very large region of 10-100MB but perhaps more noticeable on a handheld device with a smaller document size. A quick test revealed that a call to std::vector<char>::insert(begin(), 'A') takes 456ms on a 1.3Ghz Pentium M laptop if the vector contains 40MB of chars; "The Reginald Perrin Omnibus" by David Nobbs must contain about 1.6 million characters so a ≈18ms key press delay would not be noticeable during normal typing. By comparison segmented_array<char>::push_front() takes <1ms if the segmented_array contains 40MB of chars.
You can download the implementation here (v3.1) (Changelog).
Here is an example of usage which outputs some comparative timings (including a comparison against the closest competition, "avl_array"):
struct timer { timer(const char* message) { std::cout << message; QueryPerformanceCounter(&iStartTime); } ~timer() { LARGE_INTEGER endTime; QueryPerformanceCounter(&endTime); LARGE_INTEGER freq; QueryPerformanceFrequency(&freq); std::cout.precision(4); std::cout << std::fixed << (float)(endTime.QuadPart - iStartTime.QuadPart) / (float)freq.QuadPart << " seconds" << std::endl; } LARGE_INTEGER iStartTime; }; template<typename T, typename A> typename std::list<T, A>::iterator index_list(std::list<T, A>& list, typename std::list<T, A>::size_type index) { if (index < list.size() / 2) { typename std::list<T, A>::iterator it = list.begin(); for (typename std::list<T, A>::size_type i = 0; i < index; ++i) ++it; return it; } else { typename std::list<T, A>::iterator it = list.end(); for (typename std::list<T, A>::size_type i = list.size(); i > index; --i) --it; return it; } } int main() { const int iterations = 300000; const int segment_size_1 = 1; const int segment_size_2 = 64; const int segment_size_3 = 256; const int segment_size_4 = 1024; std::vector<int> v; std::deque<int> d; std::list<int> l; mkr::avl_array<int> aa; neolib::segmented_array<int, segment_size_1> sa1; neolib::segmented_array<int, segment_size_2> sa2; neolib::segmented_array<int, segment_size_3> sa3; neolib::segmented_array<int, segment_size_4> sa4; /* WARNING: "rand() % N" is normally bad practice as the low-order bits returned by by rand() cannot in all implementations be relied upon to be "random" enough. You should instead use all the bits returned by rand() and perform a division. I am being lazy by doing it here. It is probably a good idea to dump rand() and use something like a Mersenne Twister instead. */ srand(0); for (int i = 0; i < iterations; ++i) { int value = rand(); v.push_back(value); d.push_back(value); l.push_back(value); aa.push_back(value); sa1.push_back(value); sa2.push_back(value); sa3.push_back(value); sa4.push_back(value); } if (!std::equal(sa1.begin(), sa1.end(), v.begin()) || !std::equal(sa2.begin(), sa2.end(), v.begin()) || !std::equal(sa3.begin(), sa3.end(), v.begin()) || !std::equal(sa4.begin(), sa4.end(), v.begin())) std::cout << "Bug! Ranges not equal." << std::endl; unsigned int defeatOptimizer = 0; { srand(0); timer time("std::vector random erase: "); for (int i = 0; i < iterations - 1; ++i) v.erase(v.begin() + rand() % v.size()); } { srand(0); timer time("std::deque random erase: "); for (int i = 0; i < iterations - 1; ++i) d.erase(d.begin() + rand() % d.size()); } { srand(0); timer time("std::list random erase: "); for (int i = 0; i < iterations - 1; ++i) l.erase(index_list(l, rand() % l.size())); } { srand(0); timer time("mkr::avl_array random erase: "); for (int i = 0; i < iterations - 1; ++i) aa.erase(aa.begin() + rand() % aa.size()); } { srand(0); timer time("neolib::segmented_array (1) random erase: "); for (int i = 0; i < iterations - 1; ++i) sa1.erase(sa1.begin() + rand() % sa1.size()); } { srand(0); timer time("neolib::segmented_array (2) random erase: "); for (int i = 0; i < iterations - 1; ++i) sa2.erase(sa2.begin() + rand() % sa2.size()); } { srand(0); timer time("neolib::segmented_array (3) random erase: "); for (int i = 0; i < iterations - 1; ++i) sa3.erase(sa3.begin() + rand() % sa3.size()); } { srand(0); timer time("neolib::segmented_array (4) random erase: "); for (int i = 0; i < iterations - 1; ++i) sa4.erase(sa4.begin() + rand() % sa4.size()); } defeatOptimizer += (*v.begin() + *d.begin() + *l.begin() + *sa1.begin() + *sa2.begin() + *sa3.begin() + *sa4.begin()); if (!std::equal(sa1.begin(), sa1.end(), v.begin()) || !std::equal(sa2.begin(), sa2.end(), v.begin()) || !std::equal(sa3.begin(), sa3.end(), v.begin()) || !std::equal(sa4.begin(), sa4.end(), v.begin())) std::cout << "Bug! Ranges not equal." << std::endl; { srand(0); timer time("std::vector random insert: "); for (int i = 0; i < iterations; ++i) { std::size_t position = rand() % v.size(); v.insert(v.begin() + position, rand()); } } { srand(0); timer time("std::deque random insert: "); for (int i = 0; i < iterations; ++i) { std::size_t position = rand() % d.size(); d.insert(d.begin() + position, rand()); } } { srand(0); timer time("std::list random insert: "); for (int i = 0; i < iterations; ++i) { std::size_t position = rand() % l.size(); l.insert(index_list(l, position), rand()); } } { srand(0); timer time("mkr::avl_array random insert: "); for (int i = 0; i < iterations; ++i) { std::size_t position = rand() % aa.size(); aa.insert(aa.begin() + position, rand()); } } { srand(0); timer time("neolib::segmented_array (1) random insert: "); for (int i = 0; i < iterations; ++i) { std::size_t position = rand() % sa1.size(); sa1.insert(sa1.begin() + position, rand()); } } { srand(0); timer time("neolib::segmented_array (2) random insert: "); for (int i = 0; i < iterations; ++i) { std::size_t position = rand() % sa2.size(); sa2.insert(sa2.begin() + position, rand()); } } { srand(0); timer time("neolib::segmented_array (3) random insert: "); for (int i = 0; i < iterations; ++i) { std::size_t position = rand() % sa3.size(); sa3.insert(sa3.begin() + position, rand()); } } { srand(0); timer time("neolib::segmented_array (4) random insert: "); for (int i = 0; i < iterations; ++i) { std::size_t position = rand() % sa4.size(); sa4.insert(sa4.begin() + position, rand()); } } if (!std::equal(sa1.begin(), sa1.end(), v.begin()) || !std::equal(sa2.begin(), sa2.end(), v.begin()) || !std::equal(sa3.begin(), sa3.end(), v.begin()) || !std::equal(sa4.begin(), sa4.end(), v.begin())) std::cout << "Bug! Ranges not equal." << std::endl; { srand(0); timer time("std::vector random access: "); for (int i = 0; i < iterations; ++i) defeatOptimizer += v[rand() % v.size()]; } { srand(0); timer time("std::deque random access: "); for (int i = 0; i < iterations; ++i) defeatOptimizer += d[rand() % d.size()]; } { srand(0); timer time("std::list random access: "); for (int i = 0; i < iterations; ++i) defeatOptimizer += *index_list(l, rand() % l.size()); } { srand(0); timer time("mkr::avl_array random access: "); for (int i = 0; i < iterations; ++i) defeatOptimizer += aa[rand() % aa.size()]; } { srand(0); timer time("neolib::segmented_array (1) random access: "); for (int i = 0; i < iterations; ++i) defeatOptimizer += sa1[rand() % sa1.size()]; } { srand(0); timer time("neolib::segmented_array (2) random access: "); for (int i = 0; i < iterations; ++i) defeatOptimizer += sa2[rand() % sa2.size()]; } { srand(0); timer time("neolib::segmented_array (3) random access: "); for (int i = 0; i < iterations; ++i) defeatOptimizer += sa3[rand() % sa3.size()]; } { srand(0); timer time("neolib::segmented_array (4) random access: "); for (int i = 0; i < iterations; ++i) defeatOptimizer += sa4[rand() % sa4.size()]; } { timer time("std::vector sequential access: "); for (std::vector<int>::iterator i = v.begin(); i != v.end(); ++i) defeatOptimizer += *i; } { timer time("std::deque sequential access: "); for (std::deque<int>::iterator i = d.begin(); i != d.end(); ++i) defeatOptimizer += *i; } { timer time("std::list sequential access: "); for (std::list<int>::iterator i = l.begin(); i != l.end(); ++i) defeatOptimizer += *i; } { timer time("mkr::avl_array sequential access: "); for (mkr::avl_array<int>::iterator i = aa.begin(); i != aa.end(); ++i) defeatOptimizer += *i; } { timer time("neolib::segmented_array (1) sequential access: "); for (neolib::segmented_array<int, segment_size_1>::iterator i = sa1.begin(); i != sa1.end(); ++i) defeatOptimizer += *i; } { timer time("neolib::segmented_array (2) sequential access: "); for (neolib::segmented_array<int, segment_size_2>::iterator i = sa2.begin(); i != sa2.end(); ++i) defeatOptimizer += *i; } { timer time("neolib::segmented_array (3) sequential access: "); for (neolib::segmented_array<int, segment_size_3>::iterator i = sa3.begin(); i != sa3.end(); ++i) defeatOptimizer += *i; } { timer time("neolib::segmented_array (4) sequential access: "); for (neolib::segmented_array<int, segment_size_4>::iterator i = sa4.begin(); i != sa4.end(); ++i) defeatOptimizer += *i; } { timer time("std::vector sort: "); std::sort(v.begin(), v.end()); } { timer time("std::deque sort: "); std::sort(d.begin(), d.end()); } { timer time("std::list sort: "); l.sort(); } { timer time("mkr::avl_array sort: "); std::sort(aa.begin(), aa.end()); } { timer time("neolib::segmented_array (1) sort: "); std::sort(sa1.begin(), sa1.end()); } { timer time("neolib::segmented_array (2) sort: "); std::sort(sa2.begin(), sa2.end()); } { timer time("neolib::segmented_array (3) sort: "); std::sort(sa3.begin(), sa3.end()); } { timer time("neolib::segmented_array (4) sort: "); std::sort(sa4.begin(), sa4.end()); } defeatOptimizer += (*v.begin() + *d.begin() + *l.begin() + *sa1.begin() + *sa2.begin() + *sa3.begin() + *sa4.begin()); if (!std::equal(sa1.begin(), sa1.end(), v.begin()) || !std::equal(sa2.begin(), sa2.end(), v.begin()) || !std::equal(sa3.begin(), sa3.end(), v.begin()) || !std::equal(sa4.begin(), sa4.end(), v.begin())) std::cout << "Bug! Ranges not equal." << std::endl; std::cout << std::endl << "-+"[defeatOptimizer % 2] << std::endl; return 0; }
The output of which is (compiled with Microsoft Visual Studio 2013, checked iterators disabled, running on an Intel Core i7-4960X @ 3.6 GHz):
std::vector random erase: 7.4317 seconds std::deque random erase: 13.9245 seconds std::list random erase: 24.8124 seconds mkr::avl_array random erase: 0.1288 seconds neolib::segmented_array (1) random erase: 0.2581 seconds neolib::segmented_array (2) random erase: 0.1250 seconds neolib::segmented_array (3) random erase: 0.0961 seconds neolib::segmented_array (4) random erase: 0.0813 seconds std::vector random insert: 56.7711 seconds std::deque random insert: 19.2342 seconds std::list random insert: 55.3831 seconds mkr::avl_array random insert: 0.1560 seconds neolib::segmented_array (1) random insert: 0.3113 seconds neolib::segmented_array (2) random insert: 0.1489 seconds neolib::segmented_array (3) random insert: 0.1355 seconds neolib::segmented_array (4) random insert: 0.1741 seconds std::vector random access: 0.0042 seconds std::deque random access: 0.0082 seconds std::list random access: 70.4020 seconds mkr::avl_array random access: 0.0859 seconds neolib::segmented_array (1) random access: 0.1074 seconds neolib::segmented_array (2) random access: 0.0389 seconds neolib::segmented_array (3) random access: 0.0292 seconds neolib::segmented_array (4) random access: 0.0458 seconds std::vector sequential access: 0.0020 seconds std::deque sequential access: 0.0045 seconds std::list sequential access: 0.0084 seconds mkr::avl_array sequential access: 0.0195 seconds neolib::segmented_array (1) sequential access: 0.0138 seconds neolib::segmented_array (2) sequential access: 0.0052 seconds neolib::segmented_array (3) sequential access: 0.0022 seconds neolib::segmented_array (4) sequential access: 0.0021 seconds std::vector sort: 0.0206 seconds std::deque sort: 0.0399 seconds std::list sort: 0.0857 seconds mkr::avl_array sort: 0.6119 seconds neolib::segmented_array (1) sort: 0.4985 seconds neolib::segmented_array (2) sort: 0.0577 seconds neolib::segmented_array (3) sort: 0.0476 seconds neolib::segmented_array (4) sort: 0.0489 seconds +