segmented_array

Update: a new container by Aleksandr Kupriianov performs considerably better than this container but I will keep this page up for educational purposes.

A segmented_array is a container implemented as both a linked list and a red-black tree of fixed sized segments (vecarray arrays).  Being a segmented array (as opposed to a normal flat array) its elements are not stored contiguously in memory but are contiguous within a segment.  Compared to a std::vector or std::deque it is performs considerably better at inserting/erasing elements at any point in the controlled sequence.  Random access performs well being O(lg N) complexity so performs considerably better than std::list.  push_back() and pop_back() are O(lg N) complexity.  Single element insert() and erase(), push_front() and pop_front() are also O(lg N) complexity (an example which illustrates this).  The performance of multiple element insert() and erase() is linear in the number of items inserted/erased.

You specify the segment size via a template parameter (or rely on the default of 64) and varying this can affect performance: increasing segment size improves random access performance whilst decreasing segment size improves element insert performance (see example below for timings for various segment sizes).

Random access iterators are provided so the container is compatible with algorithms such as std::sort (see example below) however the iterators do not meet the requirements of a true random access iterator as they do not provide constant time iteration: they may have to traverse the red-black tree if the start and end positions are in different segments so require O(lg N) time (affects [], +=, +, -= and - iterator operators), increment (++) and decrement (--) are constant time however (modulo the additional constant time needed if the current segment needs changing).  The iterators contain an integer position ensuring that std::distance and the iterator comparison operators run in constant time.

Iterators become invalid when they are past the point of element insertion/removal.  Element references may become invalid when they are past the point of element insertion/removal but clients should not rely on this (i.e. assume always invalidated).

If a segment becomes empty after an element is erased then that segment is automatically deleted.  Depending on the number and type of operations performed on a segmented_array object many of its segments may over time become partially full resulting in wasted memory and slower random access; if this is a problem then the following trick can be used to reclaim that unused memory and "compress" the segmented array:

    {
        // the segmented_array sa contains partially full segments
        neolib::segmented_array<int> tmp(sa);
        tmp.swap(sa);
        // sa is now compressed, only its last segment may be partially full
    }

N.B. The swap() method only runs in constant time if the two segmented_array objects are instantiations of the same template instantiation i.e. have the same template parameters (same element type and segment size).

A possible example application for segmented_array might be to use a segmented_array<char, 1024> as the data structure to represent the text in a text editor allowing the insertion/deletion of text anywhere in the text document quickly.  A std::list<char> would be inefficient for this due to the overheads of storing two list node pointers per character and allocating one list node per character.  A std::vector<char> or std::string would be inefficient for this if the document text was of any length, how noticeable this inefficiency would be would depend on CPU speed of course: probably not noticeable at all on a modern PC unless the text document length is in the admittedly very large region of 10-100MB but perhaps more noticeable on a handheld device with a smaller document size.  A quick test revealed that a call to std::vector<char>::insert(begin(), 'A') takes 456ms on a 1.3Ghz Pentium M laptop if the vector contains 40MB of chars; "The Reginald Perrin Omnibus" by David Nobbs must contain about 1.6 million characters so a ≈18ms key press delay would not be noticeable during normal typing.  By comparison segmented_array<char>::push_front() takes <1ms if the segmented_array contains 40MB of chars.

You can download the implementation here (v3.1) (Changelog).

Here is an example of usage which outputs some comparative timings (including a comparison against the closest competition, "avl_array"):

struct timer
{
	timer(const char* message)
	{
		std::cout << message;
		QueryPerformanceCounter(&iStartTime);
	}
	~timer()
	{
		LARGE_INTEGER endTime;
		QueryPerformanceCounter(&endTime);
		LARGE_INTEGER freq;
		QueryPerformanceFrequency(&freq);
		std::cout.precision(4);
		std::cout << std::fixed << (float)(endTime.QuadPart - iStartTime.QuadPart) / (float)freq.QuadPart << " seconds" << std::endl;
	}
	LARGE_INTEGER iStartTime;
};

template<typename T, typename A>
typename std::list<T, A>::iterator index_list(std::list<T, A>& list, typename std::list<T, A>::size_type index)
{
	if (index < list.size() / 2)
	{
		typename std::list<T, A>::iterator it = list.begin();
		for (typename std::list<T, A>::size_type i = 0; i < index; ++i)
			++it;
		return it;
	}
	else
	{
		typename std::list<T, A>::iterator it = list.end();
		for (typename std::list<T, A>::size_type i = list.size(); i > index; --i)
			--it;
		return it;
	}
}

int main()
{
	const int iterations = 300000;
	const int segment_size_1 = 1;
	const int segment_size_2 = 64;
	const int segment_size_3 = 256;
	const int segment_size_4 = 1024;

	std::vector<int> v;
	std::deque<int> d;
	std::list<int> l;
	mkr::avl_array<int> aa;
	neolib::segmented_array<int, segment_size_1> sa1;
	neolib::segmented_array<int, segment_size_2> sa2;
	neolib::segmented_array<int, segment_size_3> sa3;
	neolib::segmented_array<int, segment_size_4> sa4;

	/* WARNING: "rand() % N" is normally bad practice as the low-order bits returned by
	by rand() cannot in all implementations be relied upon to be "random" enough. You
	should instead use all the bits returned by rand() and perform a division. I am
	being lazy by doing it here.
	It is probably a good idea to dump rand() and use something like a Mersenne Twister
	instead. */

	srand(0);
	for (int i = 0; i < iterations; ++i)
	{
		int value = rand();
		v.push_back(value);
		d.push_back(value);
		l.push_back(value);
		aa.push_back(value);
		sa1.push_back(value);
		sa2.push_back(value);
		sa3.push_back(value);
		sa4.push_back(value);
	}

	if (!std::equal(sa1.begin(), sa1.end(), v.begin()) || !std::equal(sa2.begin(), sa2.end(), v.begin()) || !std::equal(sa3.begin(), sa3.end(), v.begin()) || !std::equal(sa4.begin(), sa4.end(), v.begin()))
		std::cout << "Bug! Ranges not equal." << std::endl;

	unsigned int defeatOptimizer = 0;

	{
		srand(0);
		timer time("std::vector random erase: ");
		for (int i = 0; i < iterations - 1; ++i)
			v.erase(v.begin() + rand() % v.size());
	}
	{
		srand(0);
		timer time("std::deque random erase: ");
		for (int i = 0; i < iterations - 1; ++i)
			d.erase(d.begin() + rand() % d.size());
	}
	{
		srand(0);
		timer time("std::list random erase: ");
		for (int i = 0; i < iterations - 1; ++i)
			l.erase(index_list(l, rand() % l.size()));
	}
	{
		srand(0);
		timer time("mkr::avl_array random erase: ");
		for (int i = 0; i < iterations - 1; ++i)
			aa.erase(aa.begin() + rand() % aa.size());
	}
	{
		srand(0);
		timer time("neolib::segmented_array (1) random erase: ");
		for (int i = 0; i < iterations - 1; ++i)
			sa1.erase(sa1.begin() + rand() % sa1.size());
	}
	{
		srand(0);
		timer time("neolib::segmented_array (2) random erase: ");
		for (int i = 0; i < iterations - 1; ++i)
			sa2.erase(sa2.begin() + rand() % sa2.size());
	}
	{
		srand(0);
		timer time("neolib::segmented_array (3) random erase: ");
		for (int i = 0; i < iterations - 1; ++i)
			sa3.erase(sa3.begin() + rand() % sa3.size());
	}
	{
		srand(0);
		timer time("neolib::segmented_array (4) random erase: ");
		for (int i = 0; i < iterations - 1; ++i)
			sa4.erase(sa4.begin() + rand() % sa4.size());
	}

	defeatOptimizer += (*v.begin() + *d.begin() + *l.begin() + *sa1.begin() + *sa2.begin() + *sa3.begin() + *sa4.begin());

	if (!std::equal(sa1.begin(), sa1.end(), v.begin()) || !std::equal(sa2.begin(), sa2.end(), v.begin()) || !std::equal(sa3.begin(), sa3.end(), v.begin()) || !std::equal(sa4.begin(), sa4.end(), v.begin()))
		std::cout << "Bug! Ranges not equal." << std::endl;

	{
		srand(0);
		timer time("std::vector random insert: ");
		for (int i = 0; i < iterations; ++i)
		{
			std::size_t position = rand() % v.size();
			v.insert(v.begin() + position, rand());
		}
	}
	{
		srand(0);
		timer time("std::deque random insert: ");
		for (int i = 0; i < iterations; ++i)
		{
			std::size_t position = rand() % d.size();
			d.insert(d.begin() + position, rand());
		}
	}
	{
		srand(0);
		timer time("std::list random insert: ");
		for (int i = 0; i < iterations; ++i)
		{
			std::size_t position = rand() % l.size();
			l.insert(index_list(l, position), rand());
		}
	}
	{
		srand(0);
		timer time("mkr::avl_array random insert: ");
		for (int i = 0; i < iterations; ++i)
		{
			std::size_t position = rand() % aa.size();
			aa.insert(aa.begin() + position, rand());
		}
	}
	{
		srand(0);
		timer time("neolib::segmented_array (1) random insert: ");
		for (int i = 0; i < iterations; ++i)
		{
			std::size_t position = rand() % sa1.size();
			sa1.insert(sa1.begin() + position, rand());
		}
	}
	{
		srand(0);
		timer time("neolib::segmented_array (2) random insert: ");
		for (int i = 0; i < iterations; ++i)
		{
			std::size_t position = rand() % sa2.size();
			sa2.insert(sa2.begin() + position, rand());
		}
	}
	{
		srand(0);
		timer time("neolib::segmented_array (3) random insert: ");
		for (int i = 0; i < iterations; ++i)
		{
			std::size_t position = rand() % sa3.size();
			sa3.insert(sa3.begin() + position, rand());
		}
	}
	{
		srand(0);
		timer time("neolib::segmented_array (4) random insert: ");
		for (int i = 0; i < iterations; ++i)
		{
			std::size_t position = rand() % sa4.size();
			sa4.insert(sa4.begin() + position, rand());
		}
	}

	if (!std::equal(sa1.begin(), sa1.end(), v.begin()) || !std::equal(sa2.begin(), sa2.end(), v.begin()) || !std::equal(sa3.begin(), sa3.end(), v.begin()) || !std::equal(sa4.begin(), sa4.end(), v.begin()))
		std::cout << "Bug! Ranges not equal." << std::endl;

	{
		srand(0);
		timer time("std::vector random access: ");
		for (int i = 0; i < iterations; ++i)
			defeatOptimizer += v[rand() % v.size()];
	}
	{
		srand(0);
		timer time("std::deque random access: ");
		for (int i = 0; i < iterations; ++i)
			defeatOptimizer += d[rand() % d.size()];
	}
	{
		srand(0);
		timer time("std::list random access: ");
		for (int i = 0; i < iterations; ++i)
			defeatOptimizer += *index_list(l, rand() % l.size());
	}
	{
		srand(0);
		timer time("mkr::avl_array random access: ");
		for (int i = 0; i < iterations; ++i)
			defeatOptimizer += aa[rand() % aa.size()];
	}
	{
		srand(0);
		timer time("neolib::segmented_array (1) random access: ");
		for (int i = 0; i < iterations; ++i)
			defeatOptimizer += sa1[rand() % sa1.size()];
	}
	{
		srand(0);
		timer time("neolib::segmented_array (2) random access: ");
		for (int i = 0; i < iterations; ++i)
			defeatOptimizer += sa2[rand() % sa2.size()];
	}
	{
		srand(0);
		timer time("neolib::segmented_array (3) random access: ");
		for (int i = 0; i < iterations; ++i)
			defeatOptimizer += sa3[rand() % sa3.size()];
	}
	{
		srand(0);
		timer time("neolib::segmented_array (4) random access: ");
		for (int i = 0; i < iterations; ++i)
			defeatOptimizer += sa4[rand() % sa4.size()];
	}

	{
		timer time("std::vector sequential access: ");
		for (std::vector<int>::iterator i = v.begin(); i != v.end(); ++i)
			defeatOptimizer += *i;
	}
	{
		timer time("std::deque sequential access: ");
		for (std::deque<int>::iterator i = d.begin(); i != d.end(); ++i)
			defeatOptimizer += *i;
	}
	{
		timer time("std::list sequential access: ");
		for (std::list<int>::iterator i = l.begin(); i != l.end(); ++i)
			defeatOptimizer += *i;
	}
	{
		timer time("mkr::avl_array sequential access: ");
		for (mkr::avl_array<int>::iterator i = aa.begin(); i != aa.end(); ++i)
			defeatOptimizer += *i;
	}
	{
		timer time("neolib::segmented_array (1) sequential access: ");
		for (neolib::segmented_array<int, segment_size_1>::iterator i = sa1.begin(); i != sa1.end(); ++i)
			defeatOptimizer += *i;
	}
	{
		timer time("neolib::segmented_array (2) sequential access: ");
		for (neolib::segmented_array<int, segment_size_2>::iterator i = sa2.begin(); i != sa2.end(); ++i)
			defeatOptimizer += *i;
	}
	{
		timer time("neolib::segmented_array (3) sequential access: ");
		for (neolib::segmented_array<int, segment_size_3>::iterator i = sa3.begin(); i != sa3.end(); ++i)
			defeatOptimizer += *i;
	}
	{
		timer time("neolib::segmented_array (4) sequential access: ");
		for (neolib::segmented_array<int, segment_size_4>::iterator i = sa4.begin(); i != sa4.end(); ++i)
			defeatOptimizer += *i;
	}

	{
		timer time("std::vector sort: ");
		std::sort(v.begin(), v.end());
	}
	{
		timer time("std::deque sort: ");
		std::sort(d.begin(), d.end());
	}
	{
		timer time("std::list sort: ");
		l.sort();
	}
	{
		timer time("mkr::avl_array sort: ");
		std::sort(aa.begin(), aa.end());
	}
	{
		timer time("neolib::segmented_array (1) sort: ");
		std::sort(sa1.begin(), sa1.end());
	}
	{
		timer time("neolib::segmented_array (2) sort: ");
		std::sort(sa2.begin(), sa2.end());
	}
	{
		timer time("neolib::segmented_array (3) sort: ");
		std::sort(sa3.begin(), sa3.end());
	}
	{
		timer time("neolib::segmented_array (4) sort: ");
		std::sort(sa4.begin(), sa4.end());
	}

	defeatOptimizer += (*v.begin() + *d.begin() + *l.begin() + *sa1.begin() + *sa2.begin() + *sa3.begin() + *sa4.begin());

	if (!std::equal(sa1.begin(), sa1.end(), v.begin()) || !std::equal(sa2.begin(), sa2.end(), v.begin()) || !std::equal(sa3.begin(), sa3.end(), v.begin()) || !std::equal(sa4.begin(), sa4.end(), v.begin()))
		std::cout << "Bug! Ranges not equal." << std::endl;

	std::cout << std::endl << "-+"[defeatOptimizer % 2] << std::endl;

	return 0;
}

The output of which is (compiled with Microsoft Visual Studio 2013, checked iterators disabled, running on an Intel Core i7-4960X @ 3.6 GHz):

std::vector random erase: 7.4317 seconds
std::deque random erase: 13.9245 seconds
std::list random erase: 24.8124 seconds
mkr::avl_array random erase: 0.1288 seconds
neolib::segmented_array (1) random erase: 0.2581 seconds
neolib::segmented_array (2) random erase: 0.1250 seconds
neolib::segmented_array (3) random erase: 0.0961 seconds
neolib::segmented_array (4) random erase: 0.0813 seconds
std::vector random insert: 56.7711 seconds
std::deque random insert: 19.2342 seconds
std::list random insert: 55.3831 seconds
mkr::avl_array random insert: 0.1560 seconds
neolib::segmented_array (1) random insert: 0.3113 seconds
neolib::segmented_array (2) random insert: 0.1489 seconds
neolib::segmented_array (3) random insert: 0.1355 seconds
neolib::segmented_array (4) random insert: 0.1741 seconds
std::vector random access: 0.0042 seconds
std::deque random access: 0.0082 seconds
std::list random access: 70.4020 seconds
mkr::avl_array random access: 0.0859 seconds
neolib::segmented_array (1) random access: 0.1074 seconds
neolib::segmented_array (2) random access: 0.0389 seconds
neolib::segmented_array (3) random access: 0.0292 seconds
neolib::segmented_array (4) random access: 0.0458 seconds
std::vector sequential access: 0.0020 seconds
std::deque sequential access: 0.0045 seconds
std::list sequential access: 0.0084 seconds
mkr::avl_array sequential access: 0.0195 seconds
neolib::segmented_array (1) sequential access: 0.0138 seconds
neolib::segmented_array (2) sequential access: 0.0052 seconds
neolib::segmented_array (3) sequential access: 0.0022 seconds
neolib::segmented_array (4) sequential access: 0.0021 seconds
std::vector sort: 0.0206 seconds
std::deque sort: 0.0399 seconds
std::list sort: 0.0857 seconds
mkr::avl_array sort: 0.6119 seconds
neolib::segmented_array (1) sort: 0.4985 seconds
neolib::segmented_array (2) sort: 0.0577 seconds
neolib::segmented_array (3) sort: 0.0476 seconds
neolib::segmented_array (4) sort: 0.0489 seconds

+
 

You can e-mail comments to the author.

Back to project page