C++ persistent containers

[Note: pmem::obj::vector<> is no longer experimental. The rest of the information in this blog post is still accurate.]

Introduction

The main idea behind pmem containers is to fully exploit persistent memory potential by designing optimized on-media layouts and algorithms for persistent memory programming. On November, we published a blog post about pmem containers. If you haven’t read it yet, I encourage you to do that now.

We have recently added pmem::obj:vector container to libpmemobj-cpp library. This container is currently placed in experimental namespace and folder - this means that both API and layout may change. It provides API similar to std::vector from C++11 but guarantees full exception safety via commit or rollback semantics and allocates data in persistent memory.

Limitations

pmem::obj::vector allocates data in persistent memory libpmemobj pool. This limits maximum allocation size to value equal to PMEMOBJ_MAX_ALLOC_SIZE macro. Due to this limitation and due to the fact that pmem::obj:vector is dynamic contiguous array, maximum number of elements that can be stored in the pool is equal to PMEMOBJ_MAX_ALLOC_SIZE / sizeof(element_type) and this value can be returned by max_size() API function.

Since stored elements will reside in persistent memory, element’s type should satisfy requirements of:

StandardLayoutType (because objects representation (layout) might differ between compilers/compiler flags/ABI)
TriviallyCopyable (because we are not calling neither constructors nor destructors during snapshotting memory areas).

As a consequence type of stored element:

shouldn’t be polymorphic,
shouldn’t have non-static data members of reference type,
every copy constructor, move constructor, copy assignment operator, move assignment operator should be trivial (i.e. implicitly-defined or defaulted) or deleted,
at least one copy constructor, move constructor, copy assignment operator, or move assignment operator is non-deleted and should have trivial non-deleted destructor.

However, it is important to realize that pointers are trivially copyable types too. Whenever there are pointer inside the data structure that will be snapshotted (memcopyed) you have to make sure that copying them around is proper. The same rule applies for persistent_ptr type, even if it doesn’t satisfy TriviallyCopyable name requirements (because of explicitly-defined constructors).

pmem::obj:vector user and every persistent memory programmer should always check whether persistent_ptr could be copied in that specific case and if that wouldn’t cause errors and (persistent) memory leaks. One should realize that std::is_trivially_copyable is the syntax check only and it doesn’t tests semantics. Technically speaking, using persistent_ptr in this context leads to undefined behavior. There is no golden mean and since C++ standard does not fully support persistent memory programming, we should make sure that copying persistent_ptr is safe to use in our case.

It is very important to mention here that storing volatile memory pointers in persistent memory is almost always a design error (after application crash, pointer to virtual memory is no longer valid). Using persistent_ptr is safe and it provides only way to access specific memory area after application crash.

API extensions

API for pmem::obj:vector and std::vector is the same, except for the following:

pmem::obj:vector defines range() method (detailed description you can find in pmem::obj:array blog post)
pmem::obj:vector does not mark any non-const function as noexcept - elements must be added to a transaction which could throw an exception
pmem::obj:vector overloads constructor, assign method and assign operator to work with std::vector objects
pmem::obj:vector defines non-member compare functions between pmem::obj:vector and std::vector
pmem::obj:vector defines free_data() function that is recommended to being called before pmem::obj:vector destructor (freeing allocated persistent memory in transaction may throw an exception)
pmem::obj:vector defines const_at(), cfront(), cback() and cdata() element access methods. We decided that using at(), front(), back() and data() overloads which return const_reference (or const_pointer) is not enough (overload deduction depends on the const-qualification of the object it is called on and it is burdensome to cast pmem::obj:vector into const pmem::obj:vector), especially in persistent memory programming, where accessing element’s value for read-only purposes might be frequent operation and there is no need for doing it in transaction. Note that this is not possible to overcome this problem for operator[].

Usage

One of our main goals while designing pmem::obj:vector was to create as much similar API to std::vector as possible. The only usage difference in persistent memory version of vector is creation of an object. pmem::obj:vector resides on persistent memory so you need a way to access stored elements even after program crash, which can be done using pool’s root object. The root object is the anchor to which all the memory structures should be attached.

Here is an example how to create pmem::obj:vector:

#include <libpmemobj++/make_persistent.hpp>
#include <libpmemobj++/transaction.hpp>
#include <libpmemobj++/persistent_ptr.hpp>
#include <libpmemobj++/pool.hpp>
#include <libpmemobj++/experimental/vector.hpp>
#include <libpmemobj++/experimental/slice.hpp>

using vector_type = pmem::obj::experimental::vector<int>;

struct root {
  pmem::obj::persistent_ptr<vector_type> vec_p;
};

...

/* creating pmem::obj::vector in transaction */
pmem::obj::transaction::run(pop, [&] {
  root->vec*p = pmem::obj::make_persistent<vector_type>(/* optional constructor arguments */);
});

vector_type &pvector = *(root->vec_p);

As you can see in above code snippet pmem::obj:vector must be created and allocated in persistent memory using inside of transaction (an exception will be thrown otherwise). Vector’s element type constructor may construct an object by internally opening another transaction. In this case inner transaction will be flattened to outer one.

From now on usage of pmem::obj:vector is similar to usage of std::vector:

pvector.reserve(10);
assert(pvector.size() == 0);
assert(pvector.capacity() == 10);

pvector = {0, 1, 2, 3, 4};
assert(pvector.size() == 5);
assert(pvector.capacity() == 10);

pvector.shrink_to_fit();
assert(pvector.size() == 5);
assert(pvector.capacity() == 5);

for (unsigned i = 0; i < pvector.size(); ++i)
assert(pvector.const_at(i) == static_cast<int>(i));

pvector.push_back(5);
assert(pvector.const_at(5) == 5);
assert(pvector.size() == 6);

pvector.emplace(pvector.cbegin(), pvector.back());
assert(pvector.const_at(0) == 5);
for (unsigned i = 1; i < pvector.size(); ++i)
assert(pvector.const_at(i) == static_cast<int>(i - 1));

Note that every single modifier method opens transaction internally and guarantees full exception safety (modifications will be either committed or rolled-back if an exception was thrown, or crash happened). There is no need for using transaction when calling modifier methods whatsoever.

As you can see, we are checking i < pvector.size() on every loop iteration. Since pvector is a reference to dereferenced persistent pointer, this check is fast and can be optimized by compiler. But if you will use root->vec_p->size() from the other hand, you will notice performance overhead. The reason behind that is dereferencing of persistent_ptr in current implementation cannot be optimized and cached by compilers. We are working on workaround for this issue, but it is recommended to avoid unnecessary persistent_ptr dereferencing operations.

Iterating over pmem::obj:vector works just like for an ordinary std::vector: you can use indexing operator, range-based for loops or iterators. pmem::obj:vector can also be processed using std::algorithms:

std::vector<int> stdvector = {5, 4, 3, 2, 1};
pvector = stdvector;

try {
  pmem::obj::transaction::run(pop, [&] {
    for (auto &e : pvector)
      e++;
    /* 6, 5, 4, 3, 2 */

    for (auto it = pvector.begin(); it != pvector.end(); it++)
      *it += 2;
    /* 8, 7, 6, 5, 4 */

    for (unsigned i = 0; i < pvector.size(); i++)
      pvector[i]--;
    /* 7, 6, 5, 4, 3 */

    std::sort(pvector.begin(), pvector.end());
    for (unsigned i = 0; i < sz; ++i)
      assert(pvector.const_at(i) == static_cast<int>(i + 3));

    pmem::obj::transaction::abort(0);
  });
} catch (pmem::manual*tx_abort &) {
  /* expected transaction abort */
} catch (std::exception &e) {
  std::cerr << e.what() << std::endl;
}

assert(pvector == stdvector); /* pvector element's value was rolled back */

try {
  pmem::obj::delete_persistent<vector_type>(&pvector);
} catch (std::exception &e) {
  std::cerr << e.what() << std::endl;
}

If there is an active transaction elements (accessed using any of the presented above methods) are snapshotted. In case of iterators returned by begin() and end() snapshotting happens during iterator dereferencing. Of course, snapshotting is done only for mutable elements. In case of const iterators or const versions of indexing operator, nothing is added to the transaction. That’s why it is extremely important to use const qualified function overloads (cbegin(), cend(), etc.) whenever possible (if an object was snapshotted in current transaction, second snapshot of the same memory address won’t be performed and thus won’t have performance overhead). This will reduce number of snapshots and can significantly reduce the performance impact of transactions.

Note also that pmem::obj:vector does define convenient constructors and compare operators which take std::vector as an argument.

pmem::obj::slice

In cases where loop is known to modify several consecutive elements in the vector, a bulk-snapshot optimization can be performed using a special range() function. The usage of range() and pmem::obj::slice was described in blog post about pmem::obj:array here. It works for pmem::obj:vector in the same way.

Summary

To summarize if you need persistent scratch pad, extension for in-memory database or fast and flexible data storage with attributes of sequence container representing arrays that can change in size, you should use pmem::obj:vector.

libpmemobj-cpp library provides two persistent containers now: pmem::obj:array and pmem::obj:vector. We are currently working on pmem::obj::string implementation, stay tuned!