Concurrency considerations in libpmemobj-cpp
Introduction
Ensuring data consistency on pmem is a challenging task. It gets even more complicated if data is modified concurrently. This blog post describes several challenges related to data visibility, using transactions in multi-threaded environments, and memory leaks.
Lock-free programming on pmem
A fundamental issue (if eADR is not used) is data visibility. When a thread issues a temporal (e.g., MOV) store instruction, the modification might be visible to other threads before it is persistent (data can still be in a CPU cache). Consider the following scenario with one thread writing to a variable using atomic operations and the second one reading from it:
// thread 1
// initial value of pmem->a is 0
atomic_store(&pmem->a, 1); // visible = 1, persistent = ?
pmem_persist(&pmem->a, sizeof(pmem->a)); // visible = 1, persistent = 1
// thread 2
// initial value of pmem->b is 0
if (atomic_load(&pmem->a) == 1) {
pmem->b = 1; // visible = 1, persistent = ?
pmem_persist(&pmem->b, sizeof(pmem->b)); // visible = 1, persistent = 1
}
Let’s analyze the values of pmem->a and pmem->b from the second thread perspective for a x86 architecture (visibility vs persistency guarantees might differ for different architectures). If a crash happened during the application execution, the possible values on restart are:
- (
pmem->a = 0
,pmem->b = 0
) – e.g., when the crash happened before the first thread started - (
pmem->a = 1
,pmem->b = 1
) – e.g., when the crash happened after both threads completed - (
pmem->a = 1
,pmem->b = 0
) – e.g., when the crash happened after the first thread completed but before the second one started
But what happens if there was a crash just after the first thread set the
value, but before it called pmem_persist
? The second thread could have
observed the new value (which is not yet persistent) and set the pmem->b
variable to 1
. On restart, it’s possible that pmem->a
is equal to 0
,
and pmem->b
is equal to 1
. All four combinations of values are possible
on restart. The fact that the second thread might take some actions based
on a non-persistent state is problematic as it causes data inconsistency.
One possible way of preventing this situation is by persisting all values just
after reading them (but before any other action is taken based on the
value). This guarantees that the value was persistent at some point in time.
The second thread from the earlier example could look like presented below:
// thread 2
// initial value of pmem->b is 0
if (atomic_load(&pmem->a) == 1) {
pmem_persist(&pmem->a, sizeof(pmem->a)); // visible = 1, persistent = 1
pmem->b = 1; // visible = 1, persistent = ?
pmem_persist(&pmem->b, sizeof(pmem->b)); // visible = 1, persistent = 1
}
This solution makes (pmem->a = 0
, pmem->b = 1
) case impossible at the cost
of additional persist on each read. To optimize this approach, we can add a
dirty bit inside the value, which indicates whether the value was explicitly
persisted or not. When modifying the value, the dirty flag is set to true. All
readers first check this flag and proceed as follows:
- If a flag is set: a reader uses compare-and-swap to clear the bit and then persists the value.
- If a flag is not set: the value is for sure persistent, and the reader can proceed.
You can see an example of such implementation here.
Shared state consistency
One other problem which arises in concurrent applications is keeping shared state consistent. For example, to guarantee that calculating the number of elements in a data structure has O(1) time complexity, a size variable is usually used. In a case where only a single thread modifies the variable, one can safely use pmemobj transaction to ensure the size consistency like this:
struct list {
pmem::obj::p<size_t> size;
void push_back(Value v) {
pmem::obj::transaction::run(pop, [&]{
... // allocate new node and insert it at the end
size++;
});
}
};
Just as a reminder (more details can be found here)
when a variable of type pmem::obj::p
(or other persistent-aware structure
like persistent_ptr
or self_relative_ptr
) is modified, its old value is saved
in an undo log. In case the enclosing transaction aborts or there is a power
failure, the value is rolled back. If the transaction succeeds, all modified
variables of persistent-aware types are persisted.
In a concurrent case, when multiple threads can insert new elements and update the size, this becomes more problematic. A shared variable cannot be simply modified in a pmemobj (undo log based) transaction without taking a global lock. To see why consider the following example:
struct list {
pmem::obj::p<std::atomic<size_t>> size;
// This function is called concurrently from multiple threads
void push_back(Value v) {
pmem::obj::transaction::run(pop, [&]{
... // allocate new node and insert it at the end
size++;
});
}
};
Let’s assume that there are two threads calling push_back()
concurrently, and
they start incrementing size simultaneously. They will both store the same
initial size
value in theirs undo logs. If the transaction in the first
thread aborts, it will abort the size
value to the initial state. It will
completely ignore the changes made by the second transaction.
One other problem is that readers might suffer from the ‘dirty reads’ phenomenon. If a reader reads size variable while another thread modifies it inside a transaction, it reads uncommitted data. If a transaction aborts or a power failure happens, the size variable can be rolled back, and the reader’s view of the data might be incorrect. In addition to that, the rollback itself is not thread-safe.
One possible solution is to make the global size volatile. The volatile
variable can be modified using atomic instructions (after the transaction
completes) while providing the number of elements in O(1) time. To avoid losing
size on application restart we can introduce a persistent, per-thread diff
variable which will track how many elements were inserted or deleted by a
thread. Since it is per-thread, it can be safely modified in a transaction.
On recovery, all diff
variables can be summed-up to provide the actual size.
Libpmemobj-cpp implements a helper data structure (which as of publish date
of this blog post is not yet released as a public API) called
enumerable_thread_specific. It provides API to request memory which
will be private to a thread and iteration API. Enumerable thread-specific is
used in concurrent_map and concurrent_hash_map to solve the shared state
problem.
Following example shows recover
and reimplemented push_back
method for list
data structure considered above:
struct list {
std::atomic<size_t> global_size;
enumerable_thread_specific<pmem::obj::p<int64_t>> ptls;
void push_back(Value v) {
// `diff` is a reference to thread-local storage on pmem
auto &diff = ptls.local();
pmem::obj::transaction::run(pop, [&]{
... // allocate new node and insert it at the end
diff++;
});
global_size++;
}
void recover() {
int64_t size = 0;
for (auto &diff : ptls)
size += diff;
global_size = size;
}
};
Publishing allocations
In the above example, we ignored allocating and publishing new nodes - let’s
look at this now. The example below shows an attempt at implementing the
push_back
method in a concurrent, lock-free list. For simplicity, this example
will assume that there can be multiple concurrent operations on the list, but
only one of them can be push_back
(it will be a single-writer,
multi-reader list).
using pmem::obj;
using pmem::obj::experimental;
struct list {
struct Node {
Value v;
std::atomic<self_relative_ptr<Node>> next;
};
std::atomic<self_relative_ptr<Node>> head;
void push_back(Value v) {
self_relative_ptr<Node> new_node;
transaction::run(pop, [&]{
new_node = make_persistent<Node>(v);
find_last()->next.store(new_node);
});
}
void iterate(F callback) {
self_relative_ptr<Node> ptr = head.load();
while (ptr != nullptr) {
callback(ptr->v);
ptr = ptr->next.load();
}
}
};
The implementation uses self_relative_ptr as it’s currently the only pmem::obj
pointer type that is working with std::atomic
.
A careful reader will notice that the implementation will not work correctly. Here, we encounter the same problem with dirty reads which was described in section Shared state consistency.
To solve this problem, one can try to move atomic store outside of the transaction:
void push_back(Value v) {
self_relative_ptr<Node> new_node;
pmem::obj::transaction::run(pop, [&]{
new_node = make_persistent<Node>(v);
});
find_last()->next.store(new_node);
pop.persist(&find_last()->next, sizeof(new_node));
}
To make sure there are no visibility problems, iterate method would need to be changed to use persist on read, as described in section Lock-free programming on pmem:
void iterate(F callback) {
self_relative_ptr<Node> ptr = head.load();
pop.persist(&head, sizeof(head));
while (ptr != nullptr) {
callback(ptr->v);
auto next = ptr->next.load();
pop.persist(&ptr->next, sizeof(ptr->next));
ptr = next;
}
}
However, there is still one other issue - a possible memory leak. If a crash
happens during the push_back
method just after the new node is allocated,
but before it is linked to the list, it will not be reachable after a restart.
new_node
pointer is kept on the stack, which means it will not survive a restart.
A solution to this problem is to move new_node
pointer variable from stack to
persistent memory. In this example it can be a separate variable within the
list
structure.
void push_back(Value v) {
pmem::obj::transaction::run(pop, [&]{
// new_node resides on pmem.
this->new_node = make_persistent<Node>(v);
});
find_last()->next.store(this->new_node);
pop.persist(&find_last()->next, sizeof(this->new_node));
}
Because new_node
now resides on persistent memory, even if it is not linked
to the list, it is always reachable on restart. There are no problems with
dirty reads or visibility as new_node
is never read during normal execution
(only on recovery). If push_back
would need to be thread-safe, new_node
variable could be replaced with enumerable_thread_specific.
Summary
In this blog post, I highlighted a few challenges related to concurrent programming on persistent memory.
First, I discussed the fact that data modification might be visible before becoming persistent, which can lead to data inconsistencies in lock-free programming.
Then, I moved on to discuss how to handle shared state on persistent memory. I proposed a solution for keeping the shared state consistent by using persistent, per-thread storage.
Finally, I talked about avoiding memory leaks in pmemobj-based, lock-free applications. I suggested splitting global state modifications into two steps. First, store a pointer to newly allocated memory to a private (possibly per-thread) persistent variable and only then make it visible for other threads.