Overview#
Unexceptionally Consistent Set
Imagine In-Memory Templated Containers
Being as Consistent as Databases
UCSet library provides std::set
-like class templates for C++, where every operation is noexcept
, and no update can leave the container in a partial state.
There are 3 containers to choose from:
``consistent_set` <tree/main/include/ucset/consistent_set.hpp>`_: serializable consistency, fully sorted, based on ``std::set` <https://en.cppreference.com/w/cpp/container/set>`_.
``consistent_avl` <tree/main/include/ucset/consistent_avl.hpp>`_: serializable consistency, fully sorted, based on AVL trees.
``versioning_avl` <tree/main/include/ucset/versioning_avl.hpp>`_: snapshot isolation via MVCC, fully sorted, based on AVL trees.
All of them:
are
noexcept
top to bottom!are templated, to be used with any
noexcept
-movable anddefault
-constructible types.can be wrapped into ``locked_gt` <tree/main/include/ucset/locked.hpp>`_, to make them thread-safe.
can be wrapped into ``partitioned_gt` <tree/main/include/ucset/partitioned.hpp>`_, to make them concurrent.
If you want your exceptions and classical interfaces back, you can also wrap any container into ``crazy_gt` <tree/main/include/ucset/crazy.hpp>`_.
Installation#
The entire library is header-only and requires C++17. You can copy-paste it, but it is not 2022 anymore. We suggest using CMake:
include(FetchContent)
FetchContent_Declare(
ucset
GIT_REPOSITORY https://github.com/unum-cloud/ucset
GIT_TAG main
CONFIGURE_COMMAND "" # Nothing to configure, its that simple :)
BUILD_COMMAND "" # No build needed, UCSet is header-only
)
FetchContent_MakeAvailable(ucset)
include_directories(${consistent_set_SOURCE_DIR})
Why we created this?#
Hate for ``std::bad_alloc` <https://en.cppreference.com/w/cpp/memory/new/bad_alloc>`_. If you consider “Out of Memory” an exception, you are always underutilizing your system. It happened way too many times that a program crashed when I was only getting to an exciting place. Especially with:
Neo4J and every other JVM-based project.
With big batch sizes beyond VRAM sizes of GPU when doing ML.
At Unum, we live in conditions where machines can easily have 1 TB of RAM per CPU socket, but it is still at least 100x less than the datasets we are trying to swallow.
So when we started working on UKV to build high-speed hardware-friendly databases, we needed something better than Standard Templates Library, with features uncommon to other libraries as well:
Accessing the allocator state by reference.
Reserving memory for tree nodes before inserting.
Explicitly traversing trees for random sampling.
Speed!
Now UCSet powers the in-memory backend of UKV.
Performance Tuning#
Concurrent containers in the library are blocking. Their performance greatly depends on the “mutexes” you are using. So we allow different implementations:
STL: ``std::shared_mutex` <https://en.cppreference.com/w/cpp/thread/shared_mutex>`_,
Intel One API: ``tbb::rw_mutex` <https://spec.oneapi.io/versions/latest/elements/oneTBB/source/named_requirements/mutexes/rw_mutex.html#readerwritermutex>`_,
Or anything else with the same interfaces.