Contributing#
Contribution Guide#
Thank you for coming here! It’s always nice to have third-party contributors 🤗
To keep the quality of the code high, we have a set of guidelines common to all Unum projects.
Before you start#
Before building the first time, please pull git
submodules.
That’s how we bring in SimSIMD and other optional dependencies to test all of the available functionality.
git submodule update --init --recursive
C++ 11 and C 99#
Our primary C++ implementation uses CMake for builds. If this is your first experience with CMake, use the following commands to get started:
sudo apt-get update && sudo apt-get install cmake build-essential libjemalloc-dev g++-12 gcc-12 # Ubuntu
brew install libomp llvm # MacOS
Using modern syntax, this is how you build and run the test suite:
cmake -D USEARCH_BUILD_TEST_CPP=1 -D CMAKE_BUILD_TYPE=Debug -B build_debug
cmake --build build_debug --config Debug
build_debug/test_cpp
If there build mode is not specified, the default is Release
.
cmake -D USEARCH_BUILD_TEST_CPP=1 -B build_release
cmake --build build_release --config Release
build_release/test_cpp
For development purposes, you may want to include symbols information in the build:
cmake -D USEARCH_BUILD_TEST_CPP=1 -D CMAKE_BUILD_TYPE=RelWithDebInfo -B build_relwithdebinfo
cmake --build build_relwithdebinfo --config RelWithDebInfo
build_relwithdebinfo/test_cpp
The CMakeLists.txt file has a number of options you can pass:
What to build:
USEARCH_BUILD_TEST_CPP
- build the C++ test suiteUSEARCH_BUILD_BENCH_CPP
- build the C++ benchmark suiteUSEARCH_BUILD_LIB_C
- build the C libraryUSEARCH_BUILD_TEST_C
- build the C test suiteUSEARCH_BUILD_SQLITE
- build the SQLite extension (no Windows)
Which dependencies to use:
USEARCH_USE_OPENMP
- use OpenMP for parallelismUSEARCH_USE_SIMSIMD
- use SimSIMD for vectorizationUSEARCH_USE_JEMALLOC
- use Jemalloc for memory managementUSEARCH_USE_FP16LIB
- use software emulation for half-precision floating point
Putting all of this together, compiling all targets on most platforms should work with the following snippet:
cmake -D CMAKE_BUILD_TYPE=Release -D USEARCH_USE_FP16LIB=1 -D USEARCH_USE_OPENMP=1 -D USEARCH_USE_SIMSIMD=1 -D USEARCH_USE_JEMALLOC=1 -D USEARCH_BUILD_TEST_CPP=1 -D USEARCH_BUILD_BENCH_CPP=1 -D USEARCH_BUILD_LIB_C=1 -D USEARCH_BUILD_TEST_C=1 -D USEARCH_BUILD_SQLITE=0 -B build_release
cmake --build build_release --config Release
build_release/test_cpp
build_release/test_c
Similarly, to use the most recent Clang compiler version from HomeBrew on MacOS:
brew install clang++ clang cmake
cmake \
-D CMAKE_BUILD_TYPE=Release \
-D CMAKE_C_COMPILER="$(brew --prefix llvm)/bin/clang" \
-D CMAKE_CXX_COMPILER="$(brew --prefix llvm)/bin/clang++" \
-D USEARCH_USE_FP16LIB=1 \
-D USEARCH_USE_OPENMP=1 \
-D USEARCH_USE_SIMSIMD=1 \
-D USEARCH_USE_JEMALLOC=1 \
-D USEARCH_BUILD_TEST_CPP=1 \
-D USEARCH_BUILD_BENCH_CPP=1 \
-D USEARCH_BUILD_LIB_C=1 \
-D USEARCH_BUILD_TEST_C=1 \
-B build_release
cmake --build build_release --config Release
build_release/test_cpp
build_release/test_c
Linting:
cppcheck --enable=all --force --suppress=cstyleCast --suppress=unusedFunction \
include/usearch/index.hpp \
include/index_dense.hpp \
include/index_plugins.hpp
I’d recommend putting the following breakpoints when debugging the code in GDB:
__asan::ReportGenericError
- to detect illegal memory accesses.__ubsan::ScopedReport::~ScopedReport
- to catch undefined behavior.__GI_exit
- to stop at exit points - the end of running any executable.__builtin_unreachable
- to catch all the places where the code is expected to be unreachable.__usearch_raise_runtime_error
- for USearch-specific assertions.
Cross Compilation#
Unlike GCC, LLVM handles cross compilation very easily.
You just need to pass the right TARGET_ARCH
and BUILD_ARCH
to CMake.
The list includes:
crossbuild-essential-amd64
for 64-bit x86crossbuild-essential-arm64
for 64-bit Armcrossbuild-essential-armhf
for 32-bit ARM hard-floatcrossbuild-essential-armel
for 32-bit ARM soft-float (emulatesfloat
)crossbuild-essential-riscv64
for RISC-Vcrossbuild-essential-powerpc
for PowerPCcrossbuild-essential-s390x
for IBM Zcrossbuild-essential-mips
for MIPScrossbuild-essential-ppc64el
for PowerPC 64-bit little-endian
Here is an example for cross-compiling for Arm64 on an x86_64 machine:
sudo apt-get update
sudo apt-get install -y clang lld make crossbuild-essential-arm64 crossbuild-essential-armhf
export CC="clang"
export CXX="clang++"
export AR="llvm-ar"
export NM="llvm-nm"
export RANLIB="llvm-ranlib"
export TARGET_ARCH="aarch64-linux-gnu" # Or "x86_64-linux-gnu"
export BUILD_ARCH="arm64" # Or "amd64"
cmake -D CMAKE_BUILD_TYPE=Release \
-D CMAKE_C_COMPILER_TARGET=${TARGET_ARCH} \
-D CMAKE_CXX_COMPILER_TARGET=${TARGET_ARCH} \
-D CMAKE_SYSTEM_NAME=Linux \
-D CMAKE_SYSTEM_PROCESSOR=${BUILD_ARCH} \
-B build_artifacts
cmake --build build_artifacts --config Release
Python 3#
Python bindings are built using PyBind11 and are available on PyPi.
The compilation settings are controlled by the setup.py
and are independent from CMake used for C/C++ builds.
To install USearch locally:
pip install -e .
For testing USearch uses PyTest, which is pre-configured in pyproject.toml
.
Following options are enabled:
The
-s
option will disable capturing the logs.The
-x
option will exit after first failure to simplify debugging.The
-p no:warnings
option will suppress and allow warnings.
pip install pytest pytest-repeat # for repeated fuzzy tests
pytest # if you trust the default settings
pytest python/scripts/ -s -x -p no:warnings # to overwrite the default settings
Linting:
pip install ruff
ruff --format=github --select=E9,F63,F7,F82 --target-version=py37 python
Before merging your changes you may want to test your changes against the entire matrix of Python versions USearch supports.
For that you need the cibuildwheel
, which is tricky to use on MacOS and Windows, as it would target just the local environment.
Still, if you have Docker running on any desktop OS, you can use it to build and test the Python bindings for all Python versions for Linux:
pip install cibuildwheel
cibuildwheel
cibuildwheel --platform linux # works on any OS and builds all Linux backends
cibuildwheel --platform linux --archs x86_64 # 64-bit x86, the most common on desktop and servers
cibuildwheel --platform linux --archs aarch64 # 64-bit Arm for mobile devices, Apple M-series, and AWS Graviton
cibuildwheel --platform macos # works only on MacOS
cibuildwheel --platform windows # works only on Windows
You may need root privileges for multi-architecture builds:
sudo $(which cibuildwheel) --platform linux
On Windows and MacOS, to avoid frequent path resolution issues, you may want to use:
python -m cibuildwheel --platform windows
JavaScript#
USearch provides NAPI bindings for NodeJS available on NPM.
The compilation settings are controlled by the binding.gyp
and are independent from CMake used for C/C++ builds.
If you don’t have NPM installed, first the Node Version Manager:
wget -qO- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
nvm install 20
Testing:
npm install -g typescript
npm install
npm run build-js
npm test
To compile for AWS Lambda you’d need to recompile the binding. You can test the setup locally, overriding some of the compilation variables in Docker image:
FROM public.ecr.aws/lambda/nodejs:18-x86_64
RUN npm init -y
RUN yum install tar git python3 cmake gcc-c++ -y && yum groupinstall "Development Tools" -y
# Assuming AWS Linux 2 uses old compilers:
ENV USEARCH_USE_FP16LIB 1
ENV DUSEARCH_USE_SIMSIMD 1
ENV SIMSIMD_TARGET_HASWELL 1
ENV SIMSIMD_TARGET_SKYLAKE 0
ENV SIMSIMD_TARGET_ICE 0
ENV SIMSIMD_TARGET_SAPPHIRE 0
ENV SIMSIMD_TARGET_NEON 1
ENV SIMSIMD_TARGET_SVE 0
# For specific PR:
# RUN npm install --build-from-source unum-cloud/usearch#pull/302/head
# For specific version:
# RUN npm install --build-from-source usearch@2.8.8
RUN npm install --build-from-source usearch
To compile to WebAssembly make sure you have emscripten
installed and run the following script:
emcmake cmake -B build -DCMAKE_CXX_FLAGS="${CMAKE_CXX_FLAGS} -s TOTAL_MEMORY=64MB" && emmake make -C build
node build/usearch.test.js
If you don’t yet have emcmake
installed:
git clone https://github.com/emscripten-core/emsdk.git && ./emsdk/emsdk install latest && ./emsdk/emsdk activate latest && source ./emsdk/emsdk_env.sh
Rust#
USearch provides Rust bindings available on Crates.io.
The compilation settings are controlled by the build.rs
and are independent from CMake used for C/C++ builds.
cargo test -p usearch -- --nocapture --test-threads=1
Publishing the crate is a bit more complicated than normally. If you simply pull the repository with submodules and run the following command it will list fewer files than expected:
cargo package --list --allow-dirty
The reason for that is the heuristic that Cargo uses to determine the files to include in the package.
Regardless of whether exclude or include is specified, the following files are always excluded: Any sub-packages will be skipped (any subdirectory that contains a Cargo.toml file).
Assuming both SimSIMD and StringZilla contain their own Cargo.toml
files, we need to temporarily exclude them from the package.
mv simsimd/Cargo.toml simsimd/Cargo.toml.bak
mv stringzilla/Cargo.toml stringzilla/Cargo.toml.bak
cargo package --list --allow-dirty
cargo publish
# Revert back
mv simsimd/Cargo.toml.bak simsimd/Cargo.toml
mv stringzilla/Cargo.toml.bak stringzilla/Cargo.toml
Objective-C and Swift#
USearch provides both Objective-C and Swift bindings through the Swift Package Manager.
The compilation settings are controlled by the Package.swift
and are independent from CMake used for C/C++ builds.
swift build && swift test -v
Those depend on Apple’s
Foundation
library and can only run on Apple devices.
Swift formatting is enforced with swift-format
default utility from Apple.
To install and run it on all the files in the project, use the following command:
brew install swift-format
swift-format . -i -r
The style is controlled by the .swift-format
JSON file in the root of the repository.
As there is no standard for Swift formatting, even Apple’s own swift-format
tool and Xcode differ in their formatting rules, and available settings.
GoLang#
USearch provides GoLang bindings, that depend on the C library that must be installed beforehand. So one should first compile the C library, link it with GoLang, and only then run tests.
cmake -B build_release -D USEARCH_BUILD_LIB_C=1 -D USEARCH_BUILD_TEST_C=1 -D USEARCH_USE_OPENMP=1 -D USEARCH_USE_SIMSIMD=1
cmake --build build_release --config Release -j
cp build_release/libusearch_c.so golang/ # or .dylib to install the library on MacOS
cp c/usearch.h golang/ # to make the header available to GoLang
cd golang && LD_LIBRARY_PATH=. go test -v ; cd ..
Java#
USearch provides Java bindings available from the GitHub Maven registry and the Sonatype Maven Central Repository.
The compilation settings are controlled by the build.gradle
and are independent from CMake used for C/C++ builds.
To setup the Gradle environment:
sudo apt-get install zip
curl -s "https://get.sdkman.io" | bash
sdk install java
sdk install gradle
Afterwards, in a new terminal:
gradle clean build
gradle test
Alternatively, to run the Index.main
:
java -cp "$(pwd)/build/classes/java/main" -Djava.library.path="$(pwd)/build/libs/usearch/shared" java/cloud/unum/usearch/Index.java
Or step by-step:
cd java/cloud/unum/usearch
javac -h . Index.java NativeUtils.java
# Ensure JAVA_HOME system environment variable has been set
# e.g. export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
# Ubuntu:
g++ -c -fPIC -I${JAVA_HOME}/include -I${JAVA_HOME}/include/linux -I../../../../include cloud_unum_usearch_Index.cpp -o cloud_unum_usearch_Index.o
g++ -shared -fPIC -o libusearch.so cloud_unum_usearch_Index.o -lc
# Windows
g++ -c -I%JAVA_HOME%\include -I%JAVA_HOME%\include\win32 cloud_unum_usearch_Index.cpp -I..\..\..\..\include -o cloud_unum_usearch_Index.o
g++ -shared -o USearchJNI.dll cloud_unum_usearch_Index.o -Wl,--add-stdcall-alias
# MacOS
g++ -std=c++11 -c -fPIC \
-I../../../../include \
-I../../../../fp16/include \
-I../../../../simsimd/include \
-I${JAVA_HOME}/include -I${JAVA_HOME}/include/darwin cloud_unum_usearch_Index.cpp -o cloud_unum_usearch_Index.o
g++ -dynamiclib -o libusearch.dylib cloud_unum_usearch_Index.o -lc
# Run linking to that directory
cd ../../../..
cp cloud/unum/usearch/libusearch.* .
java -cp . -Djava.library.path="$(pwd)" cloud.unum.usearch.Index
C#
Setup the .NET environment:
dotnet nuget add source https://api.nuget.org/v3/index.json -n nuget.org
USearch provides CSharp bindings, that depend on the C library that must be installed beforehand. So one should first compile the C library, link it with CSharp, and only then run tests.
cmake -B build_artifacts -D USEARCH_BUILD_LIB_C=1 -D USEARCH_BUILD_TEST_C=1 -D USEARCH_USE_OPENMP=1 -D USEARCH_USE_SIMSIMD=1
cmake --build build_artifacts --config Release -j
Then, on Windows, copy the library to the CSharp project and run the tests:
mkdir -p ".\csharp\lib\runtimes\win-x64\native"
cp ".\build_artifacts\libusearch_c.dll" ".\csharp\lib\runtimes\win-x64\native"
cd csharp
dotnet test -c Debug --logger "console;verbosity=detailed"
dotnet test -c Release
On Linux, the process is similar:
mkdir -p "csharp/lib/runtimes/linux-x64/native" # for x86
cp "build_artifacts/libusearch_c.so" "csharp/lib/runtimes/linux-x64/native" # for x86
mkdir -p "csharp/lib/runtimes/linux-arm64/native" # for ARM
cp "build_artifacts/libusearch_c.so" "csharp/lib/runtimes/linux-arm64/native" # for ARM
cd csharp
dotnet test -c Debug --logger "console;verbosity=detailed"
dotnet test -c Release
On macOS with Arm-based chips:
mkdir -p "csharp/lib/runtimes/osx-arm64/native"
cp "build_artifacts/libusearch_c.dylib" "csharp/lib/runtimes/osx-arm64/native"
cd csharp
dotnet test -c Debug --logger "console;verbosity=detailed"
dotnet test -c Release
Wolfram#
brew install --cask wolfram-engine
Docker#
docker build -t unum/usearch . && docker run unum/usearch
For multi-architecture builds and publications:
version=$(cat VERSION)
docker buildx create --use &&
docker login &&
docker buildx build \
--platform "linux/amd64,linux/arm64" \
--build-arg version=$version \
--file Dockerfile \
--tag unum/usearch:$version \
--tag unum/usearch:latest \
--push .
WebAssembly#
export WASI_VERSION=21
export WASI_VERSION_FULL=${WASI_VERSION}.0
wget https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-${WASI_VERSION}/wasi-sdk-${WASI_VERSION_FULL}-linux.tar.gz
tar xvf wasi-sdk-${WASI_VERSION_FULL}-linux.tar.gz
After the installation, we can pass WASI SDK to CMake as a new toolchain:
cmake -DCMAKE_TOOLCHAIN_FILE=${WASI_SDK_PATH}/share/cmake/wasi-sdk.cmake .
Working on Sub-Modules#
Extending metrics in SimSIMD:
git push --set-upstream https://github.com/ashvardanian/simsimd.git HEAD:main