descriptionRadix-ish tries
ownerfanf2
last changeWed, 24 May 2017 22:47:15 +0000 (23:47 +0100)
readme

qp tries and crit-bit tries

I have been working on radix trees / patricia tries / crit-bit tries with a larger fan-out per branch to reduce lookup costs without wasting memory.

My best solution so far is the "qp trie", short for quelques-bits popcount patricia trie. (Nothing to do with cutie cupid dolls or Japanese mayonnaise!) A qp trie is like a crit-bit trie (aka patricia trie) except each branch is indexed by a few bits at a time instead of one bit. The array of sub-tries at a branch node is compressed using the popcount trick to omit unused branches.

The original version of qp tries used 4 bits at a time, so it was a quadbit popcount patricia trie; I also have a 5 bit version, a quintuple-bit popcount patricia trie, which generally outperforms the 4 bit version; and a 6 bit version which doesn't quite have a name.

Based on a few benchmarks, qp tries have about 1/3 less memory overhead of crit-bit tries, 1.3 words vs 2 words of overhead per item; the average depth of a qp trie is about half that of a crit-bit trie; and the overall speed of qp tries is about 30% faster than crit-bit tries. The qp trie implementation is about 40% bigger.

usage

Type make test or make bench. (You will need to use GNU make.) If you have a recent Intel CPU you might want to add -mpopcnt to the CFLAGS to get SSE4.2 POPCNT instructions. Other build options:

The makefile builds {test,bench}-{qs,qn} with these options; they are otherwise the same as test-qp and bench-qp.

caveats

The code has only been tested on 64-bit little endian machines. It might work on 32-bit machines (provided the compiler supports 64 bit integers) and probably won't work on a big-endian machine. It should be easy to port by tweaking the struct bit-field layouts.

Key strings can be byte-aligned but values must be word-aligned; you can swap this restriction (e.g. if you want to map from strings to integers) by tweaking the struct layout and adjusting the check in Tset().

Keys are '\0' terminated C strings, which guarantees one key is not a prefix of another, so leaves and branches cannot occur at the same point. It should be possible to support arbitrary binary keys by being more clever about handling string termination.

articles

thanks

Marek Vavrusa (CZ.NIC) and Devon O'Dell (Fastly) enthusiastically put this code to work and provided encouraging feedback.

Vladimír Čunát incorporated qp tries into CZ.NIC Knot DNS, at the suggestion of Jan Včelák.

Simon Tatham proved that parent pointers are not needed for embedded crit-bit tries.

download

You can clone or browse the repository from:

roadmap

notes


Written by Tony Finch dot@dotat.at https://dotat.at/; You may do anything with this. It has no warranty. https://creativecommons.org/publicdomain/zero/1.0/

shortlog
4 hours ago Tony Finchlooks like this debug sanity check really was bullshit master
4 hours ago Tony FinchPropagate index word changes better
4 hours ago Tony FinchMake debug mode run-time configurable
12 hours ago Tony Finchrc should (in theory) be the same as fp. let's test...
12 hours ago Tony Finchrc-debug.c copied and refactored from fp-debug.c
12 hours ago Tony Finchrc: add Tnextl() refactored from fp.c
12 hours ago Tony Finchrc: add Tsetl() refactored from fp.c
13 hours ago Tony FinchDecouple index word access from Trie / twig access.
14 hours ago Tony FinchSearching for an exact prefix match
16 hours ago Tony FinchTweak bitstring title
16 hours ago Tony FinchSome notes on bitstring keys and prefix matching
16 hours ago Tony FinchMy old thoughts on embedding keys have been superseded
16 hours ago Tony FinchAn opportunity for searches to fail earlier in double...
36 hours ago Tony Finchrc: reorder for slightly better exposition
36 hours ago Tony Finchrc: add Tdelkv() refactored from fp.c
36 hours ago Tony Finchrc: add Tgetkv() refactored from fp.c
...
heads
4 hours ago master