Crate uniq_ch

Source
Expand description

A Rust library for counting distinct elements in a stream, using ClickHouse uniq data structure.

This uses BJKST, a probabilistic algorithm that relies on adaptive sampling and provides fast, accurate and deterministic results. Two BJKSTs can be merged, making the data structure well suited for map-reduce settings.

Repository

§Examples

use uniq_ch::Bjkst;

let mut bjkst = Bjkst::new();

// Add some elements, with duplicates.
bjkst.extend(0..75_000);
bjkst.extend(25_000..100_000);

// Count the distinct elements.
assert!((99_000..101_000).contains(&bjkst.len()));

Re-exports§

Modules§

Structs§

  • A BJKST data structure to estimate the number of distinct elements in a data stream.