tdigest-ch¶
A Python library for estimating quantiles in a stream, using ClickHouse t-digest data structure.
The t-digest data structure is designed around computing accurate quantile estimates from streaming data. Two t-digests can be merged, making the data structure well suited for map-reduce settings.
API reference¶
- class tdigest_ch.TDigest(elems: Iterable[float] | TDigest | None = None)¶
T-digest data structure for approximating the quantiles of a distribution.
- Examples:
>>> digest = TDigest(); >>> # Add some elements. >>> digest.add(1.0); >>> digest.add(2.0); >>> digest.add(3.0); >>> # Get the median of the distribution. >>> digest.quantile(0.5); 2.0
- __ior__(other: TDigest) TDigest ¶
Update the t-digest, adding elements from the other.
- Examples:
>>> digest_1 = TDigest([1.0, 2.0, 3.0]) >>> digest_2 = TDigest([4.0, 5.0]) >>> digest_1 |= digest_2 >>> len(digest_1) 5
- __len__() int ¶
Return the number of elements in the t-digest.
- Examples:
>>> digest = TDigest([1.0, 2.0, 3.0]) >>> len(digest) 3 >>> digest.add(3.0, count=2) >>> len(digest) 5
- __or__(other: TDigest) TDigest ¶
Return a new t-digest with elements from the t-digest and the other.
- Examples:
>>> digest_1 = TDigest([1.0, 2.0, 3.0]) >>> digest_2 = TDigest([4.0, 5.0]) >>> digest = digest_1 | digest_2 >>> len(digest) 5 >>> digest.quantile(0.5) 3.0
- add(value: float, count: int = 1) None ¶
Add a value to the t-digest.
- Examples:
>>> digest = TDigest() >>> digest.add(1.0) >>> digest.add(2.0) >>> len(digest) 2
- clear() None ¶
Clear the t-digest, removing all values.
- Examples:
>>> digest = TDigest() >>> digest.add(1.0) >>> digest.clear() >>> len(digest) 0
- quantile(level: float) float ¶
Return the estimated quantile of the t-digest.
- Examples:
>>> digest = TDigest([1.0, 2.0, 3.0, 4.0, 5.0]) >>> digest.quantile(0.5) 3.0