datasketches
datasketches
datasketches : Approximate analytics sketches and aggregates for PostgreSQL
Overview
| ID | Extension | Package | Version | Category | License | Language |
|---|---|---|---|---|---|---|
| 4690 | datasketches | datasketches | 1.7.0 | FUNC | Apache-2.0 | C++ |
| Attribute | Has Binary | Has Library | Need Load | Has DDL | Relocatable | Trusted |
|---|---|---|---|---|---|---|
--s-d-r | No | Yes | No | Yes | yes | no |
Built against Apache DataSketches C++ core 5.0.0.
Packages
| Type | Repo | Version | PG Major Compatibility | Package Pattern | Dependencies |
|---|---|---|---|---|---|
| EXT | PIGSTY | 1.7.0 | 18 17 16 15 14 | datasketches | - |
| RPM | PIGSTY | 1.7.0 | 18 17 16 15 14 | datasketches_$v | - |
| DEB | PIGSTY | 1.7.0 | 18 17 16 15 14 | postgresql-$v-datasketches | - |
| Linux / PG | PG18 | PG17 | PG16 | PG15 | PG14 |
|---|---|---|---|---|---|
el8.x86_64 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 |
el8.aarch64 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 |
el9.x86_64 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 |
el9.aarch64 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 |
el10.x86_64 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 |
el10.aarch64 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 |
d12.x86_64 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 |
d12.aarch64 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 |
d13.x86_64 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 |
d13.aarch64 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 |
u22.x86_64 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 |
u22.aarch64 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 |
u24.x86_64 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 |
u24.aarch64 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 | PIGSTY 1.7.0 |
Source
pig build pkg datasketches; # build rpm/debInstall
Make sure PGDG and PIGSTY repo available:
pig repo add pgsql -u # add both repo and update cacheInstall this extension with pig:
pig install datasketches; # install via package name, for the active PG version
pig install datasketches -v 18; # install for PG 18
pig install datasketches -v 17; # install for PG 17
pig install datasketches -v 16; # install for PG 16
pig install datasketches -v 15; # install for PG 15
pig install datasketches -v 14; # install for PG 14Create this extension with:
CREATE EXTENSION datasketches;Usage
Sources: README, Apache DataSketches site PostgreSQL extension for approximate analytics sketches and aggregates.
CREATE EXTENSION datasketches;The extension supports CPC, HLL, Theta, Array Of Doubles, KLL, Quantiles, and Frequent Strings sketches.
Sketch Families
- CPC for compact distinct counting.
- HLL for HyperLogLog-style distinct counting.
- Theta for distinct counting with set operations such as union, intersection, and A-not-B.
- Array Of Doubles for tuple sketches with arrays of double values per key.
- KLL for quantiles, ranks, PMF, and CDF estimation.
- Quantiles sketch for long-term support of distribution estimates.
- Frequent strings for tracking the heaviest items by count or weight.
Examples
SELECT cpc_sketch_to_string(cpc_sketch_build(1));
SELECT cpc_sketch_distinct(id) FROM random_ints_100m;
SELECT cpc_sketch_get_estimate(cpc_sketch_union(sketch)) FROM cpc_sketch_test;
SELECT theta_sketch_get_estimate(theta_sketch_union(sketch)) FROM theta_sketch_test;
SELECT theta_sketch_get_estimate(theta_sketch_intersection(sketch1, sketch2)) FROM theta_set_op_test;
SELECT hll_sketch_get_estimate(hll_sketch_union(sketch)) FROM hll_sketch_test;
SELECT hll_sketch_get_estimate(hll_sketch_union(hll_sketch_build(1), hll_sketch_build(2)));
SELECT kll_float_sketch_get_quantile(kll_float_sketch_merge(sketch), 0.5) FROM kll_float_sketch_test;
SELECT frequent_strings_sketch_result_no_false_negatives(frequent_strings_sketch_build(9, value), 1000000) FROM zipf_1p1_8k_100m;Core Operations
- Build sketches with
*_sketch_build(...). - Merge or aggregate them with
*_sketch_union(...),*_sketch_merge(...), and sketch-specific set-operation helpers. - Read estimates with
*_sketch_get_estimate(...)and distribution helpers such askll_float_sketch_get_quantile(...).
Notes
- The README says the extension targets PostgreSQL 9.6 and higher and depends on Boost 1.75 and DataSketches C++ core 5.0.0 or later.
- The upstream examples emphasize additive analytics in data cubes, not exact replacement for normal aggregates.
Last updated on