Skip to content
datasketches

datasketches

datasketches : Approximate analytics sketches and aggregates for PostgreSQL

Overview

IDExtensionPackageVersionCategoryLicenseLanguage
4690
datasketches
datasketches
1.7.0
FUNC
Apache-2.0
C++
AttributeHas BinaryHas LibraryNeed LoadHas DDLRelocatableTrusted
--s-d-r
No
Yes
No
Yes
yes
no

Built against Apache DataSketches C++ core 5.0.0.

Packages

TypeRepoVersionPG Major CompatibilityPackage PatternDependencies
EXT
PIGSTY
1.7.0
18
17
16
15
14
datasketches-
RPM
PIGSTY
1.7.0
18
17
16
15
14
datasketches_$v-
DEB
PIGSTY
1.7.0
18
17
16
15
14
postgresql-$v-datasketches-
Linux / PGPG18PG17PG16PG15PG14
el8.x86_64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
el8.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
el9.x86_64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
el9.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
el10.x86_64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
el10.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
d12.x86_64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
d12.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
d13.x86_64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
d13.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
u22.x86_64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
u22.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
u24.x86_64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
u24.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
u26.x86_64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
u26.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PackageVersionOSORGSIZEFile URL
datasketches_181.7.0el8.x86_64pigsty324.4 KiBdatasketches_18-1.7.0-1PIGSTY.el8.x86_64.rpm
datasketches_181.7.0el8.aarch64pigsty314.1 KiBdatasketches_18-1.7.0-1PIGSTY.el8.aarch64.rpm
datasketches_181.7.0el9.x86_64pigsty309.4 KiBdatasketches_18-1.7.0-1PIGSTY.el9.x86_64.rpm
datasketches_181.7.0el9.aarch64pigsty315.1 KiBdatasketches_18-1.7.0-1PIGSTY.el9.aarch64.rpm
datasketches_181.7.0el10.x86_64pigsty319.1 KiBdatasketches_18-1.7.0-1PIGSTY.el10.x86_64.rpm
datasketches_181.7.0el10.aarch64pigsty319.4 KiBdatasketches_18-1.7.0-1PIGSTY.el10.aarch64.rpm
postgresql-18-datasketches1.7.0d12.x86_64pigsty918.1 KiBpostgresql-18-datasketches_1.7.0-1PIGSTY~bookworm_amd64.deb
postgresql-18-datasketches1.7.0d12.aarch64pigsty920.0 KiBpostgresql-18-datasketches_1.7.0-1PIGSTY~bookworm_arm64.deb
postgresql-18-datasketches1.7.0d13.x86_64pigsty943.3 KiBpostgresql-18-datasketches_1.7.0-1PIGSTY~trixie_amd64.deb
postgresql-18-datasketches1.7.0d13.aarch64pigsty944.0 KiBpostgresql-18-datasketches_1.7.0-1PIGSTY~trixie_arm64.deb
postgresql-18-datasketches1.7.0u22.x86_64pigsty1017.0 KiBpostgresql-18-datasketches_1.7.0-1PIGSTY~jammy_amd64.deb
postgresql-18-datasketches1.7.0u22.aarch64pigsty1020.8 KiBpostgresql-18-datasketches_1.7.0-1PIGSTY~jammy_arm64.deb
postgresql-18-datasketches1.7.0u24.x86_64pigsty977.8 KiBpostgresql-18-datasketches_1.7.0-1PIGSTY~noble_amd64.deb
postgresql-18-datasketches1.7.0u24.aarch64pigsty991.3 KiBpostgresql-18-datasketches_1.7.0-1PIGSTY~noble_arm64.deb
postgresql-18-datasketches1.7.0u26.x86_64pigsty945.1 KiBpostgresql-18-datasketches_1.7.0-1PIGSTY~resolute_amd64.deb
postgresql-18-datasketches1.7.0u26.aarch64pigsty951.7 KiBpostgresql-18-datasketches_1.7.0-1PIGSTY~resolute_arm64.deb

Source

pig build pkg datasketches;		# build rpm/deb

Install

Make sure PGDG and PIGSTY repo available:

pig repo add pgsql -u   # add both repo and update cache

Install this extension with pig:

pig install datasketches;		# install via package name, for the active PG version

pig install datasketches -v 18;   # install for PG 18
pig install datasketches -v 17;   # install for PG 17
pig install datasketches -v 16;   # install for PG 16
pig install datasketches -v 15;   # install for PG 15
pig install datasketches -v 14;   # install for PG 14

Create this extension with:

CREATE EXTENSION datasketches;

Usage

Sources: README, latest release 1.7.0, Apache DataSketches

datasketches adds approximate analytics sketch types and aggregates to PostgreSQL. The upstream README lists CPC, HLL, Theta, Array Of Doubles, KLL, Quantiles, and Frequent Strings sketches; the 1.7.0 release is the latest published GitHub release, while the default branch has already moved on to 1.8.0-SNAPSHOT.

CREATE EXTENSION datasketches;

Core Sketch Families

  • cpc_sketch and hll_sketch for approximate distinct counting.
  • theta_sketch for distinct counting plus set operations such as union, intersection, and A-not-B.
  • aod_sketch for tuple-style metrics keyed by identifiers with arrays of doubles.
  • kll_*_sketch and quantiles_*_sketch for quantiles, ranks, PMF, and CDF.
  • frequent_strings_sketch for heavy-hitter detection.

Common Patterns

Build a sketch from raw values:

SELECT cpc_sketch_build(1);
SELECT kll_float_sketch_build(value) FROM normal;

Use one-shot approximate aggregates:

SELECT cpc_sketch_distinct(id) FROM random_ints_100m;

Merge sketches across groups or cube dimensions:

SELECT cpc_sketch_get_estimate(cpc_sketch_union(sketch)) FROM cpc_sketch_test;
SELECT hll_sketch_get_estimate(hll_sketch_union(sketch)) FROM hll_sketch_test;
SELECT kll_float_sketch_get_quantile(kll_float_sketch_merge(sketch), 0.5)
FROM kll_float_sketch_test;

Run set operations on Theta sketches:

SELECT theta_sketch_get_estimate(theta_sketch_intersection(sketch1, sketch2))
FROM theta_set_op_test;

Find frequent items above a threshold:

SELECT frequent_strings_sketch_result_no_false_negatives(
  frequent_strings_sketch_build(9, value),
  1000000
)
FROM zipf_1p1_8k_100m;

Caveats

  • Upstream documents PostgreSQL 9.6+ plus Boost 1.75.0 and DataSketches C++ core 5.0.0 or later as build dependencies.
  • These are approximate structures meant to be mergeable across dimensions; they are not exact replacements for COUNT(DISTINCT ...) or exact histograms.
Last updated on