pg_tokenizer

pg_tokenizer

pg_tokenizer : Tokenizers for full-text search

Overview

IDExtensionPackageVersionCategoryLicenseLanguage
2160
pg_tokenizer
pg_tokenizer
0.1.1
FTS
Apache-2.0
Rust
AttributeHas BinaryHas LibraryNeed LoadHas DDLRelocatableTrusted
--sLd--
No
Yes
Yes
Yes
no
no
Relationships
Schemastokenizer_catalog
See Also
pg_search
pgroonga
pg_bigm
zhparser
pgroonga_database
pg_bestmatch
vchord_bm25
pg_trgm

PG18 fix by Vonng

Packages

TypeRepoVersionPG Major CompatibilityPackage PatternDependencies
EXT
PIGSTY
0.1.1
18
17
16
15
14
pg_tokenizer-
RPM
PIGSTY
0.1.1
18
17
16
15
14
pg_tokenizer_$v-
DEB
PIGSTY
0.1.1
18
17
16
15
14
postgresql-$v-pg-tokenizer-
Linux / PGPG18PG17PG16PG15PG14
el8.x86_64
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
el8.aarch64
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
el9.x86_64
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
el9.aarch64
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
el10.x86_64
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
el10.aarch64
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
d12.x86_64
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
d12.aarch64
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
d13.x86_64
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
d13.aarch64
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
u22.x86_64
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
u22.aarch64
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
u24.x86_64
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
u24.aarch64
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PIGSTY 0.1.1
PackageVersionOSORGSIZEFile URL
pg_tokenizer_180.1.1el8.x86_64pigsty11.7 MiBpg_tokenizer_18-0.1.1-1PIGSTY.el8.x86_64.rpm
pg_tokenizer_180.1.1el8.aarch64pigsty11.5 MiBpg_tokenizer_18-0.1.1-1PIGSTY.el8.aarch64.rpm
pg_tokenizer_180.1.1el9.x86_64pigsty11.0 MiBpg_tokenizer_18-0.1.1-1PIGSTY.el9.x86_64.rpm
pg_tokenizer_180.1.1el9.aarch64pigsty10.9 MiBpg_tokenizer_18-0.1.1-1PIGSTY.el9.aarch64.rpm
pg_tokenizer_180.1.1el10.x86_64pigsty10.9 MiBpg_tokenizer_18-0.1.1-1PIGSTY.el10.x86_64.rpm
pg_tokenizer_180.1.1el10.aarch64pigsty11.0 MiBpg_tokenizer_18-0.1.1-1PIGSTY.el10.aarch64.rpm
postgresql-18-pg-tokenizer0.1.1d12.x86_64pigsty9.9 MiBpostgresql-18-pg-tokenizer_0.1.1-1PIGSTY~bookworm_amd64.deb
postgresql-18-pg-tokenizer0.1.1d12.aarch64pigsty9.7 MiBpostgresql-18-pg-tokenizer_0.1.1-1PIGSTY~bookworm_arm64.deb
postgresql-18-pg-tokenizer0.1.1d13.x86_64pigsty9.9 MiBpostgresql-18-pg-tokenizer_0.1.1-1PIGSTY~trixie_amd64.deb
postgresql-18-pg-tokenizer0.1.1d13.aarch64pigsty9.7 MiBpostgresql-18-pg-tokenizer_0.1.1-1PIGSTY~trixie_arm64.deb
postgresql-18-pg-tokenizer0.1.1u22.x86_64pigsty10.9 MiBpostgresql-18-pg-tokenizer_0.1.1-1PIGSTY~jammy_amd64.deb
postgresql-18-pg-tokenizer0.1.1u22.aarch64pigsty10.7 MiBpostgresql-18-pg-tokenizer_0.1.1-1PIGSTY~jammy_arm64.deb
postgresql-18-pg-tokenizer0.1.1u24.x86_64pigsty10.8 MiBpostgresql-18-pg-tokenizer_0.1.1-1PIGSTY~noble_amd64.deb
postgresql-18-pg-tokenizer0.1.1u24.aarch64pigsty10.6 MiBpostgresql-18-pg-tokenizer_0.1.1-1PIGSTY~noble_arm64.deb
PackageVersionOSORGSIZEFile URL
pg_tokenizer_170.1.1el8.x86_64pigsty11.7 MiBpg_tokenizer_17-0.1.1-1PIGSTY.el8.x86_64.rpm
pg_tokenizer_170.1.1el8.aarch64pigsty11.5 MiBpg_tokenizer_17-0.1.1-1PIGSTY.el8.aarch64.rpm
pg_tokenizer_170.1.1el9.x86_64pigsty11.0 MiBpg_tokenizer_17-0.1.1-1PIGSTY.el9.x86_64.rpm
pg_tokenizer_170.1.1el9.aarch64pigsty10.9 MiBpg_tokenizer_17-0.1.1-1PIGSTY.el9.aarch64.rpm
pg_tokenizer_170.1.1el10.x86_64pigsty10.9 MiBpg_tokenizer_17-0.1.1-1PIGSTY.el10.x86_64.rpm
pg_tokenizer_170.1.1el10.aarch64pigsty11.0 MiBpg_tokenizer_17-0.1.1-1PIGSTY.el10.aarch64.rpm
postgresql-17-pg-tokenizer0.1.1d12.x86_64pigsty9.9 MiBpostgresql-17-pg-tokenizer_0.1.1-1PIGSTY~bookworm_amd64.deb
postgresql-17-pg-tokenizer0.1.1d12.aarch64pigsty9.7 MiBpostgresql-17-pg-tokenizer_0.1.1-1PIGSTY~bookworm_arm64.deb
postgresql-17-pg-tokenizer0.1.1d13.x86_64pigsty9.9 MiBpostgresql-17-pg-tokenizer_0.1.1-1PIGSTY~trixie_amd64.deb
postgresql-17-pg-tokenizer0.1.1d13.aarch64pigsty9.7 MiBpostgresql-17-pg-tokenizer_0.1.1-1PIGSTY~trixie_arm64.deb
postgresql-17-pg-tokenizer0.1.1u22.x86_64pigsty10.9 MiBpostgresql-17-pg-tokenizer_0.1.1-1PIGSTY~jammy_amd64.deb
postgresql-17-pg-tokenizer0.1.1u22.aarch64pigsty10.7 MiBpostgresql-17-pg-tokenizer_0.1.1-1PIGSTY~jammy_arm64.deb
postgresql-17-pg-tokenizer0.1.1u24.x86_64pigsty10.8 MiBpostgresql-17-pg-tokenizer_0.1.1-1PIGSTY~noble_amd64.deb
postgresql-17-pg-tokenizer0.1.1u24.aarch64pigsty10.7 MiBpostgresql-17-pg-tokenizer_0.1.1-1PIGSTY~noble_arm64.deb
PackageVersionOSORGSIZEFile URL
pg_tokenizer_160.1.1el8.x86_64pigsty11.7 MiBpg_tokenizer_16-0.1.1-1PIGSTY.el8.x86_64.rpm
pg_tokenizer_160.1.1el8.aarch64pigsty11.5 MiBpg_tokenizer_16-0.1.1-1PIGSTY.el8.aarch64.rpm
pg_tokenizer_160.1.1el9.x86_64pigsty11.0 MiBpg_tokenizer_16-0.1.1-1PIGSTY.el9.x86_64.rpm
pg_tokenizer_160.1.1el9.aarch64pigsty10.9 MiBpg_tokenizer_16-0.1.1-1PIGSTY.el9.aarch64.rpm
pg_tokenizer_160.1.1el10.x86_64pigsty10.9 MiBpg_tokenizer_16-0.1.1-1PIGSTY.el10.x86_64.rpm
pg_tokenizer_160.1.1el10.aarch64pigsty11.0 MiBpg_tokenizer_16-0.1.1-1PIGSTY.el10.aarch64.rpm
postgresql-16-pg-tokenizer0.1.1d12.x86_64pigsty9.9 MiBpostgresql-16-pg-tokenizer_0.1.1-1PIGSTY~bookworm_amd64.deb
postgresql-16-pg-tokenizer0.1.1d12.aarch64pigsty9.7 MiBpostgresql-16-pg-tokenizer_0.1.1-1PIGSTY~bookworm_arm64.deb
postgresql-16-pg-tokenizer0.1.1d13.x86_64pigsty9.9 MiBpostgresql-16-pg-tokenizer_0.1.1-1PIGSTY~trixie_amd64.deb
postgresql-16-pg-tokenizer0.1.1d13.aarch64pigsty9.7 MiBpostgresql-16-pg-tokenizer_0.1.1-1PIGSTY~trixie_arm64.deb
postgresql-16-pg-tokenizer0.1.1u22.x86_64pigsty10.9 MiBpostgresql-16-pg-tokenizer_0.1.1-1PIGSTY~jammy_amd64.deb
postgresql-16-pg-tokenizer0.1.1u22.aarch64pigsty10.7 MiBpostgresql-16-pg-tokenizer_0.1.1-1PIGSTY~jammy_arm64.deb
postgresql-16-pg-tokenizer0.1.1u24.x86_64pigsty10.8 MiBpostgresql-16-pg-tokenizer_0.1.1-1PIGSTY~noble_amd64.deb
postgresql-16-pg-tokenizer0.1.1u24.aarch64pigsty10.7 MiBpostgresql-16-pg-tokenizer_0.1.1-1PIGSTY~noble_arm64.deb
PackageVersionOSORGSIZEFile URL
pg_tokenizer_150.1.1el8.x86_64pigsty11.7 MiBpg_tokenizer_15-0.1.1-1PIGSTY.el8.x86_64.rpm
pg_tokenizer_150.1.1el8.aarch64pigsty11.5 MiBpg_tokenizer_15-0.1.1-1PIGSTY.el8.aarch64.rpm
pg_tokenizer_150.1.1el9.x86_64pigsty11.0 MiBpg_tokenizer_15-0.1.1-1PIGSTY.el9.x86_64.rpm
pg_tokenizer_150.1.1el9.aarch64pigsty10.9 MiBpg_tokenizer_15-0.1.1-1PIGSTY.el9.aarch64.rpm
pg_tokenizer_150.1.1el10.x86_64pigsty10.9 MiBpg_tokenizer_15-0.1.1-1PIGSTY.el10.x86_64.rpm
pg_tokenizer_150.1.1el10.aarch64pigsty11.0 MiBpg_tokenizer_15-0.1.1-1PIGSTY.el10.aarch64.rpm
postgresql-15-pg-tokenizer0.1.1d12.x86_64pigsty9.9 MiBpostgresql-15-pg-tokenizer_0.1.1-1PIGSTY~bookworm_amd64.deb
postgresql-15-pg-tokenizer0.1.1d12.aarch64pigsty9.7 MiBpostgresql-15-pg-tokenizer_0.1.1-1PIGSTY~bookworm_arm64.deb
postgresql-15-pg-tokenizer0.1.1d13.x86_64pigsty9.9 MiBpostgresql-15-pg-tokenizer_0.1.1-1PIGSTY~trixie_amd64.deb
postgresql-15-pg-tokenizer0.1.1d13.aarch64pigsty9.7 MiBpostgresql-15-pg-tokenizer_0.1.1-1PIGSTY~trixie_arm64.deb
postgresql-15-pg-tokenizer0.1.1u22.x86_64pigsty10.9 MiBpostgresql-15-pg-tokenizer_0.1.1-1PIGSTY~jammy_amd64.deb
postgresql-15-pg-tokenizer0.1.1u22.aarch64pigsty10.7 MiBpostgresql-15-pg-tokenizer_0.1.1-1PIGSTY~jammy_arm64.deb
postgresql-15-pg-tokenizer0.1.1u24.x86_64pigsty10.8 MiBpostgresql-15-pg-tokenizer_0.1.1-1PIGSTY~noble_amd64.deb
postgresql-15-pg-tokenizer0.1.1u24.aarch64pigsty10.7 MiBpostgresql-15-pg-tokenizer_0.1.1-1PIGSTY~noble_arm64.deb
PackageVersionOSORGSIZEFile URL
pg_tokenizer_140.1.1el8.x86_64pigsty11.7 MiBpg_tokenizer_14-0.1.1-1PIGSTY.el8.x86_64.rpm
pg_tokenizer_140.1.1el8.aarch64pigsty11.5 MiBpg_tokenizer_14-0.1.1-1PIGSTY.el8.aarch64.rpm
pg_tokenizer_140.1.1el9.x86_64pigsty11.0 MiBpg_tokenizer_14-0.1.1-1PIGSTY.el9.x86_64.rpm
pg_tokenizer_140.1.1el9.aarch64pigsty10.9 MiBpg_tokenizer_14-0.1.1-1PIGSTY.el9.aarch64.rpm
pg_tokenizer_140.1.1el10.x86_64pigsty10.9 MiBpg_tokenizer_14-0.1.1-1PIGSTY.el10.x86_64.rpm
pg_tokenizer_140.1.1el10.aarch64pigsty11.0 MiBpg_tokenizer_14-0.1.1-1PIGSTY.el10.aarch64.rpm
postgresql-14-pg-tokenizer0.1.1d12.x86_64pigsty9.9 MiBpostgresql-14-pg-tokenizer_0.1.1-1PIGSTY~bookworm_amd64.deb
postgresql-14-pg-tokenizer0.1.1d12.aarch64pigsty9.7 MiBpostgresql-14-pg-tokenizer_0.1.1-1PIGSTY~bookworm_arm64.deb
postgresql-14-pg-tokenizer0.1.1d13.x86_64pigsty9.9 MiBpostgresql-14-pg-tokenizer_0.1.1-1PIGSTY~trixie_amd64.deb
postgresql-14-pg-tokenizer0.1.1d13.aarch64pigsty9.7 MiBpostgresql-14-pg-tokenizer_0.1.1-1PIGSTY~trixie_arm64.deb
postgresql-14-pg-tokenizer0.1.1u22.x86_64pigsty10.9 MiBpostgresql-14-pg-tokenizer_0.1.1-1PIGSTY~jammy_amd64.deb
postgresql-14-pg-tokenizer0.1.1u22.aarch64pigsty10.7 MiBpostgresql-14-pg-tokenizer_0.1.1-1PIGSTY~jammy_arm64.deb
postgresql-14-pg-tokenizer0.1.1u24.x86_64pigsty10.8 MiBpostgresql-14-pg-tokenizer_0.1.1-1PIGSTY~noble_amd64.deb
postgresql-14-pg-tokenizer0.1.1u24.aarch64pigsty10.7 MiBpostgresql-14-pg-tokenizer_0.1.1-1PIGSTY~noble_arm64.deb

Source

pig build pkg pg_tokenizer;		# build rpm/deb

Install

Make sure PGDG and PIGSTY repo available:

pig repo add pgsql -u   # add both repo and update cache

Install this extension with pig:

pig install pg_tokenizer;		# install via package name, for the active PG version

pig install pg_tokenizer -v 18;   # install for PG 18
pig install pg_tokenizer -v 17;   # install for PG 17
pig install pg_tokenizer -v 16;   # install for PG 16
pig install pg_tokenizer -v 15;   # install for PG 15
pig install pg_tokenizer -v 14;   # install for PG 14

Config this extension to shared_preload_libraries:

shared_preload_libraries = 'pg_tokenizer';

Create this extension with:

CREATE EXTENSION pg_tokenizer;

Usage

GitHub: tensorchord/pg_tokenizer.rs

pg_tokenizer is a PostgreSQL extension that provides tokenizers for full-text search. It is designed to work with VectorChord-bm25 for native BM25 ranking index support.

Quick Start

CREATE EXTENSION pg_tokenizer;

-- Create a tokenizer using the LLMLingua2 model
SELECT create_tokenizer('tokenizer1', $$
model = "llmlingua2"
$$);

-- Tokenize text
SELECT tokenize('PostgreSQL is a powerful, open-source object-relational database system. It has over 15 years of active development.', 'tokenizer1');

Tokenizer Models

pg_tokenizer supports multiple tokenizer models for different languages and use cases:

ModelLanguageDescription
llmlingua2EnglishBERT-based tokenizer from LLMLingua2
jiebaChineseJieba Chinese text segmentation
lindera/ipadicJapaneseLindera tokenizer with IPADIC dictionary
Custom modelsAnyUser-trained models for domain-specific text

Creating Tokenizers

-- English tokenizer
SELECT create_tokenizer('en_tokenizer', $$
model = "llmlingua2"
$$);

-- Chinese tokenizer
SELECT create_tokenizer('zh_tokenizer', $$
model = "jieba"
$$);

-- Japanese tokenizer
SELECT create_tokenizer('ja_tokenizer', $$
model = "lindera/ipadic"
$$);

Tokenizing Text

-- Tokenize English text
SELECT tokenize('full text search in PostgreSQL', 'en_tokenizer');

-- Tokenize Chinese text
SELECT tokenize('PostgreSQL是一个强大的数据库系统', 'zh_tokenizer');

Text Analyzer

pg_tokenizer also provides text analyzer functionality that combines tokenization with additional text processing steps. For detailed text analyzer usage, refer to the Text Analyzer documentation.

Integration with VectorChord-BM25

pg_tokenizer is typically used together with VectorChord-BM25 for full BM25 ranking support:

CREATE EXTENSION IF NOT EXISTS pg_tokenizer CASCADE;
CREATE EXTENSION IF NOT EXISTS vchord_bm25 CASCADE;

-- Create a tokenizer
SELECT create_tokenizer('my_tokenizer', $$
model = "llmlingua2"
$$);

-- Tokenize text into bm25vectors for indexing and search
SELECT tokenize('your search query', 'my_tokenizer');

Documentation

For more details, see the full documentation:

Last updated on