vchord_bm25

vchord_bm25

vchord_bm25 : A postgresql extension for bm25 ranking algorithm

Overview

IDExtensionPackageVersionCategoryLicenseLanguage
2150
vchord_bm25
vchord_bm25
0.3.0
FTS
AGPL-3.0
Rust
AttributeHas BinaryHas LibraryNeed LoadHas DDLRelocatableTrusted
--sLd--
No
Yes
Yes
Yes
no
no
Relationships
Schemasbm25_catalog
See Also
vector
vchord
pg_search
pg_bestmatch
vectorscale
zhparser
pg_tokenizer
pgroonga

Packages

TypeRepoVersionPG Major CompatibilityPackage PatternDependencies
EXT
PIGSTY
0.3.0
18
17
16
15
14
vchord_bm25-
RPM
PIGSTY
0.3.0
18
17
16
15
14
vchord_bm25_$v-
DEB
PIGSTY
0.3.0
18
17
16
15
14
postgresql-$v-vchord-bm25-
Linux / PGPG18PG17PG16PG15PG14
el8.x86_64
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
el8.aarch64
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
el9.x86_64
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
el9.aarch64
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
el10.x86_64
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
el10.aarch64
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
d12.x86_64
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
d12.aarch64
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
d13.x86_64
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
d13.aarch64
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
u22.x86_64
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
u22.aarch64
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
u24.x86_64
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
u24.aarch64
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PIGSTY 0.3.0
PackageVersionOSORGSIZEFile URL
vchord_bm25_180.3.0el8.x86_64pigsty519.9 KiBvchord_bm25_18-0.3.0-1PIGSTY.el8.x86_64.rpm
vchord_bm25_180.3.0el8.aarch64pigsty403.2 KiBvchord_bm25_18-0.3.0-1PIGSTY.el8.aarch64.rpm
vchord_bm25_180.3.0el9.x86_64pigsty536.2 KiBvchord_bm25_18-0.3.0-1PIGSTY.el9.x86_64.rpm
vchord_bm25_180.3.0el9.aarch64pigsty433.6 KiBvchord_bm25_18-0.3.0-1PIGSTY.el9.aarch64.rpm
vchord_bm25_180.3.0el10.x86_64pigsty536.7 KiBvchord_bm25_18-0.3.0-1PIGSTY.el10.x86_64.rpm
vchord_bm25_180.3.0el10.aarch64pigsty433.2 KiBvchord_bm25_18-0.3.0-1PIGSTY.el10.aarch64.rpm
postgresql-18-vchord-bm250.3.0d12.x86_64pigsty425.2 KiBpostgresql-18-vchord-bm25_0.3.0-2PIGSTY~bookworm_amd64.deb
postgresql-18-vchord-bm250.3.0d12.aarch64pigsty318.2 KiBpostgresql-18-vchord-bm25_0.3.0-2PIGSTY~bookworm_arm64.deb
postgresql-18-vchord-bm250.3.0d13.x86_64pigsty425.4 KiBpostgresql-18-vchord-bm25_0.3.0-2PIGSTY~trixie_amd64.deb
postgresql-18-vchord-bm250.3.0d13.aarch64pigsty318.0 KiBpostgresql-18-vchord-bm25_0.3.0-2PIGSTY~trixie_arm64.deb
postgresql-18-vchord-bm250.3.0u22.x86_64pigsty478.3 KiBpostgresql-18-vchord-bm25_0.3.0-2PIGSTY~jammy_amd64.deb
postgresql-18-vchord-bm250.3.0u22.aarch64pigsty376.2 KiBpostgresql-18-vchord-bm25_0.3.0-2PIGSTY~jammy_arm64.deb
postgresql-18-vchord-bm250.3.0u24.x86_64pigsty474.6 KiBpostgresql-18-vchord-bm25_0.3.0-2PIGSTY~noble_amd64.deb
postgresql-18-vchord-bm250.3.0u24.aarch64pigsty371.5 KiBpostgresql-18-vchord-bm25_0.3.0-2PIGSTY~noble_arm64.deb
PackageVersionOSORGSIZEFile URL
vchord_bm25_170.3.0el8.x86_64pigsty520.2 KiBvchord_bm25_17-0.3.0-1PIGSTY.el8.x86_64.rpm
vchord_bm25_170.3.0el8.aarch64pigsty403.2 KiBvchord_bm25_17-0.3.0-1PIGSTY.el8.aarch64.rpm
vchord_bm25_170.3.0el9.x86_64pigsty536.7 KiBvchord_bm25_17-0.3.0-1PIGSTY.el9.x86_64.rpm
vchord_bm25_170.3.0el9.aarch64pigsty433.7 KiBvchord_bm25_17-0.3.0-1PIGSTY.el9.aarch64.rpm
vchord_bm25_170.3.0el10.x86_64pigsty536.8 KiBvchord_bm25_17-0.3.0-1PIGSTY.el10.x86_64.rpm
vchord_bm25_170.3.0el10.aarch64pigsty433.5 KiBvchord_bm25_17-0.3.0-1PIGSTY.el10.aarch64.rpm
postgresql-17-vchord-bm250.3.0d12.x86_64pigsty425.1 KiBpostgresql-17-vchord-bm25_0.3.0-2PIGSTY~bookworm_amd64.deb
postgresql-17-vchord-bm250.3.0d12.aarch64pigsty317.9 KiBpostgresql-17-vchord-bm25_0.3.0-2PIGSTY~bookworm_arm64.deb
postgresql-17-vchord-bm250.3.0d13.x86_64pigsty424.9 KiBpostgresql-17-vchord-bm25_0.3.0-2PIGSTY~trixie_amd64.deb
postgresql-17-vchord-bm250.3.0d13.aarch64pigsty317.7 KiBpostgresql-17-vchord-bm25_0.3.0-2PIGSTY~trixie_arm64.deb
postgresql-17-vchord-bm250.3.0u22.x86_64pigsty478.8 KiBpostgresql-17-vchord-bm25_0.3.0-2PIGSTY~jammy_amd64.deb
postgresql-17-vchord-bm250.3.0u22.aarch64pigsty376.2 KiBpostgresql-17-vchord-bm25_0.3.0-2PIGSTY~jammy_arm64.deb
postgresql-17-vchord-bm250.3.0u24.x86_64pigsty474.9 KiBpostgresql-17-vchord-bm25_0.3.0-2PIGSTY~noble_amd64.deb
postgresql-17-vchord-bm250.3.0u24.aarch64pigsty371.5 KiBpostgresql-17-vchord-bm25_0.3.0-2PIGSTY~noble_arm64.deb
PackageVersionOSORGSIZEFile URL
vchord_bm25_160.3.0el8.x86_64pigsty519.9 KiBvchord_bm25_16-0.3.0-1PIGSTY.el8.x86_64.rpm
vchord_bm25_160.3.0el8.aarch64pigsty403.3 KiBvchord_bm25_16-0.3.0-1PIGSTY.el8.aarch64.rpm
vchord_bm25_160.3.0el9.x86_64pigsty536.3 KiBvchord_bm25_16-0.3.0-1PIGSTY.el9.x86_64.rpm
vchord_bm25_160.3.0el9.aarch64pigsty433.3 KiBvchord_bm25_16-0.3.0-1PIGSTY.el9.aarch64.rpm
vchord_bm25_160.3.0el10.x86_64pigsty536.4 KiBvchord_bm25_16-0.3.0-1PIGSTY.el10.x86_64.rpm
vchord_bm25_160.3.0el10.aarch64pigsty433.3 KiBvchord_bm25_16-0.3.0-1PIGSTY.el10.aarch64.rpm
postgresql-16-vchord-bm250.3.0d12.x86_64pigsty425.0 KiBpostgresql-16-vchord-bm25_0.3.0-2PIGSTY~bookworm_amd64.deb
postgresql-16-vchord-bm250.3.0d12.aarch64pigsty318.0 KiBpostgresql-16-vchord-bm25_0.3.0-2PIGSTY~bookworm_arm64.deb
postgresql-16-vchord-bm250.3.0d13.x86_64pigsty425.1 KiBpostgresql-16-vchord-bm25_0.3.0-2PIGSTY~trixie_amd64.deb
postgresql-16-vchord-bm250.3.0d13.aarch64pigsty317.8 KiBpostgresql-16-vchord-bm25_0.3.0-2PIGSTY~trixie_arm64.deb
postgresql-16-vchord-bm250.3.0u22.x86_64pigsty478.5 KiBpostgresql-16-vchord-bm25_0.3.0-2PIGSTY~jammy_amd64.deb
postgresql-16-vchord-bm250.3.0u22.aarch64pigsty376.1 KiBpostgresql-16-vchord-bm25_0.3.0-2PIGSTY~jammy_arm64.deb
postgresql-16-vchord-bm250.3.0u24.x86_64pigsty474.5 KiBpostgresql-16-vchord-bm25_0.3.0-2PIGSTY~noble_amd64.deb
postgresql-16-vchord-bm250.3.0u24.aarch64pigsty371.5 KiBpostgresql-16-vchord-bm25_0.3.0-2PIGSTY~noble_arm64.deb
PackageVersionOSORGSIZEFile URL
vchord_bm25_150.3.0el8.x86_64pigsty522.1 KiBvchord_bm25_15-0.3.0-1PIGSTY.el8.x86_64.rpm
vchord_bm25_150.3.0el8.aarch64pigsty404.9 KiBvchord_bm25_15-0.3.0-1PIGSTY.el8.aarch64.rpm
vchord_bm25_150.3.0el9.x86_64pigsty538.6 KiBvchord_bm25_15-0.3.0-1PIGSTY.el9.x86_64.rpm
vchord_bm25_150.3.0el9.aarch64pigsty435.4 KiBvchord_bm25_15-0.3.0-1PIGSTY.el9.aarch64.rpm
vchord_bm25_150.3.0el10.x86_64pigsty538.2 KiBvchord_bm25_15-0.3.0-1PIGSTY.el10.x86_64.rpm
vchord_bm25_150.3.0el10.aarch64pigsty434.8 KiBvchord_bm25_15-0.3.0-1PIGSTY.el10.aarch64.rpm
postgresql-15-vchord-bm250.3.0d12.x86_64pigsty427.0 KiBpostgresql-15-vchord-bm25_0.3.0-2PIGSTY~bookworm_amd64.deb
postgresql-15-vchord-bm250.3.0d12.aarch64pigsty319.7 KiBpostgresql-15-vchord-bm25_0.3.0-2PIGSTY~bookworm_arm64.deb
postgresql-15-vchord-bm250.3.0d13.x86_64pigsty426.8 KiBpostgresql-15-vchord-bm25_0.3.0-2PIGSTY~trixie_amd64.deb
postgresql-15-vchord-bm250.3.0d13.aarch64pigsty319.6 KiBpostgresql-15-vchord-bm25_0.3.0-2PIGSTY~trixie_arm64.deb
postgresql-15-vchord-bm250.3.0u22.x86_64pigsty480.3 KiBpostgresql-15-vchord-bm25_0.3.0-2PIGSTY~jammy_amd64.deb
postgresql-15-vchord-bm250.3.0u22.aarch64pigsty378.2 KiBpostgresql-15-vchord-bm25_0.3.0-2PIGSTY~jammy_arm64.deb
postgresql-15-vchord-bm250.3.0u24.x86_64pigsty476.5 KiBpostgresql-15-vchord-bm25_0.3.0-2PIGSTY~noble_amd64.deb
postgresql-15-vchord-bm250.3.0u24.aarch64pigsty373.4 KiBpostgresql-15-vchord-bm25_0.3.0-2PIGSTY~noble_arm64.deb
PackageVersionOSORGSIZEFile URL
vchord_bm25_140.3.0el8.x86_64pigsty522.1 KiBvchord_bm25_14-0.3.0-1PIGSTY.el8.x86_64.rpm
vchord_bm25_140.3.0el8.aarch64pigsty405.2 KiBvchord_bm25_14-0.3.0-1PIGSTY.el8.aarch64.rpm
vchord_bm25_140.3.0el9.x86_64pigsty538.1 KiBvchord_bm25_14-0.3.0-1PIGSTY.el9.x86_64.rpm
vchord_bm25_140.3.0el9.aarch64pigsty435.4 KiBvchord_bm25_14-0.3.0-1PIGSTY.el9.aarch64.rpm
vchord_bm25_140.3.0el10.x86_64pigsty538.2 KiBvchord_bm25_14-0.3.0-1PIGSTY.el10.x86_64.rpm
vchord_bm25_140.3.0el10.aarch64pigsty435.0 KiBvchord_bm25_14-0.3.0-1PIGSTY.el10.aarch64.rpm
postgresql-14-vchord-bm250.3.0d12.x86_64pigsty426.8 KiBpostgresql-14-vchord-bm25_0.3.0-2PIGSTY~bookworm_amd64.deb
postgresql-14-vchord-bm250.3.0d12.aarch64pigsty319.5 KiBpostgresql-14-vchord-bm25_0.3.0-2PIGSTY~bookworm_arm64.deb
postgresql-14-vchord-bm250.3.0d13.x86_64pigsty426.9 KiBpostgresql-14-vchord-bm25_0.3.0-2PIGSTY~trixie_amd64.deb
postgresql-14-vchord-bm250.3.0d13.aarch64pigsty319.6 KiBpostgresql-14-vchord-bm25_0.3.0-2PIGSTY~trixie_arm64.deb
postgresql-14-vchord-bm250.3.0u22.x86_64pigsty480.4 KiBpostgresql-14-vchord-bm25_0.3.0-2PIGSTY~jammy_amd64.deb
postgresql-14-vchord-bm250.3.0u22.aarch64pigsty378.3 KiBpostgresql-14-vchord-bm25_0.3.0-2PIGSTY~jammy_arm64.deb
postgresql-14-vchord-bm250.3.0u24.x86_64pigsty476.3 KiBpostgresql-14-vchord-bm25_0.3.0-2PIGSTY~noble_amd64.deb
postgresql-14-vchord-bm250.3.0u24.aarch64pigsty373.7 KiBpostgresql-14-vchord-bm25_0.3.0-2PIGSTY~noble_arm64.deb

Source

pig build pkg vchord_bm25;		# build rpm/deb

Install

Make sure PGDG and PIGSTY repo available:

pig repo add pgsql -u   # add both repo and update cache

Install this extension with pig:

pig install vchord_bm25;		# install via package name, for the active PG version

pig install vchord_bm25 -v 18;   # install for PG 18
pig install vchord_bm25 -v 17;   # install for PG 17
pig install vchord_bm25 -v 16;   # install for PG 16
pig install vchord_bm25 -v 15;   # install for PG 15
pig install vchord_bm25 -v 14;   # install for PG 14

Config this extension to shared_preload_libraries:

shared_preload_libraries = 'vchord_bm25';

Create this extension with:

CREATE EXTENSION vchord_bm25;

Usage

GitHub: tensorchord/VectorChord-bm25

VectorChord-BM25 is a PostgreSQL extension for the BM25 ranking algorithm, implemented via Block-WeakAnd algorithms. It is designed to work together with pg_tokenizer for customized text tokenization.

Architecture

The extension comprises three main components:

  1. Tokenizer: Converts text into bm25vector (sparse vectors storing vocabulary IDs and term frequencies)
  2. bm25vector: A custom data type for storing tokenized text
  3. bm25vector indexes: Accelerate search and ranking operations

Quick Start

-- Enable required extensions
CREATE EXTENSION IF NOT EXISTS pg_tokenizer CASCADE;
CREATE EXTENSION IF NOT EXISTS vchord_bm25 CASCADE;

-- Create a tokenizer (e.g., LLMLingua2 for English)
SELECT create_tokenizer('tokenizer1', $$
model = "llmlingua2"
$$);

-- Create a table with text content
CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  passage TEXT,
  embedding bm25vector
);

-- Tokenize text passages into bm25vectors
UPDATE documents SET embedding = tokenize(passage, 'tokenizer1');

-- Create a BM25 index
CREATE INDEX documents_embedding_bm25 ON documents USING bm25 (embedding bm25_ops);

-- Query with BM25 ranking
SELECT id, passage, embedding <&> to_bm25query('documents_embedding_bm25', tokenize('search query', 'tokenizer1')) AS score
FROM documents
ORDER BY score
LIMIT 10;

Note: BM25 scores in VectorChord-BM25 are negative, with more negative scores indicating greater relevance.

The <&> Operator

The <&> operator computes the BM25 relevance score between a stored bm25vector and a query bm25vector. Queries must be wrapped in to_bm25query() which takes the index name and the tokenized query:

-- Basic search query
-- to_bm25query(index_name, tokenized_query)
SELECT id, passage, embedding <&> to_bm25query('documents_embedding_bm25', tokenize('database system', 'tokenizer1')) AS score
FROM documents
ORDER BY score
LIMIT 10;

Language Support

VectorChord-BM25 supports multiple languages through different tokenizer configurations:

LanguageApproachModel/Pre-tokenizer
EnglishPre-trained modelmodel = "llmlingua2" or model = "bert_base_uncased"
ChineseCustom model with Jieba pre-tokenizer[pre_tokenizer.jieba]
JapaneseCustom model with Lindera pre-tokenizerLindera with IPADIC dictionary
CustomUser-trained models via text analyzerscreate_custom_model_tokenizer_and_trigger()

Chinese Text Search Example

Chinese text requires a custom model with a Jieba pre-tokenizer (not a pre-trained model):

-- Create a text analyzer with Jieba pre-tokenizer
SELECT create_text_analyzer('zh_text_analyzer', $$
[pre_tokenizer.jieba]
$$);

-- Create a custom model tokenizer that trains on your corpus
SELECT create_custom_model_tokenizer_and_trigger(
    tokenizer_name => 'zh_tokenizer',
    model_name => 'zh_model',
    text_analyzer_name => 'zh_text_analyzer',
    table_name => 'documents',
    source_column => 'passage',
    target_column => 'embedding'
);

Custom Tokenizer Models

For domain-specific terminology, you can create text analyzers with stopwords, stemming, and other filters, then train custom models on your corpus using create_custom_model_tokenizer_and_trigger().

Comparison with Alternatives

FeatureVectorChord-BM25PostgreSQL tsvector + ts_rank
Ranking algorithmBM25tf-idf variant
Custom tokenizersYes (via pg_tokenizer)Limited to built-in configs
Index typeDedicated BM25 indexGIN index
Native PostgreSQLYes (extension)Built-in
Language supportExtensible via modelsVia text search configs
Last updated on