pg_bestmatch
pg_bestmatch
Generate BM25 sparse vector inside PostgreSQL
Overview
| ID | Extension | Package | Version | Category | License | Language |
|---|---|---|---|---|---|---|
| 2140 | pg_bestmatch | pg_bestmatch | 0.0.2 | FTS | Apache-2.0 | Rust |
| Attribute | Has Binary | Has Library | Need Load | Has DDL | Relocatable | Trusted |
|---|---|---|---|---|---|---|
--sLd-- | No | Yes | Yes | Yes | no | no |
| Relationships | |
|---|---|
| See Also | vector pg_search vchord_bm25 vchord vectorscale zhparser pg_tokenizer vectorize |
pgrx=0.16.1, manual updated pgrx by Vonng
Packages
| Type | Repo | Version | PG Major Availability | Package Pattern | Dependencies |
|---|---|---|---|---|---|
| EL | PIGSTY | 0.0.2 | 18 17 16 15 14 13 | pg_bestmatch_$v | - |
| Debian | PIGSTY | 0.0.2 | 18 17 16 15 14 13 | postgresql-$v-pg-bestmatch | - |
| Linux / PG | PG18 | PG17 | PG16 | PG15 | PG14 | PG13 |
|---|---|---|---|---|---|---|
el8.x86_64 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 |
el8.aarch64 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 |
el9.x86_64 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 |
el9.aarch64 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 |
el10.x86_64 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 |
el10.aarch64 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 | PIGSTY 0.0.2 |
d12.x86_64 | MISS | PIGSTY 0.0.1 | PIGSTY 0.0.1 | PIGSTY 0.0.1 | PIGSTY 0.0.1 | PIGSTY 0.0.1 |
d12.aarch64 | MISS | PIGSTY 0.0.1 | PIGSTY 0.0.1 | PIGSTY 0.0.1 | PIGSTY 0.0.1 | PIGSTY 0.0.1 |
d13.x86_64 | MISS | MISS | MISS | MISS | MISS | MISS |
d13.aarch64 | MISS | MISS | MISS | MISS | MISS | MISS |
u22.x86_64 | MISS | PIGSTY 0.0.1 | PIGSTY 0.0.1 | PIGSTY 0.0.1 | PIGSTY 0.0.1 | PIGSTY 0.0.1 |
u22.aarch64 | MISS | PIGSTY 0.0.1 | PIGSTY 0.0.1 | PIGSTY 0.0.1 | PIGSTY 0.0.1 | PIGSTY 0.0.1 |
u24.x86_64 | MISS | PIGSTY 0.0.1 | PIGSTY 0.0.1 | PIGSTY 0.0.1 | PIGSTY 0.0.1 | PIGSTY 0.0.1 |
u24.aarch64 | MISS | PIGSTY 0.0.1 | PIGSTY 0.0.1 | PIGSTY 0.0.1 | PIGSTY 0.0.1 | PIGSTY 0.0.1 |
| Package | Version | OS | ORG | SIZE | File URL |
|---|---|---|---|---|---|
pg_bestmatch_18 | 0.0.2 | el8.x86_64 | pigsty | 7.2 MiB | pg_bestmatch_18-0.0.2-1PIGSTY.el8.x86_64.rpm |
pg_bestmatch_18 | 0.0.2 | el8.aarch64 | pigsty | 7.0 MiB | pg_bestmatch_18-0.0.2-1PIGSTY.el8.aarch64.rpm |
pg_bestmatch_18 | 0.0.2 | el9.x86_64 | pigsty | 7.0 MiB | pg_bestmatch_18-0.0.2-1PIGSTY.el9.x86_64.rpm |
pg_bestmatch_18 | 0.0.2 | el9.aarch64 | pigsty | 6.9 MiB | pg_bestmatch_18-0.0.2-1PIGSTY.el9.aarch64.rpm |
pg_bestmatch_18 | 0.0.2 | el10.x86_64 | pigsty | 6.9 MiB | pg_bestmatch_18-0.0.2-1PIGSTY.el10.x86_64.rpm |
pg_bestmatch_18 | 0.0.2 | el10.aarch64 | pigsty | 7.0 MiB | pg_bestmatch_18-0.0.2-1PIGSTY.el10.aarch64.rpm |
| Package | Version | OS | ORG | SIZE | File URL |
|---|---|---|---|---|---|
pg_bestmatch_17 | 0.0.2 | el8.x86_64 | pigsty | 7.2 MiB | pg_bestmatch_17-0.0.2-1PIGSTY.el8.x86_64.rpm |
pg_bestmatch_17 | 0.0.1 | el8.x86_64 | pigsty | 6.7 MiB | pg_bestmatch_17-0.0.1-1PIGSTY.el8.x86_64.rpm |
pg_bestmatch_17 | 0.0.2 | el8.aarch64 | pigsty | 7.0 MiB | pg_bestmatch_17-0.0.2-1PIGSTY.el8.aarch64.rpm |
pg_bestmatch_17 | 0.0.1 | el8.aarch64 | pigsty | 6.5 MiB | pg_bestmatch_17-0.0.1-1PIGSTY.el8.aarch64.rpm |
pg_bestmatch_17 | 0.0.2 | el9.x86_64 | pigsty | 7.0 MiB | pg_bestmatch_17-0.0.2-1PIGSTY.el9.x86_64.rpm |
pg_bestmatch_17 | 0.0.1 | el9.x86_64 | pigsty | 6.4 MiB | pg_bestmatch_17-0.0.1-1PIGSTY.el9.x86_64.rpm |
pg_bestmatch_17 | 0.0.2 | el9.aarch64 | pigsty | 6.9 MiB | pg_bestmatch_17-0.0.2-1PIGSTY.el9.aarch64.rpm |
pg_bestmatch_17 | 0.0.2 | el10.x86_64 | pigsty | 6.9 MiB | pg_bestmatch_17-0.0.2-1PIGSTY.el10.x86_64.rpm |
pg_bestmatch_17 | 0.0.2 | el10.aarch64 | pigsty | 7.0 MiB | pg_bestmatch_17-0.0.2-1PIGSTY.el10.aarch64.rpm |
postgresql-17-pg-bestmatch | 0.0.1 | d12.x86_64 | pigsty | 5.4 MiB | postgresql-17-pg-bestmatch_0.0.1-1PIGSTY~bookworm_amd64.deb |
postgresql-17-pg-bestmatch | 0.0.1 | d12.aarch64 | pigsty | 5.2 MiB | postgresql-17-pg-bestmatch_0.0.1-1PIGSTY~bookworm_arm64.deb |
postgresql-17-pg-bestmatch | 0.0.1 | u22.x86_64 | pigsty | 6.1 MiB | postgresql-17-pg-bestmatch_0.0.1-1PIGSTY~jammy_amd64.deb |
postgresql-17-pg-bestmatch | 0.0.1 | u22.aarch64 | pigsty | 6.0 MiB | postgresql-17-pg-bestmatch_0.0.1-1PIGSTY~jammy_arm64.deb |
postgresql-17-pg-bestmatch | 0.0.1 | u24.x86_64 | pigsty | 6.1 MiB | postgresql-17-pg-bestmatch_0.0.1-1PIGSTY~noble_amd64.deb |
postgresql-17-pg-bestmatch | 0.0.1 | u24.aarch64 | pigsty | 6.0 MiB | postgresql-17-pg-bestmatch_0.0.1-1PIGSTY~noble_arm64.deb |
| Package | Version | OS | ORG | SIZE | File URL |
|---|---|---|---|---|---|
pg_bestmatch_16 | 0.0.2 | el8.x86_64 | pigsty | 7.2 MiB | pg_bestmatch_16-0.0.2-1PIGSTY.el8.x86_64.rpm |
pg_bestmatch_16 | 0.0.1 | el8.x86_64 | pigsty | 6.7 MiB | pg_bestmatch_16-0.0.1-1PIGSTY.el8.x86_64.rpm |
pg_bestmatch_16 | 0.0.2 | el8.aarch64 | pigsty | 7.0 MiB | pg_bestmatch_16-0.0.2-1PIGSTY.el8.aarch64.rpm |
pg_bestmatch_16 | 0.0.1 | el8.aarch64 | pigsty | 6.5 MiB | pg_bestmatch_16-0.0.1-1PIGSTY.el8.aarch64.rpm |
pg_bestmatch_16 | 0.0.2 | el9.x86_64 | pigsty | 7.0 MiB | pg_bestmatch_16-0.0.2-1PIGSTY.el9.x86_64.rpm |
pg_bestmatch_16 | 0.0.1 | el9.x86_64 | pigsty | 6.4 MiB | pg_bestmatch_16-0.0.1-1PIGSTY.el9.x86_64.rpm |
pg_bestmatch_16 | 0.0.2 | el9.aarch64 | pigsty | 6.9 MiB | pg_bestmatch_16-0.0.2-1PIGSTY.el9.aarch64.rpm |
pg_bestmatch_16 | 0.0.2 | el10.x86_64 | pigsty | 6.9 MiB | pg_bestmatch_16-0.0.2-1PIGSTY.el10.x86_64.rpm |
pg_bestmatch_16 | 0.0.2 | el10.aarch64 | pigsty | 7.0 MiB | pg_bestmatch_16-0.0.2-1PIGSTY.el10.aarch64.rpm |
postgresql-16-pg-bestmatch | 0.0.1 | d12.x86_64 | pigsty | 5.4 MiB | postgresql-16-pg-bestmatch_0.0.1-1PIGSTY~bookworm_amd64.deb |
postgresql-16-pg-bestmatch | 0.0.1 | d12.aarch64 | pigsty | 5.2 MiB | postgresql-16-pg-bestmatch_0.0.1-1PIGSTY~bookworm_arm64.deb |
postgresql-16-pg-bestmatch | 0.0.1 | u22.x86_64 | pigsty | 6.1 MiB | postgresql-16-pg-bestmatch_0.0.1-1PIGSTY~jammy_amd64.deb |
postgresql-16-pg-bestmatch | 0.0.1 | u22.aarch64 | pigsty | 6.0 MiB | postgresql-16-pg-bestmatch_0.0.1-1PIGSTY~jammy_arm64.deb |
postgresql-16-pg-bestmatch | 0.0.1 | u24.x86_64 | pigsty | 6.1 MiB | postgresql-16-pg-bestmatch_0.0.1-1PIGSTY~noble_amd64.deb |
postgresql-16-pg-bestmatch | 0.0.1 | u24.aarch64 | pigsty | 5.9 MiB | postgresql-16-pg-bestmatch_0.0.1-1PIGSTY~noble_arm64.deb |
| Package | Version | OS | ORG | SIZE | File URL |
|---|---|---|---|---|---|
pg_bestmatch_15 | 0.0.2 | el8.x86_64 | pigsty | 7.2 MiB | pg_bestmatch_15-0.0.2-1PIGSTY.el8.x86_64.rpm |
pg_bestmatch_15 | 0.0.1 | el8.x86_64 | pigsty | 6.7 MiB | pg_bestmatch_15-0.0.1-1PIGSTY.el8.x86_64.rpm |
pg_bestmatch_15 | 0.0.2 | el8.aarch64 | pigsty | 7.0 MiB | pg_bestmatch_15-0.0.2-1PIGSTY.el8.aarch64.rpm |
pg_bestmatch_15 | 0.0.1 | el8.aarch64 | pigsty | 6.5 MiB | pg_bestmatch_15-0.0.1-1PIGSTY.el8.aarch64.rpm |
pg_bestmatch_15 | 0.0.2 | el9.x86_64 | pigsty | 7.0 MiB | pg_bestmatch_15-0.0.2-1PIGSTY.el9.x86_64.rpm |
pg_bestmatch_15 | 0.0.1 | el9.x86_64 | pigsty | 6.4 MiB | pg_bestmatch_15-0.0.1-1PIGSTY.el9.x86_64.rpm |
pg_bestmatch_15 | 0.0.2 | el9.aarch64 | pigsty | 6.9 MiB | pg_bestmatch_15-0.0.2-1PIGSTY.el9.aarch64.rpm |
pg_bestmatch_15 | 0.0.2 | el10.x86_64 | pigsty | 6.9 MiB | pg_bestmatch_15-0.0.2-1PIGSTY.el10.x86_64.rpm |
pg_bestmatch_15 | 0.0.2 | el10.aarch64 | pigsty | 7.0 MiB | pg_bestmatch_15-0.0.2-1PIGSTY.el10.aarch64.rpm |
postgresql-15-pg-bestmatch | 0.0.1 | d12.x86_64 | pigsty | 5.4 MiB | postgresql-15-pg-bestmatch_0.0.1-1PIGSTY~bookworm_amd64.deb |
postgresql-15-pg-bestmatch | 0.0.1 | d12.aarch64 | pigsty | 5.2 MiB | postgresql-15-pg-bestmatch_0.0.1-1PIGSTY~bookworm_arm64.deb |
postgresql-15-pg-bestmatch | 0.0.1 | u22.x86_64 | pigsty | 6.1 MiB | postgresql-15-pg-bestmatch_0.0.1-1PIGSTY~jammy_amd64.deb |
postgresql-15-pg-bestmatch | 0.0.1 | u22.aarch64 | pigsty | 6.0 MiB | postgresql-15-pg-bestmatch_0.0.1-1PIGSTY~jammy_arm64.deb |
postgresql-15-pg-bestmatch | 0.0.1 | u24.x86_64 | pigsty | 6.1 MiB | postgresql-15-pg-bestmatch_0.0.1-1PIGSTY~noble_amd64.deb |
postgresql-15-pg-bestmatch | 0.0.1 | u24.aarch64 | pigsty | 5.9 MiB | postgresql-15-pg-bestmatch_0.0.1-1PIGSTY~noble_arm64.deb |
| Package | Version | OS | ORG | SIZE | File URL |
|---|---|---|---|---|---|
pg_bestmatch_14 | 0.0.2 | el8.x86_64 | pigsty | 7.2 MiB | pg_bestmatch_14-0.0.2-1PIGSTY.el8.x86_64.rpm |
pg_bestmatch_14 | 0.0.1 | el8.x86_64 | pigsty | 6.7 MiB | pg_bestmatch_14-0.0.1-1PIGSTY.el8.x86_64.rpm |
pg_bestmatch_14 | 0.0.2 | el8.aarch64 | pigsty | 7.0 MiB | pg_bestmatch_14-0.0.2-1PIGSTY.el8.aarch64.rpm |
pg_bestmatch_14 | 0.0.1 | el8.aarch64 | pigsty | 6.5 MiB | pg_bestmatch_14-0.0.1-1PIGSTY.el8.aarch64.rpm |
pg_bestmatch_14 | 0.0.2 | el9.x86_64 | pigsty | 6.9 MiB | pg_bestmatch_14-0.0.2-1PIGSTY.el9.x86_64.rpm |
pg_bestmatch_14 | 0.0.1 | el9.x86_64 | pigsty | 6.4 MiB | pg_bestmatch_14-0.0.1-1PIGSTY.el9.x86_64.rpm |
pg_bestmatch_14 | 0.0.2 | el9.aarch64 | pigsty | 6.9 MiB | pg_bestmatch_14-0.0.2-1PIGSTY.el9.aarch64.rpm |
pg_bestmatch_14 | 0.0.2 | el10.x86_64 | pigsty | 6.9 MiB | pg_bestmatch_14-0.0.2-1PIGSTY.el10.x86_64.rpm |
pg_bestmatch_14 | 0.0.2 | el10.aarch64 | pigsty | 7.0 MiB | pg_bestmatch_14-0.0.2-1PIGSTY.el10.aarch64.rpm |
postgresql-14-pg-bestmatch | 0.0.1 | d12.x86_64 | pigsty | 5.4 MiB | postgresql-14-pg-bestmatch_0.0.1-1PIGSTY~bookworm_amd64.deb |
postgresql-14-pg-bestmatch | 0.0.1 | d12.aarch64 | pigsty | 5.2 MiB | postgresql-14-pg-bestmatch_0.0.1-1PIGSTY~bookworm_arm64.deb |
postgresql-14-pg-bestmatch | 0.0.1 | u22.x86_64 | pigsty | 6.1 MiB | postgresql-14-pg-bestmatch_0.0.1-1PIGSTY~jammy_amd64.deb |
postgresql-14-pg-bestmatch | 0.0.1 | u22.aarch64 | pigsty | 6.0 MiB | postgresql-14-pg-bestmatch_0.0.1-1PIGSTY~jammy_arm64.deb |
postgresql-14-pg-bestmatch | 0.0.1 | u24.x86_64 | pigsty | 6.1 MiB | postgresql-14-pg-bestmatch_0.0.1-1PIGSTY~noble_amd64.deb |
postgresql-14-pg-bestmatch | 0.0.1 | u24.aarch64 | pigsty | 5.9 MiB | postgresql-14-pg-bestmatch_0.0.1-1PIGSTY~noble_arm64.deb |
| Package | Version | OS | ORG | SIZE | File URL |
|---|---|---|---|---|---|
pg_bestmatch_13 | 0.0.2 | el8.x86_64 | pigsty | 7.2 MiB | pg_bestmatch_13-0.0.2-1PIGSTY.el8.x86_64.rpm |
pg_bestmatch_13 | 0.0.1 | el8.x86_64 | pigsty | 6.7 MiB | pg_bestmatch_13-0.0.1-1PIGSTY.el8.x86_64.rpm |
pg_bestmatch_13 | 0.0.2 | el8.aarch64 | pigsty | 7.0 MiB | pg_bestmatch_13-0.0.2-1PIGSTY.el8.aarch64.rpm |
pg_bestmatch_13 | 0.0.1 | el8.aarch64 | pigsty | 6.5 MiB | pg_bestmatch_13-0.0.1-1PIGSTY.el8.aarch64.rpm |
pg_bestmatch_13 | 0.0.2 | el9.x86_64 | pigsty | 6.9 MiB | pg_bestmatch_13-0.0.2-1PIGSTY.el9.x86_64.rpm |
pg_bestmatch_13 | 0.0.1 | el9.x86_64 | pigsty | 6.4 MiB | pg_bestmatch_13-0.0.1-1PIGSTY.el9.x86_64.rpm |
pg_bestmatch_13 | 0.0.2 | el9.aarch64 | pigsty | 6.9 MiB | pg_bestmatch_13-0.0.2-1PIGSTY.el9.aarch64.rpm |
pg_bestmatch_13 | 0.0.2 | el10.x86_64 | pigsty | 6.9 MiB | pg_bestmatch_13-0.0.2-1PIGSTY.el10.x86_64.rpm |
pg_bestmatch_13 | 0.0.2 | el10.aarch64 | pigsty | 7.0 MiB | pg_bestmatch_13-0.0.2-1PIGSTY.el10.aarch64.rpm |
postgresql-13-pg-bestmatch | 0.0.1 | d12.x86_64 | pigsty | 5.4 MiB | postgresql-13-pg-bestmatch_0.0.1-1PIGSTY~bookworm_amd64.deb |
postgresql-13-pg-bestmatch | 0.0.1 | d12.aarch64 | pigsty | 5.2 MiB | postgresql-13-pg-bestmatch_0.0.1-1PIGSTY~bookworm_arm64.deb |
postgresql-13-pg-bestmatch | 0.0.1 | u22.x86_64 | pigsty | 6.1 MiB | postgresql-13-pg-bestmatch_0.0.1-1PIGSTY~jammy_amd64.deb |
postgresql-13-pg-bestmatch | 0.0.1 | u22.aarch64 | pigsty | 6.0 MiB | postgresql-13-pg-bestmatch_0.0.1-1PIGSTY~jammy_arm64.deb |
postgresql-13-pg-bestmatch | 0.0.1 | u24.x86_64 | pigsty | 6.1 MiB | postgresql-13-pg-bestmatch_0.0.1-1PIGSTY~noble_amd64.deb |
postgresql-13-pg-bestmatch | 0.0.1 | u24.aarch64 | pigsty | 6.0 MiB | postgresql-13-pg-bestmatch_0.0.1-1PIGSTY~noble_arm64.deb |
Source
pig build get pg_bestmatch; # get pg_bestmatch source code
pig build dep pg_bestmatch; # install build dependencies
pig build pkg pg_bestmatch; # build extension rpm or deb
pig build ext pg_bestmatch; # build extension rpmsInstall
To add the required PGDG / PIGSTY upstream repository, use:
pig repo add pgsql -u # add PGDG + Pigsty repo and update cache (leave existing repos)Install this extension with:
pig ext install pg_bestmatch; # install by extension name, for the current active PG version
pig ext install pg_bestmatch; # install via package alias, for the active PG version
pig ext install pg_bestmatch -v 18; # install for PG 18
pig ext install pg_bestmatch -v 17; # install for PG 17
pig ext install pg_bestmatch -v 16; # install for PG 16
pig ext install pg_bestmatch -v 15; # install for PG 15
pig ext install pg_bestmatch -v 14; # install for PG 14
pig ext install pg_bestmatch -v 13; # install for PG 13Create this extension with:
CREATE EXTENSION pg_bestmatch CASCADE SCHEMA bm_catalog;Usage
- repo: https://github.com/tensorchord/pg_bestmatch.rs
- benchmark: https://hazyresearch.stanford.edu/blog/2024-05-20-m2-bert-retrieval
How does it work?
- Create an BM25 statistics based on your document set by
bm25_create(table_name, column_name, statistic_name);. It will create a materilized view to record the stats. - Generate document sparse vector by
bm25_document_to_svector(statistic_name, passage) - For query, generate query sparse vector
bm25_query_to_svector(statistic_name, query) - Calculate the score by dot product between the query sparse vector and the document sparse vector
- Currently we use huggingface tokenizer with
bert-base-uncasedvocabulary set to tokenize words. Might support more configuration on tokenizer in the future.
Install
CREATE EXTENSION pg_bestmatch;
SET search_path TO public, bm_catalog;Example
Here is an example workflow demonstrating the usage of this extension with the example of Stanford LoCo benchmark.
- Load the dataset. Here is a script for you if you want to experience
pg_bestmatchwith the dataset.
wget https://huggingface.co/api/datasets/hazyresearch/LoCoV1-Documents/parquet/default/test/0.parquet -O documents.parquet
wget https://huggingface.co/api/datasets/hazyresearch/LoCoV1-Queries/parquet/default/test/0.parquet -O queries.parquetimport pandas as pd
from sqlalchemy import create_engine
import numpy as np
from psycopg2.extensions import register_adapter, AsIs
def adapter_numpy_float64(numpy_float64):
return AsIs(numpy_float64)
def adapter_numpy_int64(numpy_int64):
return AsIs(numpy_int64)
def adapter_numpy_float32(numpy_float32):
return AsIs(numpy_float32)
def adapter_numpy_int32(numpy_int32):
return AsIs(numpy_int32)
def adapter_numpy_array(numpy_array):
return AsIs(tuple(numpy_array))
register_adapter(np.float64, adapter_numpy_float64)
register_adapter(np.int64, adapter_numpy_int64)
register_adapter(np.float32, adapter_numpy_float32)
register_adapter(np.int32, adapter_numpy_int32)
register_adapter(np.ndarray, adapter_numpy_array)
db_url = "postgresql://localhost:5432/pg_bestmatch_test"
engine = create_engine(db_url)
def load_documents():
df = pd.read_parquet("documents.parquet")
df.to_sql("documents", engine, if_exists='replace', index=False)
def load_queries():
df = pd.read_parquet("queries.parquet")
df['answer_pids'] = df['answer_pids'].apply(lambda x: str(x[0]))
df.to_sql("queries", engine, if_exists='replace', index=False)
load_documents()
load_queries()- Create BM25 statistics for the
documentstable.
SELECT bm25_create('documents', 'passage', 'documents_passage_bm25', 0.75, 1.2);- Add an embedding column to the
documentsandqueriestables and update the embeddings for documents and queries.
ALTER TABLE documents ADD COLUMN embedding svector; -- for pgvecto.rs users
ALTER TABLE documents ADD COLUMN embedding sparsevec; -- for pgvector users
UPDATE documents SET embedding = bm25_document_to_svector('documents_passage_bm25', passage)::svector; -- for pgvecto.rs users
UPDATE documents SET embedding = bm25_document_to_svector('documents_passage_bm25', passage, 'pgvector')::sparsevec; -- for pgvector users- (Optional) Create a vector index on the sparse vector column.
CREATE INDEX ON documents USING vectors (embedding svector_dot_ops); -- for pgvecto.rs users
CREATE INDEX ON documents USING ivfflat (embedding sparsevec_ip_ops); -- for pgvector users- Perform a vector search to find the most relevant documents for each query.
ALTER TABLE queries ADD COLUMN embedding svector; -- for pgvecto.rs users
ALTER TABLE queries ADD COLUMN embedding sparsevec; -- for pgvector users
UPDATE queries SET embedding = bm25_query_to_svector('documents_passage_bm25', query)::svector; -- for pgvecto.rs users
UPDATE queries SET embedding = bm25_query_to_svector('documents_passage_bm25', query, 'pgvector')::sparsevec; -- for pgvector users
SELECT sum((array[answer_pids] = array(SELECT pid FROM documents WHERE queries.dataset = documents.dataset ORDER BY queries.embedding <#> documents.embedding LIMIT 1))::int) FROM queries;This workflow showcases how to leverage BM25 text queries and vector search in PostgreSQL using this extension. The Top 1 recall of BM25 on this dataset is 0.77. If you reproduce the result, your operations are correct.
Comparison with pg_search
pg_bestmatch.rsonly provides methods for generating sparse vectors and does not support index-based search (which can be achieved by pgvecto.rs or pgvector).pg_searchperforms BM25 retrieval via the externaltantivyengine, which may have limitations when combined with transactions, filters, or JOIN operations. Sincepg_bestmatch.rsis entirely native to Postgres, it offers full compatibility with these operations inside postgres.
Reference
tokenize- Description: Tokenizes an input string into individual tokens.
- Example:
SELECT tokenize('i have an apple'); -- result: {i,have,an,apple}
bm25_create- Description: Creates BM25 statistics for a specified table and column.
- Usage:
SELECT bm25_create('documents', 'passage', 'documents_passage_bm25'); - Parameters:
table_name: Name of the table.column_name: Name of the column.stat_name: Name of the BM25 statistics.b: BM25 parameter (default 0.75).k: BM25 parameter (default 1.2).
bm25_refresh- Description: Updates the BM25 statistics to reflect any changes in the underlying data.
- Usage:
SELECT bm25_refresh('documents_passage_bm25'); - Parameters:
stat_name: Name of the BM25 statistics to update.
bm25_drop- Description: Deletes the BM25 statistics for a specified table and column.
- Usage:
SELECT bm25_drop('documents_passage_bm25'); - Parameters:
stat_name: Name of the BM25 statistics to delete.
bm25_document_to_svector- Description: Converts document text into a sparse vector representation.
- Usage:
SELECT bm25_document_to_svector('documents_passage_bm25', 'document_text'); - Parameters:
stat_name: Name of the BM25 statistics.document_text: The text of the document.style: Emitspgvecto.rs-style sparse vector orpgvector-style sparse vector.
bm25_query_to_svector- Description: Converts query text into a sparse vector representation.
- Usage:
SELECT bm25_query_to_svector('documents_passage_bm25', 'We begin, as always, with the text.'); - Parameters:
stat_name: Name of the BM25 statistics.query_text: The text of the query.style: Emitspgvecto.rs-style sparse vector orpgvector-style sparse vector.
Last updated on