pg_tiktoken

pg_tiktoken

pg_tiktoken : tiktoken tokenizer for use with OpenAI models in postgres

Overview

IDExtensionPackageVersionCategoryLicenseLanguage
1870
pg_tiktoken
pg_tiktoken
0.0.1
RAG
Apache-2.0
Rust
AttributeHas BinaryHas LibraryNeed LoadHas DDLRelocatableTrusted
--s-d--
No
Yes
No
Yes
no
no
Relationships
See Also
vectorize
pg_summarize
pg4ml
pgml
vector
vchord
vectorscale
pg_graphql

Packages

TypeRepoVersionPG Major CompatibilityPackage PatternDependencies
EXT
PIGSTY
0.0.1
18
17
16
15
14
pg_tiktoken-
RPM
PIGSTY
0.0.1
18
17
16
15
14
pg_tiktoken_$v-
DEB
PIGSTY
0.0.1
18
17
16
15
14
postgresql-$v-pg-tiktoken-
Linux / PGPG18PG17PG16PG15PG14
el8.x86_64
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
el8.aarch64
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
el9.x86_64
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
el9.aarch64
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
el10.x86_64
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
el10.aarch64
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
d12.x86_64
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
d12.aarch64
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
d13.x86_64
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
d13.aarch64
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
u22.x86_64
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
u22.aarch64
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
u24.x86_64
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
u24.aarch64
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1

Source

pig build pkg pg_tiktoken;		# build rpm/deb

Install

Make sure PGDG and PIGSTY repo available:

pig repo add pgsql -u   # add both repo and update cache

Install this extension with pig:

pig install pg_tiktoken;		# install via package name, for the active PG version

pig install pg_tiktoken -v 18;   # install for PG 18
pig install pg_tiktoken -v 17;   # install for PG 17
pig install pg_tiktoken -v 16;   # install for PG 16
pig install pg_tiktoken -v 15;   # install for PG 15
pig install pg_tiktoken -v 14;   # install for PG 14

Create this extension with:

CREATE EXTENSION pg_tiktoken;

Usage

pg_tiktoken: tiktoken tokenizer for use with OpenAI models in PostgreSQL. Source: README.md

pg_tiktoken is a PostgreSQL extension that provides input tokenization using OpenAI’s tiktoken library. It allows you to count and encode tokens directly in SQL, which is useful for managing input length limits when working with OpenAI models.


Functions

tiktoken_count

Count the number of tokens for a given encoding or model:

SELECT tiktoken_count('p50k_edit', 'A long time ago in a galaxy far, far away');
 tiktoken_count
----------------
             11
(1 row)

tiktoken_encode

Get the token IDs for a given encoding or model:

SELECT tiktoken_encode('cl100k_base', 'A long time ago in a galaxy far, far away');
                  tiktoken_encode
----------------------------------------------------
 {32,1317,892,4227,304,264,34261,3117,11,3117,3201}
(1 row)

Both tiktoken_count and tiktoken_encode accept either an encoding name or an OpenAI model name as the first argument.


Supported Models

Encoding nameOpenAI models
cl100k_baseChatGPT models, text-embedding-ada-002
p50k_baseCode models, text-davinci-002, text-davinci-003
p50k_editEdit models like text-davinci-edit-001, code-davinci-edit-001
r50k_base (or gpt2)GPT-3 models like davinci
Last updated on