Working with AI data stored in S3-compatible object storage
Suggest editsThe following examples demonstrate how to use the pgai functions with S3-compatible object storage. You can use the following examples as is, because they use a publicly accessible example S3 bucket. Or you can prepare your own S3 compatible object storage bucket with some test data and try the steps in this section with that data.
These examples also use image data and an appropriate image encoder LLM instead of text data. You could, though, use plain text data on object storage similar to the examples in Working with AI data in Postgres.
Creating a retriever
Start by creating a retriever for images stored on s3-compatible object storage as the source using the pgai.create_s3_retriever
function.
- The retriever_name is used to identify and reference the retriever; set it to
image_embeddings
for this example. - The schema_name is the schema where the source table is located.
- The model_name is the name of the embeddings encoder model for similarity data; set it to
clip-vit-base-patch32
to use the open encoder model for image data from HuggingFace. - The data_type is the type of data in the source table, which could be either
img
ortext
; set it toimg
. - The bucket_name is the name of the S3 bucket where the data is stored; set this to
torsten
. - The prefix is the prefix of the objects in the bucket; set this to an empty string because you want all the objects in that bucket.
- The endpoint_url is the URL of the S3 endpoint; set that to
https://s3.us-south.cloud-object-storage.appdomain.cloud
to access the public example bucket.
This gives the following SQL command:
Refreshing the retriever
Next, run the pgai.refresh_retriever
function.
Retrieving data
Finally, run the pgai.retrieve_via_s3
function with the required parameters to retrieve the top K most relevant (most similar) AI data items. Be aware that the object type is currently limited to image and text files.
Could this page be better? Report a problem or suggest an addition!