Building AI-powered Article Embeddings with Chroma and GPT-4

This guide demonstrates how to use Chroma, a developer-centric embedding database, along with GPT-4, a state-of-the-art language model. By following these steps, you can harness the power of Chroma and GPT-4 to enable similarity-based search, recommendation systems, and more.

Before proceeding with this guide, make sure you have the following prerequisites in place:

  1. Docker installed on your machine.
  2. An OpenAI API key.

To get started with Chroma, follow the steps below:

Run the following command to install Chroma as a dependency in your project:

npm install --save chromadb

Import the ChromaClient from the `chromadb` package and create a new instance of the client:

import { ChromaClient } from 'chromadb';
const client = new ChromaClient();

Before using Chroma, you need to connect to its backend. You can either connect to a hosted version of Chroma or run it on your local machine.

  • Clone the Chroma repository from GitHub:
git clone https://github.com/chroma-core/chroma.git

  • Navigate to the cloned directory:
cd chroma

  • Start the Chroma backend using Docker Compose:

Note: Make sure Docker is running on your machine before doing so.

docker-compose up -d --build

Note: If you encounter any build issues, please seek help in the active Community Discord, as most issues are resolved quickly.

Collections are used to store embeddings, documents, and metadata in Chroma. To create a collection, use the createCollection method of the Chroma client. Provide a name for the collection and an optional embedding function if you want to generate embeddings from text. Here's an example using OpenAI's ada-002 model for embedding:

import { OpenAIEmbeddingFunction } from 'chromadb';
const embedder = new OpenAIEmbeddingFunction({ openai_api_key: process.env.YOUR_API_KEY });
const collection = await client.createCollection({ name: "my_collection", embeddingFunction: embedder });

You can add text documents to the collection using the add method. Chroma will handle tokenization, embedding, and indexing automatically. You can add through raw text documents:

await collection.add({
ids: ["id1", "id2"],
metadatas: [{ "source": "my_source" }, { "source": "my_source" }],
documents: ["This is a document", "This is another document"],
});

Or by adding pre-computed embeddings:

await collection.add({
ids: ["id1", "id2"],
embeddings: [[1.2, 2.3, 4.5], [6.7, 8.2, 9.2]],
metadatas: [{ "source": "my_source" }, { "source": "my_source" }],
documents: ["This is a document", "This is another document"]
});

You can query the collection to retrieve the most similar results based on a list of query texts or query embeddings. Use the query method of the collection object. Here's an example:

const results = await collection.query({
nResults: 2,
queryTexts: ["This is a query document"]
});

Finally, we’ll be deploying the repo to Vercel.

1. First, create a new GitHub repository and push your local changes.

2. Deploy it to Vercel. Ensure you add all environment variables that you configured earlier to Vercel during the import process.

And that's it! By following these steps, you can integrate Chroma and OpenAI GPT-4 into your application, allowing you to leverage powerful AI-powered article embeddings for various use cases.

Good luck with your AI-powered project!

Couldn't find the guide you need?