Core Concepts
This guide explains the fundamental concepts and components of Ragify.js.
Documents and Chunks
Documents
A document in Ragify.js is the basic unit of information. It consists of:
interface Document {
text: string; // The actual content
metadata?: Record<string, unknown>; // Optional metadata
}Chunks
Documents are automatically split into chunks for better processing and retrieval:
interface Chunk {
text: string; // The chunk content
metadata?: Record<string, unknown>; // Inherited from parent document
}The chunking process:
- Splits text into sentences
- Preserves sentence boundaries
- Maintains context through overlap
- Preserves metadata
Embeddings
Embeddings are vector representations of text that capture semantic meaning:
OpenAI Embeddings
Ragify.js uses OpenAI’s embedding models:
text-embedding-3-small: 1536 dimensionstext-embedding-3-large: 3072 dimensionstext-embedding-ada-002: 1536 dimensions
interface EmbeddingProvider {
generateEmbeddings: (texts: string[]) => Promise<number[][]>;
getDimension: () => Promise<number>;
}Vector Store
The vector store (Qdrant) manages the storage and retrieval of embeddings:
Features
- Efficient similarity search
- Metadata filtering
- Batch operations
- Scalable storage
Collection Structure
interface CollectionConfig {
vectors: {
size: number; // Embedding dimension
distance: "Cosine" | "Euclid" | "Dot" | "Manhattan";
};
}Querying
Query Process
- Convert query text to embedding
- Search vector store for similar embeddings
- Return relevant documents with scores
Query Options
interface QueryOptions {
topK?: number; // Number of results
threshold?: number; // Minimum similarity score
}Result Format
interface QueryResult {
text: string; // Document text
score: number; // Similarity score
metadata?: Record<string, unknown>; // Document metadata
}Architecture Overview
Best Practices
-
Document Size
- Keep documents focused and concise
- Use metadata for organization
- Consider chunk size for context
-
Chunking
- Adjust chunk size based on content
- Use overlap for context preservation
- Consider sentence boundaries
-
Querying
- Use appropriate similarity thresholds
- Consider result count
- Leverage metadata filtering
-
Performance
- Batch operations when possible
- Use appropriate embedding models
- Monitor vector store size
Last updated on