RAG Agents in Calabi AI Builder

Enterprise

Retrieval Augmented Generation (RAG) is the technique of supplying a language model with relevant excerpts from your own documents at query time, so it can answer questions grounded in your organization's actual knowledge — not just its training data. Calabi AI Builder provides a visual, no-code environment for building production RAG agents.

What Is RAG?

A plain LLM has no knowledge of your internal documents, policies, data dictionaries, or proprietary research. RAG solves this by:

Indexing — breaking your documents into small chunks, converting each chunk into a numerical vector (embedding), and storing those vectors in a vector database.
Retrieval — when a user asks a question, converting the question into the same vector space and finding the most semantically similar document chunks.
Augmentation — injecting those relevant chunks into the LLM's prompt as context.
Generation — the LLM generates an answer that is grounded in the retrieved content.

This dramatically reduces hallucination, keeps responses up-to-date with your latest documents, and provides attributable source citations.

RAG Pipeline Architecture

Document Loaders

Document loaders bring your content into the AI Builder pipeline. Choose the loader that matches your document source.

PDF Loader

Supported extensions: .pdf
Settings:
  - Usage:        "One document per file" or "One document per page"
  - Split pages:  If enabled, each page becomes a separate document
  - OCR:          Enable for scanned PDFs (adds processing time)

CSV Loader

Supported extensions: .csv
Settings:
  - Column:       Which column(s) to include in the text chunk
  - Separator:    Comma (default), tab, semicolon
  - Include metadata columns: Select columns to attach as metadata (not embedded, but returnable)

Web Scraper Loader

Settings:
  - URL:              The page or sitemap URL to scrape
  - Scrape Type:      "Single page", "Entire site (sitemap)", "Crawl links"
  - Max depth:        How many link levels to follow (for crawl mode)
  - Include selectors: CSS selectors to include (e.g., "article.content")
  - Exclude selectors: CSS selectors to exclude (e.g., "nav, footer")

S3 Loader

Settings:
  - Bucket:       S3 bucket name
  - Prefix:       Folder prefix (e.g., "documents/hr/")
  - File types:   PDF, TXT, DOCX, CSV
  - AWS Region:   Region where the bucket is hosted
  - Credentials:  AWS credential (configured in AI Builder secrets)

Supported Document Formats

Format	Loader	Notes
PDF	PDF Loader	Supports OCR for scanned documents
Word (.docx)	Docx Loader	Preserves heading structure
Plain text (.txt)	Text Loader	UTF-8 encoding
CSV	CSV Loader	Each row can become a document
Excel (.xlsx)	Excel Loader	One sheet per document
Markdown (.md)	Markdown Loader	Structure-aware splitting
HTML	Web Loader / HTML Loader	Strips tags, preserves text
JSON	JSON Loader	Configurable field extraction
Confluence	Confluence Loader	Authenticated API pull
Notion	Notion Loader	Authenticated API pull

Text Splitting Strategies

Text splitters determine how documents are chunked before embedding. The chunking strategy significantly impacts retrieval quality.

Splitter	Strategy	Best For
Recursive Character	Splits on , , , `""` in order until chunks are small enough	General-purpose; works well for most documents
Character	Splits on a single separator (default: )	Documents with consistent paragraph breaks
Token	Splits at token boundaries (respects LLM context window)	When you need exact token counts
Markdown	Splits at markdown heading boundaries	Technical documentation, README files
Code	Language-aware splitting for code files	Source code indexing
HTML	Splits at HTML tag boundaries	Web-scraped content

Recommended settings for most knowledge base use cases:

Parameter	Recommended Value	Rationale
`chunk_size`	1,000 characters	Fits enough context in one chunk without losing focus
`chunk_overlap`	200 characters	Prevents information loss at chunk boundaries
Splitter	Recursive Character	Handles mixed document styles gracefully

Vector Stores

A vector store is a specialized database optimized for fast similarity search over high-dimensional embedding vectors.

Vector Store	Backend	When to Use
Postgres (pgvector)	Calabi metadata DB	Default for all Calabi deployments. No external service required.
Pinecone	Pinecone SaaS	Extremely large knowledge bases (>10M chunks) needing sub-100ms retrieval.
Weaviate	Self-hosted / SaaS	When you need hybrid (keyword + vector) search.
Qdrant	Self-hosted	High-performance on-premise deployments.
Chroma	In-process	Development and testing only; not for production.

Calabi's default is Postgres pgvector. It is provisioned automatically with every Calabi Enterprise deployment and requires no additional configuration for knowledge bases under ~1M document chunks.

Creating a Vector Store in AI Builder

Open a chatflow in AI Builder.
Drag a Postgres Vector Store node onto the canvas.
In the configuration drawer:
- Table name: Give the store a unique name (e.g., hr_policy_store).
- Embedding model: Select the embedding node connected upstream.
- Operation: Upsert (index new documents) or Similarity Search (query mode).
Connect a Document Loader → Text Splitter → Embedding → Vector Store for indexing.
Connect a Vector Store → Retriever for query time.

Embeddings

Embeddings are numerical representations of text that capture semantic meaning. Similar texts produce similar vectors, enabling semantic search.

Embedding Model	Provider	Dimensions	Cost Profile	Best For
`text-embedding-3-small`	OpenAI	1,536	Low	General-purpose; excellent quality-to-cost ratio
`text-embedding-3-large`	OpenAI	3,072	Higher	Maximum accuracy for complex domain knowledge
`text-embedding-ada-002`	OpenAI	1,536	Low	Legacy; use `3-small` for new projects
`nomic-embed-text`	Calabi Local Models	768	Free (local compute)	Air-gapped environments, sensitive data
`mxbai-embed-large`	Calabi Local Models	1,024	Free (local compute)	On-premise deployments needing good quality
`amazon.titan-embed-text-v2`	AWS Bedrock	1,024	Pay-per-token	Customers standardized on AWS

Consistency Requirement

The embedding model used during indexing and the one used during query-time retrieval must be identical. Switching models requires re-indexing all documents.

Similarity Search Configuration

The Vector Store Retriever node controls how documents are retrieved.

Parameter	Default	Description
Top K	4	Number of chunks to retrieve per query. Increase for broader context; decrease for precision.
Similarity Threshold	0.7	Minimum cosine similarity score (0–1). Chunks below this threshold are excluded.
Search Type	`similarity`	`similarity` (pure vector), `mmr` (Maximum Marginal Relevance — reduces duplicate chunks)
Fetch K (MMR only)	20	Number of candidates to fetch before MMR re-ranking
Lambda (MMR only)	0.5	Balance between relevance (1.0) and diversity (0.0)

Tuning guidance:

If answers are incomplete → increase Top K to 6–8.
If unrelated chunks are included → increase Similarity Threshold to 0.75–0.85.
If multiple chunks repeat the same information → switch to MMR search type.

Building a Company Knowledge Base Agent

Step 1: Prepare Your Documents

Organize your documents by domain. For a company knowledge base, a typical structure:

documents/
├── hr/
│   ├── employee_handbook.pdf
│   ├── leave_policy.pdf
│   └── code_of_conduct.pdf
├── finance/
│   ├── expense_policy.pdf
│   └── budget_guidelines.pdf
└── engineering/
    ├── architecture_overview.pdf
    └── on_call_runbook.pdf

Upload all files to an S3 bucket or use the AI Builder document upload interface.

Step 2: Create the Indexing Flow

In AI Builder, click + New Chatflow → Template → Document Q&A.
Configure the Document Loader for your source (S3 or upload).
Set Text Splitter: Recursive Character, chunk 1000, overlap 200.
Connect to your embedding model (e.g., text-embedding-3-small).
Connect to a Postgres Vector Store, name it company_kb.
Click Upsert to index all documents. Monitor progress in the logs panel.

Step 3: Create the Query Flow

Add a Chat Prompt Template with this system message:

You are a helpful assistant for Acme Corp employees.
Answer questions using only the information in the context below.
If the answer is not in the context, say "I don't have that information
in the knowledge base — please contact HR directly."

Context:
{context}

Connect the Postgres Vector Store (in Similarity Search mode) → Retriever → Chat Prompt Template.
Add your LLM (ChatOpenAI or Chat (Local Models)).
Add Redis Memory for session persistence.
Save and test.

Click API Endpoint to get the chatflow URL.
See Embedding AI Builder Chatflows for integration options (iframe, Slack bot, Teams bot, widget).

Re-Indexing Documents

When source documents are updated, re-index to keep the knowledge base current:

Open the indexing version of the chatflow.
Click Upsert — by default, existing vectors for the same source file are replaced (upsert semantics).

Alternatively, use the trigger from Calabi Automate to schedule automatic re-indexing:

Trigger: S3 file upload event
Action: HTTP Request → AI Builder Upsert API
Schedule: Nightly at 02:00 UTC

Building Chatflows — Canvas overview and all node types
Local Models — Use local embedding and LLM models for privacy
Embedding AI Builder Chatflows — Deployment options

What Is RAG?​

RAG Pipeline Architecture​

Document Loaders​

PDF Loader​

CSV Loader​

Web Scraper Loader​

S3 Loader​

Supported Document Formats​

Text Splitting Strategies​

Vector Stores​

Creating a Vector Store in AI Builder​

Embeddings​

Similarity Search Configuration​

Building a Company Knowledge Base Agent​

Step 1: Prepare Your Documents​

Step 2: Create the Indexing Flow​

Step 3: Create the Query Flow​

Step 4: Deploy and Share​

Re-Indexing Documents​

Related Pages​