ChatBees tops RAG quality leaderboard

2 min readApr 3, 2024

ChatBees, a serverless LLM platform, achieves a 4.6/5 average RAG quality score in a benchmark published by Tonic.ai. ChatBees is in the leaderboard beating other popular platforms like Cohere, and OpenAI assistant.

Even more incredibly, we were able to achieve this score without any client-side configuration! Every customer of ChatBees has access to this state-of-the-art RAG system out of the box. You can verify our results here.

The detailed test scores ChatBees vs. OpenAI

The average scores of the top RAG products.

Not Just Benchmarks, Try it for your data!

Ultimately, what matters most are the outcomes. While we’re thrilled to be at the forefront in accuracy, we’re most excited to help businesses leveraging LLM! Sign up now and try it for yourself!

About ChatBees

ChatBees is built by a team of industry experts and leaders with decades of experience building scalable enterprise cloud, data, and search platforms at top-tier tech companies like Snowflake, Google, Microsoft, Uber and EMC.

We built ChatBees to help organizations tackle the challenge of running LLM apps in production. ChatBees offers:

State-of-the-art RAG pipeline out of the box
Simple APIs to build your own LLM application in just a few lines of code
No devops or tuning required to make this work in production

Benchmarking RAG quality with tonic.ai

The benchmark from Tonic consists of a series of needle-in-a-haystack type of retrieval challenges aimed to evaluate a RAG system’s accuracy and completeness. You can find the complete jupyter notebook with full setup and evaluation code here.

We will highlight the relevant code snippets to set up ChatBees below. You can explore the full features of ChatBees here.

Get or Create the test collection

import chatbees as cb
import shortuuid

def get_shared_collection():
    """ 
    Use the shared collection that already has all essays.
    """
    cb.init(account_id=”QZCLEHTW”)
    return cb.Collection(name=”tonic_rag_test”)

def create_collection():
    """
    Creates a collection and uploads all essays
    """
    cb.init(api_key="my_api_key", account_id="my_account_id")

    col = cb.create_collection(cb.Collection(name='tonic_test' + shortuuid.uuid()))
    col.upload_document('./all_essays_in_single_file.txt')
    return col

collection = get_shared_collection()
# If you prefer to create a new collection to try, call create_collection()
# collection = create_collection()

2. Run retrieval challenges

from tonic_validate import (
    ValidateScorer,
    Benchmark,
    BenchmarkItem,
    LLMResponse,
    BenchmarkItem,
    Run,
)
from tqdm import tqdm

def get_cb_rag_response(benchmarkItem: BenchmarkItem, collection):
    prompt = benchmarkItem.question
    response, _ = collection.ask(prompt)
    return response

raw_cb_responses = []
for x in tqdm(benchmark.items):
    raw_cb_responses.append(get_cb_rag_response(x, collection))

3. Evaluate ChatBees responses

cb_responses = [
    LLMResponse(
        llm_answer=r, llm_context_list=[], benchmark_item=bi
    ) for r, bi in zip(raw_cb_responses, benchmark.items)
]

scorer = ValidateScorer([AnswerSimilarityMetric()])
cb_run = scorer.score_run(cb_responses, parallelism=5)

ChatBees tops RAG quality leaderboard

Not Just Benchmarks, Try it for your data!

About ChatBees

Benchmarking RAG quality with tonic.ai

Written by ChatBees