Swanand Potnis — AI Engineer

Open Source Library

llm-behave

pytest plugin for behavioral testing of LLM applications

Stop using LLMs to judge LLMs. llm-behave brings semantic assertions, tone detection, and drift detection to your CI pipeline — all with an offline 80MB model. No API costs. No circular dependencies.

Semantic matching — response.mentions("refund") understands meaning, not just strings
Tone & intent detection — response.tone("professional") without an LLM judge
Drift detection for CI — catch behavioral regressions before they hit production
100% offline — sentence-transformer runs on CPU, ~32ms per assertion

View on GitHub Read Docs ↗

test_support.py

# Install the library
$ pip install llm-behave[semantic]

# Write semantic tests, not string matchers
def test_support_response(llm):
    response = llm.ask(
        "How do I get a refund?"
    )

    # Understands meaning, not just words
    assert response.mentions("refund policy")
    assert response.tone("professional")
    assert response.intent("helpful")

# Detect drift across model updates
def test_no_regression(drift):
    drift.assert_consistent(baseline="v1.json")

In Production

Live Applications

Real tools solving real problems. Try them.

LIVE

30+ active users

Try Live Demo →

DocGrit

SaaS API documentation generator. Transform OpenAPI specs into beautiful, interactive docs in seconds.

Node.js/ExpressSupabaseGoogle GeminiRailway

Launch App ↗ View Code

LIVE

AI-Powered Data Quality

Try Live Demo →

Data Quality Detective

Upload a CSV, get instant AI-powered data quality scoring and actionable insights.

PythonStreamlitPandas

Launch App ↗ View Code

Work

Projects & Case Studies

AI implementations, open-source tools, and technical deep-dives.

Live on ClawHub

LLM Regression Monitor

Automated behavioral testing & alerting for LLM apps. Catches prompt drift daily and alerts on Slack, Discord, WhatsApp, or email — before users notice.

Pythonllm-behaveOpenClawClawHub

ClawHub ↗ GitHub ↗

90% Accuracy

AI Bill Shock Detector

ML anomaly detection system for unexpected billing events. Real-time alerts preventing revenue loss.

PythonScikit-LearnStreamlit

View on GitHub ↗

Enterprise RAG

Chat Assistant – RAG

Private RAG implementation with PDF ingestion. 100% data privacy with instant knowledge retrieval.

LangChainOpenAIFAISS

View on GitHub ↗

10x Faster

InsightIQ

Upload CSV/Excel, get GPT-4 powered business insights and automated reports instantly.

PythonGradioGPT-4

View on GitHub ↗

Multi-Agent

Accessible AI Agents

Multi-agent AI system for automated digital accessibility checks and content remediation.

PythonAgentsAutomation

View on GitHub ↗

Prompt Engineering Playground

Comprehensive demos of CoT, few-shot, RAG, and agent prompting techniques with benchmarks.

PythonOpenAIJupyter

View on GitHub ↗

MCP Server

mcp-llm-behave

MCP server wrapping llm-behave using FastMCP with stdio transport. Run behavioral LLM tests from any MCP-compatible client.

PythonFastMCPMCPstdio

View on GitHub ↗

ClearFeed

AI-curated news aggregation with smart categorization, real-time updates, and consensus ratings.

Next.jsAI/MLAPIs

Live Demo ↗ GitHub ↗

Toolkit

Skills & Stack

Tools I use to get things done.

Databricks Generative AI Fundamentals Certified 2024

AI & LLMs

LC LangChain

OAI OpenAI

GEM Google Gemini

Scikit-Learn

HF HuggingFace

RAG RAG / FAISS

MCP MCP / FastMCP

ST sentence-transformers

Languages

Python

SQL

JavaScript

NumPy

Data

Pandas

Matplotlib

Jupyter

Web & Infra

FastAPI

Flask

React

Next.js

Docker

SL Streamlit

Cloud & Databases

AWS

GCP

SB Supabase

MySQL

RW Railway

Available for projects

About

Hi, I'm Swanand.

I'm an AI Engineer based in India. I build production AI systems — from LLM integrations and RAG pipelines to testing infrastructure and deployed APIs. Not notebooks that sit on GitHub, but tools with real users.

Today I maintain llm-behave (an open source testing library for LLM apps), run production applications with active users, and ship code daily. Background in data engineering and analytics.

Databricks Generative AI Certified (2024)

2 live applications with active users

Open source library author (PyPI)

Actively seeking full-time AI engineering roles

Let's Talk

Contact

Let's build something.

Actively looking for full-time AI engineering roles. Also open to freelance projects and interesting collaborations — AI systems, LLM integrations, and production tooling.

India — Remote worldwide

Actively seeking full-time roles

Also available for freelance projects

I build AI tools that work in production.

llm-behave

Live Applications

DocGrit

Data Quality Detective

Projects & Case Studies

LLM Regression Monitor

AI Bill Shock Detector

Chat Assistant – RAG

InsightIQ

Accessible AI Agents

Prompt Engineering Playground

mcp-llm-behave

ClearFeed

Skills & Stack

AI & LLMs

Languages

Data

Web & Infra

Cloud & Databases

Hi, I'm Swanand.

Let's build something.