|
AI Engineer building production AI systems — from LLM pipelines to deployed APIs, I build things that actually ship.
pytest plugin for behavioral testing of LLM applications
Stop using LLMs to judge LLMs. llm-behave brings semantic assertions, tone detection, and drift detection to your CI pipeline — all with an offline 80MB model. No API costs. No circular dependencies.
response.mentions("refund") understands meaning, not just strings
response.tone("professional") without an LLM judge
# Install the library
$ pip install llm-behave[semantic]
# Write semantic tests, not string matchers
def test_support_response(llm):
response = llm.ask(
"How do I get a refund?"
)
# Understands meaning, not just words
assert response.mentions("refund policy")
assert response.tone("professional")
assert response.intent("helpful")
# Detect drift across model updates
def test_no_regression(drift):
drift.assert_consistent(baseline="v1.json")
Real tools solving real problems. Try them.
SaaS API documentation generator. Transform OpenAPI specs into beautiful, interactive docs in seconds.
Upload a CSV, get instant AI-powered data quality scoring and actionable insights.
AI implementations, open-source tools, and technical deep-dives.
Live on ClawHub
90% Accuracy
ML anomaly detection system for unexpected billing events. Real-time alerts preventing revenue loss.
View on GitHub ↗
Enterprise RAG
Private RAG implementation with PDF ingestion. 100% data privacy with instant knowledge retrieval.
View on GitHub ↗
10x Faster
Upload CSV/Excel, get GPT-4 powered business insights and automated reports instantly.
View on GitHub ↗Multi-agent AI system for automated digital accessibility checks and content remediation.
View on GitHub ↗
Comprehensive demos of CoT, few-shot, RAG, and agent prompting techniques with benchmarks.
View on GitHub ↗MCP server wrapping llm-behave using FastMCP with stdio transport. Run behavioral LLM tests from any MCP-compatible client.
View on GitHub ↗
AI-curated news aggregation with smart categorization, real-time updates, and consensus ratings.
Tools I use to get things done.
I'm an AI Engineer based in India. I build production AI systems — from LLM integrations and RAG pipelines to testing infrastructure and deployed APIs. Not notebooks that sit on GitHub, but tools with real users.
Today I maintain llm-behave (an open source testing library for LLM apps), run production applications with active users, and ship code daily. Background in data engineering and analytics.
Actively looking for full-time AI engineering roles. Also open to freelance projects and interesting collaborations — AI systems, LLM integrations, and production tooling.