OpenClaw

llm-evaluator

LLM-as-a-Judge evaluator via Langfuse. Scores traces on relevance, accuracy, hallucination, and helpfulness using GPT-5-nano as judge. Supports single trace scoring, batch backfill, and test mode. Integrates with Langfuse dashboard for observability. Triggers: evaluate trace, score quality, check accuracy, backfill scores, test evaluator, LLM judge.

2.8k stars
openclaw/skillsskills/aiwithabidi/llm-evaluator-proMarch 14, 2026
View on GitHub

Install command

python "$CODEX_HOME/skills/.system/skill-installer/scripts/install-skill-from-github.py" --repo openclaw/skills --path skills/aiwithabidi/llm-evaluator-pro
Tell me the task — I'll narrow the agent shortlist.