Detroit Sports Chatbot

A conversational AI chatbot that answers live questions about the Lions, Tigers, Red Wings, and Pistons — powered by ESPN data and your choice of Claude or Groq.

Unique Value: Combines real-time ESPN data with AI tool use so the model decides when to fetch live scores, standings, injuries, and more — then streams the answer word by word.

Demo

Live conversation asking about the Detroit Tigers — the AI pulled real ESPN data and streamed the response word by word.

Problem

Built to go beyond static sports data apps. The goal was to learn AI tool use (function calling), prompt engineering with measurable evaluation, and how to deploy a Python web app with secure API key handling. Detroit sports fans deserve a chatbot that actually knows what happened last night — not just a generic wrapper.

Challenge-Based Learning

Challenge: Get an AI model to reliably call the right ESPN API tool based on a user's natural language question, then stream a useful, accurate answer.
Approach: Defined 15 ESPN tool schemas for both Anthropic and Groq, built a tool use loop that handles multi-step responses, and used an automated eval pipeline to score and improve the system prompt from v1 to v4.
Outcome: A 28% improvement in response quality (3.2 → 4.1 out of 5), live on Render, with real Detroit sports data available on demand.

Project Snapshot

Platform: Web app (Streamlit)
Stack: Python · Streamlit · Anthropic Claude API · Groq API · ESPN unofficial API
Focus: AI tool use, prompt engineering, streaming, secure deployment
Team: Solo
Role: Full-stack developer, AI engineer, prompt engineer
Timeline: April 2026 - April 2026

View Live Demo View on GitHub

Role

Solo developer and AI engineer responsible for the full stack: Streamlit UI, Anthropic and Groq API integration, ESPN tool definitions, prompt engineering, eval pipeline, and Render deployment.

My Contributions

Defined 15 ESPN API tool schemas compatible with both Anthropic and Groq (OpenAI-compatible format)
Built a tool use loop so the model can call multiple ESPN tools in a single response
Implemented streaming responses in Streamlit for word-by-word output
Added a provider switcher in the sidebar (Claude Sonnet ↔ Groq Llama 3.3 70b)
Engineered and iterated the system prompt using an automated eval pipeline (eval.py)
Added 30-second ESPN response caching, rate limiting (10 req/min), and secure server-side API keys
Deployed on Render free tier with environment variable configuration

Key Features

15 ESPN API tools: live scores, standings, schedule, injuries, roster, news, team stats, transactions, depth chart, leaders, play-by-play, box score
Dual AI provider support — switch between Claude Sonnet and Groq Llama 3.3 70b in the sidebar
Tool use — the AI decides which ESPN tool(s) to call based on the question, and shows which tool is running in real time
Word-by-word streaming responses
Suggested starter questions shown on first load
30-second ESPN response cache to reduce redundant API calls
Rate limiting (10 requests/minute) with user-friendly error messages
API keys stored server-side only — never exposed to the browser

Solo project · Early 2026 · Python · Streamlit · AI Tool Use · Live on Render

Architecture

The project is split into four focused files:

app.py — Streamlit UI, sidebar provider selector, rate limiting, friendly error handling
chatbot.py — Anthropic and Groq API logic, tool use loop, streaming
sports_tools.py — All 15 ESPN API functions + tool schemas for both providers. Includes a _fetch_espn() helper with 30-second caching and error handling
eval.py — Automated prompt evaluation pipeline using Groq to score responses 1–5

How Tool Use Works

When a user asks a question, the AI model decides whether to call an ESPN tool, which tool, and with what arguments — all automatically. The tool use loop handles cases where the model wants to call multiple tools in a single response.

TOOL USE LOOP — chatbot.py Model decides which ESPN tool to call; loop handles multi-step responses.

while response.stop_reason == "tool_use":
    tool_use_block = next(b for b in response.content if b.type == "tool_use")
    tool_name = tool_use_block.name
    tool_input = tool_use_block.input

    # Call the matching ESPN function
    tool_result = sports_tools.call_tool(tool_name, tool_input)

    # Feed result back to the model
    messages.append({"role": "assistant", "content": response.content})
    messages.append({
        "role": "user",
        "content": [{"type": "tool_result", "tool_use_id": tool_use_block.id, "content": tool_result}]
    })
    response = client.messages.create(model=model, tools=tools, messages=messages, ...)

ESPN Tool Schema

Each of the 15 tools is defined with a schema compatible with both Anthropic's and Groq's API formats. The model reads these schemas to decide which tool fits the user's question.

EXAMPLE TOOL SCHEMA — sports_tools.py Schema for the live scores tool — one of 15 defined.

{
    "name": "get_scoreboard",
    "description": "Get live or recent scores for a Detroit sports team.",
    "input_schema": {
        "type": "object",
        "properties": {
            "sport": {
                "type": "string",
                "enum": ["football", "baseball", "basketball", "hockey"],
                "description": "The sport to fetch scores for."
            }
        },
        "required": ["sport"]
    }
}

Prompt Engineering — Automated Eval Pipeline

Instead of guessing whether a system prompt was good, I built eval.py— an automated pipeline that uses an LLM to both run the chatbot and grade its responses on a 1–5 scale. This gave measurable signal to guide each iteration. The pipeline originally used Claude, then was updated to support Groq for free local testing.

v1 — Basic prompt 3.2 / 5

v2 — Added XML examples & bad examples 3.6 / 5

v3 — Fixed grader context, edge case handling 3.9 / 5

v4 — Added output format rules 4.1 / 5 ✓

28% improvement from v1 to v4 — measured, not estimated.

EVAL PIPELINE — eval.py Runs the chatbot on test questions and auto-grades each response 1–5.

def grade_response(question, response):
    grading_prompt = f"""
    Grade this chatbot response 1-5 for accuracy, helpfulness, and tone.
    Question: {question}
    Response: {response}
    Return only a number 1-5.
    """
    result = groq_client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=[{"role": "user", "content": grading_prompt}]
    )
    return int(result.choices[0].message.content.strip())

scores = [grade_response(q, run_chatbot(q)) for q in test_questions]
print(f"Average score: {sum(scores) / len(scores):.1f} / 5")

Key Decisions

Chose Streamlit for fast iteration on the UI without needing a separate frontend framework
Supported both Anthropic and Groq to compare providers and avoid vendor lock-in
Used ESPN's unofficial API rather than a paid sports data service to keep it free and open
Cached ESPN responses for 30 seconds to balance freshness with API rate limits
Stored all API keys server-side — never passed to the browser — to keep credentials safe
Built the eval pipeline with the same Groq model used in the chatbot to keep the feedback loop cheap and fast

Outcome

A fully deployed, production-ready sports chatbot that handles live Detroit sports questions with AI tool use and streaming. The prompt engineering work produced a measurable 28% quality improvement — a skill directly applicable to any AI product. The dual-provider setup demonstrates working knowledge of both the Anthropic and OpenAI-compatible API formats.

What I Learned

AI tool use / function calling with Anthropic and Groq APIs
Streaming responses in a Streamlit web app
Prompt engineering with measurable, automated evaluation
Deploying a Python web app to Render with secure environment variables
ESPN unofficial API integration and response caching
Rate limiting and production error handling

Next Iteration

Add support for all 30 NFL/NBA/MLB/NHL teams, not just Detroit
Persist conversation history across sessions
Add a voice input option
Upgrade to a paid sports data API for more reliable coverage
Build a custom UI to move beyond Streamlit's default look