Detroit Sports Chatbot

A conversational AI chatbot that answers live questions about the Lions, Tigers, Red Wings, and Pistons — powered by ESPN data and your choice of Claude or Groq.


Unique Value: Combines real-time ESPN data with AI tool use so the model decides when to fetch live scores, standings, injuries, and more — then streams the answer word by word.

Demo

Detroit Sports Chatbot screenshot showing a conversation about the Detroit Tigers
Live conversation asking about the Detroit Tigers — the AI pulled real ESPN data and streamed the response word by word.

Problem

Built to go beyond static sports data apps. The goal was to learn AI tool use (function calling), prompt engineering with measurable evaluation, and how to deploy a Python web app with secure API key handling. Detroit sports fans deserve a chatbot that actually knows what happened last night — not just a generic wrapper.

Challenge-Based Learning

Challenge: Get an AI model to reliably call the right ESPN API tool based on a user's natural language question, then stream a useful, accurate answer.
Approach: Defined 15 ESPN tool schemas for both Anthropic and Groq, built a tool use loop that handles multi-step responses, and used an automated eval pipeline to score and improve the system prompt from v1 to v4.
Outcome: A 28% improvement in response quality (3.2 → 4.1 out of 5), live on Render, with real Detroit sports data available on demand.

Project Snapshot

  • Platform: Web app (Streamlit)
  • Stack: Python · Streamlit · Anthropic Claude API · Groq API · ESPN unofficial API
  • Focus: AI tool use, prompt engineering, streaming, secure deployment
  • Team: Solo
  • Role: Full-stack developer, AI engineer, prompt engineer
  • Timeline: April 2026 - April 2026
View Live Demo View on GitHub

Role

Solo developer and AI engineer responsible for the full stack: Streamlit UI, Anthropic and Groq API integration, ESPN tool definitions, prompt engineering, eval pipeline, and Render deployment.

My Contributions

  • Defined 15 ESPN API tool schemas compatible with both Anthropic and Groq (OpenAI-compatible format)
  • Built a tool use loop so the model can call multiple ESPN tools in a single response
  • Implemented streaming responses in Streamlit for word-by-word output
  • Added a provider switcher in the sidebar (Claude Sonnet ↔ Groq Llama 3.3 70b)
  • Engineered and iterated the system prompt using an automated eval pipeline (eval.py)
  • Added 30-second ESPN response caching, rate limiting (10 req/min), and secure server-side API keys
  • Deployed on Render free tier with environment variable configuration

Key Features

  • 15 ESPN API tools: live scores, standings, schedule, injuries, roster, news, team stats, transactions, depth chart, leaders, play-by-play, box score
  • Dual AI provider support — switch between Claude Sonnet and Groq Llama 3.3 70b in the sidebar
  • Tool use — the AI decides which ESPN tool(s) to call based on the question, and shows which tool is running in real time
  • Word-by-word streaming responses
  • Suggested starter questions shown on first load
  • 30-second ESPN response cache to reduce redundant API calls
  • Rate limiting (10 requests/minute) with user-friendly error messages
  • API keys stored server-side only — never exposed to the browser
Solo project · Early 2026 · Python · Streamlit · AI Tool Use · Live on Render

Architecture

The project is split into four focused files:

  • app.py — Streamlit UI, sidebar provider selector, rate limiting, friendly error handling
  • chatbot.py — Anthropic and Groq API logic, tool use loop, streaming
  • sports_tools.py — All 15 ESPN API functions + tool schemas for both providers. Includes a _fetch_espn() helper with 30-second caching and error handling
  • eval.py — Automated prompt evaluation pipeline using Groq to score responses 1–5

How Tool Use Works

When a user asks a question, the AI model decides whether to call an ESPN tool, which tool, and with what arguments — all automatically. The tool use loop handles cases where the model wants to call multiple tools in a single response.

TOOL USE LOOP — chatbot.py Model decides which ESPN tool to call; loop handles multi-step responses.
while response.stop_reason == "tool_use":
    tool_use_block = next(b for b in response.content if b.type == "tool_use")
    tool_name = tool_use_block.name
    tool_input = tool_use_block.input

    # Call the matching ESPN function
    tool_result = sports_tools.call_tool(tool_name, tool_input)

    # Feed result back to the model
    messages.append({"role": "assistant", "content": response.content})
    messages.append({
        "role": "user",
        "content": [{"type": "tool_result", "tool_use_id": tool_use_block.id, "content": tool_result}]
    })
    response = client.messages.create(model=model, tools=tools, messages=messages, ...)

ESPN Tool Schema

Each of the 15 tools is defined with a schema compatible with both Anthropic's and Groq's API formats. The model reads these schemas to decide which tool fits the user's question.

EXAMPLE TOOL SCHEMA — sports_tools.py Schema for the live scores tool — one of 15 defined.
{
    "name": "get_scoreboard",
    "description": "Get live or recent scores for a Detroit sports team.",
    "input_schema": {
        "type": "object",
        "properties": {
            "sport": {
                "type": "string",
                "enum": ["football", "baseball", "basketball", "hockey"],
                "description": "The sport to fetch scores for."
            }
        },
        "required": ["sport"]
    }
}

Prompt Engineering — Automated Eval Pipeline

Instead of guessing whether a system prompt was good, I built eval.py— an automated pipeline that uses an LLM to both run the chatbot and grade its responses on a 1–5 scale. This gave measurable signal to guide each iteration. The pipeline originally used Claude, then was updated to support Groq for free local testing.

v1 — Basic prompt  3.2 / 5
v2 — Added XML examples & bad examples  3.6 / 5
v3 — Fixed grader context, edge case handling  3.9 / 5
v4 — Added output format rules  4.1 / 5  ✓

28% improvement from v1 to v4 — measured, not estimated.

EVAL PIPELINE — eval.py Runs the chatbot on test questions and auto-grades each response 1–5.
def grade_response(question, response):
    grading_prompt = f"""
    Grade this chatbot response 1-5 for accuracy, helpfulness, and tone.
    Question: {question}
    Response: {response}
    Return only a number 1-5.
    """
    result = groq_client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=[{"role": "user", "content": grading_prompt}]
    )
    return int(result.choices[0].message.content.strip())

scores = [grade_response(q, run_chatbot(q)) for q in test_questions]
print(f"Average score: {sum(scores) / len(scores):.1f} / 5")

Key Decisions

  • Chose Streamlit for fast iteration on the UI without needing a separate frontend framework
  • Supported both Anthropic and Groq to compare providers and avoid vendor lock-in
  • Used ESPN's unofficial API rather than a paid sports data service to keep it free and open
  • Cached ESPN responses for 30 seconds to balance freshness with API rate limits
  • Stored all API keys server-side — never passed to the browser — to keep credentials safe
  • Built the eval pipeline with the same Groq model used in the chatbot to keep the feedback loop cheap and fast

Outcome

A fully deployed, production-ready sports chatbot that handles live Detroit sports questions with AI tool use and streaming. The prompt engineering work produced a measurable 28% quality improvement — a skill directly applicable to any AI product. The dual-provider setup demonstrates working knowledge of both the Anthropic and OpenAI-compatible API formats.

What I Learned

  • AI tool use / function calling with Anthropic and Groq APIs
  • Streaming responses in a Streamlit web app
  • Prompt engineering with measurable, automated evaluation
  • Deploying a Python web app to Render with secure environment variables
  • ESPN unofficial API integration and response caching
  • Rate limiting and production error handling

Next Iteration

  • Add support for all 30 NFL/NBA/MLB/NHL teams, not just Detroit
  • Persist conversation history across sessions
  • Add a voice input option
  • Upgrade to a paid sports data API for more reliable coverage
  • Build a custom UI to move beyond Streamlit's default look