Build an AI Image-Caption Generator on Azure App Service with Streamlit and GPT-4o-mini
TulikaC walks through building a Python app with Azure AI Vision and GPT-4o-mini to generate image captions, deployed securely to Azure App Service using Streamlit for the UI.
Build an AI Image-Caption Generator on Azure App Service with Streamlit and GPT-4o-mini
This guide demonstrates how to create a cloud-native application that takes any image upload and instantly produces a natural one-line caption using a combination of Microsoft-powered AI services and Python technologies.
Overview
- Image upload: User submits an image via Streamlit UI.
- Azure AI Vision: Extracts descriptive tags with confidence scores from the image.
- Azure OpenAI (GPT-4o-mini): Receives tags and generates a fluent image caption.
- Streamlit: Provides a simple, Python-based frontend perfect for fast iteration and sharing.
Sample Code and Infra Templates
What are these components?
- Streamlit: Open-source Python framework for rapid development of data/AI apps with an intuitive interface.
- Azure AI Vision (Vision API): Delivers thorough image analysis, returning tags and signals for further processing.
How the App Works
- Image upload: Streamlit UI lets users upload any photo.
- Tag extraction: App sends the image to Azure AI Vision and receives a high-confidence tag list.
- Caption generation: Tags are sent to Azure OpenAI’s GPT-4o-mini, which creates a natural-sounding one-line caption.
- Results: Caption is displayed in the Streamlit browser app.
Prerequisites
- Azure subscription (Sign Up)
- Azure CLI (Install guide)
- Azure Developer CLI (
azd
) (Install guide) - Python 3.10+
- Visual Studio Code (optional)
- Streamlit (for local runs)
- Managed Identity enabled on App Service (Overview)
Resources to Deploy
You can provision resources either manually or using the provided azd template.
Deployed Components
- Azure App Service (Linux, Python)
- Azure AI Foundry/OpenAI with a GPT-4o-mini deployment
- Azure AI Vision (Computer Vision API)
- Managed Identity for secure service authentication
Quick Deploy with Azure Developer CLI
git clone https://github.com/Azure-Samples/appservice-ai-samples
cd appservice-ai-samples/image_caption_app
azd auth login
azd up
All required resources and app deployment are automated with these commands.
Manual Setup Steps
- Create Azure AI Vision resource (note the endpoint).
- Deploy OpenAI resource and set up GPT-4o-mini deployment.
- Deploy App Service and enable system-assigned Managed Identity.
- Assign correct RBAC roles:
- Cognitive Services OpenAI User (OpenAI)
- Cognitive Services User (Vision)
- Add application settings (endpoints/deployment names) and deploy code.
-
Configure startup command (manual path):
streamlit run app.py --server.port 8000 --server.address 0.0.0.0
Core Code Flow Walkthrough
Top-level (app.py)
tags = extract_tags(image_bytes)
caption = generate_caption(tags)
Vision API Call (utils/vision.py)
response = requests.post(VISION_API_URL, headers=headers, params=PARAMS, data=image_bytes, timeout=30)
response.raise_for_status()
analysis = response.json()
tags = [t.get('name') for t in analysis.get('tags', []) if t.get('name') and t.get('confidence', 0) > 0.6]
Caption Generation (utils/openai_caption.py)
tag_text = ", ".join(tags)
prompt = f"""
You are an assistant that generates vivid, natural-sounding captions for images. Create a one-line caption for an image that contains the following: {tag_text}.
"""
response = client.chat.completions.create(
model=DEPLOYMENT_NAME,
messages=[{"role": "system", "content": "You are a helpful AI assistant."}, {"role": "user", "content": prompt.strip()}],
max_tokens=60,
temperature=0.7
)
return response.choices[0].message.content.strip()
Security and Authentication
- By default, Managed Identity is enabled, so the app can securely authenticate to Azure resources via Microsoft Entra ID, with no secrets in config.
- For local tests without Managed Identity, key-based authentication is possible by supplying credentials as environment variables.
Running Locally (optional)
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
# Set environment variables for endpoints and deployments (plus keys if not using MI)
streamlit run app.py
Repository Structure
- App code & Streamlit UI:
/image_caption_app/
- Infrastructure as code (Bicep):
/image_caption_app/infra/
Extension Ideas
- Add object detection, OCR, or brand detection to enhance prompts for captioning.
- Save images and metadata to Blob Storage and Cosmos DB; build a gallery feature.
- Implement performance optimizations (caching, token usage tracking).
- Hook up Application Insights for observability and monitoring.
Further Learning
Author: TulikaC
This post appeared first on “Microsoft Tech Community”. Read the entire article here