TulikaC walks through building a Python app with Azure AI Vision and GPT-4o-mini to generate image captions, deployed securely to Azure App Service using Streamlit for the UI.

Build an AI Image-Caption Generator on Azure App Service with Streamlit and GPT-4o-mini

This guide demonstrates how to create a cloud-native application that takes any image upload and instantly produces a natural one-line caption using a combination of Microsoft-powered AI services and Python technologies.

Overview

Image upload: User submits an image via Streamlit UI.
Azure AI Vision: Extracts descriptive tags with confidence scores from the image.
Azure OpenAI (GPT-4o-mini): Receives tags and generates a fluent image caption.
Streamlit: Provides a simple, Python-based frontend perfect for fast iteration and sharing.

Sample Code and Infra Templates

What are these components?

Streamlit: Open-source Python framework for rapid development of data/AI apps with an intuitive interface.
Azure AI Vision (Vision API): Delivers thorough image analysis, returning tags and signals for further processing.

How the App Works

Image upload: Streamlit UI lets users upload any photo.
Tag extraction: App sends the image to Azure AI Vision and receives a high-confidence tag list.
Caption generation: Tags are sent to Azure OpenAI’s GPT-4o-mini, which creates a natural-sounding one-line caption.
Results: Caption is displayed in the Streamlit browser app.

Prerequisites

Azure subscription (Sign Up)
Azure CLI (Install guide)
Azure Developer CLI (azd) (Install guide)
Python 3.10+
Visual Studio Code (optional)
Streamlit (for local runs)
Managed Identity enabled on App Service (Overview)

Resources to Deploy

You can provision resources either manually or using the provided azd template.

Deployed Components

Azure App Service (Linux, Python)
Azure AI Foundry/OpenAI with a GPT-4o-mini deployment
Azure AI Vision (Computer Vision API)
Managed Identity for secure service authentication

Quick Deploy with Azure Developer CLI

git clone https://github.com/Azure-Samples/appservice-ai-samples
cd appservice-ai-samples/image_caption_app
azd auth login
azd up

All required resources and app deployment are automated with these commands.

Manual Setup Steps

Create Azure AI Vision resource (note the endpoint).
Deploy OpenAI resource and set up GPT-4o-mini deployment.
Deploy App Service and enable system-assigned Managed Identity.
Assign correct RBAC roles:
- Cognitive Services OpenAI User (OpenAI)
- Cognitive Services User (Vision)
Add application settings (endpoints/deployment names) and deploy code.

Configure startup command (manual path):

streamlit run app.py --server.port 8000 --server.address 0.0.0.0

Core Code Flow Walkthrough

Top-level (app.py)

tags = extract_tags(image_bytes)
caption = generate_caption(tags)

Vision API Call (utils/vision.py)

response = requests.post(VISION_API_URL, headers=headers, params=PARAMS, data=image_bytes, timeout=30)
response.raise_for_status()
analysis = response.json()
tags = [t.get('name') for t in analysis.get('tags', []) if t.get('name') and t.get('confidence', 0) > 0.6]

Caption Generation (utils/openai_caption.py)

tag_text = ", ".join(tags)
prompt = f"""
You are an assistant that generates vivid, natural-sounding captions for images. Create a one-line caption for an image that contains the following: {tag_text}.
"""
response = client.chat.completions.create(
    model=DEPLOYMENT_NAME,
    messages=[{"role": "system", "content": "You are a helpful AI assistant."}, {"role": "user", "content": prompt.strip()}],
    max_tokens=60,
    temperature=0.7
)
return response.choices[0].message.content.strip()

Security and Authentication

By default, Managed Identity is enabled, so the app can securely authenticate to Azure resources via Microsoft Entra ID, with no secrets in config.
For local tests without Managed Identity, key-based authentication is possible by supplying credentials as environment variables.

Running Locally (optional)

python -m venv .venv && source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt

# Set environment variables for endpoints and deployments (plus keys if not using MI)

streamlit run app.py

Repository Structure

App code & Streamlit UI: /image_caption_app/
Infrastructure as code (Bicep): /image_caption_app/infra/

Extension Ideas

Add object detection, OCR, or brand detection to enhance prompts for captioning.
Save images and metadata to Blob Storage and Cosmos DB; build a gallery feature.
Implement performance optimizations (caching, token usage tracking).
Hook up Application Insights for observability and monitoring.

Further Learning

Author: TulikaC

This post appeared first on “Microsoft Tech Community”. Read the entire article here