Browse Azure Community (393)
malaikanazim announces that Azure StandardV2 NAT Gateway now supports outbound ping using ICMP Echo Request/Reply, enabling basic reachability checks and faster troubleshooting for workloads that egress through NAT Gateway without needing per-VM public IPs or extra configuration.
yairgil explains how the Azure Copilot Observability Agent in Azure Monitor helps teams investigate AKS incidents by correlating metrics, logs, traces, Kubernetes events, and change history into an evidence-backed root-cause narrative with recommended next steps.
Karl-WE breaks down June 2026 changes to Azure Local licensing and the Azure Local Solutions hardware ecosystem, including new host fee tiers (S2D, external storage/SAN, and disconnected operations) and the shift from a 3-tier to 2-tier hardware catalog model. The post also clarifies key acronyms and support implications for existing deployments.
Efrat Ben Porat announces the general availability of dynamic thresholds for Azure Monitor log search alerts, which use machine learning to learn normal behavior from historical query results and automatically adapt alert thresholds over time. The post includes practical examples for AKS pod restart spikes and Azure Resource Graph inventory drift detection.
Efrat Ben Porat announces the general availability of Simple log alerts in Azure Monitor, a new alert type that evaluates each matching log row individually and supports Basic Logs—making it easier to keep lower-cost telemetry plans while still alerting quickly on important events.
azinh17 breaks down how Azure achieved a top MLPerf Training v6.0 result for Llama 3.1 405B, training at extreme scale across 8,192 GPUs. The post focuses on the cluster and network architecture choices—NVLink scale-up domains, Azure’s MRC fabric, and topology-aware parallelism mapping—that kept step time stable as the system scaled.
Anavi Nahar rounds up Azure Databricks announcements and sessions from Databricks Data + AI Summit 2026, focusing on tighter interoperability with Microsoft’s data stack (OneLake, ADLS) and governed access via Unity Catalog, plus new integrations like the Excel add-in, SharePoint ingestion, and OneLake catalog federation.
Jamesdld23 explains how to avoid the 230-second HTTP timeout in Azure Functions by splitting long-running sync work into an HTTP “request” function that enqueues a message and a queue-triggered function that performs the job, with practical PowerShell and Azure CLI examples plus Entra ID-based auth hardening.
yalavi explains how the Azure Copilot observability agent runs “deep investigations” to troubleshoot incidents by correlating telemetry across application, infrastructure, and platform layers, and by producing an evidence-backed narrative with clear mitigations rather than a single best-guess answer.
GeertVanTeylingen outlines a zero-copy pattern for making enterprise file data usable by modern AI and analytics platforms, using Azure NetApp Files as the system of record and Microsoft OneLake shortcuts to expose that data without migration or duplication.
GeertVanTeylingen explains how to build an enterprise RAG “knowledge pipeline” that can index and retrieve file-based content in place (no copy/migration) using Microsoft OneLake, Azure AI Search, and Azure OpenAI for embeddings and grounded answers with citations.
kinfey shows how to build a cloud-native evaluation harness for Azure AI Foundry skills using Foundry Hosted Agents, combining deterministic validators, an LLM judge that returns structured JSON, and a multi-turn adversarial attacker to catch regressions and compare models side by side.
RohitMadhavKrishnan introduces ArchAngel, an educational AI coding assistant designed to bring a team’s engineering standards directly into the IDE, so junior developers get constructive feedback while they write code. The post outlines the core idea, a reference architecture, and the Microsoft-centric stack used to ground guidance in “golden repos.”
BhaktiRath95 walks through common failure modes when running AI/ML inference workloads on Azure Container Apps, including slow model startup, probe timeouts, OOM kills, and GPU initialization problems. The post provides concrete probe settings, Python/FastAPI patterns, and Log Analytics queries to diagnose and fix issues methodically.
Dirk Brinkmann shows how to turn Azure Savings Plan recommendations into defensible, hour-by-hour data by exporting the underlying PAYG usage series and alternative commitment levels from the Azure Cost Management Benefit Recommendations API, using a companion PowerShell script that outputs CSV, Markdown, and JSON files.
viviandiec announces general availability of OpenTelemetry (OTel) Guest OS metrics for Azure VMs and Arc-enabled Servers, plus an updated Azure Monitor VM experience. The post explains what metrics are available, how OTel compares to Log Analytics-based metrics, and how to use PromQL and Grafana dashboards for troubleshooting at scale.
Sokuma announces the general availability of Service Level Indicators (SLIs) and Service Level Objectives (SLOs) in Azure Monitor, outlining how teams can track customer-experience reliability with SLI authoring, SLO tracking, error budgets, and burn rate–based alerting in a single Azure Monitor workflow.
Sokuma announces the general availability of Azure Monitor Metrics Export using data collection rules (DCRs), highlighting how to continuously stream platform metrics to Azure Storage, Event Hubs, or Log Analytics with multidimensional metrics support, metric-name filtering, and typical end-to-end latency of about three minutes.
Sunita_AZ0708 explains how to run Ansys Discovery on Azure using NVads V710 v5 GPU VMs, including a reference architecture, right-sizing guidance for fractional GPUs, and validation results across fluid, thermal, and structural simulation scenarios.
Rafia Aqil explains how to diagnose and respond when Azure Databricks clusters can’t start or scale due to Azure regional VM capacity constraints, including what to send to Microsoft support, which VM families to switch to, and longer-term design choices like instance pools, serverless compute, and multi-region deployments.
ShubhamSachdeva99 explains how to switch built-in connector connections at runtime in Azure Logic Apps Standard by making the service provider action’s connectionName dynamic, enabling a single workflow to route to different SFTP/SQL/Service Bus endpoints per team or environment.
TulikaC introduces new Azure CLI commands for listing and viewing Azure App Service for Linux startup logs, making it easier to diagnose container initialization issues, runtime startup failures, warmup probe problems, and slot-specific startup behavior directly from the command line.
BhaktiRath95 breaks down why Azure Container Apps can feel “slow to start” in production, separating true cold starts from scaling delays and resource throttling. It includes concrete fixes like minReplicas tuning, KEDA rule adjustments, probe configuration, image-size reduction, and practical .NET and Django startup optimizations backed by Log Analytics and Application Insights queries.
j_folberth explains how to deploy Azure AI Foundry Hosted Agents directly from a source-code ZIP instead of a container image, including the deployment lifecycle, an azd-based workflow, and a reusable GitHub Action that posts to the Foundry data plane and polls until the new agent version becomes active.
Mahesh Sundaram announces a public preview in Azure Monitor that lets platform teams collect Azure resource platform logs at scale using Data Collection Rules (DCRs), replacing per-resource diagnostic settings with a centralized, policy-driven model that supports governance, cost control, and modern identity-based access.
Heather Poulsen shares an optimization playbook for running agentic AI workloads in production on Azure, focusing on keeping multi-agent orchestration reliable while controlling token costs and latency. It highlights practical techniques like inference routing, prompt compression, RAG tuning, caching, and FinOps-style capacity planning.
Heather Poulsen outlines a governance-first blueprint for building scalable agentic AI systems, focusing on how to embed consistent controls and quality checks across user interactions, agent orchestration, integrations, data, and models so systems can scale without losing trust and oversight.
Heather Poulsen shares an event session overview on designing Azure AI Landing Zones as a production-ready foundation for deploying AI applications and AI agents at scale, with guardrails for networking, identity, security, governance, and cost control using Microsoft’s recommended architecture frameworks.
Rafia_Aqil outlines a reference architecture for ingesting both streaming and batch data through Microsoft Fabric into Azure Databricks, using OneLake/ADLS and a medallion (Bronze/Silver/Gold) layout. The post breaks down five Fabric-to-Databricks integration paths and calls out security, governance, and monitoring considerations.
brauerblogs announces a two-day “Path to Production for Agents” webinar series (July 27–28) focused on moving agentic AI from prototypes to production, covering governance, landing-zone architecture, AgentOps practices, security risks like prompt injection, and cost/performance optimization with Azure Monitor and Microsoft Foundry.
BhaktiRath95 walks through common startup and deployment failures in Azure Container Apps and Container App Jobs for .NET and Django workloads, showing what the errors look like in logs, why they happen, and the concrete CLI, configuration, and code changes that fix them.
Mayunk Jain summarizes the Azure App Service announcements from Microsoft Build 2026, including a new “Easy AI experience” with built-in MCP, GA of Isolated v4 for App Service Environments, and Managed Instance improvements for modernizing legacy apps (including IIS) with better diagnostics and deployment workflows.
sunayanasingh explains how Azure Monitor now supports exemplars so teams can jump from Prometheus/OpenTelemetry metric spikes to the exact OpenTelemetry trace in Application Insights, using Azure Managed Grafana for visualization and trace linking.
Johnson Shi provides an operational guide to running a geo-replicated Azure Container Registry (ACR) for high availability, explaining how global endpoints, regional endpoints, and dedicated data endpoints behave during incidents, throttling, and DNS changes, with concrete Azure CLI steps for setup, routing control, and troubleshooting.
shashankamalladi announces General Availability of Network Security Perimeter (NSP) support for Azure Service Bus, including availability in Azure Government regions. The post explains how NSP provides a centralized security boundary with default-deny communication, explicit inbound/outbound rules, and diagnostic logging for audit and compliance.
jordanselig announces a public preview feature that lets Azure App Service expose an existing REST API as a Model Context Protocol (MCP) server using only an OpenAPI spec. The post covers how the platform generates MCP tools, how to configure it, and what to consider for authentication and safe exposure.
Sally Dabbah explains how to turn Synapse/ADF/Microsoft Fabric pipeline failures into structured, queryable telemetry by sending standardized failure events into Azure Monitor Log Analytics via the Logs Ingestion API and a Data Collection Rule, enabling KQL-based analysis, alerting, and reliability reporting across environments and datasets.
anandranjan explains a practical AKS pattern for keeping secret values out of YAML and CI/CD by using Azure Key Vault with the Secrets Store CSI Driver and AKS Workload Identity. It covers the identity flow, required AKS/Azure setup, workload onboarding YAML, and common troubleshooting points around federation, labels, mounts, and permissions.
Alex-wdy explains how Azure CLI 2.86.0+ speeds up slow enterprise-scale az login by skipping post-auth subscription enumeration across many tenants and subscriptions. The post introduces --skip-subscription-discovery (and --skip-sub), targeted --subscription on login, and when to use (or avoid) these flags.
amolravande explains how to run agent-generated Python safely by combining Agent Governance Toolkit (AGT) policy enforcement with Azure Container Apps Sandboxes, using per-session microVM isolation plus a fail-closed egress proxy to reduce the blast radius of untrusted code.