Content by vishnu charan tj (3)
Vishnu Charan TJ explains how streaming LLM weights directly from Azure Blob Storage into GPU memory with Run:AI Model Streamer can cut inference cold-start times by up to ~6x, reducing idle GPU spend and improving autoscaling behavior for vLLM and SGLang deployments.
Vishnu Charan TJ explains the latest enhancements in adlfs, empowering data professionals to efficiently connect Python-based AI and ML workloads to Azure Blob and Data Lake Storage, with real-world framework integrations and best practices.
Vishnu Charan TJ details the general availability of network security perimeters for Azure Storage, showing how centralized network controls can secure PaaS resources and prevent data exfiltration.
End of content