320: AWS Cost MCP: Your Billing Data Now Speaks Human
Manage episode 505694454 series 3680004
Welcome to episode 320 of The Cloud Pod, where the forecast is always cloudy! Justin, Matt, and Ryan are coming to you from Justin’s echo chamber and bringing all the latest in AI and Cloud news, including updates to Google’s Anti-trust case, AWS Cost MCP, new regions, updates to EKS, Veo, and Claude, and more! Let’s get into it.
Titles we almost went with this week:
- Breaking Bad Bottlenecks: AWS Cooks Up Faster Container Pulls
- The Bucket List: Finding Your Lost Storage Dollars
- State of Denial: Terraform Finally Stops Saving Your Passwords
- Three Stages of Azure Grief: Development, Preview, and Launch
- Ground Control to Major Cloud: Microsoft Launches Planetary Computer Pro
- Veo Vidi Vici: Google Conquers Video Editing
- Red Alert: AWS Makes Production Accounts Actually Look Dangerous
- Amazon EKS Discovers the F5 Key
- Chaos Theory Meets ChatGPT: When Your Reliability Data Gets an AI Therapist
- Breaking Bad (Services): How AI Helps You Find What’s Already Broken
- Breaking Up is Hard to Cloud: Gemini Moves Back In
- Intel Inside Your Secrets: TDX Takes Over Google Cloud
- Lord of the Regions: The Return of the Kiwi
- All Blacks and All Stacks: AWS Goes Full Kiwi
- Azure Forecast: 100% Chance of Budget Alert Storms
- Google Keeps Its Cloud Together: A $2.5T Near Miss
- Shell We Dance? AWS Makes CLI Scripting Less Painful
- AWS Finally Admits Nobody Remembers All Those CLI Commands
- Cache Me If You Claude
- Your AWS Console gets its Colors, just don’t choose red shirts
- Amazon Q walks into a bar, Tells MCP to order it a beer.. The Bartender sighs and mutters “at least chatgpt just hallucinates its beer”
- Ryan’s shitty scripts now as a AWS CLI Library
A big thanks to this week’s sponsor:
We’re sponsorless! Want to get your brand, company, or service in front of a very enthusiastic group of cloud news seekers? You’ve come to the right place! Send us an email or hit us up on our Slack channel for more info.
General News
00:57 Google Dodges A 2.5t Breakup
- We have breaking news – and it’s good news for Google.
- Google successfully avoided a potential $2.5 trillion breakup following antitrust proceedings, maintaining its current corporate structure despite regulatory pressure.
- The decision represents a significant outcome for Big Tech antitrust cases, potentially setting a precedent for how regulators approach market dominance issues in the cloud and technology sectors.
- Cloud customers and partners can expect business continuity with Google Cloud Platform services, avoiding potential disruptions that could have resulted from a corporate restructuring.
- The ruling may influence how other major cloud providers structure their businesses and approach regulatory compliance, particularly around bundling services and market competition.
- Enterprise customers relying on Google’s integrated ecosystem of cloud, advertising, and productivity tools can continue their current architectures without concerns about service separation.
- You just KNOW Microsoft is super mad about this.
AI Is Going Great – Or How ML Makes Money
02:16 Introducing GPT-Realtime
- OpenAI‘s GPT-Realtime introduces real-time processing capabilities to GPT models, reducing latency for interactive applications and enabling more responsive AI experiences in cloud environments.
- The technology leverages optimized model inference and architectural changes to deliver sub-second response times, making it suitable for live customer service, real-time translation, and interactive coding assistants.
- Cloud providers can integrate GPT-Realtime through new API endpoints, offering developers the ability to build applications that require immediate AI responses without traditional batch processing delays.
- This development addresses a key limitation in current LLM deployments where response latency has restricted use cases in time-sensitive applications like live streaming, gaming, and financial trading systems.
- For businesses running AI workloads in the cloud, GPT-Realtime could reduce infrastructure costs by eliminating the need for pre-processing queues and enabling more efficient resource utilization through streaming inference.
02:58 Matt – “More AI scam calling coming your way.”
Cloud Tools
04:14 Terraform provider for Google Cloud 7.0 is now GA
- Terraform Google Cloud provider 7.0 introduces ephemeral resources and write-only attributes that prevent sensitive data, such as access tokens and passwords, from being stored in state files, addressing a major security concern for infrastructure teams.
- The provider now supports over 800 resources and 300 data sources with 1.4 billion downloads, making it one of the most comprehensive infrastructure-as-code tools for Google Cloud Platform management.
- New validation logic catches configuration errors during Terraform plan rather than apply, providing fail-fast behavior that makes deployments more predictable and reduces failed infrastructure changes.
- Breaking changes in 7.0 align the provider with Google Cloud’s latest APIs and mark functionally required attributes as mandatory in schemas, requiring teams to review upgrade guides before migrating from version 6.
- The ephemeral resource feature leverages Terraform 1.10+ capabilities to handle temporary credentials, such as service account access tokens, without exposing state file attributes (write-only). This solves the long-standing problem of secret management in GitOps workflows.
05:19 Ryan – “I like the ephemeral resources; I think it’s a neat model for handling sensitive information and stuff you don’t want to store. It’s kind of a neat process.”
06:50 How to get fast, easy insights with the Gremlin MCP Server
- Gremlin’s MCP Server connects chaos engineering data to LLMs like ChatGPT or Claude, enabling teams to query their reliability testing results using natural language to uncover insights about service dependencies, test coverage gaps, and which services to test next.
- The server architecture consists of three components: the LLM client, a containerized MCP server that interfaces with Gremlin’s API, and the Gremlin API itself – designed for read-only operations to prevent accidental system damage during data exploration.
- This solves the problem of making sense of complex reliability testing data by allowing engineers to ask plain English questions like “Which of my services should I test next?” Instead of manually analyzing test results and metrics.
- The tool requires a Gremlin account with REST API key, an AI interface that supports MCP servers like Claude Desktop, and Node.js 22+ – making it accessible to teams already using Gremlin for chaos engineering.
- During internal beta testing at Gremlin, the MCP server helped uncover production-impacting bugs before release, demonstrating its practical value for improving service reliability through AI-assisted data analysis.
07:38 Ryan – “It’s amazing they limited this to read-only commands, the API. I don’t know why they did that…it’s kind of neat to see the interaction model with different services.”
AWS
09:21 Introducing Seekable OCI Parallel Pull mode for Amazon EKS | Containers
- AWS introduces SOCI Parallel Pull mode for EKS to address container image pull bottlenecks, particularly for AI/ML workloads where images can exceed 10GB and take several minutes to download using traditional methods.
- The feature parallelizes both the download and unpacking phases, utilizing multiple HTTP connections per layer for downloads and concurrent CPU cores for unpacking, to achieve up to 60% faster pull times compared to standard containerd configurations.
- SOCI Parallel Pull is built into recent Amazon EKS Optimized AMIs for Amazon Linux 2023 and Bottlerocket, with configurable parameters for download concurrency (recommended 10-20 for ECR), chunk size (16MB recommended), and unpacking parallelism based on your instance resources.
- The solution trades reduced pull times for higher network, CPU, and storage utilization, requiring optimized EBS volumes with 1000 MiB/s throughput or instance store NVMe disks for optimal performance on instances like m6i.8xlarge.
- This directly impacts deployment responsiveness and cluster scaling operations, with container startup time reductions from nearly 2 minutes to 45 seconds for a 10GB Deep Learning Container, making it particularly valuable for organizations running large-scale AI/ML workloads on EKS.
- What Matt was remembering: https://aws.amazon.com/about-aws/whats-new/2023/11/aws-fargate-amazon-ecs-tasks-selectively-leverage-soci/
10:24 Justin – “I personally don’t use all the CPU memory or the network of most of my container instances. So yes, that’s a willing trade-off I’m willing to make.”
13:13 AWS Management Console now supports assigning a color to an AWS account for easier identification
- AWS Management Console now allows admins to assign colors to accounts (like red for production, yellow for testing) that appear in the navigation bar, replacing the need to memorize account numbers for identification across multi-account environments.
- The feature addresses a common pain point for organizations managing multiple AWS accounts for different workloads, business units, or environments by providing instant visual differentiation when switching between accounts.
- Implementation requires admin privileges to set colors through the Account menu, and users need either the AWSManagementConsoleBasicUserAccess managed policy or the custom uxc:getaccountcolor permission to view the assigned colors.
- This quality-of-life improvement reduces the risk of accidental changes in the wrong environment and speeds up context switching for engineers and operators who regularly work across multiple AWS accounts.
- The feature is available now in all public regions at no additional cost, representing AWS’s continued focus on console usability improvements for enterprise customers managing complex multi-account architectures.
14:57 Matt – “I use it for Chrome and that’s always where I’ve identified different users depending on where it was, I kind of like it where it’s something that can be set.”
17:07 AWS Transfer Family introduces Terraform support for deploying SFTP connectors
- AWS Transfer Family now supports Terraform deployment for SFTP connectors, enabling Infrastructure as Code automation for file transfers between S3 and remote SFTP servers. This extends beyond the existing SFTP server endpoint support to include the connector functionality.
- SFTP connectors provide fully managed, low-code file copying between S3 and remote SFTP servers, and the new Terraform module allows programmatic provisioning with dependencies and customizations in a single deployment.
- The module includes end-to-end examples for automating file transfer workflows using schedule or event triggers, eliminating manual configuration errors and providing repeatable, scalable deployments.
- This addresses a common enterprise need for automated file transfers between cloud storage and legacy SFTP systems, particularly useful for organizations migrating to the cloud or maintaining hybrid architectures.
- The Terraform module is available on GitHub at github.com/aws-ia/terraform-aws-transfer-family with documentation at registry.terraform.io/modules/aws-ia/transfer-family/aws/latest.
18:57 Ryan – “You know you’re getting deep into enterprise orchestration in terms of your customer base when you’re doing stuff like this, because this is ROUGH. “
19:20 Amazon EKS introduces on-demand insights refresh
- Amazon EKS now allows on-demand refresh of cluster insights, letting customers immediately verify if their applied recommendations and configuration changes have taken effect instead of waiting for periodic automatic checks.
- This feature addresses a key pain point during Kubernetes upgrades by providing instant feedback on whether required changes have been properly implemented, reducing the time between making changes and validating them.
- The insights system checks for issues like deprecated APIs before version upgrades and provides specific remediation steps, with the refresh capability now available in all commercial AWS regions.
- For DevOps teams managing multiple EKS clusters, this eliminates the guesswork and waiting periods during maintenance windows, particularly useful when performing rolling upgrades across environments.
- The feature integrates with existing EKS cluster management workflows at no additional cost, accessible through the EKS console or API as documented at docs.aws.amazon.com/eks/latest/userguide/cluster-insights.html.
20:41 Amazon Q Developer now supports MCP admin control
- Amazon Q Developer adds centralized admin control for Model Context Protocol (MCP) servers, allowing organizations to enable or disable MCP functionality across all Q Developer clients from the AWS console.
- The feature provides session-level enforcement, checking admin settings at startup and every 24 hours during runtime, ensuring consistent policy application across VSCode, JetBrains, Visual Studio, Eclipse, and the Q Developer CLI.
- Organizations gain granular control over external resource access through MCP servers, addressing security concerns by preventing users from adding unauthorized servers when the functionality is disabled.
- This update positions Q Developer as a more enterprise-ready AI coding assistant by giving IT administrators the governance tools needed to manage AI-powered development environments at scale.
- The control mechanism operates at no additional cost and integrates with existing Q Developer subscriptions, making it immediately available to current enterprise customers without deployment overhead.
21:33 Ryan – “This future is going to be a little weird, you know, as we sort it out. You think about like chatbots and being able to sort of create infrastructure there and then, kind of bypassing a lot of the permissions and stuff. This is kind of the same problem, but magnified a lot more. And so like, it’s going to be interesting to see how companies adapt.”
22:48 Introducing Amazon EC2 I8ge instances
- AWS launches I8ge instances with Graviton4 processors delivering 60% better compute performance than previous Graviton2 storage-optimized instances, plus 120TB of local NVMe storage – the highest density among Graviton-based storage instances.
- The new third-generation AWS Nitro SSDs provide 55% better real-time storage performance per TB with 60% lower I/O latency compared to I4gn instances, making them ideal for latency-sensitive workloads like real-time databases and streaming analytics.
- I8ge instances scale up to 48xlarge with 1,536 GiB memory and offer 300 Gbps networking bandwidth – the highest among storage-optimized EC2 instances – addressing the needs of data-intensive applications requiring both storage density and network throughput.
- Currently available only in US East (Ohio), US East (N. Virginia), and US West (Oregon), limiting deployment options for global workloads compared to other EC2 instance families.
- The combination of high storage density, improved I/O performance, and Graviton4 efficiency positions these instances for cost-effective deployment of search clusters, time-series databases, and real-time analytics platforms that previously required multiple instances or external storage.
PLUS
New general-purpose Amazon EC2 M8i and M8i Flex instances are now available | AWS News Blog
- AWS launches M8i and M8i-Flex instances with custom Intel Xeon 6 processors running at 3.9 GHz all-core turbo, delivering up to 15% better price-performance and 2.5x memory bandwidth compared to M7i generation.
- M8i-Flex offers a 5% lower price point for workloads that don’t need sustained CPU performance, reaching full CPU performance 95% of the time while maintaining compatibility with existing applications.
- Performance gains include 60% faster NGINX web serving, 30% faster PostgreSQL database operations, and 40% faster AI deep learning recommendation models compared to the previous generation.
- New sixth-generation AWS Nitro Cards provide 2x network and EBS bandwidth with configurable 25% allocation adjustments between network and storage, improving database query processing and logging speeds.
- Available in 4 regions (US East Virginia/Ohio, US West Oregon, Europe Spain) with sizes up to 384 vCPUs and 1.5TB memory, including bare metal options and SAP certification for enterprise workloads.
29:30 Now Open — AWS Asia Pacific (New Zealand) Region | AWS News Blog
- AWS launches its 38th global region in New Zealand (ap-southeast-6) with three availability zones, representing a NZD 7.5 billion investment that’s expected to contribute NZD 10.8 billion to New Zealand’s GDP and create 1,000 jobs annually.
- The region addresses data residency requirements for New Zealand organizations and government agencies operating under the country’s cloud-first policy, with AWS supporting 143 security standards, including PCI DSS, HIPAA, and GDPR compliance certifications.
- New Zealand customers like MATTR, Xero, and Thematic are already leveraging AWS services, including Amazon Bedrock for generative AI applications, with the region powered by renewable energy through an agreement with Mercury New Zealand from day one.
- AWS has been building infrastructure in New Zealand since 2013, including CloudFront edge locations, an Auckland Local Zone for single-digit millisecond latency, and Direct Connect locations, with this full region launch completing their local infrastructure footprint.
- The launch brings AWS to 120 Availability Zones across 38 regions globally, with strong local partner ecosystem support from companies like Custom D, Grant Thornton Digital, MongoDB, and Parallo serving New Zealand customers.
30:54 Announcing a new open source project for scenario-focused AWS CLI scripts
- AWS launched an open source project providing tested shell scripts for over 60 AWS services, addressing the common challenge of writing error-handling and cleanup logic when using the AWS CLI for infrastructure automation.
- The AWS Developer Tutorials project on GitHub includes end-to-end scripts with built-in resource tracking and cleanup operations, reducing the time developers spend debugging CLI commands and preventing orphaned resources.
- Developers can generate new scripts in as little as 15 minutes using generative AI tools like Amazon Q Developer CLI, leveraging existing documentation to create working scripts through an iterative test-and-improve process.
- Each script comes with tutorials explaining the AWS service API interactions, making it easier for teams to understand and modify scripts for their specific use cases rather than starting from scratch.
- The project accepts community contributions and provides instructions for generating new scripts, potentially building a comprehensive library of production-ready CLI automation patterns across AWS services.
- We hereby nominate Ryan’s shitty scripts to the community as a contribution. You’re welcome, world.
31:56 Ryan – “I will definitely give it a look. It’s kind of strange, because most of the contributions right now are very specific to tutorials, like trying to learn a new Amazon service, and there’s very little documentation on what error handling and advanced sorts of logic are built into these scripts. All of the documentation is just directing you at Q and say, Hey Q, build me a thing that looks like that.”
33:15 Simplified Cache Management for Anthropic’s Claude models in Amazon Bedrock
- Amazon Bedrock simplifies prompt caching for Claude models by automatically identifying and reusing the longest previously cached prefix, eliminating manual cache point management for developers using Claude 3.5 Haiku, Claude 3.7, and Claude 4.
- The update reduces token consumption and costs since cache read tokens don’t count toward token per minute (TPM) quotas, making multi-turn conversations and research assistants more economical to operate.
- Developers now only need to set a single cache breakpoint at the end of their request instead of tracking multiple cache segments, significantly reducing implementation complexity for applications with repetitive context.
- This feature addresses a common pain point in LLM applications where repeated context (like system prompts or document analysis) previously required manual cache management logic that was error-prone and time-consuming.
- Available immediately in all regions supporting these Claude models on Bedrock, with implementation details in the Amazon Bedrock Developer Guide for teams looking to optimize their existing Claude deployments.
34:07 Ryan – “I’m just really glad I don’t have to create any applications that need to be this focused on token usage. It sounds painful.”
GCP
35:02 Google Workspace announces new gen AI features and a no-cost option for Vids
- Google Vids now includes generative AI capabilities powered by Veo 3 that can transform static images into short videos, available to paid Workspace customers and Google AI Pro/Ultra subscribers.
- This positions Google against competitors like Microsoft’s Clipchamp and Adobe’s AI video tools by integrating video creation directly into the productivity suite.
- The basic Vids editor without AI features launches as a no-cost option for consumers, marking Google’s first free video editing tool within Workspace. This creates a clear freemium model where basic editing is free, but AI-powered features like avatars and automatic transcript trimming require paid subscriptions.
- The Veo 3 integration represents Google’s latest attempt to embed its foundational AI models across productivity tools, similar to how Gemini powers other Workspace features.
- This could benefit marketing teams, educators, and content creators who need quick video content from existing image assets.
- The feature addresses the growing demand for video content in business communications and training materials, where users often have images but lack video production skills or resources. The automatic transcript trim feature particularly targets corporate training and documentation use cases.
- Pricing remains tied to existing Workspace tiers rather than separate charges, making it accessible to current enterprise customers without additional procurement processes. The instructional “Vids on Vids” series suggests Google expects significant adoption and wants to reduce the learning curve.
- Expect shenanigans.
36:34 Gemini is now available anywhere | Google Cloud Blog
- Google now offers Gemini AI models on-premises through Google Distributed Cloud (GDC), allowing organizations with strict data sovereignty requirements to run advanced AI workloads in their own data centers without compromising security or compliance.
- The platform includes Gemini 2.5 Flash and Pro models, supports NVIDIA Hopper and Blackwell GPUs, and provides managed infrastructure with automatic scaling, load balancing, and confidential computing capabilities for both CPUs and GPUs.
- This positions Google against AWS Outposts and Azure Stack, but with a specific focus on AI workloads – offering a complete AI stack including Vertex AI services, pre-built agents, and support for custom models alongside Gemini.
- Key customers include Singapore government agencies (CSIT, GovTech, HTX) and KDDI in Japan, highlighting the appeal to the public sector and regulated industries that need AI capabilities while maintaining complete control over sensitive data.
- The offering comes in two variants: GDC air-gapped (now generally available) for completely isolated environments and GDC connected (in preview) for hybrid scenarios, though pricing details are not disclosed and require contacting Google directly, which means expensive. Don’t say we didn’t warn you.
38:18 Justin – “I 100% expect this is going to be very expensive. I mean, connected and managed Kubernetes for containers and VMs on a one-year half-depth ruggedized server is $415 per node per month with a five-year commitment.”
39:41 Container-optimized compute delivers autoscaling for Autopilot | Google Cloud Blog
- GKE Autopilot’s new container-optimized compute platform delivers up to 7x faster pod scheduling by using dynamically resizable VMs and pre-provisioned compute capacity that doesn’t impact billing since customers only pay for requested resources.
- The platform addresses a common pain point where autoscaling could take several minutes, forcing users to implement costly workarounds like balloon pods to hold unused capacity for rapid scaling scenarios.
- Built-in high-performance HPA profile provides 3x faster calculations and supports up to 1000 HPA objects, making it particularly suitable for web applications and services requiring gradual scaling with 2 CPU or less.
- Available in GKE Autopilot 1.32 or later with the general-purpose compute class, though not recommended for one-pod-per-node deployments or batch workloads.
- This positions GKE competitively against EKS and AKS by solving the cold start problem for containerized workloads without requiring manual capacity planning or paying for idle resources.
40:38 Ryan – “Imagine my surprise when I found out that using GKE autopilot didn’t handle node-level cold start. It was so confusing, so I was like, wait, what? Because you’ve been able to do that on EKS for so long. I was confused. Why do I need to care about node provisioning and size when I have zero access or really other interactions at that node level using autopilot? So it is kind of strange, but glad to see they fixed it.”
41:23 From clicks to clusters: Confidential Computing expands with Intel TDX |Google Cloud Blog
- Google expands Confidential Computing with Intel TDX across multiple services, including Confidential VMs, GKE Nodes, and Confidential Space, now available in 10 regions with 21 zones.
- The technology creates hardware-isolated trust domains that encrypt workloads in memory during processing, addressing the security gap beyond traditional at-rest and in-transit encryption.
- Confidential VMs with NVIDIA H100 GPUs on A3 instances combine Intel TDX for CPU protection with NVIDIA Confidential Computing for GPU security, enabling secure AI/ML workloads during training and inference.
- Available in three zones (europe-west4-c, us-central1-a, us-east5-a) with the a3-highgpu-1g machine type.
- Confidential GKE Nodes with Intel TDX work on both GKE Standard and Autopilot without code changes, allowing containerized workloads to remain encrypted in memory. Configuration can be set at the cluster or node pool level via CLI, API, UI, or Terraform.
- Confidential Space now supports Intel TDX hardware in addition to AMD, enabling multi-party data collaboration and federated learning use cases. Customers like Symphony and Duality use it for isolating customer data from privileged insiders and privacy-preserving ML, respectively.
- Intel’s Tiber Trust Authority attestation service now offers a free tier for third-party verification of Confidential VMs and Confidential Space workloads. This provides stronger separation of duties and security guarantees beyond Google’s built-in attestation.
43:07 Eventarc Advanced orchestrates complex microservices environments | Google Cloud Blog
- Eventarc Advanced is now GA, evolving from Eventarc Standard to handle complex event-driven architectures with centralized message bus management, real-time filtering and transformation, and multi-format payload support (Avro, JSON, Protobuf). This positions GCP competitively against AWS EventBridge and Azure Event Grid by offering built-in transformation capabilities and Envoy-based routing.
- The service introduces a Publish API for ingesting custom and third-party messages in CloudEvents format, enabling organizations to connect existing systems without major refactoring. The centralized message bus provides per-message fine-grained access control and integrates with Cloud Logging for observability.
- Key use cases include large-scale microservices orchestration, IoT data streaming for AI workloads, and hybrid/multi-cloud deployments where event routing across different environments is critical. The example order processing system demonstrates practical filtering (routing new orders to notification services) and transformation (high-value orders to fraud detection).
- Future integration with Service Extensions will allow custom code insertion into the data path, and planned Model Armor support suggests Google is positioning this for AI agent communication scenarios. This aligns with GCP’s broader push into AI infrastructure and agentic architectures.
- While pricing details aren’t provided in the announcement, the serverless nature suggests pay-per-use pricing similar to other GCP eventing services. Organizations should evaluate whether the advanced features justify potential cost increases over Eventarc Standard for their specific use cases.
44:20 Ryan – “So OpenAI is going for real-time inference, and Google is going to be event-based. It seems like two very different directions. I like the event-driven architecture; it’s something I continue to use in most of the apps that I’m developing and creating. I think that having the ability to do something at a larger scale and coordinating across an entire business is pretty handy.”
Azure
45:22 Agent Factory: Top 5 agent observability best practices for reliable AI | Microsoft Azure Blog
- Azure AI Foundry introduces comprehensive agent observability capabilities that extend beyond traditional metrics, logs, and traces to include AI-specific evaluations and governance features for monitoring autonomous AI agents throughout their lifecycle.
- The platform provides built-in agent evaluators that assess critical behaviors like intent resolution, task adherence, tool call accuracy, and response completeness, with seamless integration into CI/CD pipelines through GitHub Actions and Azure DevOps extensions.
- Azure’s AI Red Teaming Agent automates adversarial testing to identify security vulnerabilities before production deployment, simulating attacks on both individual agents and complex multi-agent workflows to validate production readiness.
- The solution differentiates from traditional observability tools by addressing the non-deterministic nature of AI agents, offering model leaderboards for selection, continuous evaluation capabilities, and integration with Azure Monitor for real-time production monitoring with customizable dashboards and alerts.
- Enterprise customers like EY, Accenture, and Veeam are already using these features to ensure their AI agents meet quality, safety, and compliance standards, with particular emphasis on regulatory frameworks like the EU AI Act through integrations with Microsoft Purview, Credo AI, and Saidot.
47:31 Matt – “It just feels like we’re saying it’s this revolutionary thing, but really it’s something we have to approach from a slightly different angle. It’s the difference between, hey, we have an API and now we have a UI, and users can do things slightly differently… It’s just the evolution of a tool.”
49:04 Generally Available: Azure App Service – New Premium v4 Offering
- Azure App Service Premium v4 brings NVMe local storage and memory-optimized configurations to both Windows and Linux workloads, addressing performance bottlenecks for I/O-intensive applications like content management systems and e-commerce platforms.
- The new tier runs on Azure’s latest hardware with faster processors, positioning it competitively against AWS’s compute-optimized instances and GCP’s N2 series while maintaining App Service’s PaaS simplicity.
- Starting configurations at 1 vCPU and 4GB RAM make Premium v4 accessible for smaller production workloads that need enhanced performance without jumping to dedicated VM solutions.
- This release signals Microsoft’s continued investment in App Service as enterprises increasingly adopt PaaS for mission-critical applications, particularly those requiring consistent low-latency performance.
- Premium v4 fills the gap between standard App Service tiers and isolated environments, giving customers a middle-ground option for applications that need better performance but don’t require full network isolation.
52:47 Public Preview: Microsoft Planetary Computer Pro
- Microsoft Planetary Computer Pro enters public preview as a geospatial data platform that ingests, manages, and disseminates location-based data for enterprise Data & AI workflows, targeting organizations that need to process satellite imagery and environmental datasets at scale.
- The platform integrates with Azure’s existing data services to accelerate geospatial insights, positioning Microsoft to compete with AWS’s Earth on AWS and Google Earth Engine by offering enterprise-grade tools for climate modeling, agriculture monitoring, and urban planning applications.
- Key capabilities include streamlined data ingestion pipelines for various geospatial formats and built-in processing tools that reduce the complexity of working with petabyte-scale Earth observation data.
- Target customers include government agencies, environmental organizations, and enterprises in agriculture, insurance, and logistics sectors that require planetary-scale data analysis for decision-making.
- While pricing details aren’t provided in the preview announcement, the platform likely follows Azure’s consumption-based model, with costs scaling based on data storage, compute resources, and API calls for geospatial processing.
53:55 Matt – “I just want to play with the satellites.”
54:24 Microsoft cloud customers hit by messed-up migration • The Register
- Microsoft’s migration from MOSP to the Microsoft Customer Agreement caused incorrect cost calculations that triggered false budget alerts, with some customers seeing forecast increases of over 1000% despite no actual billing impact.
- Those poor Finops people.
- The incident highlights risks in Azure’s account migration processes where automated systems can send panic-inducing alerts even when actual invoices remain unaffected, creating unnecessary administrative burden.
- Microsoft’s support response drew criticism as users reported difficulty reaching human support and some claimed their forum comments were being deleted, raising questions about Azure’s customer communication during service disruptions.
- This follows other recent Azure security and operational issues, including Storm-0501 ransomware attacks and Pentagon concerns about China-based support staff, suggesting potential systemic challenges in Azure’s operational management.
- For cloud architects, this emphasizes the importance of understanding the difference between forecast alerts and actual billing, and maintaining direct billing verification processes rather than relying solely on automated notifications.
56:26 Generally Available: Azure Ultra Disk Price Reduction
- Azure Ultra Disks now cost less in multiple regions, making sub-millisecond latency storage more accessible for demanding enterprise workloads like SAP HANA, SQL Server, and Oracle databases.
- Ultra Disks deliver up to 160,000 IOPS and 4,000 MB/s throughput per disk with consistent performance, positioning them as Azure’s answer to AWS io2 Block Express and GCP Extreme Persistent Disks.
- The price reduction targets performance-critical applications where storage latency directly impacts business operations, though specific discount percentages weren’t disclosed in the announcement.
- This regional pricing strategy suggests Microsoft is testing market response before potentially expanding discounts to other regions, following similar patterns seen with premium storage tiers.
- Enterprise customers running latency-sensitive workloads should evaluate whether migrating to Central US for Ultra Disk deployments offers meaningful cost savings compared to their current storage configurations.
Closing
And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod
312 episodes