Google Next 2026 GKE - 01 Overview, The Big Picture - Google Cloud 프리미어 파트너 메가존소프트

1편 개관: Next 2026 GKE의 큰 그림

Google Cloud Next 2026에서 구글 쿠버네티스 엔진(GKE)은 AI/ML 인프라 자동화를 한 축으로 묶어 발표했습니다. 추론 서빙, 학습 가속, 워크로드 격리, 클러스터 간 통신까지 각 영역에 신기능이 들어왔습니다. 본 글은 시리즈 인덱스로, 이후 본편들이 다룰 신기능 10개를 추론 인프라, 학습 인프라, 보안과 격리, 네트워킹 4축으로 정리합니다.

본편 구성

추론 인프라 축은 대규모 언어 모델(LLM) 서빙 경로의 병목을 게이트웨이, 메모리, 시작 시간, 오토스케일링 단계로 나눠 잡습니다. 학습 인프라 축은 가속기 통신, 스토리지, 강화학습 워크로드 패턴을 묶습니다. 보안과 격리 축은 워크로드 샌드박스와 인프라 봉인 두 결로 갈리고, 네트워킹 축은 외전에서 Cloud Service Mesh의 Ambient 진화를 다룹니다.

그림 1. 본 시리즈가 다루는 9개 신기능을 4개 축으로 정리한 지도.

분류	본편 번호	핵심 키워드
추론 인프라	2편	GKE Inference Gateway, KV 캐시 활용 라우팅, Model Armor
추론 인프라	3편	KV 캐시 티어링, paged attention, 메모리 계층화
추론 인프라	4편	Image Streaming, 의도 기반 오토스케일링, ProvisioningRequest
학습 인프라	5편	NCCL, RDMA over Converged Ethernet, Rail-aligned, A4X
학습 인프라	6편	Hyperdisk ML, Managed Lustre, Cloud Storage FUSE, Run:ai Model Streamer
학습 인프라	7편	강화학습 워크로드, OpenTelemetry, golden signals
보안과 격리	8편	Agent Sandbox, gVisor, Hypercluster, sealed configuration
보안과 격리	8편	TEE, Titanium Intelligence enclave, NVIDIA Confidential Computing
네트워킹	외전	Cloud Service Mesh, Ambient, sidecar-less, East-West

추론: 게이트웨이부터 오토스케일링까지

2편의 GKE Inference Gateway는 키-값 캐시(KV 캐시) 사용률 같은 모델 서버 메트릭을 보고 라우팅하며, LoRA 어댑터별 어피니티와 Model Armor 정책까지 한 곳에서 처리합니다. 3편은 그 KV 캐시를 그래픽 처리 장치(GPU) 메모리, 호스트 메모리, 디스크로 계층화하는 이야기로, paged attention과 prefix caching이 함께 놓입니다. 4편은 추론 Pod의 부팅 시간을 줄입니다. Image Streaming이 컨테이너 이미지를 스트리밍하고, 클러스터 오토스케일러는 ProvisioningRequest CRD로 의도 기반 용량 요청을 받습니다.

학습: 가속기, 스토리지, 강화 학습

5편은 AI Hypercomputer의 GPU 네트워킹입니다. A4X와 A3 Ultra가 원격 직접 메모리 액세스(RDMA) over Converged Ethernet 위에서 GPUDirect RDMA를 쓰고, Network Collective Communication Library(NCCL)와 Rail-aligned 토폴로지가 sub-block, block, cluster 계층으로 펼쳐집니다. 6편은 모델 가중치 로딩 스토리지로, Cloud Storage FUSE, Managed Lustre, Hyperdisk ML, Run:ai Model Streamer를 비교합니다. 7편은 강화학습 워크로드를 OpenTelemetry로 관측합니다. golden signals(지연, 트래픽, 에러, 포화도)를 rl.loop.duration, rl.train.mfu 같은 메트릭으로 잡습니다.

보안: 워크로드 격리와 인프라 봉인

보안 축은 8편 한 편이지만 두 결로 나뉩니다. GKE Agent Sandbox는 워크로드 단위 격리입니다. gVisor 기반 커널 격리와 1초 미만 warm pool 프로비저닝으로 LLM이 생성한 코드를 실행합니다. GKE Hypercluster는 인프라 단위 봉인입니다. linked runner가 별도 가상 사설 클라우드(VPC)와 OS 이미지를 쓰고, sealed configuration에서는 Tensor Processing Unit(TPU)의 Titanium Intelligence enclave와 GPU의 NVIDIA Confidential Computing으로 신뢰 실행 환경(TEE)을 구성합니다.

네트워킹: Ambient로 진화한 service mesh

외전은 Cloud Service Mesh가 East-West 트래픽을 책임지는 방식을 봅니다. Next 2026에서는 sidecar-less, 즉 Ambient 모드가 강조되었습니다. Pod마다 Envoy를 붙이지 않고도 mesh의 라우팅과 보안을 적용하는 방향입니다.

이 시리즈를 어떻게 읽을까

추론을 운영 중이라면 2~4편을, 학습 클러스터를 설계 중이라면 5~7편을, 보안 모델을 점검 중이라면 8편을, 네트워킹 표준을 검토 중이라면 외전을 우선 펼치시기 바랍니다. 각 본편 끝에서 두 번째 섹션 “본 시리즈와의 연결”이 인접 편으로 향하는 지도를 다시 제공합니다.

참고 자료

Google Cloud, “AI infrastructure at Next 26”, https://cloud.google.com/blog/ko/products/compute/ai-infrastructure-at-next26/
Google Cloud, “Google Cloud Next 2026 wrap-up”, https://cloud.google.com/blog/topics/google-cloud-next/google-cloud-next-2026-wrap-up
Google Cloud Documentation, “About AI/ML model inference on GKE”, https://docs.cloud.google.com/kubernetes-engine/docs/concepts/machine-learning/inference
Google Cloud Documentation, “Best practices for optimizing large language model inference with GPUs on GKE”, https://docs.cloud.google.com/kubernetes-engine/docs/best-practices/machine-learning/inference/llm-optimization
Google Cloud Documentation, “Use Image streaming to pull container images”, https://docs.cloud.google.com/kubernetes-engine/docs/how-to/image-streaming
Google Cloud Documentation, “About GKE cluster autoscaling”, https://docs.cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler
Google Cloud Documentation, “GPU networking overview”, https://docs.cloud.google.com/ai-hypercomputer/docs/networking-overview
Google Cloud Documentation, “GKE storage overview”, https://docs.cloud.google.com/kubernetes-engine/docs/concepts/storage-overview
Google Cloud Documentation, “Monitor reinforcement learning workloads on GKE”, https://docs.cloud.google.com/kubernetes-engine/docs/tutorials/monitor-reinforcement-learning-workloads
Google Cloud Documentation, “About GKE Agent Sandbox”, https://docs.cloud.google.com/kubernetes-engine/docs/concepts/machine-learning/agent-sandbox
Google Cloud Documentation, “About GKE Hypercluster”, https://docs.cloud.google.com/kubernetes-engine/docs/concepts/hypercluster-overview
Google Cloud Documentation, “GKE Service networking overview”, https://docs.cloud.google.com/kubernetes-engine/docs/concepts/service-networking

자세한 내용이 궁금하시다면, 메가존소프트 문의포탈을 통해 궁금한 부분을 남겨주세요.

Google Next 2026 GKE – 01 Overview, The Big Picture