A Complete Guide on How to Deploy GPU-Based ECS Containers Application Using Cdk

--

Ever feel like deploying applications is like trying to solve a Rubik’s Cube while blindfolded? Well, strap in, because we’re diving into the mystical realm of GPU-based ECS containers. And because life is too short for dull technical guides, we’re adding humor and a sprinkle of sarcasm!

Today, we’ll explore setting up a GPU cluster, creating GPU containers, and auto-scaling them based on custom metrics, using AWS CDK. Grab your favorite beverage, and let’s get this party started!

Setting Up GPU Clusters: Where Titans Dwell

First things first, you need a solid foundation — a GPU cluster. Think of this as your tech mansion’s ground floor. If setting up an ECS cluster was easy, every developer would be doing it. Between managing VPCs, fiddling with security groups, and getting instance types right, you honestly might feel like you’re trying to pull a rabbit out of a hat.

Here’s how we did it painlessly:

Challenges:

  1. Instance Configuration: What instance type supports GPUs? (Hint: Not your average t2.micro)
  2. Security Groups: Handling network traffic without accidentally opening your cluster to the entire internet.
  3. User Data Scripts: Initializing instances with the correct configurations for ECS and GPU support.

Solution:

function createCluster(stack: cdk.Stack, props: any) {
const ecsSG = new ec2.SecurityGroup(stack, 'SecurityGroupEcsEc2', {
vpc: props.vpc,
allowAllOutbound: true,
});

const cluster = new ecs.Cluster(stack, 'EcsCluster', {
vpc: props.vpc,
containerInsights: true,
});

const launchTemplate = new ec2.LaunchTemplate(stack, 'LaunchTemplate', {
launchTemplateName: 'gpu-instance-template',
instanceType: new ec2.InstanceType('g4dn.5xlarge'),
machineImage: ecs.EcsOptimizedImage.amazonLinux2(ecs.AmiHardwareType.GPU),
userData: ec2.UserData.custom(`
#!/bin/bash
echo ECS_CLUSTER=${cluster.clusterName} >> /etc/ecs/ecs.config
echo "ECS_ENABLE_GPU_SUPPORT=true" >> /etc/ecs/ecs.config
`),
role: props.iamInstanceRole,
securityGroup: ecsSG,
blockDevices: [{
deviceName: '/dev/xvda',
volume: ec2.BlockDeviceVolume.ebs(200, {
encrypted: true,
volumeType: ec2.EbsDeviceVolumeType.GP3,
}),
}],
});

const asg = new autoscaling.AutoScalingGroup(stack, 'ASG', {
vpc: props.vpc,
minCapacity: 1,
maxCapacity: 7,
desiredCapacity: 1,
launchTemplate,
vpcSubnets: props.vpc.selectSubnets({
subnetGroupName: 'Private',
}),
});
asg.connections.addSecurityGroup(ecsSG);

const capacityProvider = new ecs.AsgCapacityProvider(stack, 'AsgCapacityProvider', {
autoScalingGroup: asg,
enableManagedTerminationProtection: true,
enableManagedScaling: true,
targetCapacityPercent: 100,
});

cluster.addAsgCapacityProvider(capacityProvider);

return {
cluster,
CapacityProvider: capacityProvider,
};
}

Creating the GPU Container: Giving Life to the Beast

With the cluster configured, imagine it like a brand-new fancy oven. Now, let’s add the batter. This step involves creating a GPU container that actually does the work.

Challenges:

  1. Logging: If it’s not logged, did it even happen?
  2. Memory and CPU Configuration: You don’t want your container to run out of resources mid-task.
  3. Environment Variables: Your container won’t figure out AWS regions on its own (sad to say, it isn’t sentient… yet).

Solution:

import * as ecr from "aws-cdk-lib/aws-ecr";
import * as ecs from "aws-cdk-lib/aws-ecs";
import * as logs from "aws-cdk-lib/aws-logs";

function createECSservice(stack: cdk.Stack, props: any) {
const repo = ecr.Repository.fromRepositoryName(
stack,
"Repository",
"GPU-repo-name"
);
const image = new ecs.EcrImage(repo, "latest");

const taskDefinition = new ecs.Ec2TaskDefinition(
stack,
"TaskDefinition",
{
taskRole: props.taskRole,
networkMode: ecs.NetworkMode.AWS_VPC,
}
);

const LogGroup = new logs.LogGroup(stack, "LogGroup", {
retention: logs.RetentionDays.ONE_MONTH,
removalPolicy: cdk.RemovalPolicy.DESTROY,
});

const gpuEc2Container = taskDefinition.addContainer("GPUContainer", {
image,
memoryReservationMiB: 14336,
cpu: 4096,
gpuCount: 1,
logging: ecs.LogDriver.awsLogs({
streamPrefix: "LogStream",
logGroup: LogGroup,
}),
environment: {
AWS_REGION: "us-east-1",
NVIDIA_DRIVER_CAPABILITIES: "all",
},
});

gpuEc2Container.addPortMappings({ containerPort: 8080 });

const ecsSG = new ec2.SecurityGroup(stack, "SecurityGroup", {
vpc: props.vpc,
allowAllOutbound: true,
});

const ecsService = new ecs.Ec2Service(stack, "GPUService", {
cluster: props.Cluster.cluster,
taskDefinition,
securityGroups: [ecsSG],
minHealthyPercent: 100,
maxHealthyPercent: 200,
desiredCount: 1,
enableExecuteCommand: true,
capacityProviderStrategies: [
{
capacityProvider: props.Cluster.CapacityProvider.capacityProviderName,
weight: 1,
base: 1,
},
],
});

return ecsService;
}

Now you have a container that’s smart enough to find its way around, but not too smart to start an AI uprising.

Auto-Scaling Containers: Because More is More

We’ve set up the cluster and the container. But what happens when your traffic spikes faster than a caffeine-loaded squirrel on a sugar high? You auto-scale! This way, you avoid disaster when Facebook-like traffic hits your Twitter-like application.

Challenges:

  1. Custom Metrics: CPU and Memory are so last season.
  2. Cooldowns and Scaling Steps: You don’t want your auto-scaling to behave like a yo-yo.
  3. Load Balancers: Properly distributing incoming requests like a well-oiled machine.

Solution:

import * as elbv2 from "aws-cdk-lib/aws-elasticloadbalancingv2";

function setupAutoScaling(stack: cdk.Stack, ecsService: ecs.Ec2Service, props: any) {
const scaling = ecsService.autoScaleTaskCount({ minCapacity: 1, maxCapacity: 4});

scaling.scaleOnMetric('ScaleOutOnQueueSize', {
metric: props.SQS.sqsTopic.Queue.metricApproximateNumberOfMessagesNotVisible({
period: cdk.Duration.minutes(1),
statistic: 'max',
}),
adjustmentType: autoscaling.AdjustmentType.EXACT_CAPACITY,
scalingSteps: [
{ change: 2, lower: 150, upper: 300 },
{ change: 3, lower: 300, upper: 450 },
{ change: 4, lower: 450, upper: 600 },
],
cooldown: cdk.Duration.seconds(60),
});

scaling.scaleOnMetric('ScaleInOnQueueSize', {
metric: props.SQS.sqsTopic.Queue.metricApproximateNumberOfMessagesNotVisible({
period: cdk.Duration.minutes(1),
statistic: 'max',
}),
adjustmentType: autoscaling.AdjustmentType.EXACT_CAPACITY,
scalingSteps: [
{ change: 2, lower: 150, upper: 300 },
{ change: 1, lower: 0, upper: 150 },
],
cooldown: cdk.Duration.seconds(60),
});
}

function setupLoadBalancer(stack: cdk.Stack, ecsService: ecs.Ec2Service, props: any) {
const TargetGroup = new elbv2.ApplicationTargetGroup(
stack,
"TargetGroup",
{
targets: [ecsService],
protocol: elbv2.ApplicationProtocol.HTTP,
vpc: props.vpc,
port: 80,
healthCheck: {
port: "8080",
path: "/",
healthyThresholdCount: 2,
unhealthyThresholdCount: 10,
timeout: cdk.Duration.seconds(4),
interval: cdk.Duration.seconds(5),
healthyHttpCodes: "200",
},
}
);

new elbv2.ApplicationListenerRule(stack, "ListenerRule", {
listener: props.Alb.httpslistener,
priority: 1,
action: elbv2.ListenerAction.forward([TargetGroup]),
conditions: [elbv2.ListenerCondition.hostHeaders(["gpu.example.com"])],
});
}

Now your containers will auto-scale gracefully, without being all “I WANNA SCALE UP! I WANNA SCALE DOWN!” every few minutes.

Closing Thoughts

Deploying GPU-based ECS containers doesn’t have to be a journey through Mordor. With this guide, you’ve got a GPS, a flashlight, and a sense of humor to guide you through. Remember: when in doubt, always blame the network.

Happy deploying! And if anyone asks, just tell them your containers are busier than a one-legged man in a butt-kicking contest.

Note: Always test your infrastructure as code changes in a safe environment before deploying to production.

❤️ Follow Siddhanth Dwivedi aka mafiaguy for more such awesome blogs.

--

--

THE HOW TO BLOG |Siddhanth Dwivedi
THE HOW TO BLOG |Siddhanth Dwivedi

Written by THE HOW TO BLOG |Siddhanth Dwivedi

Siddhanth Dwivedi | Senior Security Engineer & AWS Community Builder 👨🏾‍💻

No responses yet