2024-10-09 6 min read

FinOps in Practice: 38% AWS Bill Reduction in 6 Weeks

How systematic cost analysis and targeted infrastructure changes cut a client's AWS spend by $40K monthly. A practical breakdown of what actually worked.

The Problem

Our client—a mid-sized SaaS platform—had been scaling aggressively for 18 months. Their AWS bill hit $125K per month, and finance started asking uncomfortable questions. When they reached out to LavaPi, they expected we'd find some quick wins and call it a day. We did find wins, but the real issue ran deeper: no one was systematically watching infrastructure costs, and architectural decisions weren't informed by actual spend data.

The first meeting revealed the core problem: developers weren't aware of cost implications when choosing instance types, storage classes, or data transfer patterns. There was no visibility into who spent what on what, and no automated way to catch waste.

The Audit: Finding the Money

Reserved Instance Underutilization

We ran their CloudTrail logs and cost allocation tags through custom analysis. The first finding: they'd purchased $60K in Reserved Instances but were only using 58% of capacity. Another $8K in RIs sat completely idle because workloads had shifted.

python
# Simple cost tracker using boto3 and CE API
import boto3
from datetime import datetime, timedelta

ce = boto3.client('ce')
start = (datetime.now() - timedelta(days=30)).strftime('%Y-%m-%d')
end = datetime.now().strftime('%Y-%m-%d')

response = ce.get_cost_and_usage(
    TimePeriod={'Start': start, 'End': end},
    Granularity='DAILY',
    Metrics=['UnblendedCost'],
    GroupBy=[
        {'Type': 'DIMENSION', 'Key': 'SERVICE'},
        {'Type': 'TAG', 'Key': 'Environment'}
    ]
)

for result in response['ResultsByTime']:
    for group in result['Groups']:
        print(f"{group['Keys']}: ${group['Metrics']['UnblendedCost']['Amount']}")

We right-sized their RIs against actual usage patterns, converting the unused capacity to Savings Plans (more flexible) and canceling redundant purchases. That alone saved $18K monthly.

NAT Gateway Overprovisioning

Their architecture had data flowing through three NAT gateways in production when one would suffice. NAT processing charges were running $12K per month. A redesign using a shared gateway and VPC endpoints for S3 cut this to $2,100.

bash
# Check NAT gateway data processing costs
aws cloudwatch get-metric-statistics \
  --namespace AWS/NatGateway \
  --metric-name BytesOutToDestination \
  --dimensions Name=NatGatewayId,Value=nat-12345 \
  --start-time 2024-01-01T00:00:00Z \
  --end-time 2024-01-31T23:59:59Z \
  --period 86400 \
  --statistics Sum

Compute Right-Sizing

They were running t3.xlarge instances for background jobs that only needed t3.small. Their data warehouse was running 24/7 even though it was only queried during business hours. By implementing scheduled stopping and moving batch jobs to smaller instances, we cut EC2 costs from $38K to $28K monthly.

Unused Data Transfer

Cross-region replication was enabled for "disaster recovery purposes" but never actually tested. Removing it and implementing a proper (and cheaper) backup strategy saved $7K monthly.

Implementation and Governance

Cost reduction only sticks if teams stay accountable. We set up three things:

  1. Cost allocation tags on every resource—no exceptions. This gives visibility into which team, environment, and project is spending what.

  2. Weekly cost reports sent to engineering leads, showing trends and unusual spikes.

  3. Cost-aware architecture decisions. We documented the per-unit costs of common services (per GB of data transfer, per million requests, etc.) so developers could make informed tradeoffs during design.

typescript
// Example: cost-aware caching decision in TypeScript
const cacheConfig = {
  // DynamoDB: $0.25 per million requests (on-demand)
  // CloudFront: $0.085 per 10K requests
  // Decision: use CloudFront for frequently accessed assets
  ttl: 86400, // 24 hours
  strategy: 'cloudfront' // not direct DynamoDB
};

Results

In six weeks, their bill dropped from $125K to $77.5K—a 38% reduction. The bigger win: they now understand their costs and can make smart decisions about scaling without assuming AWS bills always climb.

FinOps isn't about finding hidden waste. It's about making cost visible, embedding it into decisions, and treating infrastructure efficiency the same way you'd treat code quality.

Share
LP

LavaPi Team

Digital Engineering Company

All articles