At goML, we design and build cutting-edge Generative AI, AI/ML, and Data Engineering solutions that empower businesses to unlock the full potential of their data, drive intelligent automation, and deliver transformative AI-powered experiences. Our mission is to bridge the gap between advanced AI research and real-world enterprise applications—helping organizations innovate faster, make smarter decisions, and scale seamlessly.
We are looking for a Senior Cloud & DevOps Engineer with deep expertise in AWS, cloud architecture, and large-scale infrastructure management. In this role, you will lead the design, implementation, and optimization of secure, scalable, and high-performance cloud platforms that power mission-critical AI/ML and GenAI workloads.
Why You? Why Now?
As AI adoption accelerates across industries, building resilient, automated, and cost-efficient cloud infrastructure is more critical than ever. This role is ideal for someone who thrives on solving complex cloud challenges, driving DevOps excellence, and enabling engineering teams to deliver at scale with speed, reliability, and confidence.
What You’ll Do (Key Responsibilities)
First 30 Days: Strategic Onboarding
- Gain a deep understanding of goML’s AI/ML and GenAI architecture, pipelines, and infrastructure
- Assess existing cloud environments, DevOps practices, and deployment workflows
- Identify gaps, risks, and opportunities for optimization across infrastructure and processes
- Collaborate with engineering and leadership teams to align on priorities and roadmap
First 60 Days: Leadership & Execution
- Architect, design, and manage scalable AWS infrastructure across ECS, EKS, Lambda, EC2, VPC, S3, and API Gateway
- Lead the implementation and optimization of CI/CD pipelines (Jenkins, GitHub Actions, AWS CodePipeline)
- Establish Infrastructure as Code (IaC) standards using Terraform, AWS CDK, or CloudFormation
- Enable and optimize AI/ML workloads using services like AWS Bedrock and SageMaker
- Drive automation across infrastructure, deployments, and operational workflows
- Implement robust monitoring, alerting, and observability frameworks
First 180 Days: Ownership & Scale
- Own and evolve the DevOps and cloud architecture strategy for enterprise-scale AI deployments
- Drive performance optimization, cost efficiency, and system resilience initiatives
- Establish best practices for Kubernetes and container orchestration at scale
- Strengthen cloud security, governance, and compliance frameworks
- Mentor junior engineers and elevate DevOps maturity across teams
- Lead incident management, root cause analysis, and long-term reliability improvements
What You Bring (Qualifications & Skills)
Must-Have
- 8+ years of experience in Cloud/DevOps engineering roles
- Strong expertise in AWS cloud architecture and services (ECS, EKS, Lambda, EC2, VPC, API Gateway, S3, CloudWatch, Load Balancers)
- Proven experience designing and managing large-scale, production-grade cloud systems
- Deep hands-on experience with Infrastructure as Code (Terraform, AWS CDK, or CloudFormation)
- Strong proficiency in Docker and Kubernetes (designing and managing clusters at scale)
- Expertise in building and optimizing CI/CD pipelines
- Strong scripting/programming skills (Python, Bash)
- Solid understanding of cloud security, networking, observability, and reliability engineering
- Excellent problem-solving, stakeholder communication, and leadership skills
Nice-to-Have
- AWS Certified DevOps Engineer / Solutions Architect (Professional preferred)
- Experience with AI/ML infrastructure (AWS Bedrock, SageMaker)
- Certified Kubernetes Administrator (CKA)
- Experience with FinOps (cost optimization strategies)
- Exposure to multi-cloud or hybrid cloud environments
Why Work With Us?
- Remote-first culture with collaboration opportunities in Coimbatore
- Work on cutting-edge AI/ML & GenAI infrastructure challenges
- High ownership and impact in building enterprise-grade systems
- Competitive compensation, growth opportunities, and ESOPs