I’m a DevOps Engineer with a Bachelor's degree from Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT). Originally from Porbandar, Gujarat, I am currently based in Pune. As a DevOps Engineer, I am passionate about automation, optimization and enhancing system reliability. Outside of work, I enjoy swimming 🏊🏻, hitting the gym 🏋🏻, playing the guitar 🎸, and diving into self-help and mythology books 📚.
👋 Hello! I'm Dhruv Nice to meet you! 😁
I'm a CKA & ICA Certified DevOps/SRE/Platform Engineer with over 4 years of hands-on experience in Kubernetes, Docker, and AWS cloud services. I specialize in Istio, K8s Operators, optimizing workflows, and implementing cost-saving strategies
About me 🤷🏻♂️
Feel free to explore my Blogs , check out my Certifications on Credly , or Book a 1-on-1 on Topmate 😄.
Skills 🧑🏻💻
Languages
-
Python
-
Bash - Zsh
-
Go
-
C++
Cloud Tech
- Amazon Web Services
-
DigitalOcean
-
Kubernetes
-
Docker
-
Istio (Service Mesh)
-
Cloudflare
Automations
-
Helm
-
Terraform
-
AWS CloudFormation
CI | CD
-
GitLab Pipelines
-
Jenkins
- GitLeaks
Logging and Monitoring
-
Prometheus
-
Alertmanager
-
Grafana
-
PagerDuty
-
Datadog
-
Sumologic
-
Loki (Log Aggregation)
Soft Skills
- Flexibility and Adaptability
- Teamwork
- Proactivity
- Critical Thinking
- Problem Solving
- Always Learning
Tools & Systems
-
Linux
-
Git
-
Github
- Slack Workflows
-
Jira
-
WordPress
-
Apache Airflow
- Infisical
- Microsoft Entra ID
Experience 🚀
-
Senior DevOps Engineer
ZZAZZ AI
• Kubernetes Architecture & Observability: - Owned and architected a secure Kubernetes platform from scratch across 2 environments, supporting ~150 microservices and serving ~15K RPS per cluster; introduced RBAC, network policies, namespace isolation, and multi-region clusters for GDPR compliance and release safety. - Drove an OSS-first platform strategy over vendor tooling across observability, CI/CD, certificate management, and orchestration; adopted Helm, cert-manager, Kubernetes Operators, and the LGTM stack, cutting licensing costs and avoiding vendor lock-in. - Built the platform-wide observability stack from the ground up using Prometheus, Grafana, Alertmanager, and Loki; delivered team-specific alerting and dashboards covering nodes, pods, workloads, compute, and networking. - Standardized deployment patterns for 150+ services with reusable Helm templates, reducing setup overhead and powering scalable self-service delivery. • Self-Hosted GitLab & CI/CD at Scale: - Established and operationalized self-hosted GitLab for 50+ users, ~300 projects, and 20k+ pipeline runs per month; transferred 250+ repositories from Bitbucket with zero data loss and added DR automation with RPO ~24h / RTO ~1h. - Streamlined CI/CD on Kubernetes-based runners, cutting container image build times by 60–70%, adding GitLeaks security scans, and introducing deployment guardrails such as central kill switches for safer releases. • ZTNA & Infrastructure Security: - Architected centralized identity and access management by rolling out SSO and SCIM across 45+ tools and applications, standardizing authentication and automating user lifecycle management organization-wide. - Strengthened platform access controls by rolling out Cloudflare ZTNA across 500+ droplets, onboarding staging and production services behind private-network access patterns. - Enabled safer developer access by adding Cloudflared + ZTNA authenticated test endpoints for Kubernetes services and rolling out Cloudflare Warp for 35+ developers and contractors. - Unified multi-cloud operations across DigitalOcean, AWS, and Cloudflare with OAuth-enabled tooling and resource tagging for governance and cost accountability. • Argus — AI Ops Chatbot: - Developed Argus, an internal AI Ops chatbot on Kubernetes, Prometheus, and Grafana MCP servers, reducing MTTR by 75% and unlocking self-serve cluster debugging. - Launched an admin dashboard for chatbot monitoring, token cost analytics, and usage insights. • Cloudflare DNS & Workers: - Simplified ingress and edge routing with dynamic DNS allocation and a single shared load balancer, eliminating 60+ load balancers and enabling faster, standardized ingress creation. - Transitioned 15+ frontend services to Cloudflare Workers and Pages, strengthening edge delivery and standardizing end-to-end CI/CD automation for frontend releases. Also served as a 24x7 on-call engineer across cloud, platform, and infrastructure, supporting developer enablement, production incident response, and end-to-end platform ownership.
Learn more... -
Software Development Engineer-2 (DevOps)
Mindtickle
• Istio Service Mesh: - Reduced inter-AZ data transfer costs by 70% through Istio locality-aware traffic management and expanded mesh visibility with Grafana dashboards for both control plane and data plane observability. - Delivered zero-downtime Istio upgrades in production by introducing a canary upgrade strategy and transitioning mesh deployments to Helm-based version control. - Operated Istio service mesh across multiple clusters serving 500+ microservices, strengthening traffic control and production reliability for platform-managed services. • Cost-Saving Initiatives: - Replatformed Kubernetes workloads onto AWS Graviton-based instances through ARM image build and deployment pipelines, generating $270,000 in savings. - Automated EC2 and EBS cleanup to remove redundant capacity, driving $100,000+ annual savings. • Isolated Sandbox Testing: - Designed an isolated sandbox testing model for Kubernetes-native applications based on Shift Left Testing, driving a 40% reduction in error rates before production rollout. - Used Kubernetes Operators and Istio reliability features to make the solution repeatable and production-like, with plans to open-source the approach. • Chaos Testing Enhancements: - Engineered an in-house chaos testing mechanism using Kubernetes Operators and Istio fault injection to validate resiliency across 50+ microservices. - Extended resilience testing to infrastructure level with AWS Fault Injection Simulator on clusters spanning 150+ nodes. • Repository Migration: - Developed migration and validation tooling to move 1000+ repositories from GitHub to GitLab across 5 language ecosystems, saving around $40,000 annually. • Slack Workflows and Automation: - Automated production escalation setup with a Slack workflow that integrated Jira, PagerDuty, and Google Meet, reducing manual coordination during incidents. - Implemented a custom Slackbot for access management and routine developer tasks; cut manual workload by 30% and saved the team 50+ hours monthly. • DevOps Help-Desk: - Developed an internal portal to manage and route developer on-call requests, supporting 400+ requests/month through Jira Service Management and Slack integrations. - Introduced SLAs and reporting for recurring issue analysis, streamlining support prioritization and reducing tickets by 10% month-over-month.
Learn more... -
Platform Architect (12,000+ Sites)
Freelance Project
• Platform Architecture & Cost Optimization: - Designed and delivered a Kubernetes-based hosting platform for 12,000+ internal websites, reducing monthly hosting costs by ~$25k/month, from ~$30k to ~$5k/month. - Engineered the platform's database layer, observability, security, and access controls end-to-end, with automated DNS, SSL certificate management, and monitoring at 12,000-site scale. - Optimized traffic-based performance with Redis + local caching and Autoscaling. • Automation & Operational Tooling: - Developed automation tooling for bulk content push and theme/plugin management at scale using custom scripts and WP CLI. - Exposed API endpoints to automate operational website actions such as article publishing and plugin-driven workflows across the fleet. - Coordinated delivery for demo and client-critical sites, handling access, plugin procurement, timelines, and end-to-end execution.