Senior SRE Engineer
-
FlexAISeptember 2025 - Present
Building production-grade infrastructure for AI-as-a-Service platform, orchestrating multi-cloud GPU compute for training, inference, and AI workloads at scale.
Key Achievements:
- Architected and built stable, zero-maintenance Kubernetes infrastructure spanning multi-cloud and bare metal environments (AWS, Azure, Sesterce, Scaleway)
- Set up and configured FluxCD for GitOps-based continuous deployment, enabling declarative infrastructure management and automated reconciliation across clusters
- Created custom Helm charts for application deployments, standardizing packaging and configuration management across multi-cluster environments
- Designed and delivered FlexAI-managed customer-facing product enabling “Bring Your Own Compute” on AWS and Azure, defining architecture, building core services, and streamlining deployment workflows
- Brought up, configured, and performance-tuned GPU-based compute infrastructure (Nvidia A10G, H100, AMD MI300), ensuring high reliability and throughput across distributed AI workloads
- Built gRPC services for cross-cluster workload scheduling, enabling efficient orchestration and coordination of AI jobs across geographically distributed Kubernetes clusters
- Designed and implemented custom Kubernetes operators for workload management, automating complex orchestration patterns for AI training and inference jobs
- Integrated and managed GPU Kubernetes operators including nvidia-device-plugin and ROCm gpu-operator, ensuring proper resource allocation and comprehensive monitoring of GPU utilization
- Investigated and deployed low-latency networking solutions and edge cluster serving (Skupper) for distributed AI workloads
- Contributed high-scalability data layer with cross-cloud/region support using JuiceFS for distributed storage across heterogeneous infrastructure
- Established SRE best practices by building robust observability pipelines and monitoring systems, delivering actionable metrics and distributed tracing to accelerate debugging and improve service reliability
- Collaborated closely with runtime teams to ensure AI workloads run smoothly, addressing infrastructure bottlenecks and optimizing resource utilization
- Actively reviewed design documents across teams to ensure cross-functional and business alignment on architectural decisions
Lead Backend/DevOps Engineer
-
PerplexApril 2025 - September 2025
Led backend architecture and infrastructure operations for an on-chain trading platform serving DeFi protocols and traders.
Key Achievements:
- Designed and implemented complete infrastructure architecture from scratch, including network topology, security policies, and scalability frameworks
- Provisioned multi-cloud infrastructure using Terraform across Digital Ocean and OVHCloud platforms, ensuring cost optimization and geographic redundancy
- Integrated Ansible for automated configuration management and deployed Kubernetes clusters with ArgoCD for GitOps-based continuous deployment
- Architected and implemented high-performance trading backend systems in Golang, handling real-time price feeds and order execution
- Built robust on-chain data ingestion pipelines processing millions of transactions daily with sub-second latency requirements
- Established system architecture patterns for high-availability trading infrastructure with zero-downtime deployments
- Implemented comprehensive monitoring and alerting systems for trading operations and infrastructure health
Solution Architect
-
SkilldAugust 2021 - April 2025
Designed and implemented scalable backend systems and cloud-native solutions with focus on distributed architectures.
Key Achievements:
- Architected and developed high-performance microservices in Go using domain-driven design principles
- Built event-driven backend systems using Apache Kafka and NATS, processing millions of messages daily
- Designed and maintained multi-cluster Kubernetes infrastructure supporting 50+ microservices
- Implemented comprehensive API architectures with REST and gRPC services achieving 99.9% uptime
- Developed stream processing pipelines using Apache Spark for real-time data analytics and transformation
- Built monitoring and observability systems using Prometheus, Grafana, and distributed tracing
- Established database optimization strategies with PostgreSQL and Redis, reducing query times by 60%
- Provisioned and managed cloud infrastructure using Terraform on Digital Ocean and OVH
- Implemented GitOps workflows using ArgoCD for automated deployments and configuration management
- Built comprehensive CI/CD pipelines reducing deployment time from hours to minutes
- Implemented SSO integration and security hardening practices for production systems
Blockchain Engineer
-
ChainsAtlasMarch 2024 - April 2025
Specialized in blockchain backend development and cross-chain protocol implementation.
Key Projects:
- Architected and implemented multi-chain bridge protocols in Go connecting EVM-compatible chains with XRPL
- Developed smart contracts in Solidity and deployed across multiple EVM blockchains
- Built comprehensive backend systems for smart contract interactions, including automated deployment pipelines
- Implemented cryptographic validation systems for cross-chain asset transfers and multi-signature operations
- Designed blockchain-agnostic APIs enabling seamless protocol deployment across multiple networks
- Built robust transaction processing engines with advanced retry mechanisms and comprehensive error handling
- Developed real-time blockchain monitoring systems with alerting for transaction failures and network issues
- Contributed to RWAFi protocol development, building smart contracts in Solidity for institutional-grade real-world asset infrastructure that bridges DeFi liquidity with traditional finance security
- Worked on blockchain wallet extension development, adding EVM-based blockchain support to enhance multi-chain wallet functionality and cross-chain asset management
Developed scalable data ingestion and processing systems.
Achievements:
- Developed scalable data ingestion pipeline using Python/Flask and PostgreSQL
- Implemented containerized microservices using Docker Compose and Kubernetes
- Created automated testing and deployment pipeline using GitHub Actions
- Designed real-time metrics collection system