Hi, I'm

Dhia Gharsallaoui

Senior SRE Engineer - AI Infrastructure

A passionate SRE engineer building production-grade AI infrastructure at scale. I specialize in Kubernetes orchestration, GPU infrastructure, multi-cloud deployments, and high-performance distributed systems for AI workloads.

About Me

A Senior SRE Engineer with 5+ years of experience specializing in AI infrastructure and cloud-native systems. I architect and implement production-grade GPU infrastructure, Kubernetes platforms, and multi-cloud orchestration for AI workloads at scale.

I’m passionate about 🤖 AI infrastructure, ☸️ Kubernetes orchestration, 🎮 GPU computing, and ⚡ high-performance distributed systems. My approach combines deep technical expertise with hands-on implementation of mission-critical AI platforms.

Currently building infrastructure at FlexAI (AI-as-a-Service platform), architecting multi-cloud GPU compute, and contributing to cloud-native open-source projects.

Technologies I work with:
  • GPU Infrastructure (Nvidia H100, A10G, AMD MI300)
  • Kubernetes Orchestration
  • Kubernetes Operators Development
  • AI Workload Orchestration
  • Training & Inference Pipelines
  • Multi-Cloud GPU Computing
  • Bare Metal GPU Clusters
  • nvidia-device-plugin
  • ROCm gpu-operator
  • Go (Golang) - Expert
  • Python - Production
  • Rust
  • Scala
  • TypeScript
  • Bash/Shell Scripting
  • Terraform IaC
  • Pulumi IaC
  • Ansible Configuration
  • ArgoCD GitOps
  • FluxCD GitOps
  • Helm Charts
  • Docker Containerization
  • AWS
  • Azure
  • Scaleway
  • gRPC Services
  • REST APIs
  • Microservices Architecture
  • Event-Driven Systems
  • Domain-Driven Design
  • High Availability Design
  • Skupper Edge Networking
  • Low-Latency Networking
  • Cross-Cluster Communication
  • Service Discovery
  • Load Balancing
  • Prometheus & Grafana
  • OpenTelemetry
  • Distributed Tracing
  • Loki Log Aggregation
  • Alerting & Incident Response
  • Performance Monitoring
  • SLI/SLO Implementation
  • Apache Kafka
  • NATS
  • PostgreSQL
  • TimescaleDB
  • Redis Caching
  • Apache Druid
  • JuiceFS Distributed Storage
  • Data Modeling
  • GitHub Actions
  • GitLab CI/CD
  • Jenkins Pipelines

Experience

Senior SRE Engineer - FlexAI
September 2025 - Present

Building production-grade infrastructure for AI-as-a-Service platform, orchestrating multi-cloud GPU compute for training, inference, and AI workloads at scale.

Key Achievements:

  • Architected and built stable, zero-maintenance Kubernetes infrastructure spanning multi-cloud and bare metal environments (AWS, Azure, Sesterce, Scaleway)
  • Set up and configured FluxCD for GitOps-based continuous deployment, enabling declarative infrastructure management and automated reconciliation across clusters
  • Created custom Helm charts for application deployments, standardizing packaging and configuration management across multi-cluster environments
  • Designed and delivered FlexAI-managed customer-facing product enabling “Bring Your Own Compute” on AWS and Azure, defining architecture, building core services, and streamlining deployment workflows
  • Brought up, configured, and performance-tuned GPU-based compute infrastructure (Nvidia A10G, H100, AMD MI300), ensuring high reliability and throughput across distributed AI workloads
  • Built gRPC services for cross-cluster workload scheduling, enabling efficient orchestration and coordination of AI jobs across geographically distributed Kubernetes clusters
  • Designed and implemented custom Kubernetes operators for workload management, automating complex orchestration patterns for AI training and inference jobs
  • Integrated and managed GPU Kubernetes operators including nvidia-device-plugin and ROCm gpu-operator, ensuring proper resource allocation and comprehensive monitoring of GPU utilization
  • Investigated and deployed low-latency networking solutions and edge cluster serving (Skupper) for distributed AI workloads
  • Contributed high-scalability data layer with cross-cloud/region support using JuiceFS for distributed storage across heterogeneous infrastructure
  • Established SRE best practices by building robust observability pipelines and monitoring systems, delivering actionable metrics and distributed tracing to accelerate debugging and improve service reliability
  • Collaborated closely with runtime teams to ensure AI workloads run smoothly, addressing infrastructure bottlenecks and optimizing resource utilization
  • Actively reviewed design documents across teams to ensure cross-functional and business alignment on architectural decisions
Lead Backend/DevOps Engineer - Perplex
April 2025 - September 2025

Led backend architecture and infrastructure operations for an on-chain trading platform serving DeFi protocols and traders.

Key Achievements:

  • Designed and implemented complete infrastructure architecture from scratch, including network topology, security policies, and scalability frameworks
  • Provisioned multi-cloud infrastructure using Terraform across Digital Ocean and OVHCloud platforms, ensuring cost optimization and geographic redundancy
  • Integrated Ansible for automated configuration management and deployed Kubernetes clusters with ArgoCD for GitOps-based continuous deployment
  • Architected and implemented high-performance trading backend systems in Golang, handling real-time price feeds and order execution
  • Built robust on-chain data ingestion pipelines processing millions of transactions daily with sub-second latency requirements
  • Established system architecture patterns for high-availability trading infrastructure with zero-downtime deployments
  • Implemented comprehensive monitoring and alerting systems for trading operations and infrastructure health
Solution Architect - Skilld
August 2021 - April 2025

Designed and implemented scalable backend systems and cloud-native solutions with focus on distributed architectures.

Key Achievements:

  • Architected and developed high-performance microservices in Go using domain-driven design principles
  • Built event-driven backend systems using Apache Kafka and NATS, processing millions of messages daily
  • Designed and maintained multi-cluster Kubernetes infrastructure supporting 50+ microservices
  • Implemented comprehensive API architectures with REST and gRPC services achieving 99.9% uptime
  • Developed stream processing pipelines using Apache Spark for real-time data analytics and transformation
  • Built monitoring and observability systems using Prometheus, Grafana, and distributed tracing
  • Established database optimization strategies with PostgreSQL and Redis, reducing query times by 60%
  • Provisioned and managed cloud infrastructure using Terraform on Digital Ocean and OVH
  • Implemented GitOps workflows using ArgoCD for automated deployments and configuration management
  • Built comprehensive CI/CD pipelines reducing deployment time from hours to minutes
  • Implemented SSO integration and security hardening practices for production systems
Blockchain Engineer - ChainsAtlas
March 2024 - April 2025

Specialized in blockchain backend development and cross-chain protocol implementation.

Key Projects:

  • Architected and implemented multi-chain bridge protocols in Go connecting EVM-compatible chains with XRPL
  • Developed smart contracts in Solidity and deployed across multiple EVM blockchains
  • Built comprehensive backend systems for smart contract interactions, including automated deployment pipelines
  • Implemented cryptographic validation systems for cross-chain asset transfers and multi-signature operations
  • Designed blockchain-agnostic APIs enabling seamless protocol deployment across multiple networks
  • Built robust transaction processing engines with advanced retry mechanisms and comprehensive error handling
  • Developed real-time blockchain monitoring systems with alerting for transaction failures and network issues
  • Contributed to RWAFi protocol development, building smart contracts in Solidity for institutional-grade real-world asset infrastructure that bridges DeFi liquidity with traditional finance security
  • Worked on blockchain wallet extension development, adding EVM-based blockchain support to enhance multi-chain wallet functionality and cross-chain asset management
Software Engineer - idatase GmbH
January 2024 - March 2024

Developed scalable data ingestion and processing systems.

Achievements:

  • Developed scalable data ingestion pipeline using Python/Flask and PostgreSQL
  • Implemented containerized microservices using Docker Compose and Kubernetes
  • Created automated testing and deployment pipeline using GitHub Actions
  • Designed real-time metrics collection system

Education

2020 - 2021
Master's Degree in Computer Science
MINES ParisTech
Minor in Mathematics Focus on Distributed Systems and Cloud Computing
2018 - 2021
Engineer's Degree in Industrial Engineering
ENIT
Minor in Data Science Specialization in Big Data Analytics

Projects

Mage-ai
Python Apache Druid LDAP
Mage-ai
Implemented LDAP authentication, Apache Druid integration, and centralized logging system for this open-source data pipeline tool with 8K+ GitHub stars.
Go-mail
Go OpenPGP Middleware
Go-mail
Developed middleware architecture pattern and implemented OpenPGP encryption middleware with comprehensive test coverage.
Go-RTE
Go API Client Library
Go-RTE
Created a Go client library for RTE APIs with clean and uniform way to interact with different API endpoints.
AuthGuard
Go Nginx Authentication Caching Monitoring
AuthGuard
Lightweight, high-performance authentication service designed for nginx's auth_request module. Provides composable authentication with pluggable providers, built-in caching, and comprehensive monitoring.
Go ElevenLabs
Go Text-to-Speech API Client Streaming
Go ElevenLabs
Production-grade Go client library for the ElevenLabs Text-to-Speech API. Built with idiomatic Go practices, comprehensive error handling, and full support for streaming audio generation.
Kratos Admin UI
React TypeScript Ory Kratos Identity Management
Kratos Admin UI
A modern, responsive admin interface for Ory Kratos identity management system. Features identity management, session monitoring, analytics dashboard, and schema inspection.

Get in Touch

Interested in collaboration or have a project in mind? Feel free to reach out!