System Online|UTC 2026-06-05 12:32:49

Onesmus DzidzaiMaenzanise

Cloud Infrastructure & Software Engineer

"I build and maintain production systems that cannot afford to fail."

System StatusOperational
SYSTEM UPTIME99.9%+
DAILY THROUGHPUT300K+
INCIDENT RESPONSE<15min
INTEGRATION NODES12
System Load82%
CloudReadyPCI DSS CompliantDistributed SystemsObservabilityKubernetesArgoCDGrafana
Scroll
// System Profile

Production Systems Overview

I operate in live production environments not isolated development sandboxes. My work centers on maintaining reliability, observability, and fault handling across distributed systems that process hundreds of thousands of transactions daily.

I specialize in stabilizing real-world systems under load, managing cross-organization integrations, and ensuring that every component from API gateways to database clusters performs predictably at scale.

My approach combines disciplined software engineering with infrastructure operations, treating production environments as the single source of truth.

System Specifications
Environment
Production Systems
Domain
Fintech / Telecom
Scale
High-volume production systems
Reliability Focus
Observability, Fault Tolerance, RCA
Compliance
PCI DSS, Secure Design
// Daily Tooling

Production Stack

Tools I operate daily in production environments. Every component is battle-tested under real traffic and real incidents.

Kubernetes
Operational

Container orchestration across EKS clusters. Pod management, rollouts, resource scaling, and health checks in production.

ClustersProduction
DeploymentsRolling
HealthStable
ArgoCD
Operational

GitOps delivery for all production deployments. Declarative sync policies, automated rollbacks, and application health monitoring.

Sync StatusHealthy
StrategyGitOps
RollbacksAutomated
Grafana
Operational

Unified observability dashboards for system health, application performance, and infrastructure metrics across all environments.

DashboardsCustom
Data SourcesMultiple
AlertsConfigured
Prometheus
Operational

Metrics collection and alerting for distributed systems. Custom alert rules, time-series analysis, and incident trigger pipelines.

CollectionReal-time
Alert RulesCustom
RetentionLong-term
AWS Lambda
Operational

Serverless compute for event-driven workloads. Integrated with API Gateway, SQS, and S3 for scalable production processing.

RuntimesNode.js / Go
TriggersEvent-driven
ScaleAuto
Terraform
Operational

Infrastructure as Code for provisioning and managing cloud resources. Modular configurations, state management, and pipeline-driven deployments.

ProviderAWS
StateRemote
WorkflowCI/CD
// System Modules

Capabilities

Each capability is a production-tested system module-battle-hardened across fintech and telecom environments handling real traffic and real failures.

  • AWS Lambda - serverless compute
  • Docker / Kubernetes - container orchestration
  • Cloud hosting - multi-environment deployments
  • CI/CD pipelines - automated build & deploy
  • Deployment automation - zero-downtime releases
  • Infrastructure troubleshooting - root cause analysis
  • Golang / Python / Node.js / Java - polyglot engineering
  • REST / GraphQL APIs - service interfaces
  • Integration systems - cross-platform connectivity
  • PostgreSQL / MySQL / MongoDB - relational & document stores
  • Performance tuning - query optimization, indexing
  • Data pipelines - ETL, streaming, batch processing
  • PCI DSS - compliant payment environments
  • OAuth2 / JWT - authentication & authorization
  • Secure system design - defense in depth
  • Incident response - real-time production triage
  • Root Cause Analysis - systematic failure investigation
  • Monitoring dashboards - metrics, logs, traces
  • Kubernetes cluster deployments and container orchestration in production
  • Deployment pipelines, rollback strategies, and uptime management
  • Pod scaling, resource management, and health checks
  • GitOps-based deployment workflows using ArgoCD
  • Declarative application delivery to production infrastructure
  • Sync policies, rollback management, and deployment observability
  • Grafana dashboards for system health and performance monitoring
  • Metrics visualization across distributed services
  • Alert configuration and incident response support
// Production History

Experience

Jan 2024 – Present

Intermediate Software Engineer

APS Holdings

  • Supported and maintained cloud-based payment and integration platforms at production scale
  • Second-line incident response: investigating failed transactions, degraded integrations, and performance issues
  • Led integrations between internal platforms and external partner systems
  • Built and maintained Spring Boot microservices for partner-facing API integrations
  • Took technical ownership of cross-organisation integration workstreams involving Telecoms partners
  • Worked within PCI DSS requirements, managing authentication, access control, and audit needs
  • Contributed to CI/CD pipelines and containerized deployments
Same company
Jan 2024 – Jan 2025

Junior Software Engineer

APS Holdings

  • Developed and supported Golang-based backend services in live production environments
  • Assisted with production incident investigations and post-incident fixes
  • Worked with Kubernetes-based deployment pipelines, improving deployment speed and rollback reliability
  • Supported operational dashboards and reporting used by finance and operations teams
May 2023 – Dec 2023

Software Engineer

CartShare (Remote)

  • Developed backend services and integrations for e-commerce payment and data systems
  • Built and supported AWS Lambda-based workflows and analytics pipelines
  • Implemented automated deployment workflows to reduce time-to-market and deployment errors
  • Supported operational issues related to data consistency and transaction processing
Sep 2022 – Jan 2024

Software Engineer in Training

WeThinkCode_

  • Built full-stack applications using Python, Java, JavaScript, and SQL
  • Worked in agile teams delivering production-style projects with testing and documentation expectations
  • Achieved high test coverage and developed disciplined debugging and review practices
// Engineering Doctrine

How I Think About Systems

01

Systems must fail safely, not silently.

A failure you know about is a failure you can respond to. Silent failures cascade into outages. Every component should degrade gracefully and report its state.

02

Observability is not optional.

You can’t fix something you can’t observe. Just like you need to have metrics, logs, and traces in order to maintain a production system, you also need to be able to see everything in order to determine if it is ready for production. If something is not observable, then it’s not yet appropriate for production.

03

Complexity must be controlled, not eliminated blindly.

Managed complexity is the goal of a distributed system; the distributed system will never be zero complexity, however, it will have clear boundaries, explicitly defined interfaces, and where you have made deliberate trade-offs.

04

Production is the only real environment.

Development & staging are approximations of what production is going to be like. Production is where your system will behave in the way you expect based on real load, real data & real failures. You should design your system with production in mind.

05

Reliability is a feature, not an afterthought.

Every architectural decision has implications for reliability. Adding redundancy, retry strategy, circuit breaker, back pressure, etc. are not optional 'add-ons' they are part of the core design requirements.

// System Evolution

Education

B.S. Software Development

Expected 2026

BYU-Idaho

Bachelor's degree program in Software Development, focused on software engineering principles, data structures, and system design.

National Certificate: Information Technology (Systems Development) NQF 5

Aug 2022 – Jan 2024

WeThinkCode_

National certificate in systems development covering programming, system analysis, and software development methodologies. Completed through an intensive peer-led engineering program with emphasis on testing, agile delivery, debugging, and real-world software development practices.

// Contact Channel

Contact

Available for infrastructure-focused roles and production engineering opportunities.