System Online|UTC 2026-06-05 12:32:49

Onesmus DzidzaiMaenzanise

Cloud Infrastructure & Software Engineer

"I build and maintain production systems that cannot afford to fail."

View System Profile View Experience

System StatusOperational

SYSTEM UPTIME99.9%+

DAILY THROUGHPUT300K+

INCIDENT RESPONSE<15min

INTEGRATION NODES12

System Load82%

CloudReadyPCI DSS CompliantDistributed SystemsObservabilityKubernetesArgoCDGrafana

Scroll

// System Profile

Production Systems Overview

I operate in live production environments not isolated development sandboxes. My work centers on maintaining reliability, observability, and fault handling across distributed systems that process hundreds of thousands of transactions daily.

I specialize in stabilizing real-world systems under load, managing cross-organization integrations, and ensuring that every component from API gateways to database clusters performs predictably at scale.

My approach combines disciplined software engineering with infrastructure operations, treating production environments as the single source of truth.

System Specifications

▸

Environment

Production Systems

▸

Domain

Fintech / Telecom

▸

Scale

High-volume production systems

▸

Reliability Focus

Observability, Fault Tolerance, RCA

▸

Compliance

PCI DSS, Secure Design

// Daily Tooling

Production Stack

Tools I operate daily in production environments. Every component is battle-tested under real traffic and real incidents.

Kubernetes

Operational

Container orchestration across EKS clusters. Pod management, rollouts, resource scaling, and health checks in production.

ClustersProduction

DeploymentsRolling

HealthStable

ArgoCD

Operational

GitOps delivery for all production deployments. Declarative sync policies, automated rollbacks, and application health monitoring.

Sync StatusHealthy

StrategyGitOps

RollbacksAutomated

Grafana

Operational

Unified observability dashboards for system health, application performance, and infrastructure metrics across all environments.

DashboardsCustom

Data SourcesMultiple

AlertsConfigured

Prometheus

Operational

Metrics collection and alerting for distributed systems. Custom alert rules, time-series analysis, and incident trigger pipelines.

CollectionReal-time

Alert RulesCustom

RetentionLong-term

AWS Lambda

Operational

Serverless compute for event-driven workloads. Integrated with API Gateway, SQS, and S3 for scalable production processing.

RuntimesNode.js / Go

TriggersEvent-driven

ScaleAuto

Terraform

Operational

Infrastructure as Code for provisioning and managing cloud resources. Modular configurations, state management, and pipeline-driven deployments.

ProviderAWS

StateRemote

WorkflowCI/CD

// System Modules

Capabilities

Each capability is a production-tested system module-battle-hardened across fintech and telecom environments handling real traffic and real failures.

▸AWS Lambda - serverless compute
▸Docker / Kubernetes - container orchestration
▸Cloud hosting - multi-environment deployments

▸CI/CD pipelines - automated build & deploy
▸Deployment automation - zero-downtime releases
▸Infrastructure troubleshooting - root cause analysis

▸Golang / Python / Node.js / Java - polyglot engineering
▸REST / GraphQL APIs - service interfaces
▸Integration systems - cross-platform connectivity

▸PostgreSQL / MySQL / MongoDB - relational & document stores
▸Performance tuning - query optimization, indexing
▸Data pipelines - ETL, streaming, batch processing

▸PCI DSS - compliant payment environments
▸OAuth2 / JWT - authentication & authorization
▸Secure system design - defense in depth

▸Incident response - real-time production triage
▸Root Cause Analysis - systematic failure investigation
▸Monitoring dashboards - metrics, logs, traces

▸Kubernetes cluster deployments and container orchestration in production
▸Deployment pipelines, rollback strategies, and uptime management
▸Pod scaling, resource management, and health checks

▸GitOps-based deployment workflows using ArgoCD
▸Declarative application delivery to production infrastructure
▸Sync policies, rollback management, and deployment observability

▸Grafana dashboards for system health and performance monitoring
▸Metrics visualization across distributed services
▸Alert configuration and incident response support

// Production History

Experience

Promoted↑

Jan 2024 – Present

Intermediate Software Engineer

APS Holdings

Supported and maintained cloud-based payment and integration platforms at production scale
Second-line incident response: investigating failed transactions, degraded integrations, and performance issues
Led integrations between internal platforms and external partner systems
Built and maintained Spring Boot microservices for partner-facing API integrations
Took technical ownership of cross-organisation integration workstreams involving Telecoms partners
Worked within PCI DSS requirements, managing authentication, access control, and audit needs
Contributed to CI/CD pipelines and containerized deployments

Same company

Jan 2024 – Jan 2025

Junior Software Engineer

APS Holdings

Developed and supported Golang-based backend services in live production environments
Assisted with production incident investigations and post-incident fixes
Worked with Kubernetes-based deployment pipelines, improving deployment speed and rollback reliability
Supported operational dashboards and reporting used by finance and operations teams

May 2023 – Dec 2023

Software Engineer

CartShare (Remote)

Developed backend services and integrations for e-commerce payment and data systems
Built and supported AWS Lambda-based workflows and analytics pipelines
Implemented automated deployment workflows to reduce time-to-market and deployment errors
Supported operational issues related to data consistency and transaction processing

Sep 2022 – Jan 2024

Software Engineer in Training

WeThinkCode_

Built full-stack applications using Python, Java, JavaScript, and SQL
Worked in agile teams delivering production-style projects with testing and documentation expectations
Achieved high test coverage and developed disciplined debugging and review practices

// Engineering Doctrine

How I Think About Systems

01

Systems must fail safely, not silently.

A failure you know about is a failure you can respond to. Silent failures cascade into outages. Every component should degrade gracefully and report its state.

02

Observability is not optional.

You can’t fix something you can’t observe. Just like you need to have metrics, logs, and traces in order to maintain a production system, you also need to be able to see everything in order to determine if it is ready for production. If something is not observable, then it’s not yet appropriate for production.

03

Complexity must be controlled, not eliminated blindly.

Managed complexity is the goal of a distributed system; the distributed system will never be zero complexity, however, it will have clear boundaries, explicitly defined interfaces, and where you have made deliberate trade-offs.

04

Production is the only real environment.

Development & staging are approximations of what production is going to be like. Production is where your system will behave in the way you expect based on real load, real data & real failures. You should design your system with production in mind.

05

Reliability is a feature, not an afterthought.

Every architectural decision has implications for reliability. Adding redundancy, retry strategy, circuit breaker, back pressure, etc. are not optional 'add-ons' they are part of the core design requirements.

// System Evolution

Education

B.S. Software Development

Expected 2026

BYU-Idaho

Bachelor's degree program in Software Development, focused on software engineering principles, data structures, and system design.

National Certificate: Information Technology (Systems Development) NQF 5

Aug 2022 – Jan 2024

WeThinkCode_

National certificate in systems development covering programming, system analysis, and software development methodologies. Completed through an intensive peer-led engineering program with emphasis on testing, agile delivery, debugging, and real-world software development practices.

// Contact Channel