Permanent
Platform Engineer (AWS, GitHub Actions, Heroku CI) (JHB)
Unknown, Unknown
unknown
Closing date not listed
About this role
IT/Computer - Other IT/Computer
Johannesburg
FULL TIME
Job Summary
Platform Engineer (AWS, GitHub Actions, Heroku CI) (JHB)
IT - Software Development ~ IT - Infrastructure
Johannesburg - Gauteng - South Africa
ENVIRONMENT:
A provider of cutting-edge Financial Tools in Joburg seeks the technical expertise of a Platform Engineer to manage Heroku pipelines, CI/CD, review apps, and production environments. You will also operate Celery workers and queues, monitor health, and handle missed task check-ins, manage Cloudflare for DNS, edge security, and performance optimisation & collaborate with Developers to streamline workflows and educate on secure coding practices. The ideal candidate must have 3+ years operating production apps on Heroku, AWS, DigitalOcean, or similar, CI/CD pipelines: Hands-on experience with GitHub Actions, Heroku CI, or equivalent; solid Git fundamentals and Monitoring & incident response: Experience with Sentry, Papertrail (or similar), logs, and uptime/performance dashboards.
DUTIES:
Reliability & Operations -
* Own uptime, performance, and monitoring for all production applications.
* Manage Heroku pipelines, CI/CD, review apps, and production environments.
* Operate Celery workers and queues, monitor health, and handle missed task check-ins.
* Define and track service level objectives (SLOs) (availability, latency, task success rate).
* Maintain runbooks, a centralised wiki for incident response, and lead post-mortems.
* Run periodic disaster recovery drills and coordinate Penetration Tests.
Platform Engineering -
* Keep environments current (Heroku stacks, Postgres/Redis versions, DO/AWS base images).
* Manage daily backups, ensure restore tests and disaster recovery runbooks are in place.
* Standardise infrastructure (Terraform or scripts for DO/AWS; app.json for Heroku).
* Manage Cloudflare for DNS, edge security, and performance optimisation.
* Tune performance (DB indices, query optimisation, cache usage, Celery queue design).
* Optimise infrastructure costs across Heroku, DigitalOcean, and AWS.
Developer Experience & CI/CD -
* Maintain CI pipelines with type checking, linting, and security scanning.
* Enforce test coverage and automate deploy checks (smoke tests, migration health, error budgets).
* Support Developers with tooling for local/staging environments and build self-service dashboards (e.g., Celery queue status).
* Collaborate with Developers to streamline workflows and educate on secure coding practices.
Security & Compliance -
* Own vulnerability management and dependency patching cadence.
* Manage access reviews, secrets, MFA/SSO, and enforce least-privilege IAM policies.
* Implement encryption for data at rest and in transit (e.g., S3 server-side encryption).
* Contribute evidence and responses for security questionnaires and SOC 2 audits.
* Maintain a security pack with architecture, sub-processors, and DR/backup processes.
Monitoring & Alerting -
* Configure Sentry ownership rules, Cron Monitors, and release health.
* Centralise metrics/logs (Heroku metrics, Papertrail, Sentry, APM, Prometheus/New Relic).
* Set up alerts on golden signals (latency, errors, traffic, saturation) and avoid alert fatigue.
* Conduct capacity planning and track resource usage trends.
Vendor & External Services -
* Evaluate and manage vendor relationships (e.g., Mailgun, Twilio) to ensure service level agreements (SLAs) and performance.
* Assess new tools/services to enhance platform capabilities (e.g., observability, security).
* Track costs, security posture, and integration quality for all third-party services.
REQUIREMENTS:
Must-Haves -
* Cloud Infrastructure Management: 3+ years operating production apps on Heroku, AWS, DigitalOcean, or similar.
* CI/CD pipelines: Hands-on experience with GitHub Actions, Heroku CI, or equivalent; solid Git fundamentals.
* Monitoring & incident response: Experience with Sentry, Papertrail (or similar), logs, and uptime/performance dashboards.
* Security Fundamentals: Understanding of IAM, encryption in transit/at rest, MFA/SSO, and secure configuration practices.
* Disaster recovery & backups: Experience implementing and operating automated backups, restore testing, and writing/maintaining incident runbooks.
* Communication & collaboration: Ability to document processes clearly and work closely with Developers in a small team.
Strong Plus -
* Infrastructure as Code & automation: Experience with Terraform, Docker, or equivalent tooling.
* Asynchronous workloads: Familiarity with Celery, Redis, or other task queues and message brokers.
* Scaling & cost optimisation: Capacity planning, performance tuning, and managing infra spend.
* Compliance frameworks: Exposure to SOC 2, GDPR, or supporting client security questionnaires.
* Incident management: Participation in on-call rotations, leading post-mortems, or serving as incident commander.
Nice-to-Haves -
* Certifications (AWS Certified DevOps Engineer, CKS, or equivalent).
* Proficiency in Python; familiarity with Django/Flask.
* Experience with DNS/CDN/edge security (e.g., Cloudflare).
* Observability platforms (Prometheus, Grafana, New Relic).
* Static analysis and code quality tools (mypy, Bandit, SonarQube).
* Prior exposure to multi-tenant SaaS environments.
Apply for this Job
Datafin
Recruiter