Permanent
Junior Site Reliability Engineer
Unknown, Unknown
unknown
Closing date not listed
About this role
Engineering - Engineering Management
Gauteng
CONTRACT
Job Summary
About the roleThe position is responsible for contributing to the reliability, scalability, and performance of the company’s cloud-native infrastructure and production services.
Responsibilities:
* System Monitoring & Observability
* Configure and maintain monitoring tools (e.g., Prometheus, Datadog) to track key system metrics (latency, traffic, errors, saturation).
* Create and refine dashboards and alerts to ensure rapid detection of anomalies and potential outages.
* Assist in the implementation of distributed tracing and structured logging to improve debugging and performance analysis.
Incident Response & Management
* Participate in a 24/7 on-call rotation as a secondary responder, escalating issues as needed to senior team members.
* Follow incident response playbooks to diagnose and mitigate production incidents, aiming to restore service within defined SLOs.
* Contribute to blameless post-incident reviews by documenting timelines, root causes, and action items to prevent recurrence.
Automation & Infrastructure as Code
* Develop and maintain automation scripts (Python, Go, or Bash) to streamline repetitive operational tasks such as certificate rotation, user access management, and log rotation.
* Assist in managing cloud infrastructure using IaC tools (Terraform, CloudFormation) to ensure consistent, version-controlled, and repeatable deployments.
* Support CI/CD pipeline improvements (GitLab CI, GitHub Actions, Jenkins) to enable safe and efficient application deployments.
Capacity Planning & Performance Tuning
* Collect and analyse resource usage trends (CPU, memory, storage, network) to help forecast capacity needs and recommend scaling actions.
* Work with development teams to conduct load testing and identify performance bottlenecks.
Collaboration & Knowledge Sharing
* Partner with software engineers to implement service level indicators (SLIs) and define realistic service level objectives (SLOs).
* Document system architecture, operational runbooks, and common troubleshooting steps to empower the wider team.
* Actively participate in team agile ceremonies, providing input on reliability risks for upcoming features.
Beneficial Skills (Desired Skills):
* Container Orchestration: Hands-on experience with Kubernetes (cluster administration, Helm charts, pod autoscaling) or Docker Swarm.
* Programming & Scripting: Proficiency in at least one high-level language (Python, Go) for automation and tooling; comfort with shell scripting.
* CI/CD Pipelines: Familiarity with building and maintaining deployment pipelines, including canary deployments, feature flags, and rollback strategies.
* Observability Stack: Experience with Prometheus, Grafana, Loki, Tempo, or the ELK Stack (Elasticsearch, Logstash, Kibana).
* Networking: Working knowledge of load balancers (NGINX, HAProxy), DNS management, firewalls, and TCP/IP troubleshooting.
* Cloud Platforms: Exposure to AWS (EC2, EKS, RDS, S3) or equivalent cloud provider services.
* Security Best Practices: Understanding of identity and access management (IAM), secrets management (Vault, AWS Secrets Manager), and basic security hardening.
Minimum Requirements
* South African Unemployed youth between the ages of 18 and 34.
* Must not have participated on the YES programme before.
* Matric.
* Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field; or equivalent demonstrable experience in a systems or software engineering role.
Certifications & Licenses (Desired but not required):
* AWS Certified Solutions Architect – Associate or equivalent cloud certification.
* Certified Kubernetes Administrator (CKA) or CKAD.
* Any SRE-related or DevOps training certifications.
Technical Fundamentals:
* Solid understanding of Linux/Unix operating systems (systems, filesystems, process management, networking stack).
* Familiarity with at least one cloud provider (AWS, GCP, or Azure) and its core compute, storage, and networking services.
* Basic understanding of version control systems (Git) and collaborative development workflows (pull requests, code reviews).
Personal Characteristics:
* Problem-solving mindset: Able to break down complex issues into manageable components and systematically debug under pressure.
* Ownership: Takes responsibility for tasks from inception to completion, with a focus on quality and reliability.
* Curiosity: Passionate about learning new technologies and staying current with industry best practices in SRE and cloud engineering.
* Communication: Effectively conveys technical concepts to both technical and non-technical stakeholders, and documents work clearly.
* Team collaboration: Works well in a cross-functional team environment, providing and receiving constructive feedback.
Please consider your application unsuccessful when you have not heard from the Signa Opportunity team within two weeks of submitting your application.
Signa Opportunity
Recruiter