DevOps Observability Engineer - Jobs at Teamwork Vietnam Ltd

Job level

Experienced (Non - Manager)

Salary

Job Descriptions

We are looking for a Senior DevOps Observability Engineer who is passionate about building, operating, and evolving modern observability platforms at scale. In this role, you will be a key technical authority for monitoring, logging, and alerting systems used in production environments supporting European (French) operations.

You will work hands-on with Prometheus, Grafana, and Loki, taking ownership of observability architecture, advanced dashboards, automation, and complex incident troubleshooting. This is an excellent opportunity for a senior engineer who enjoys combining deep technical expertise, automation, and operational excellence.

Key Responsibilities

Design, build, and maintain high‑quality dashboards in Grafana that provide clear, actionable insights for engineering and operations teams.
Develop, customize, and integrate Prometheus exporters tailored to application, infrastructure, and business requirements.
Act as Level 3 (L3) support for complex monitoring and observability incidents, including root cause analysis and post‑incident improvements.
Own and continuously improve the observability stack:
- Metrics: Prometheus
- Visualization: Grafana
- Logs: Loki
Automate deployments, upgrades, and operations of observability components and exporters using Ansible.
Improve alerting quality by reducing noise and increasing signal‑to‑noise ratio.
Collaborate closely with DevOps, SRE, and Platform teams to embed observability into system design.
Proactively identify scalability, performance, and reliability risks within monitoring and logging platforms.
Ensure high availability, resilience, and performance of observability services in production environments.

Job Requirement

Required Skills & Experience

Strong hands‑on expertise with Prometheus, Grafana, and Loki, including:
- Architecture and integration
- Customization and optimization
- Advanced troubleshooting in production
Solid scripting and automation skills (Bash, Python, or similar).
Proven experience using Ansible for configuration management and deployment automation.
Strong automation mindset, with a focus on reliability, repeatability, and operational efficiency.
Experience supporting production environments with high availability requirements.
Ability to work independently during French business hours (13:00–22:00 VN time).

Nice to Have

Experience with GitOps practices and tools (e.g. Argo CD).
Exposure to container platforms such as Rancher.
Practical experience with Kubernetes and cloud‑native architectures.
Familiarity with SRE concepts (SLIs, SLOs, error budgets).
Experience operating observability stacks at scale in enterprise environments.

Profile We’re Looking For

Deep passion for observability, monitoring, and logging.
Senior‑level problem solver who enjoys tackling complex production issues.
Autonomous, proactive, and detail‑oriented.
Comfortable taking technical ownership and driving improvements end‑to‑end.
Strong communication skills, especially when explaining complex technical topics clearly.

More Information

Degree: Bachelor
Age: Unlimited
Type of employment: Permanent

You should be skill

Site Reliability Engineer Infrastructure Engineer SYSTEMS ENGINEER platform engineer Automation Engineer DevOps

Apply for:

Your Contact Information