Hello,

Sign in to find your next job.

DevOps Observability Engineer

Teamwork Vietnam Ltd

Lầu 3, Tòa nhà Anna, công viên phần mềm quang trung, quận 12, HCM

Posted date:

Job level

Experienced (Non - Manager)

Salary

Job Descriptions

We are looking for a Senior DevOps Observability Engineer who is passionate about building, operating, and evolving modern observability platforms at scale. In this role, you will be a key technical authority for monitoring, logging, and alerting systems used in production environments supporting European (French) operations.

You will work hands-on with Prometheus, Grafana, and Loki, taking ownership of observability architecture, advanced dashboards, automation, and complex incident troubleshooting. This is an excellent opportunity for a senior engineer who enjoys combining deep technical expertise, automation, and operational excellence.

Key Responsibilities

  • Design, build, and maintain high‑quality dashboards in Grafana that provide clear, actionable insights for engineering and operations teams.
  • Develop, customize, and integrate Prometheus exporters tailored to application, infrastructure, and business requirements.
  • Act as Level 3 (L3) support for complex monitoring and observability incidents, including root cause analysis and post‑incident improvements.
  • Own and continuously improve the observability stack:
    • Metrics: Prometheus
    • Visualization: Grafana
    • Logs: Loki
  • Automate deployments, upgrades, and operations of observability components and exporters using Ansible.
  • Improve alerting quality by reducing noise and increasing signal‑to‑noise ratio.
  • Collaborate closely with DevOps, SRE, and Platform teams to embed observability into system design.
  • Proactively identify scalability, performance, and reliability risks within monitoring and logging platforms.
  • Ensure high availability, resilience, and performance of observability services in production environments.

Job Requirement

Required Skills & Experience

  • Strong hands‑on expertise with Prometheus, Grafana, and Loki, including:
    • Architecture and integration
    • Customization and optimization
    • Advanced troubleshooting in production
  • Solid scripting and automation skills (Bash, Python, or similar).
  • Proven experience using Ansible for configuration management and deployment automation.
  • Strong automation mindset, with a focus on reliability, repeatability, and operational efficiency.
  • Experience supporting production environments with high availability requirements.
  • Ability to work independently during French business hours (13:00–22:00 VN time).

Nice to Have

  • Experience with GitOps practices and tools (e.g. Argo CD).
  • Exposure to container platforms such as Rancher.
  • Practical experience with Kubernetes and cloud‑native architectures.
  • Familiarity with SRE concepts (SLIs, SLOs, error budgets).
  • Experience operating observability stacks at scale in enterprise environments.

Profile We’re Looking For

  • Deep passion for observability, monitoring, and logging.
  • Senior‑level problem solver who enjoys tackling complex production issues.
  • Autonomous, proactive, and detail‑oriented.
  • Comfortable taking technical ownership and driving improvements end‑to‑end.
  • Strong communication skills, especially when explaining complex technical topics clearly.

More Information

  • Degree: Bachelor
  • Age: Unlimited
  • Type of employment: Permanent

You should be skill

Apply for:

Your Contact Information

Your resume

Upload resume (Only supports *.doc, .*docx, *.pdf and less than 3 MB).

Choose file other source (Dropbox)

CareerViet.vn - Mạng Việc làm & Tuyển dụng lớn nhất thế giới

Công Ty Cổ Phần CareerViet Trụ̣ sở: 139 Pasteur, Phường Võ Thị Sáu, Quận 3, TP.HCM

MST: 0303284985Ngày cấp: 25/04/2013 Nơi cấp: Sở Kế Hoạch Và Đầu Tư Thành Phố Hồ Chí MinhĐiện thoại: (84.28) 3822-6060 Email: contact@careerviet.vn