Company OverviewLeading E-commerce company
Job DescriptionAre you interested in building the next generation of Internet services that touch hundreds of millions of users across the globe every day? our client is one of the leading e-commerce companies in the world. Their mission is to empower people and society through the Internet, and we aim to become the No.1 Internet Service Company in the world. Our Client's development unit is the core of our entire group that drives our business today. You will be joining a diverse global team and play a central role in our technology innovations.
The Cloud Platform Department is thinking big: we build large scale infrastructure and platform with high availability to empower the Group's ecosystem world wide. The Cloud Platform Department is looking for a site reliability engineer who will be working for the Data Solutions Group.
The Data Solutions Group has a huge mission: provide data related XaaS plaform to all group companies all over the world. We are searching for people who are passionate to work in a global scale.
Constantly re-evaluate the existing architecture, infrastructure and process and take actions to make a change
Guide the team to new technologies and best practices
Develop new functions and maintain the automated provisioning system and configuration management system
Automating operations for the existing system platform
Incident handling and trouble shooting (this includes being part of the 24x7 team)
Work with other team members who are in a different time zones
Over 3 years of experience as linux system admin (RHEL or CentOS or Ubuntu)
Over 2 years of experience writing groovy scripts and using Jenkins
Over 2 years of experience writing ansible playbook or chef cookbook
Over 3 years of experience managing one of the following products: MySQL, Cassandra, Oracle, Couchbase, Hadoop or kafka
Over 2 years of experience developing REST or RPC-based API
Deep understanding of networking protocols (TCP/IP, SSH, DHCP, HTTP, HTTPS, DNS, GOSSIP), packet structure and load balancing equipment
Deep understanding of monitoring technology (such as Prometheus, Nagios, grafana) and incident handling process
Operation experience using kubernetes or docker
Excellent written and verbal communication skills
Very strong will to automate everything
Strong eagerness to learn new technologies
Ability to effectively work with members living in different time zones
Enjoy being in a situation where you feel constantly on the edge of the cliff