Reliability Engineer – Systems
Fantastic opportunity with a very successful company with cutting-edge technology and a staff consisting of many of the brightest quantitative talent in the world.
Reliability Engineering is a versatile group of full stack engineers, at the front line maintaining and expanding the capabilities of the companies many and varied systems. The team exists in the space between traditional systems administration and development and seeks to merge the capabilities from both disciplines.
– Primary engineering and operational support for multiple large distributed software applications including both the foundational infrastructure for the company and components of our trading environments.
– Improving all aspects of system and software reliability, including better monitoring, alerting, and documentation.
– Engage with the software engineering teams on support issues and improvements to tools, processes, and software.
– Act as a conduit between infrastructure and development teams, being sympathetic to the concerns and priorities of both.
– Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding.
– Educate DevOps best practices for the company.
– A bachelor’s degree in computer science or another highly technical, scientific discipline.
– In-depth knowledge and experience in at least one of: host based networking, Linux/UNIX administration, storage technologies, systems programming, distributed systems, host based networking, databases, cloud computing, and a strong desire to learn more.
– Ability to program (structured and OO) with one or more high level languages (such as Python, Java, C/C++, Go).
– The ability to leverage off the shelf and open source systems and utilities to provision production systems in a variety of domains, especially for multi-tenant use.
– A proven track record of automation and a systematic approach to solving problems.
– The ability to translate high level software and infrastructure requirements from idea to high quality implementation.
– A proactive approach to spotting problems, areas for improvement, and performance bottlenecks.
– The ability to understand the inherent trade-offs between various software architectures as it relates to performance, resiliency/fault tolerance, load balancing, and data consistency.
– Ability to profile and debug applications in real time.
Additional Qualifications Preferred:
– Experience with public cloud technologies (ex: EC2, GCS).
– Experience with authentication and encryption technologies like TLS, Kerberos, and GSSAPI.
– Experience with the analysis of network packet captures and an understanding of the OSI model, BGP, and multicast routing.
– Experience with Linux kernel and OS tunables as well as building custom kernels.
– Experience with automated configuration management tools such as Ansible, Chef, Puppet, SaltStack.
– Experience with distributed storage technologies like NFS, HDFS, Ceph, S3 as well as dynamic resource management frameworks, such as Mesos, Kubernetes or Yarn.
– Experience with enterprise messaging systems and concepts (ex: Kafka, JMS, MQ Series).
– Experience with observability and monitoring tools (ex: New Relic, Datadog, Prometheus, Nagios, VictorOps, Splunk).
– Experience with data visualization tools (ex: Kibana, Grafana, Tableau).
*Will Relocate The Right Candidate! Will Sponsor Visa’s. Will only consider candidates from top tier computer science universities and/or individuals with a stellar GPA. Bachelor’s and/or Master’s degree from a top computer science program with a GPA of 3.5 or higher. PhD preferred. Top computer science program preferred (Carnegie Mellon University, Massachusetts Institute of Technology/MIT, Stanford University, University of California-Berkeley, Cornell University, University of Illinois-Urbana-Champaign, Princeton University, University of Washington, University of Texas-Austin, Georgia Institute of Technology, California Institute of Technology, University of Wisconsin-Madison, University of Michigan-Ann Arbor, etc.
Keywords: New York NY Jobs, Reliability Engineer Systems, C/C++, Java, Python, Object Oriented, New York Recruiters, Information Technology Jobs, IT Jobs, New York Recruiting
If you are an employer and recruiting for similar IT professionals / positions, please contact our Technical Recruiters at Next Step Systems.
We are a national IT Recruiting Firm / Agency specializing in full-time direct hire Information Technology employment opportunities.
Company Will Sponsor Visas! Company Will Relocate Candidates!
“PLEASE DO NOT APPLY” If You Are A Consulting Firm, Third Party Recruiter Or Seeking Corp-To-Corp; W-2 Direct Hire Only.