Their work is fast paced and expansive. They build models, coalesce data sources, interpret results, and build services and occasionally products that enhance clients’ ability to derive value from data and that upgrade their decision-making capabilities.
Solutions feature the latest in data science tools, machine learning / A.I. algorithms, software engineering disciplines, and analytical techniques to make an extraordinary impact on clients and societies. Take the opportunity to operate at the intersection of “cool tech” and real world problems faced by some of the world's leading companies.
- You will work with people as passionate and awesome as yourself - our client has a "no jerks" policy
- You will get variety of tech, industries, projects, and clients
- You will deliver work that has real impact in how our clients do business
- You'll be invested in
- You'll be helped to grow your career while remaining hands-on and technical
- You will work in smaller, more agile, flatter teams than is the norm elsewhere
- You will be empowered and have more autonomy and responsibilities than almost anywhere else
- You will help recruit your future colleagues
- You will be offered competitive compensation and benefits
The Data Engineering Tech Lead is the universal translator between IT, business, software engineers, and Data Scientists, working directly with clients and project teams. S/he works to understand the business problem being solved and provides the data required to do so, delivering at the pace of the consulting teams and iterating data to ensure quality as understandings crystallize.
Historical focus has been on high-performance SQL data marts for batch analytics, but clients are now driving toward new data stores and cluster-based architectures to enable streaming analytics and scaling beyond current terabyte-level capabilities. Your ability to tune high-performance pipelines using Scala or similar will help to rapidly deploy some of the latest machine learning frameworks and other advanced analytical techniques at scale.
You will serve as a keystone on larger projects, enabling delivery of solutions hand-in-hand with consultants, data science specialists, and software engineers.
Key Role Attributes
- Understand the overall problem being solved and what flows into them
- Create and implement data engineering solutions using modern software engineering practices
- Scale up from “laptop-scale” to “cluster scale” problems, in terms of both infrastructure and problem structure and technique
- Deliver tangible value very rapidly, working with diverse teams of varying backgrounds
- Codify best practices for future reuse in the form of accessible, reusable patterns, templates, and code bases
- Technical background in computer science, data science, machine learning, artificial intelligence, statistics or other quantitative and computational science
- A compelling track record of designing and deploying large scale technical solutions, which deliver tangible, ongoing value
- Direct experience having built and deployed complex production systems that implement modern, data scientific methods at scale and do so robustly
- Comfort in environments where large projects are time-boxed and therefore consequential design decisions may need to be made and acted upon rapidly
- Fluency with cluster computing environments and their associated technologies, and a deep understanding of how to balance computational considerations with theoretical properties of potential solutions
- Ability to context-switch, to provide support to dispersed teams which may need an “expert hacker” to unblock an especially challenging technical obstacle
- Demonstrated ability to deliver technical projects with a team, often working under tight time constraints to deliver value
- An ‘engineering’ mindset, willing to make rapid, pragmatic decisions to improve performance, accelerate progress or magnify impact; recognizing that the ‘good’ is not the enemy of the ‘perfect’
- Comfort with working with distributed teams on code-based deliverables, using version control systems and code reviews
- Demonstrated expertise working with and maintaining open source data analysis platforms, including but not limited to:
- Pandas, Scikit-Learn, Matplotlib, TensorFlow, Jupyter and other Python data tools
- Spark (Scala and PySpark), HDFS, Kafka and other high volume data tools
- SQL and NoSQL storage tools, such as MySQL, Postgres, Cassandra, MongoDB and ElasticSearch
- Demonstrated fluency in modern programming languages for data science, covering a wide gamut from data storage and engineering frameworks through to machine learning libraries
- Deep understanding of the architecture, performance characteristics and limitations of modern storage and computational frameworks, with experience implementing solutions that leverage: HDFS/Hive; Spark/MLlib; Kafka, etc.
- A history of compelling side projects or contributions to the Open Source community is valued but not required
- Willingness to travel as required for cases (~25%)
CEI No: R1112169 | Licence No: 07C3147