As a key member of the local Data Science team, you will be a trusted partner to the Senior Data Scientist and various stakeholders on data-related issues.
- Create and maintain optimal data pipeline architecture.
- Assemble large, complex data sets that meet functional/non-functional business requirements.
- Identify, design and implement internal process improvements: automate manual processes, optimize data delivery, re-design infrastructure for greater scalability.
- Build the infrastructure required for optimal extraction, transformation and loading of data from a wide variety of data sources using SQL, AWS & GCP ‘big data’ technologies.
- Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
- Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
- Keep our data separated and secured across national boundaries through multiple data centers and AWS regions.
- Create data tools for analytics and data scientist team members to assist them in building and optimizing our product into an innovative industry leader.
- Any other adhoc duties assigned when appropriate.
- Degree in Computer Science/Statistics/Information Systems with minimally 5 years of experience in a Data Engineer role
- Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases
- Has experience in building and optimizing ‘big data’ data pipelines, architectures and data sets
- Has experience with big data tools: Hadoop, Spark, Kafka, etc.
- Has experience with relational SQL and NoSQL databases, including Postgres and Cassandra.
- Has experience with data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
- Has experience with AWS cloud services: EC2, EMR, RDS, Redshift
- Has experience with GCP cloud services
- Has experience with stream-processing systems: Storm, Spark-Streaming, etc.
- Has experience with object-oriented/object function scripting languages: Python, Java, C++, Scala, etc.
- Strong project management and organizational skills
- Proven ability of building strong working relationships, both internal and external to the organization
- Strong analytic skills related to working with unstructured datasets