Data Engineer
Pune Day Remote 3 years experience
Machintel is a leading B2B marketing services company helping businesses achieve their marketing goals through innovative and data-driven strategies. We are seeking a Senior Data Engineer to join our dynamic team. This is an exciting opportunity to significantly impact a rapidly evolving industry.
Skills
- A Bachelor’s degree and a minimum of 3 years of relevant experience as a data engineer
- Hands-on deployment experience with Hadoop/Spark, Scala, MySQL, Redshift, and Amazon AWS or other cloud base systems
- Comfortable writing code in Python, Ruby, Perl, or equivalent scripting language
- Experience with Cosmos/Scope, SQL, or Hadoop
- At least 3 years of professional work experience programming in Python, Java or Scala
- 2+ years of Distributed Computing frameworks such as Apache Spark, Hadoop
Responsibilities
- Design and develop ETL (extract-transform-load) processes to validate and transform data, calculate metrics and attributes, and populate data models using HADOOP, Spark, SQL, and other technologies
- Experience in Cloud technologies like S3, databases and so on.
- Lead by example, demonstrating best practices for code development and optimization, unit testing, CI/CD, performance testing, capacity planning, documentation, monitoring, alerting, and incident response to ensure data availability, quality, usability and required performance.
- Use programming languages such as SAS, R, Python, and SQL to create automated data gathering, cleansing, reporting, and visualization processes.
- Implement systems for tracking data quality, usage, and consistency
- Design and develop new data products using languages
- Monitor and maintain system health and security
- Oversee administration and improvements to source control and deployment process.
- Prepare unit tests for all work to be released to our live environment (including data validation scripts for data set releases or changes)
- Implement performance tuning on the databases based on monitoring
- Design and implement data products using Hadoop technologies
- Clear documentation of process flow diagrams and best practices
- Design and implementation of multi-source data channels and ETL processes
- Working experience with AWS services such as EMR, Athena, Glue, Redshift and Lambda.