Job Description: We are seeking an experienced Data Engineer to join our team. In this role, you will be responsible for building, maintaining, and optimizing data pipelines and architectures that support data-driven decision-making across the organization. You will work closely with data scientists, analysts, and other stakeholders to ensure that data is accessible, reliable, and ready for analysis.
Responsibilities:
- Design, develop, and maintain scalable and efficient data pipelines to collect, process, and store large volumes of data from multiple sources.
- Build and maintain ETL (Extract, Transform, Load) processes to transform raw data into actionable insights.
- Design and implement data warehouses, data lakes, and other storage solutions to ensure data is stored efficiently and is easily accessible for analysis.
- Develop and optimize database schemas and structures to support high-performance queries and analytics.
- Integrate data from a variety of sources, including internal systems, APIs, and third-party tools, into a centralized data platform.
- Ensure data integrity and consistency across multiple systems.
- Automate data workflows and optimize data processing for performance and scalability.
- Perform data quality checks and implement monitoring tools to ensure the reliability and accuracy of data pipelines.
- Work with data scientists and analysts to understand data requirements and design solutions that meet their needs.
- Collaborate with business teams to understand data requirements and deliver timely, accurate, and actionable insights.
- Maintain clear and comprehensive documentation for data architecture, pipeline designs, and data models.
- Stay current with the latest trends and technologies in data engineering and data management.
- Continuously improve the performance, reliability, and scalability of data pipelines and storage solutions.
Experience & Skills:
- Proven experience as a Data Engineer or in a similar role, with expertise in building and maintaining large-scale data pipelines.
- Strong proficiency in SQL and experience working with relational databases (e.g., PostgreSQL, MySQL, SQL Server).
- Hands-on experience with data processing frameworks like Apache Spark, Hadoop, or Flink.
- Familiarity with cloud platforms such as AWS, Azure, or Google Cloud Platform and related data tools (e.g., Amazon Redshift, Google BigQuery).
- Experience with ETL tools like Apache Airflow, Talend, Informatica, or similar.
- Knowledge of scripting and programming languages such as Python, Java, or Scala for automating data tasks and building data pipelines.
- Familiarity with data storage technologies such as HDFS, NoSQL databases (e.g., MongoDB, Cassandra), or data lakes.
Certifications (Preferred):
- Relevant cloud certifications (e.g., AWS Certified Big Data – Specialty, Google Cloud Professional Data Engineer) are a plus.
Additional Skills (Preferred):
- Experience with containerization technologies like Docker and Kubernetes for deploying data pipelines.
- Understanding data modeling, data governance, and data security best practices.
- Knowledge of data visualization tools and platforms (e.g., Tableau, Power BI) is a plus.
Educational Background:
A degree in Computer Science, Information Technology, Data Engineering, or a related field, or equivalent work experience.