Job Title: Data Engineer
Location: San Francisco, CA
Required Clearance: Secret
Salary: Competitive
We are looking for a skilled Data Engineer with a strong focus on AI and machine learning to join our dynamic team. The ideal candidate will play a critical role in designing, implementing, and optimizing data pipelines and infrastructure to support our AI-driven initiatives. You will collaborate closely with data scientists, machine learning engineers, and software developers to ensure robust data architecture and seamless integration of AI models.
Key Responsibilities:
- Design, build, and maintain scalable data pipelines and ETL processes to support AI and machine learning workflows.
- Collaborate with data scientists and machine learning engineers to understand data requirements and translate them into technical solutions.
- Implement data processing, transformation, and integration solutions to ensure high-quality, clean, and usable data for model training and evaluation.
- Optimize data storage and retrieval processes for performance and efficiency.
- Develop and maintain data infrastructure on cloud platforms (e.g., AWS, Google Cloud, Azure) to support AI model deployment and scaling.
- Implement monitoring and alerting mechanisms to ensure data pipeline reliability and availability.
- Ensure compliance with data governance and security standards throughout the data lifecycle.
- Stay current with industry trends and advancements in data engineering, AI, and machine learning.
- Participate in code reviews, team meetings, and contribute to a collaborative development environment.
- Document data architecture, processes, and workflows comprehensively.
Qualifications:
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- Proven experience as a Data Engineer with a focus on AI and machine learning projects.
- Strong proficiency in programming languages such as Python, Scala, Java, or SQL.
- Experience with big data technologies and frameworks such as Hadoop, Spark, Kafka, etc.
- Solid understanding of data modeling, database design, and data warehousing concepts.
- Experience with cloud-based data services (e.g., AWS S3, Redshift, EMR, Glue, Google BigQuery, Azure Data Lake).
- Familiarity with AI and machine learning frameworks and libraries such as TensorFlow, PyTorch, scikit-learn, etc.
- Strong problem-solving skills and attention to detail.
- Excellent communication and teamwork skills.
Preferred Qualifications:
- Experience with containerization technologies such as Docker and orchestration tools like Kubernetes.
- Knowledge of DevOps practices and CI/CD pipelines.