Data Engineer (Python, Big Data)
Tầng 7, VinMec International Clinic, Times City, Hai Ba Trung, Ha Noi
Không xác định
2021-09-13 -> 2021-09-14
- Bachelor's degree in a related field or a combination of education and relevant experience
- Knowledge, Skills and Abilities
- Good in programming language Python
- Experience in Hadoop ecosystem including HDFS, MapReduce, YARN, HBase, Zookeeper, Spark
- Experience in building large-scale data processing (batch-processing, stream processing)
- Experience with Apache Spark preferably in PySpark
- Understanding of SLA and meeting Timelines for support activities
- Experience with Data warehouse
- Experience in ETL
- Experience in Data management, Data integration
- Experience in SQL and NoSQL Database
- Ability to collaborate with team members.
- Adequate English communication skills.
- VinBrain is a company funded by Vingroup, the largest conglomerate in Vietnam by market capitalization. Our mission is to perform cutting-edge research and development of AI, Machine Learning, and Deep Learning technologies and products that will lead to improve healthcare system and quality of life. At VinBrain, we believe the greatest promise of Big Data lies in healthcare. We believe that by solving unique and challenging problems at the intersection of (medical) Big Data, AI, IoT, and IoP (Internet of People), we can improve the outcomes for patients around the globe. We have assembled seasoned researchers, engineers, and entrepreneurs from world-class companies such as Microsoft, Amazon, Adobe and Google to build the platform and services to achieve this. If you are motivated to be part of something special and to be a catalyst for improving lives and healthier communities, we want to talk to you.
- We are looking for an individual to support our team in managing develop, operate and drive scalable and resilient data platform based on Hadoop ecosystem to address the business requirements:
- Ensure industry best practices around data pipelines, metadata management, data quality, data governance and data privacy
- Design and implement business-specific large-scale data processing pipelines
- Work with complex data structures, manipulate, cleanse data, and perform transformations to make insights from data.
- Responsible to ingest data from files, streams, and databases. Process the data with PySpark, Kafka
- Develop efficient software code for multiple use cases leveraging Spark and Big Data Technologies for various use cases built on the platform
- Provide high operational excellence guaranteeing high availability and platform stability. Acts as liaison with various teams across the workplace.