Introduction
I'm a senior ML engineer with good knowledge in both backend technologies and Data Science. I love to follow best practices and continuously deliver high quality projects. I enjoy enhancing my coding and architecture skills all the time. Moreover, I like working in a team and exchanging knowledge with teammates constantly.
Details
Name:
Zhiwei Zhang
Hobby:
Programming, Swimming, Movies, Traveling
Location:
Munich, Germany
Educations
Technical University of Munich
October 2020 - Present
Master - Computer Science (partial time)
Munich, Germany
Stanford University
October 2017 - November 2017
Course - Machine Learning
This course provided a broad introduction to machine learning, datamining, and statistical pattern recognition. Topics included:Supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks).Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning).Best practices in machine learning (bias/variance theory, innovation process in machine learning and AI). Also learned how to apply learning algorithms to building smart robots (perception, control), text understanding (web search, anti-spam), computer vision, medical informatics, audio, database mining, and other areas.
Coursera, Online
|
Certification
University of Freiburg
October 2013 - March 2017
Bachelor - Computer Science
The education was mainly C++ & python-based programming, but I also learned about SQL, UML, Machine Learning and more. During my time in college, I specialized in Information Retrieval. Here I learned about how to build a search engine, semantic search and indexing, as well as the importance of OOP-, web- and Iuser experience and design.
Freiburg, Germany
University of Siegen
April 2013 - September 2013
Language Course - German
Learning german and preparing the german language test.
Siegen, Germany
Central University for Nationalities
September 2007 - July 2011
Bachelor - Philosophy
The education mainly focus on the history of philosophy and religions. I specialized in scientific philosophy.
Beijing, China
Careers
Allianz SE
January 2021 - Present
Senior Data Scientist/ML Engineer/Solution Architect
Responsibilities:Researching and implementing Machine Learning (ML) algorithms.Designing, developing and deploying large scalable ML systems in private/public cloud platforms.Independently managing and leading ML system development projects and taking technical decisions.Working together with the team and stakeholders to translate business needs into data science questions and deliver practical solutions, bringing positive impact.Designing, developing and maintaining Allianz's ML tools with best practices collaboratively.Cross-validating models to ensure their generalisability.Ensuring good code quality in project development through code reviewing, pair programming.Delegating tasks to Data Scientists in order to realise the successful completion of projects.Monitoring the performance of Data Scientists and providing them with practical guidance, as needed.Suggesting ways in which insights obtained might be used to inform business strategies.Promoting culture of data-driven decision making within the organization.Impacts:Designed and helped the team successfully establish a mechanism drawing on tezos blockchain community, which makes the whole team works as a self-organised open-source community for problem solving and politics reduction.Actively participated in the technical part of the hiring process, responsible for designing the technical challenges, such as coding challenge, ML system design challenge.Led the team of data scientists and engineers to design, develop and deploy a large scalable Natural Language Processing (NLP) system in customer’s public cloud platform. The NLP system contains multiple components:a highly scalable and efficient online realtime website crawler implemented with Scrapy and FastAPI;an efficient spark pipeline consisting of multiple steps, such as text cleaning, language detection with fasttext, text lemmatisation with Spacy, bigrams generation, term reduction through linear correlation check, to process millions of crawled websites in hours;an automatic ML training pipeline with Louvain clustering and multiple ML models (classic ML algorithms, BERT, RNN) stacked in funnel approach,a highly scalable ML Model-Serving RESTful API service using FastAPI.As a solution architect, collaborated with the customer’s IT experts to setup spark in AWS and migrate the developed solution into customer’s system.
Munich, Germany
Allianz SE
January 2019 - December 2020
Data Scientist/ML Engineer
Responsibilities:Researching and implementing ML algorithms.Developing and deploying large scalable ML systems in private/public cloud platforms.Verifying data quality and performing statistical analysis.Designing, developing and maintaining Allianz's ML tools with best practices collaboratively.Formulating and managing data-driven projects which are geared at furthering the business’s interests.Suggesting ways in which insights obtained might be used to inform business strategies.Cross-validating models to ensure their generalisability.Delegating tasks to Junior Data Scientists and mentoring them with practical guidance.Impacts:Jointly designed, implemented and published multiple python cookiecutter templates:a PyPi standard package template,a RESTful API service template using FastAPI,a helm charts template for deploying RESTful API services with Kubernetes in Azure and AWS,a nomad job file template for deploying Jupyter Hub, API services, Spark clusters in private cloud platform etc.Designed, developed and deployed a technology radar for the whole allianz group based on the thoughtworks one.Implemented an internal golang package for applying stress test to API services based on the open-source project Vegeta.Collaboratively worked together with data scientists on a large Optical Character Recognition (OCR) system development:Jointly developed a ML pipeline containing image segmentation via U-Net, image cropping and Optical Character Recognition via pyTesseract.Designed, developed and deployed a RESTful API service to serve the ML pipeline.Implemented the whole testing CI/CD pipeline including unit tests, integration test and end-to-end test.And led the team during the whole project restructure phase to clean the technical debts.Design a ML system for email classification based on kafka streaming event, and developed the MVP version with DistilBERT base model (uncased).Designed and implemented a kafka-based task queue management python package.Developed a feedback-providing interface and a ML model management interface for a news article classification ML system.
Munich, Germany
Ernst & Young GmbH
April 2017 - December 2018
Data Scientist/Full Stack Python Developer
Responsibilities:Developing and deploying ML software systems including user interface.Designing and developing EY's python libraries with best practices.Identifying relevant data sources for business needs.Generating information and insights from data sets and identifying trends and patterns.Analysing, exploring and visualising data to prepare reports for executive and project teams.Preparing data for ML model training.Mentoring work students in project developments.Impacts:Designed and implemented a server side processing table librabry for Django/Flask projects.Designed and implemented an asynchronous task-management sub-system using celery and RabbitMQ.Collaborated with data scientists on the development of a large software system with Django for automating the global compensations.Worked in a team to design and develop the POC version of a ML system, which helped the team win the project campaign.
Stuttgart, Germany
Freelance
August 2011 - March 2013
Tourist guide
Beijing, China
Abilities
Programming Languages
- Python (10 years)
- C++ (5 years)
Frameworks/ Libraries
- numpy
- pandas
- scikit-learn
- xgboost
- spacy
- Pytorch
- tensorflow
- pytesseract
- opencv-python
- PySpark
- dusk
- SQLAlchemy
- Django
- Flask
- FastAPI
- Scrapy
- cookiecutter
- celery
- kafka-python
- libxml (C++)
- libsvm (C++)
- boost (C++)
- vegeta (golang)
Version Control
- Git
- SVN
CI/CD
- GitLab-CI
- GitHub-Actions
- Bitbucket Pipelines
- Jenkins
Databases
- Postgres
- MySQL
- Microsoft SQL Server
- Oracle Database
- redis
- Apache Cassandra
Others
- Azure
- AWS
- NoMAD
- Kubernetes
- Docker
- Docker-compose
- Kafka
- RabbitMQ
- Bash
- Powershell
Languages
- Chinese
- English
- German
- Spanish
- Japanese
Projects
ytb-downloader
November 2022 – Present
Python Terminal Package
Download youtube videos and convert them to mp3.
Home
pyckage-cookiecutter
September 2021 – Present
Python Backend Package
A cookiecutter template that generates a full structure (including a predefined CI/CD pipeline in GitLab, GitHub and Bitbucket) for creating a PyPi standard package.
Home
kafkatasque
September 2019 – Present
Python Backend Package
A kafka-python based task management package.
Home
named_enum
November 2018 – Present
Python Backend Package
Basic Python package. It provides several enumeration classes, which extends the python default Enum class with various functionalities.
Home
sspdatatables
December 2018 – Present
Python Frontend Package
Python package for Django project. It uses the datatables package and provides a nice Django- project-friendly interface for implementing the server side processing table.
Home