Introduction

I'm a senior ML engineer with good knowledge in both backend technologies and Data Science. I love to follow best practices and continuously deliver high quality projects. I enjoy enhancing my coding and architecture skills all the time. Moreover, I like working in a team and exchanging knowledge with teammates constantly.

Details

Name:
Zhiwei Zhang
Hobby:
Programming, Swimming, Movies, Traveling
Location:
Munich, Germany

Educations


Technical University of Munich

October 2020 - Present

Master - Computer Science (partial time)
Munich, Germany

Stanford University

October 2017 - November 2017

Course - Machine Learning
This course provided a broad introduction to machine learning, datamining, and statistical pattern recognition. Topics included:Supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks).Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning).Best practices in machine learning (bias/variance theory, innovation process in machine learning and AI). Also learned how to apply learning algorithms to building smart robots (perception, control), text understanding (web search, anti-spam), computer vision, medical informatics, audio, database mining, and other areas.
Coursera, Online | Certification

University of Freiburg

October 2013 - March 2017

Bachelor - Computer Science
The education was mainly C++ & python-based programming, but I also learned about SQL, UML, Machine Learning and more. During my time in college, I specialized in Information Retrieval. Here I learned about how to build a search engine, semantic search and indexing, as well as the importance of OOP-, web- and Iuser experience and design.
Freiburg, Germany

University of Siegen

April 2013 - September 2013

Language Course - German
Learning german and preparing the german language test.
Siegen, Germany

Central University for Nationalities

September 2007 - July 2011

Bachelor - Philosophy
The education mainly focus on the history of philosophy and religions. I specialized in scientific philosophy.
Beijing, China

Careers


Allianz SE

January 2021 - Present

Senior Data Scientist/ML Engineer/Solution Architect
Responsibilities:Researching and implementing Machine Learning (ML) algorithms.Designing, developing and deploying large scalable ML systems in private/public cloud platforms.Independently managing and leading ML system development projects and taking technical decisions.Working together with the team and stakeholders to translate business needs into data science questions and deliver practical solutions, bringing positive impact.Designing, developing and maintaining Allianz's ML tools with best practices collaboratively.Cross-validating models to ensure their generalisability.Ensuring good code quality in project development through code reviewing, pair programming.Delegating tasks to Data Scientists in order to realise the successful completion of projects.Monitoring the performance of Data Scientists and providing them with practical guidance, as needed.Suggesting ways in which insights obtained might be used to inform business strategies.Promoting culture of data-driven decision making within the organization.Impacts:Designed and helped the team successfully establish a mechanism drawing on tezos blockchain community, which makes the whole team works as a self-organised open-source community for problem solving and politics reduction.Actively participated in the technical part of the hiring process, responsible for designing the technical challenges, such as coding challenge, ML system design challenge.Led the team of data scientists and engineers to design, develop and deploy a large scalable Natural Language Processing (NLP) system in customer’s public cloud platform. The NLP system contains multiple components:a highly scalable and efficient online realtime website crawler implemented with Scrapy and FastAPI;an efficient spark pipeline consisting of multiple steps, such as text cleaning, language detection with fasttext, text lemmatisation with Spacy, bigrams generation, term reduction through linear correlation check, to process millions of crawled websites in hours;an automatic ML training pipeline with Louvain clustering and multiple ML models (classic ML algorithms, BERT, RNN) stacked in funnel approach,a highly scalable ML Model-Serving RESTful API service using FastAPI.As a solution architect, collaborated with the customer’s IT experts to setup spark in AWS and migrate the developed solution into customer’s system. Munich, Germany

Allianz SE

January 2019 - December 2020

Data Scientist/ML Engineer
Responsibilities:Researching and implementing ML algorithms.Developing and deploying large scalable ML systems in private/public cloud platforms.Verifying data quality and performing statistical analysis.Designing, developing and maintaining Allianz's ML tools with best practices collaboratively.Formulating and managing data-driven projects which are geared at furthering the business’s interests.Suggesting ways in which insights obtained might be used to inform business strategies.Cross-validating models to ensure their generalisability.Delegating tasks to Junior Data Scientists and mentoring them with practical guidance.Impacts:Jointly designed, implemented and published multiple python cookiecutter templates:a PyPi standard package template,a RESTful API service template using FastAPI,a helm charts template for deploying RESTful API services with Kubernetes in Azure and AWS,a nomad job file template for deploying Jupyter Hub, API services, Spark clusters in private cloud platform etc.Designed, developed and deployed a technology radar for the whole allianz group based on the thoughtworks one.Implemented an internal golang package for applying stress test to API services based on the open-source project Vegeta.Collaboratively worked together with data scientists on a large Optical Character Recognition (OCR) system development:Jointly developed a ML pipeline containing image segmentation via U-Net, image cropping and Optical Character Recognition via pyTesseract.Designed, developed and deployed a RESTful API service to serve the ML pipeline.Implemented the whole testing CI/CD pipeline including unit tests, integration test and end-to-end test.And led the team during the whole project restructure phase to clean the technical debts.Design a ML system for email classification based on kafka streaming event, and developed the MVP version with DistilBERT base model (uncased).Designed and implemented a kafka-based task queue management python package.Developed a feedback-providing interface and a ML model management interface for a news article classification ML system. Munich, Germany

Ernst & Young GmbH

April 2017 - December 2018

Data Scientist/Full Stack Python Developer
Responsibilities:Developing and deploying ML software systems including user interface.Designing and developing EY's python libraries with best practices.Identifying relevant data sources for business needs.Generating information and insights from data sets and identifying trends and patterns.Analysing, exploring and visualising data to prepare reports for executive and project teams.Preparing data for ML model training.Mentoring work students in project developments.Impacts:Designed and implemented a server side processing table librabry for Django/Flask projects.Designed and implemented an asynchronous task-management sub-system using celery and RabbitMQ.Collaborated with data scientists on the development of a large software system with Django for automating the global compensations.Worked in a team to design and develop the POC version of a ML system, which helped the team win the project campaign. Stuttgart, Germany

Freelance

August 2011 - March 2013

Tourist guide
Beijing, China

Abilities


Programming Languages

  • Python (10 years)
  • C++ (5 years)

Frameworks/ Libraries

  • numpy
  • pandas
  • scikit-learn
  • xgboost
  • spacy
  • Pytorch
  • tensorflow
  • pytesseract
  • opencv-python
  • PySpark
  • dusk
  • SQLAlchemy
  • Django
  • Flask
  • FastAPI
  • Scrapy
  • cookiecutter
  • celery
  • kafka-python
  • libxml (C++)
  • libsvm (C++)
  • boost (C++)
  • vegeta (golang)

Version Control

  • Git
  • SVN

CI/CD

  • GitLab-CI
  • GitHub-Actions
  • Bitbucket Pipelines
  • Jenkins

Databases

  • Postgres
  • MySQL
  • Microsoft SQL Server
  • Oracle Database
  • redis
  • Apache Cassandra

Others

  • Azure
  • AWS
  • NoMAD
  • Kubernetes
  • Docker
  • Docker-compose
  • Kafka
  • RabbitMQ
  • Bash
  • Powershell

Languages

  • Chinese
  • English
  • German
  • Spanish
  • Japanese

Projects


ytb-downloader

November 2022 – Present

Python Terminal Package
Download youtube videos and convert them to mp3.
Home


pyckage-cookiecutter

September 2021 – Present

Python Backend Package
A cookiecutter template that generates a full structure (including a predefined CI/CD pipeline in GitLab, GitHub and Bitbucket) for creating a PyPi standard package.
Home


kafkatasque

September 2019 – Present

Python Backend Package
A kafka-python based task management package.
Home


named_enum

November 2018 – Present

Python Backend Package
Basic Python package. It provides several enumeration classes, which extends the python default Enum class with various functionalities.
Home


sspdatatables

December 2018 – Present

Python Frontend Package
Python package for Django project. It uses the datatables package and provides a nice Django- project-friendly interface for implementing the server side processing table.
Home