My Projects:

Visual Question Rewriting for Increasing Response Rate

This is a project I completed as my Master Project. When browsing the image related content online, questions or topics containing more details and emotion have higher possibility to receive attention. Unfortunately, people who are not good at it may suffer. To solve it, I proposed a new NLP task named Visual Language Rewriting (VQR) which reconstructs a bland question into a detailed and emotional style according to its paired image. For the VQR task, the ability on multi-modal understanding is required.

View more on Github or paper.

Created a VQR dataset with 4k of bland-emotional sentences and image triples. All data are collected from Houzz.com and then are cleaned with de-emotionalizing and simplifying original sentence.
Trained a Seq2Seq based model as our baseline with attention mechanism employed on both text sequence and visual feature extracted from encoder and object detector.
Designed an advanced Transformer based model with the visual information as conditional input. The language generation task is viewed as a sentence completion task as UniLM's work.
Both the auto and Amazon Turks evaluation proves the advancement of the Transformer+Vis model.

Virtual Wardrobe – Outfit Recommendation based on Style Match

This is a project I completed during Google Internship. The goal of this project is to help user to find out great outfit which could be achieved with apparels in their own wardrobe.

Recommended style-compatible apparels from the user’s virtual wardrobe to build outfits with user’s chosen apparel.
Created a dataset of 200M apparel triples for apparel style compatibility model training, developed a neural network with triplet loss in TensorFlow, and achieved 0.94 AUC score on the test dataset with the best performance model.
Designed the pipeline in C++ to utilize the trained model to select out style-compatible apparel in wardrobe which can achieve pretty outfits with user’s chosen apparel, and to retrieve visual examples for all potential outfits in the style corpus of 100M images
Built multi demos for the whole project with multithreading on compatibility selection and low latency image retrieval

Data Augmentation for Rare Traffic Signs to Boost the Detection Performance

This project is part of my Sensetime internship. When training the traffic sign detection model for self-driving vehicles, because of the hardness to create a dataset with all categories balanced, I research to generate synthesized images fo rare categories. Here, we initialized with the application of style transformation on traffic object generation.

Because of the policy, no detail or code can be shared publicly.

Initiated research on style transfer to generate rare traffic signs in real traffic scenes, providing more balanced dataset in the following task of traffic sign detection
Analyzed distribution of traffic signs from 6 categories in 500k images of real traffic scenes, created synthesized images by replacing original traffic signs with rare ones and achieved balance in each category
Implemented WCT (whitening and coloring transform) and local smoothing algorithm with Pytorch to transfer style from context to traffic sign objects in the synthesized images
Achieved 94% recall rate in traffic sign detection using the augmented balanced images (35% using original imbalanced dataset and 98% using real balanced dataset under the same 0.1 FPPI)

NVIDIA AICity Challenge 2019 (CVPRW 2019)

In this challenge, I collaborated with students and professors from three different universities. Our algorithms rank 8th, 13th, and 3rd on three tracks respectively among dozens of teams from academia and industry. During the competition, I held weekly meetings, covered the work on track-3 and contributed to the other two tracks.

View more on Github or paper.

Collaborated with professors and students from three universities; held and participated in weekly meetings; independently accomplished the track-3 task and contributed to track-1 and track-2
Polished the framework I proposed in last year's Challenge for detection and tracking
Trained a FPN-R-FCN vehicle detection model on more vehicle data and improve the classification model
Replaced the simple IoU strategy with DaSiamRPN single object tracking algorithms to obtain more accurate timestamps during backtracking in origin video
Achieved a competitive result of 0.7027 F1-score and 7.4679 RMSE on a much harder dataset compared with last year's challenge

Advanced Driving Assistance System (ADAS)

This is a project during the internship at Sensetime. Our ADAS system is used in Honda self-driving. My job is to improved the performance of detection model on traffic objects, including traffic light (four status), traffic sign (20 categories) and PVB (pedestrain, vehicle, bike). Besides, due to the limited computation resource, I fused all model together by using shared convolution layers while maintaining the perfect performance.

This project cannot be put on Git cause of the policy.

Innovated a multi-task architecture in Caffe to accomplish three traffic object detection tasks simultaneously, namely PVB (Pedestrian, Vehicle and Bike) detection, traffic light detection and traffic sign detection
Shared backbone across three detectors consisted of RPN and Fast R-CNN, which was updated by normalized gradients coming from different detectors
Exploited mimic learning to refine detectors towards the performance of individually trained networks
Met Honda’s requirements with significantly increased training and inference speed as well as reduced GPU storage

Understanding Vehicle Density in Traffic Surveillance

When talking about understanding the traffic, the number of vehicles on road is a key to describe the congestion degree. While it is hard for detection models to predict in hard cases (like traffic jams or bad weather). Therefore, borrowing the ability of crowd density estimation methods to work in extreme congestion, I designed an FCN, whose backbone is Inception-v3, to predict the density map. To obtain the accurate estimation, during training I implemented two losses on density map and bias respectively. The bias means the difference between the ground truth of the total number and the number summed on the predicted density map. Finally, the model outperforms the detection based method and runs in real-time.

View more on Github.

Used a density prediction model with better robustness for traffic jam and low-quality video to obtain the total number of vehicles in real-time
Designed an FCN with inception-v3 as the backbone to predict density map
Trained the model with two losses to restrain on density map and total number respectively
Achieved 94.2% accuracy with speed of 20fps and outperformed the detection-based model

Unsupervised Anomaly Detection for Traffic Surveillance (CVPRW 2018)

This work is for NVIDIA AI CITY CHALLENGE 2018 track-2, traffic anomaly detection in surveillance video. We figure out the nature among all broken vehicles, that is whenever an anomaly happens, it leads to at least one stopped vehicle, which becomes part of the video background. According to this finding, we designed a framework and our algorithm ranks the 2nd in the final competition.

View more on Github.

Also, you can find the demo video here, and paper is here.

Designed a novel unsupervised system to detect abnormal vehicles in traffic surveillance videos that could adapt to various scenes without special treatment
Utilized MOG2 to capture the background frames and implement Faster-RCNN to detect vehicles in background frames, as the abnormal vehicles stay in the video background for a long time; trained a VGG as the classifier to eliminate false detected bounding boxes
Trained a ReID model, whose backbone is ResNet, with triplet loss to complete the similarity comparison when meeting camera-movement or vehicles waiting for the traffic light
Designed a decision model to determine whether a detected vehicle is belonged to anomaly according to the bounding boxes obtained above
Achieved 0.81 F1-score and 10.2 RMSE in traffic anomaly detection dataset on NVIDIA AI CITY Challenge

License Plates Recognition Based on Segmentation and Multi-Label Classification

As we all know, almost all existing license plate recognition algorithm can only be utilized in homogeneous scenes. Therefore, I built a license plate recognition system to recognize the Chinese license plate from multi-oriented images which could handle all kinds of scenes in real-time with great robustness. My system can work in real-time and real-life without any specious modification.

View more on Github.

Contact me if want to view more or get my data.

Built a license plate recognition system to detect and recognize the multi-oriented Chinese license plate in kinds of scenes in real-time with great robustness
Trained a segmentation model to perform pixel-wise classification and obtain the pure license plate area on wild vehicle images, to avoid the inference of background caused by using detection to locate license plate
Obtained the quadrilateral envelope according to the context hull of the license plate area, and then transform the quadrilateral to rectangle through perspective transformation.
Designed a CNN with CTC loss to recognize license plate characters end-to-end on rectangular license plate images; trained the classification model on 100k simulated images and fine-tuned on real data due to the lack of annotation
Achieved 98.1% recognition accuracy from vehicle image to vehicle ID with speed of 50fps

Denoising for Cosmic Microwave Background(CMB) Signal with Auto-Encoders.

In this project, my target is to remove tough noise on the data, which is called 'all'. Using the approach of Res-Connection, the signal of noise is obtained from the first autoencoder whose input is the 'all' signal. Sequentially, the difference between the 'all' signal and the 'noise' signal is the input of the next autoencoder. A fantastic result has been acquired from the stack of autoencoders.

View more on Github.

Designed an Auto-Encoder Network using Residual Connection and Dense Connection which could efficiently reduce the background noise
Trained two models with same architecture with MSE and MAE loss to learn the noise from original images and to learn CMB signal from the noise free images respectively; connected the two model together using residual connection
Achieved around 450 MSE and 8 MAE on the evaluation dataset and a decent PSD compared with traditional methods

Traffic Signs Detection

This is a project about detecting traffic signs on the highway. Considering the accuracy and speed, I chose the Faster-RCNN to realize the target.

View more on Github.

Contact me if want to view more or get my data.

Designed a traffic sign detection system with high accuracy and fast speed
Annotated 10K+ bounding boxes on 5k+ images captured in various scenes
Trained Faster-RCNN detection model with 80% labeled data and achieved 97.4% accuracy on the test dataset
Deployed the traffic sign detection system on Windows using QT and hosted the detection model on Linux server

Lane detection for moving vehicle

This is a project I completed during my internship at Samsung China. Because of the mobile is where the program will be used, the runtime and size are equally important. The main idea is to find the line which could be lanes and then consider the change of weight on the location of lanes. Finally, we could figure out the lane or whether the car is changing lanes.

View more on Github.

Design an algorithm for detecting lane through in-vehicle camera in real-time with high accuracy
Implement the algorithm designed by me using C++
Optimize algorithm performance and reduce detection time by more efficient data structure and more elegant logic

Video Face Recognition based on Deep Learning

I begin to touch deep learning with completing this project. As the project going on, I become familiar with the details of deep learning. I spend plenty of time on environment building and debugging actually. It is a big motivation to witness my face recognition model running.

Contact me if want to view more. The project is too large to be Gited because of the MFC stuff.

Completed a demo that can show the identified face appearing in the demo video
Designed a convoluted neural network for face recognition
Trained CNN and adjusted parameters on the Caffe, with CASIA Webface database
Got a further understanding of the implications sorted layers of CNN
Achieved 96% accuracy on face recognition and the demo performance well

Handwritten Fuzzy Numeric Characters Recognition

A simple but important project for me, which triggers me to explore in deep learning and the general machine learning area. I come out with a way to classify the number in pictures, dividing every picture into parts, each of which is used to fit a Guassian distribution. Obviously, the EM mothed meets my needs well.

View more on Github.

Build a model that can recognize the fuzzy number in pictures
Selected the EM algorithm to complete the recognition work considering the features of numeric characters
Completed the designed content using C++
Achieved 93% average accuracy in the different character images

Hi! This is Jiayi Wei

My Projects:

Visual Question Rewriting for Increasing Response Rate

Virtual Wardrobe – Outfit Recommendation based on Style Match

Data Augmentation for Rare Traffic Signs to Boost the Detection Performance

NVIDIA AICity Challenge 2019 (CVPRW 2019)

Advanced Driving Assistance System (ADAS)

Understanding Vehicle Density in Traffic Surveillance

Unsupervised Anomaly Detection for Traffic Surveillance (CVPRW 2018)

License Plates Recognition Based on Segmentation and Multi-Label Classification

Denoising for Cosmic Microwave Background(CMB) Signal with Auto-Encoders.

Traffic Signs Detection

Lane detection for moving vehicle

Video Face Recognition based on Deep Learning

Handwritten Fuzzy Numeric Characters Recognition

Let's Get In Touch!