RAM: 8 GB minimum, the running model will be taking up around 6 GB.System: MacOS/Windows/Linux (Any OS that supports docker).Since Jittor requires Ubuntu >= 16.04, while most of the people are using MacOS/Windows, one of the easiest ways is to use Docker to build a working environment from scratch. The motivation for starting this blog came from me trying to implement along with the original GitHub Repository. ![]() ![]() Implementationīefore anything, If you are just looking for a quick test-drawing with the application, make sure to check out the project homepage, where the original author created a web-based testing interface. Then in stage 2, the parameters in stage 1 are fixed, and the FM/IS modules work together to generate face images through training GAN. They also applied a two-stage training method, wherein stage 1 only the five individual auto-encoders in the CE module were trained using different component sketches. Given the combined feature vector maps, the IS module converts them to a realistic face image using a conditional GAN architecture. In the FM module, instead of decoding each component vector back to image then synthesis on the component-level, the authors chose to fuse the vectors’ sketches into one complete face then generate the complete image, as it helps with a more consistent result in terms of both local details and global styles. The individual feature vectors of components are projected to manifolds to increase their plausibility. The feature vectors of components are considered as the point samples of the underlying component manifolds, and are used to refine the hand-drawn sketch by projecting its individual parts to the corresponding component manifolds using K nearest neighbors, as shown in the Manifold Projection part. The five components are then feature-encoded using 5 auto-encoders with latent descriptors of 512 dimensions. The “eye”s, “nose” and “mouth” are separated by taking heat windows size of 128, 168 and 192, while the “remainder” is literally the remainder part of the sketch. An input of hand sketch face image of size 512 by 512, is first decomposed into five components: “left-eye”, “right-eye”, “nose”, “mouth”, and “remainder”. ![]() I will briefly introduce the model architecture, then we’ll go hands-on where you can deploy this model locally! Model ArchitectureĪs shown in the architecture above, the model is separated into three parts, which are the Component Embedding (CE) Module, Feature Mapping (FM) Module, and Image Synthesis (IS) Module. Before anything, credit goes to the researchers, and thanks for the open-source code. Thanks to all of the people who contributed to the Deep Learning field, we have more and more applications that we could not have thought of before, which brings up today’s topic, generating realistic face images from hand sketches. In late 2017, transformer architecture introduced by Google not only brought faster training and better results than LSTMs in the NLP field, but also challenged the Computer Vision world by bringing the attention module. Also in 2015, DCGAN shown the world that Deep Learning algorithms can not only classify objects, but also create new ones. 3 years later in 2015, ResNet-152 shows only 3.57% error, which is better than human error for this task at 5%. Have you ever thought about the limit of what Deep Learning and AI could do? 8 years ago, AlexNet achieved a top-5 error of 15.3% in the ImageNet 2012 Challenge, which was incredible at the time thanks to CNN and GPU training.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |