Deep Learning Project — Anime Illustration Colorization — Part 2

Originally posted on : https://medium.com/mlearning-ai/anime-illustration-colorization-with-deep-learning-part-2-62b1068ef734

sketch source (left): 🐧 by Hitsu

Introduction

This is an update on my previous article Anime Illustration Colorization with Deep Learning — Part 1. If you have not read that, I recommend you to read it first and then come back.

Anime Illustration Colorization with Deep Learning

This project aims to create…

medium.com

After posting the previous article, I have done a few things to try to improve the performance of the model.

1. Cleaning The Dataset

Since I scraped the dataset automatically from the internet, there are some useless images in the dataset.

originally gray images (artworks’ author: しらび)
sketches in different colors (artworks’ author: Silver)

Although I have filtered out some unwanted images by keywords and tags when scraping, there are still some outliers (gray images, sketches, product catalogs, tutorial or announcement images…).

For gray images, I simply wrote a script to automatically remove them from the dataset.

But for the others, the only way I could think of was to skim through all the images and remove them manually.

2. Updating Color Hints Input

Instead of color dots only, I updated the color hints images to contain both dots and lines in random positions with the corresponding colors.

I also changed the background from white to black such that it is able to draw white hints on it. Although it turns out we cannot draw black hints anymore, this is fine since I realized black color #000000 is rarely used in paintings except for lines and borders.

The better solution to this issue would be using a transparent background (RGB → RGBA). However, this will increase the dataset size and the training time unnecessarily for this project. So, for simplicity, I decided to use black background here instead.

3. Increasing Image Size in Training

Originally, I trained the Pix2Pix model with the input size 256×256. Now I simply increased it to 512×512 and here are some results after training for 100 epochs:

(From left to right: Input imageInput hintsGround truthPredicted result)

source: ♥ by ふわり
source: 春ですね by necömi
source: 先輩? by 皐月まゆり
source: 天使 by ファジョボレ
source: ☆ by はなこ

4. Testing with Arbitrary Size Images

Since both the U-Net and the PatchGAN in our Pix2Pix model are fully convolutional, there should be no limitations on the input size. But remember that our U-Net contains concatenation layers. We have to make sure that the inputs of every concatenation layer are in the same shape. Otherwise, we may run into ConcatOp: Dimensions of inputs should match error.

We are using padding="same" in every convolutional and transposed convolutional layer. The output sizes of each of them are calculated as follows:

  • Convolution layer with padding="same" :
size_out = ceil(size_in / stride)
  • Transposed convolution layer with padding="same" :
size_out = size_in * stride

The tricky part here is the ceil(size_in / stride) . If the input size of that layer is not divisible by the stride length, the ceiling operator will round the number up. This often results in having a different size with its concatenation partner on the up-sampling side.

Input shape of the deepest concatenation

To cope with this issue, we need to pad/crop the input image to make sure its convoluted size is divisible by the stride length in every down-sampling layer. Assuming that the stride lengths of all the down-sampling Conv layers are the same, the input height and width should match the following conditions:

  • height % (stride ^ N) = 0
  • width % (stride ^ N) = 0

Where N is the number of down-sampling layers.

As you can notice, the deeper the U-Net, the more chances that we have to pad the image, and the more chances of having a much larger input. With very limited hardware resources, it is easy to get a Resource exhausted: OOM when allocating tensor with shape [...] error when inputting a very large image. So I limited the number of down-sampling layers to 7, such that with strides=2 , we only have to pad the image so that its height & width are divisible by 2^7 = 128 .

The following are some results after training for 120 epochs:

source: MEGA PEER by アマガイタロー
source: 俺の姉がこんなに可愛いわけがない by 日向あずり
source: 夏!海!水着! by 希望つばめ
source: 「フランに会いにきてくれる?」 by りいちゅ@しの唄
source: IA by Hitsu
source: Magic of Love by フライ
source: ☆〜(ゝ。∂) by 誘拐禁止
source: Halloween! by U35(うみこ)
source: いちご大福ちゃん by 森倉円
source: ルミノシティ08 金剛四姉妹物語(仮) by かにビーム
source: IA by U35(うみこ)
source: 真ちゃん誕。 by 白クマシェイク
source: コミティア111 by しらび
source: 莉嘉ちゃそ~ by 日向あずり
source: リベッチオちゃん by そらなにいろ
source: 秋 by Hiten
source: 美琴ちゃん by DSマイル
source: レムりん by あやみ
source: 花雲りん by TwinBox
source: 🐧 by Hitsu
source: 無題 by Bison倉鼠
source: ✿ by Bison倉鼠
source: ッス!! by 白クマシェイク

5. Conclusion

The results are quite impressive to me. But obviously, there is still a lot of room for improvement. The following is the checklist for future works in this project:

  • Colors sometimes leak to surrounding areas
  • Try other model architectures

Once I have time and ideas, I will come back to this project. Thank you for reading.

Leave a Comment

Your email address will not be published.