What AI can do and cannot do
What is AI image recognition
which cannot be achieved without human support?
150,000 pages of manuscript data from Osamu Tezuka were prepared for the AI to learn what “Tezuka-like” characters are. These were passed through the recognition software at Future University Hakodate, and were classified and tagged as "frames," "speech balloons," "faces," and "bodies." To improve the quality of the images generated by AI, the staff repeated a process of trial and error until the ideal images were generated—for example, by flipping, then doubling the number of images, loading only female characters, and so on.
What is GAN deep learning, which produces characters as close as possible to Osamu Tezuka’s?
At the core of character generation is an AI technology called GAN (Generative Adversarial Networks), which is capable of generating images as close to real as possible by learning what "realness" is. GAN consists of two networks. The learning process itself is "adversarial" in that one network, the discriminator, identifies whether or not the generated data is close to the input real data, and another network, the generator, generates higher quality fake data so that the discriminator cannot distinguish it from the real data. By a process of repeated generation and discrimination, the generator learn how to produce data similar to the real data. Although this system was originally used at Kioxia to improve semiconductor design and manufacturing quality, it was applied to character generation for this project.
However, the challenges of AI mean it isn’t an easy process, which is what makes it interesting. For “TEZUKA 2020”, the GAN system that studied “Tezuka-like” properties was further incorporated into two other types of AI technologies to attempt character generation. This resulted in the following three systems.
In the initially proposed GAN, an image of the entire face was generated in a single process, so the details were not fully complete. For this reason, the team attempted to generate more complete images using a new GAN method where fine details such as the eyes, nose, and mouth are gradually generated from rough depictions, such as contours. Image-like generation results were obtained after initially training on about 4,500 images, but later, the learned images were flipped left and right. By training on the flipped data, the system learned about 18,000 pieces of data—about 4 times as many. This is a major step forward, because the more training data, the better the accuracy.
However, it was still far from ideal. Next, the team decided to use only female characters from Tezuka’s manga and start learning from scratch. The results were quite good, probably because the female characters had relatively well-designed patterns in many cases. After that, various other approaches were tried, such as learning by mixing character images.
Ultimately, the decision was made to introduce a technology called "Transfer learning." Incorporating hundreds of thousands of data points, the AI learned what human facial features are from an AI that had already learned this, then the AI studied Tezuka’s manga characters as additional learning. This enabled the generation of “Tezuka-like” characters without fail, and the project started to make great progress. “PHAEDO” emerged from this process.
What is "transfer learning," the key to problem-solving here?
Transfer learning is a technique for adapting a model that has been trained in one domain to another domain. This may seem like a difficult concept, but when applied to this project, it involves adapting “a model that has learned the structure of human faces” to “a model that has learned Tezuka’s manga art style” (additional learning).
In the initial attempts, the team tried to create new characters by letting an AI learn Tezuka’s manga characters alone, but with the transfer learning technique, an AI already capable of generating human faces is tasked with learning about Tezuka’s manga art characters.
Essentially, with an AI, the larger the amount of data learned, the higher the quality of the data it generates. Using Tezuka’s manga comics alone, the limit was some tens of thousands of images, but using actual human facial images expanded the AI learning to several hundreds of thousands of data. The key to success was a dramatic increase in learning using transfer learning.
Mixing different individual features enabled the generation of more attractive characters.
By combining different individual features of the characters, we found that characters with unprecedented characteristics could be created. This mechanism relies on creating a number of variations by gradually changing the ratio of the two characters that are mixed. Here's what it actually looks like, along with a GIF animation.
Here is an image extracted from character we generated. You can see that the details of the characteristics change while the overall atmosphere is maintained.