SNU ECE Professor Kyoung Mu Lee
Image features acquired through deep learning
Similar to images created in the brain
Anticipating scenes of imagination that AI will visualize
Professor Kyoung Mu Lee
The human intelligence is a process of receiving external information from sensory organs and processing, analyzing, and assessing in the brain. The importance of visual information is known to take up more than 80% in this process. Even for a researcher like myself who has for a long time conducted studies in computer vision, a field that treats the issue of how to make computers acquire visual intelligence like humans, the recent speed of advancement in image analysis and understanding through machine learning far surpasses expectations. Comprehending images refers to a high-level cognition process in which the entire situation, including not only object existence, type, and properties but also interactions, is understood.
Analyzing and understanding images are closely related to the issue of image synthesis. When we were younger, we used to listen to grandma’s old tales or read novels and imagine the situations and scenes of the unfolding stories. If so, could machines also be capable of imagination.
The images that we view are in a digitally converted form. For instance, in the case of a 10 x10 gray scale image, because one pixel can have 256 different brightness values, the total number of possible images is 6x10240. What this means is that if we were to look at these images on TV, it would take about 7x10231 years to do so. In reality, if we think of the HD(high-definition) color images that we see, the number of existent images is near an infinity.
Fortunately, however, out of these images, the number of real world images that are meaningful to us is comparatively small, and those images are known to create continuous manifolds in image space. The forms and properties of these manifolds act as prior knowledge that allows human recognition, understanding, and inference. That is why research in image manifold and attempts to implement human-like visual cognition through modeling in low-dimension feature space have continued for a long while in the fields of computer vision and machine learning. Recent deep learning methods find image feature space through learning on datasets instead of mathematical modeling. The image features acquired through deep learning are similar to the hierarchical, semantic image features generated in the human brain, and using such, object recognition, higher dimension image analysis, and the conversion of images into sentences, although previously difficult, are now possible.
Recently, there is active and swift development in Generative Adversarial Networks (GAN) research, in which images similar to reality are synthesized by using image feature space processed with deep learning. For example, images are converted to images of different form and when simple sentences or keywords are entered, computers synthesize images accordingly. AI can create drawings or images on its own, just as if when children are given words such as ‘summer’ or ‘space’ and asked to draw, they visualize their imaginations. If AI can read novels and express them through images, or generate videos and movies based on scripts, soon it will be possible to watch movies created by AI in theaters. If the elements people find moving or funny are simultaneously learned, commercial success will also be guaranteed.
However, as in the case of humans, artificial intelligence also basically implements functions through learning. Therefore, the characteristics of the environment and data used in the learning may affect the results. For example, for myself or older generations, rural farms or old alleys come to mind when thinking of ‘hometown(고향)’, whereas the younger generation will picture apartment buildings or academy streets. I am rather curious what AI will envision if asked to imagine its childhood dream.
Translated by: Jee Hyun Lee, English Editor of Electrical and Computer Engineering, email@example.com