2025-06-04
2025-04-30
Manuscript received March 5, 2025; revised April 8, 2025; accepted June 19, 2025; published August 19, 2025.
Abstract—With the rapid development of artificial intelligence technology, text-generated image technology has garnered widespread attention, which shows great potential in enhancing human-computer interaction, increasing the credibility of visual content, and creating novel works of art. In this paper, an improved Generative Adversarial Network (GAN) model based on attention mechanism, Improved AttnGAN, is proposed to deal with the challenges of existing technologies in dealing with complex text input, improving image clarity and authenticity, and enhancing semantic consistency between text and image. By introducing the SimAM attention mechanism and optimizing the AttnGAN architecture, our model achieves significant improvements in both image generation quality and variety. The experimental results show that the Improved AttnGAN model is superior to StackGAN, DM-GAN, MirrorGAN, DF-GAN, AttnGAN, and other models. The Improved AttnGAN has obvious advantages in terms of image quality and realism. Keywords—image generation, Chinese text, GAN, selfattention mechanism, AttnGAN, Improved AttnGAN Cite: Yongxia Hu and Dong-Hyun Kim, "Research on Controllable Image Generation Technology for Chinese Text," Journal of Image and Graphics, Vol. 13, No. 4, pp. 452-458, 2025. Copyright © 2025 by the authors. This is an open access article distributed under the Creative Commons Attribution License (CC-BY-4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.