Deep Learning Algorithms in the Development of Generative AI Models for Automated Content Creation

Agung Yuliyanto Nugroho

doi:10.59059/mutiara.v3i5.2804

Authors

Agung Yuliyanto Nugroho Universitas Cendekia MItra Indonesia

DOI:

https://doi.org/10.59059/mutiara.v3i5.2804

Keywords:

Content, Deep Learning, Generative Adversarial Network, Generative AI, Transformer

Abstract

The rapid advancement of artificial intelligence (AI) has introduced a new paradigm in informatics known as Generative AI. One of the key driving forces behind this innovation is the application of deep learning algorithms, which can emulate human cognitive patterns to automatically generate text, images, audio, and video. This study aims to analyze how deep learning algorithms particularly Generative Adversarial Networks (GANs) and Transformer-based Models (such as GPT and Diffusion Models) are utilized in developing generative AI systems for automated content creation. The research employs a literature review of recent studies, comparative analysis of generative models, and performance evaluation based on quality, creativity, and computational efficiency. The findings reveal that Transformer-based models exhibit greater adaptability in understanding semantic context and producing more realistic content compared to traditional GAN models. However, challenges such as overfitting, data bias, and high computational resource demands remain major obstacles to large-scale implementation. This study concludes that optimizing deep learning algorithms supported by ethical considerations and careful data management will be crucial to the successful development of generative AI that is both effective and responsible within the modern informatics ecosystem.

References

Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., ... & Herrera, F. (2020). Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115. https://doi.org/10.1016/j.inffus.2019.12.012

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT), 610–623. https://doi.org/10.1145/3442188.3445922

Brynjolfsson, E., & McAfee, A. (2017). Machine, platform, crowd: Harnessing our digital future. W. W. Norton & Company.

Crawford, K. (2021). Atlas of AI: Power, politics, and the planetary costs of artificial intelligence. Yale University Press. https://doi.org/10.12987/9780300252392

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT 2019, 4171–4186. https://doi.org/10.48550/arXiv.1810.04805

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.

Floridi, L., & Chiriatti, M. (2020). GPT-3: Its nature, scope, limits, and consequences. Minds and Machines, 30(4), 681–694. https://doi.org/10.1007/s11023-020-09548-1

Floridi, L., & Cowls, J. (2021). A unified framework of five principles for AI in society. Harvard Data Science Review, 3(1). https://doi.org/10.1162/99608f92.8cd550d1

Frid-Adar, M., Klang, E., Amitai, M., Goldberger, J., & Greenspan, H. (2018). Synthetic data augmentation using GAN for improved liver lesion classification. IEEE Transactions on Medical Imaging, 38(3), 839–853. https://doi.org/10.1109/TMI.2018.2862452

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems (NeurIPS), 27, 2672–2680.

Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., & Lew, M. S. (2016). Deep learning for visual understanding: A review. Neurocomputing, 187, 27–48. https://doi.org/10.1016/j.neucom.2015.09.116

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778. https://doi.org/10.1109/CVPR.2016.90

Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems (NeurIPS), 33, 6840–6851.

Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4700–4708. https://doi.org/10.1109/CVPR.2017.243

Kingma, D. P., & Welling, M. (2014). Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539

Li, B., Liu, S., Xu, J., & Lin, Z. (2023). Diffusion models in visual content generation: A comprehensive survey. arXiv preprint arXiv:2303.03968.

Minsky, M. (1986). The society of mind. Simon and Schuster.

Nugroho, A. Y. (2024). Transformasi digital: Mengoptimalkan strategi e-commerce di era disrupsi. Jurnal Ilmiah Bisnis Digital, 1(1), 56–66. https://doi.org/10.69533/4wj6tw45

Nugroho, A. Y., & Sutopo, J. (2018). Rancang bangun aplikasi pemesanan layanan service mobil berbasis website. Jurnal ePrints UTY.

Nugroho, A. Y. (2024). Pendampingan NIB untuk usaha mikro: Solusi menuju kesuksesan bisnis yang berkelanjutan. Khidmah Nusantara, 1(1), 127–134.

Nugroho, A. Y., Kom, M., & Par, M. (2025). Pengantar videografi untuk pemula. Yogyakarta: CV Gemilang Press Indonesia.

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., ... & Christiano, P. (2022). Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155.

Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. OpenAI Technical Report.

Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125.

Russell, S. J., & Norvig, P. (2021). Artificial intelligence: A modern approach (4th ed.). Pearson Education.

Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), 3645–3650. https://doi.org/10.18653/v1/P19-1355

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems (NeurIPS), 30, 5998–6008.

Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P. A. (2008). Extracting and composing robust features with denoising autoencoders. Proceedings of the 25th International Conference on Machine Learning (ICML), 1096–1103. https://doi.org/10.1145/1390156.1390294