A hybrid approach to visually oriented generation of culinary recipes based on convolutional neural networks and large language models

Minukhin S.; Shaposhnyk M.

Будь ласка, використовуйте цей ідентифікатор, щоб цитувати або посилатися на цей матеріал: https://repository.hneu.edu.ua/handle/123456789/39144

Назва:	A hybrid approach to visually oriented generation of culinary recipes based on convolutional neural networks and large language models
Автори:	Minukhin S. Shaposhnyk M.
Теми:	Convolutional neural networks large language models classification culinary food ingredients recipe generation image
Дата публікації:	2026
Бібліографічний опис:	Minukhin S. A hybrid approach to visually oriented generation of culinary recipes based on convolutional neural networks and large language models / S. Minukhin, M. Shaposhnyk // Herald of Khmelnytskyi national university. – 2026. - Issue 2(363).
Короткий огляд (реферат):	This article delineates a hybrid approach for visually anchored recipe synthesis, orchestrating a confluence of computer vision and natural language processing. By integrating multi-label Convolutional Neural Networks with Large Language Models, the architecture remediates the inherent opacity found when mapping pixel-level abstractions onto culinary discourse. To rectify the resolution divergence between monolithic dish categorization and granular ingredient composition, this research prioritizes semantic fidelity. The investigative trajectory involved diagnosing the constraints of orthodox single-label classification and subsequently re-engineering the DenseNet-121 topology to accommodate concurrent streams for ingredient identification. Grounded in transfer learning, the ocular engine—trained on the Food-101 corpus—utilizes cost-sensitive optimization to sharpen detection accuracy. Linguistic synthesis proceeds via the Llama 3.1 8B model, instrumented through In-Context Learning and validated through BLEU, ROUGE, and Cosine Similarity benchmarks. Empirical evidence underscores the framework's efficacy; the refined detector yielded a Recall of 0.91. Insofar as visual context was integrated into structured prompts, the mean Cosine Similarity ascended to 0.765, marking a significant leap in capturing nuanced dish variations compared to established baselines. The proposed hybrid approach successfully bridges the semantic gap between visual data and textual generation. Explicitly injecting detected ingredients into the LLM context enables the creation of instance-specific recipes rather than template-based outputs, significantly mitigating AI hallucinations and increasing the relevance of the results.
URI (Уніфікований ідентифікатор ресурсу):	https://repository.hneu.edu.ua/handle/123456789/39144
Розташовується у зібраннях:	Статті (ІС)

Файли цього матеріалу:

Файл	Опис	Розмір	Формат
(363)+VKNU-TS-2026-N2_p418-434.pdf		1,51 MB	Adobe PDF	Переглянути/відкрити

Показати повний опис матеріалу Перегляд статистики

Усі матеріали в архіві електронних ресурсів захищені авторським правом, всі права збережені.