A hybrid approach to visually oriented generation of culinary recipes based on convolutional neural networks and large language models

Minukhin S.; Shaposhnyk M.

Будь ласка, використовуйте цей ідентифікатор, щоб цитувати або посилатися на цей матеріал: https://repository.hneu.edu.ua/handle/123456789/39144

Повний запис метаданих

Поле DC	Значення	Мова
dc.contributor.author	Minukhin S.	-
dc.contributor.author	Shaposhnyk M.	-
dc.date.accessioned	2026-03-27T07:59:04Z	-
dc.date.available	2026-03-27T07:59:04Z	-
dc.date.issued	2026	-
dc.identifier.citation	Minukhin S. A hybrid approach to visually oriented generation of culinary recipes based on convolutional neural networks and large language models / S. Minukhin, M. Shaposhnyk // Herald of Khmelnytskyi national university. – 2026. - Issue 2(363).	uk_UA
dc.identifier.uri	https://repository.hneu.edu.ua/handle/123456789/39144	-
dc.description.abstract	This article delineates a hybrid approach for visually anchored recipe synthesis, orchestrating a confluence of computer vision and natural language processing. By integrating multi-label Convolutional Neural Networks with Large Language Models, the architecture remediates the inherent opacity found when mapping pixel-level abstractions onto culinary discourse. To rectify the resolution divergence between monolithic dish categorization and granular ingredient composition, this research prioritizes semantic fidelity. The investigative trajectory involved diagnosing the constraints of orthodox single-label classification and subsequently re-engineering the DenseNet-121 topology to accommodate concurrent streams for ingredient identification. Grounded in transfer learning, the ocular engine—trained on the Food-101 corpus—utilizes cost-sensitive optimization to sharpen detection accuracy. Linguistic synthesis proceeds via the Llama 3.1 8B model, instrumented through In-Context Learning and validated through BLEU, ROUGE, and Cosine Similarity benchmarks. Empirical evidence underscores the framework's efficacy; the refined detector yielded a Recall of 0.91. Insofar as visual context was integrated into structured prompts, the mean Cosine Similarity ascended to 0.765, marking a significant leap in capturing nuanced dish variations compared to established baselines. The proposed hybrid approach successfully bridges the semantic gap between visual data and textual generation. Explicitly injecting detected ingredients into the LLM context enables the creation of instance-specific recipes rather than template-based outputs, significantly mitigating AI hallucinations and increasing the relevance of the results.	uk_UA
dc.language.iso	en	uk_UA
dc.subject	Convolutional neural networks	uk_UA
dc.subject	large language models	uk_UA
dc.subject	classification	uk_UA
dc.subject	culinary food	uk_UA
dc.subject	ingredients	uk_UA
dc.subject	recipe	uk_UA
dc.subject	generation	uk_UA
dc.subject	image	uk_UA
dc.title	A hybrid approach to visually oriented generation of culinary recipes based on convolutional neural networks and large language models	uk_UA
dc.type	Article	uk_UA
Розташовується у зібраннях:	Статті (ІС)

Файли цього матеріалу:

Файл	Опис	Розмір	Формат
(363)+VKNU-TS-2026-N2_p418-434.pdf		1,51 MB	Adobe PDF	Переглянути/відкрити

Показати базовий опис матеріалу Перегляд статистики

Усі матеріали в архіві електронних ресурсів захищені авторським правом, всі права збережені.