Будь ласка, використовуйте цей ідентифікатор, щоб цитувати або посилатися на цей матеріал: https://repository.hneu.edu.ua/handle/123456789/38819
Повний запис метаданих
Поле DCЗначенняМова
dc.contributor.authorVenhrina O.-
dc.date.accessioned2026-02-24T12:41:23Z-
dc.date.available2026-02-24T12:41:23Z-
dc.date.issued2025-
dc.identifier.citationVenhrina O. Чи надають RAG-системи перевагу? емпіричне дослідження ефективності ChatGPT та NotebookLM при створенні тестів з академічних дисциплін / O. Venhrina // Transformation of the scientific area in the context of contemporary challenges : Scientific monograph. - Riga, Latvia : Baltija Publishing, 2025. – Р. 89-106.uk_UA
dc.identifier.urihttps://repository.hneu.edu.ua/handle/123456789/38819-
dc.description.abstractIn the context of the rapid digitalization of education, automating the creation of assessment materials has become a critical task for educators. The emergence of Large Language Models (LLMs) has opened new perspectives for generating test items; however, the question of selecting the optimal toolkit remains unresolved within the educational community. Specifically, there is a need for empirical verification of the hypothesis regarding whether specialized tools with Retrieval-Augmented Generation (RAG) architecture, such as NotebookLM, provide a significant advantage in quality and reliability over universal chatbots (e.g., ChatGPT) when used by subject teachers lacking complex prompt engineering skills. Objective. The study aims to conduct a comparative analysis of the quality, structural correctness, and cognitive depth of multiple-choice test questions generated using three different AI usage scenarios. Methods. The experimental basis was the educational material of the "Database Organization and Storage" discipline (topics: SQL DDL, DML, Aggregation). A pool of 90 test questions was generated across three scenarios: (A) generation in ChatGPT based solely on the topic title; (B) generation in ChatGPT based on uploaded lecture notes; (C) generation in NotebookLM based on an uploaded source. The quality of the obtained content was evaluated by three independent experts on a 5-point scale according to the following criteria: factual correctness, relevance to the topic, and quality of distractors (incorrect answer options). Additionally, questions were classified according to a simplified Bloom's taxonomy. To verify statistical hypotheses, the non-parametric Kruskal-Wallis H-test and Pearson's test of independence were used. Results. Statistical analysis revealed no significant differences ( ) between the three scenarios for any of the quality criteria. A pronounced "ceiling effect" was recorded for the "relevance" criterion (mean scores 4.8–4.99), indicating the high competence of base models in standard academic topics even without providing context. At the same time, it was found that the NotebookLM tool demonstrated technical instability when generating content in Ukrainian, specifically omitting individual words in question formulations, which led to significantly higher variability in "correctness" scores ( ) compared to the stable results of ChatGPT. Analysis using Bloom's taxonomy confirmed that switching to a RAG system does not automatically increase the cognitive complexity of tasks: the majority of questions in all groups remained at the levels of remembering and understanding. Furthermore, generating plausible distractors remains a weak point for all examined AI tools. Conclusions. The findings refute the assumption of the unconditional advantage of RAG systems for generating tests in standardized disciplines. For an educator, the use of universal chatbots with simple prompts is the most effective method in terms of the time-to-quality ratio. The use of specialized tools (NotebookLM) is advisable primarily for working with unique authorial materials; however, it requires increased attention to the verification of each generated question.uk_UA
dc.language.isouk_UAuk_UA
dc.subjectRAG системиuk_UA
dc.subjectChatGPTuk_UA
dc.subjectNotebookLMuk_UA
dc.subjectдистракториuk_UA
dc.subjectкоректність і релевантністьuk_UA
dc.titleЧи надають RAG-системи перевагу? емпіричне дослідження ефективності ChatGPT та NotebookLM при створенні тестів з академічних дисциплінuk_UA
dc.typeBook chapteruk_UA
Розташовується у зібраннях:Монографії (КІТ)

Файли цього матеріалу:
Файл Опис РозмірФормат 
Venhrina_Scientific monograph_2025.pdf1,21 MBAdobe PDFПереглянути/відкрити


Усі матеріали в архіві електронних ресурсів захищені авторським правом, всі права збережені.