Чи надають RAG-системи перевагу? емпіричне дослідження ефективності ChatGPT та NotebookLM при створенні тестів з академічних дисциплін

Venhrina O.

Будь ласка, використовуйте цей ідентифікатор, щоб цитувати або посилатися на цей матеріал: https://repository.hneu.edu.ua/handle/123456789/38819

Повний запис метаданих

Поле DC	Значення	Мова
dc.contributor.author	Venhrina O.	-
dc.date.accessioned	2026-02-24T12:41:23Z	-
dc.date.available	2026-02-24T12:41:23Z	-
dc.date.issued	2025	-
dc.identifier.citation	Venhrina O. Чи надають RAG-системи перевагу? емпіричне дослідження ефективності ChatGPT та NotebookLM при створенні тестів з академічних дисциплін / O. Venhrina // Transformation of the scientific area in the context of contemporary challenges : Scientific monograph. - Riga, Latvia : Baltija Publishing, 2025. – Р. 89-106.	uk_UA
dc.identifier.uri	https://repository.hneu.edu.ua/handle/123456789/38819	-
dc.description.abstract	In the context of the rapid digitalization of education, automating the creation of assessment materials has become a critical task for educators. The emergence of Large Language Models (LLMs) has opened new perspectives for generating test items; however, the question of selecting the optimal toolkit remains unresolved within the educational community. Specifically, there is a need for empirical verification of the hypothesis regarding whether specialized tools with Retrieval-Augmented Generation (RAG) architecture, such as NotebookLM, provide a significant advantage in quality and reliability over universal chatbots (e.g., ChatGPT) when used by subject teachers lacking complex prompt engineering skills. Objective. The study aims to conduct a comparative analysis of the quality, structural correctness, and cognitive depth of multiple-choice test questions generated using three different AI usage scenarios. Methods. The experimental basis was the educational material of the "Database Organization and Storage" discipline (topics: SQL DDL, DML, Aggregation). A pool of 90 test questions was generated across three scenarios: (A) generation in ChatGPT based solely on the topic title; (B) generation in ChatGPT based on uploaded lecture notes; (C) generation in NotebookLM based on an uploaded source. The quality of the obtained content was evaluated by three independent experts on a 5-point scale according to the following criteria: factual correctness, relevance to the topic, and quality of distractors (incorrect answer options). Additionally, questions were classified according to a simplified Bloom's taxonomy. To verify statistical hypotheses, the non-parametric Kruskal-Wallis H-test and Pearson's test of independence were used. Results. Statistical analysis revealed no significant differences ( ) between the three scenarios for any of the quality criteria. A pronounced "ceiling effect" was recorded for the "relevance" criterion (mean scores 4.8–4.99), indicating the high competence of base models in standard academic topics even without providing context. At the same time, it was found that the NotebookLM tool demonstrated technical instability when generating content in Ukrainian, specifically omitting individual words in question formulations, which led to significantly higher variability in "correctness" scores ( ) compared to the stable results of ChatGPT. Analysis using Bloom's taxonomy confirmed that switching to a RAG system does not automatically increase the cognitive complexity of tasks: the majority of questions in all groups remained at the levels of remembering and understanding. Furthermore, generating plausible distractors remains a weak point for all examined AI tools. Conclusions. The findings refute the assumption of the unconditional advantage of RAG systems for generating tests in standardized disciplines. For an educator, the use of universal chatbots with simple prompts is the most effective method in terms of the time-to-quality ratio. The use of specialized tools (NotebookLM) is advisable primarily for working with unique authorial materials; however, it requires increased attention to the verification of each generated question.	uk_UA
dc.language.iso	uk_UA	uk_UA
dc.subject	RAG системи	uk_UA
dc.subject	ChatGPT	uk_UA
dc.subject	NotebookLM	uk_UA
dc.subject	дистрактори	uk_UA
dc.subject	коректність і релевантність	uk_UA
dc.title	Чи надають RAG-системи перевагу? емпіричне дослідження ефективності ChatGPT та NotebookLM при створенні тестів з академічних дисциплін	uk_UA
dc.type	Book chapter	uk_UA
Розташовується у зібраннях:	Монографії (КІТ)

Файли цього матеріалу:

Файл	Опис	Розмір	Формат
Venhrina_Scientific monograph_2025.pdf		1,21 MB	Adobe PDF	Переглянути/відкрити

Показати базовий опис матеріалу Перегляд статистики

Усі матеріали в архіві електронних ресурсів захищені авторським правом, всі права збережені.