The explainability of shallow AI-generated text classification models via parts removing

Peredrii O.; Gorokhovatskyi O.

Please use this identifier to cite or link to this item: https://repository.hneu.edu.ua/handle/123456789/40084

Title:	The explainability of shallow AI-generated text classification models via parts removing
Authors:	Peredrii O. Gorokhovatskyi O.
Keywords:	explainability black-box shallow ANN perturbation AI-generated content human-written content text chunk text classification explainability index
Issue Date:	2026
Citation:	Peredrii O. The explainability of shallow AI-generated text classification models via parts removing / O. Peredrii, O.Gorokhovatskyi // Системи управління, навігації та зв’язку. – 2026. -№ 2. – С. 153–159.
Abstract:	In this paper, we address the explainability problem for the ANNs' classification of AI-generated and human-written text chunks in Ukrainian texts in the IT domain. The objective is to investigate whether the perturbation-based modifications of text chunks that include the removal of sentences, words, and word combinations may be helpful in searching for explanations. We used five shallow ANN models (with an average accuracy of about 0.88) and tested them on a sample of the document containing human-written text and AI-generated fragments generated with GPT-5, Gemini 2.5 Flash, and Claude Sonnet 4.5. The experimental modeling showed that it is not easy to find a single sentence or word that can flip the classification result. We have proposed an explainability index that measures the total influence of all perturbed samples on the classification result, accounting for the fact that short perturbations are more valuable.
URI:	https://repository.hneu.edu.ua/handle/123456789/40084
Appears in Collections:	Статті (ІКТ)

Files in This Item:

File	Description	Size	Format
24.pdf		607,81 kB	Adobe PDF	View/Open

Show full item record