Word2Vec Model Analysis for Semantic and Morphologic Similarities in Turkish Words

Savytska L.; Turgut Sübay; Vnukova  N.; Bezugla I.; Pyvovarov  V.

Пожалуйста, используйте этот идентификатор, чтобы цитировать или ссылаться на этот ресурс: http://repository.hneu.edu.ua/handle/123456789/28893

Полная запись метаданных

Поле DC	Значение	Язык
dc.contributor.author	Savytska L.	-
dc.contributor.author	Turgut Sübay	-
dc.contributor.author	Vnukova N.	-
dc.contributor.author	Bezugla I.	-
dc.contributor.author	Pyvovarov V.	-
dc.date.accessioned	2023-02-13T17:58:43Z	-
dc.date.available	2023-02-13T17:58:43Z	-
dc.date.issued	2022	-
dc.identifier.citation	Savytska L.Word2Vec Model Analysis for Semantic and Morphologic Similarities in Turkish Words / L.Savytska, Turgut Sübay, N. Vnukova and other // CEUR Workshop Proceedingsthis link is disabled. – 2022. – Vol. 3171. - Р. 161–176. https://ceur-ws.org/Vol-3171/paper17.pdf	ru_RU
dc.identifier.uri	http://repository.hneu.edu.ua/handle/123456789/28893	-
dc.description.abstract	The study presents the calculation of the similarity between words in Turkish language by using word representation techniques. Word2Vec is a model used to represent words into vector form. The model is formed using articles from Wikipedia dump Turkish service as the corpus and then Cosine Similarity calculation method is used to determine the similarity value. The open-source Python programming language and Gensim library are used to obtain high quality word vectors with Word2Vec and calculate the cosine similarity of the vectors. Continuous Bag-of-words (CBOW) algorithm is used to train high quality word vectors. The cosine similarity values in the results are derived from the weight (dimension values) of the vector dimensions. The Window size 10 and 300 vector dimension configurations are taken. Increasing the number of cycles contributes to the vectors getting more accurate values. The corpus is trained in five cycles (EPOCH) with the same parameters. The Turkish corpus contains more than one hundred and sixty one million words. The dictionary of words (unique words), obtained from the corpus, is more than three hundred and sixty-seven thousand. Such a big data gives an opportunity to conduct high quality semantic and morphologic analysis and arithmetic operations of the word vectors.	ru_RU
dc.language.iso	en	ru_RU
dc.subject	NLP	ru_RU
dc.subject	Word2Vec	ru_RU
dc.subject	word vectors	ru_RU
dc.subject	cosine similarity	ru_RU
dc.subject	word embedding	ru_RU
dc.subject	semantic relations	ru_RU
dc.subject	formal (structural) relations	ru_RU
dc.subject	Turkish language	ru_RU
dc.title	Word2Vec Model Analysis for Semantic and Morphologic Similarities in Turkish Words	ru_RU
dc.type	Article	ru_RU
Располагается в коллекциях:	Статті (МСіФП)

Файлы этого ресурса:

Файл	Описание	Размер	Формат
Стаття_Савицька _Л_Внукова _Н_paper17.pdf		1,11 MB	Adobe PDF	Просмотреть/Открыть

Показать базовое описание ресурса Просмотр статистики

Все ресурсы в архиве электронных ресурсов защищены авторским правом, все права сохранены.