نوع مقاله : مقاله پژوهشی
نویسندگان
1 دکترا، گروه زبانشناسی همگانی، دانشکده ادبیات و علوم انسانی، دانشگاه بوعلیسینا، همدان، ایران
2 گروه رایانه، دانشکده آمار، ریاضی و رایانه، دانشگاه علامه طباطبایی، تهران، ایران.
چکیده
کلیدواژهها
موضوعات
عنوان مقاله [English]
نویسندگان [English]
The sentences that people use during writing has valuable information that can be used to identify the author's gender. Meanwhile, the use of deep learning algorithms in processing natural language helps to identify hidden patterns in the text. In this research, an attempt is made to design a system for the Persian language that identifies the gender of the author by fine-tuning the parameters of ParsBert model. For this purpose, first, a corpus labeled with gender tags of 5000 documents is prepared, and then the author's gender identification system is designed and evaluated through 10-fold cross validation. Experimental results show that the F-measure of the gender identification task is 76.5%. The proposed method is also compared to classic machine learning methods. Also it obtains the better results in compare to LSTM model. The results obtained from the comparison of the corpus prepared in this research and the corpus that is prepared in the previous research for gender identification show the improvement of the system's performance. Thus, the need to use new deep learning methods such as the ParsBert model and the use of appropriate data is the main achievements of this research.The creation of a gender-annotated corpus comprising 5000 documents has also been one of the most significant achievements of this research
کلیدواژهها [English]