Yağmur Öztürk

About me

Hello!
I am Yağmur, a PhD student in NLP and teaching assistant at the CRIT research lab in the Marie & Louis Pasteur University.
You can contact me for any question you have at yagmur.ozturk@edu.univ-fcomte.fr!

Research areas

Morphosemantics, semantics, word-formation, derivational morphology, linguistic modelisation, processing based on linguistic rules, ontologies

Research Project

Contrastive Analysis of Turkish–French Legal Language and AI Applications

Franco-Turkish collaborative project – currently being established (submission to TÜBİTAK, the Scientific and Technological Research Council of Turkey)

I am involved in a research project focusing on the semantic and translation-oriented analysis of legal terminology in Turkish and French. The project aims to:

extract and analyse legal terms from specialised corpora;
study their meanings and their equivalents in the other language;
model these data within a computer-assisted translation framework;
develop a dataset for training AI models applied to legal translation.

This project is closely aligned with my work on morphosemantics and linguistic modelling.

Thesis

Modelling Turkish Nominal Derivation: From Linguistic Formalisation to Morphosemantic Knowledge Representation

Supervisors

Izabella THOMAS, UFC
Snejana GADJEVA, INALCO

Funding

Presidency For Turks Abroad And Related Communities (YTB)

Funded by YTB (Ministry of Cultural Affairs and Tourism, Turkey), from november 2020 to november 2021.
https://www.ytb.gov.tr/en

Abstract

Turkish derivational morphology exhibits remarkable complexity: as an agglutinative language, Turkish allows multiple suffixes to interact systematically to build meaning. Despite this rich system, existing sources remain fragmented and inconsistent in their coverage of derivational morphemes. Current inventories rely on flat, list-based structures that cannot capture which morphemes derive nouns from nouns (N-to-N) or what semantic values they encode.

To address this gap, this thesis tackles a foundational challenge: how can we move beyond existing morpheme inventories to enable morphosemantic analysis and formally model derivational semantics in a computationally tractable way? To answer this question, we develop two interconnected resources: DerivBaseTR, a multidimensional morphological database, and Semantürk, a semantic ontology for Turkish derivation.

We first cross-reference fifteen heterogeneous sources (grammars, academic studies, and language-learning materials) to build UNITuM, a unified inventory of 131 morphemes traditionally classified as N-to-N in Turkish. Because those conflate N-to-N derivation with other categories (e.g., adjectives, adverbs), we apply three systematic filters to isolate genuine N-to-N morphemes: grammatical category tests, productivity thresholds, and semantic transparency criteria, resulting in a selection of 36 N-to-N morphemes.

For semantic annotation, we develop Semantürk by adapting Démonette-2's ontological framework (Huguin et al., 2023) to Turkish morphemes. This adaptation, informed by evaluation experiments, refines semantic categories using consolidated descriptions from UNITuM. We define 40 additional semantic categories beyond those in Démonette-2 and validate them through a controlled annotation experiment on a custom corpus of 100 derived nouns, each presented with definitional cues.

The annotation experiments we conduct demonstrate Semantürk's strong reliability: inter-annotator agreement remains consistently high (Kα ≈ 0.83) with strong annotator confidence (mean: 2.89/3), indicating that the ontology's categories are both interpretable and robust. Unlike flat tagsets, Semantürk offers a hierarchically structured and OWL 2-compliant ontology, enabling both logical reasoning and empirical annotation. DerivBaseTR, a relational database structured around four linkable dimensions (morphemes, lexical instantiations, semantic annotations, and bibliographic documentation) implements these 36 morphemes with their Semantürk annotations in a multidimensional architecture that transcends the simple linear organisation of previous resources. This structure enables diverse empirical queries that extract concrete insights, revealing, for instance, a strong correlation between semantic polysemy and morphological productivity: morphemes expressing meanings across multiple semantic categories tend to produce more derivatives.

This work addresses fundamental questions about derivational morphology's position between lexicon and syntax. Beyond Turkish linguistics, DerivBaseTR and Semantürk establish methodological principles for creating sustainable, reusable linguistic resources that ensure data reusability across diverse research contexts and theoretical perspectives, opening avenues for applications from automatic morpheme analysis to semantic parsing in morphologically rich languages.

Home