Evaluering af sprogforståelsen i danske sprogmodeller - med udgangspunkt i semantiske ordbøger

Forfattere

  • Bolette Sandford Pedersen
  • Nathalie C. Hau Sørensen
  • Sussi Olsen
  • Sanni Nimb

DOI:

https://doi.org/10.7146/nys.v1i65.143072

Nøgleord:

danske sprogmodeller, evaluering, sprogforståelse, semantiske ordbøger, benchmark, ChatGPT

Resumé

Artiklen beskriver hvordan vi har udviklet en række datasæt – et såkaldt benchmark – til at evaluere forskellige aspekter af sprogforståelse i danske sprogmodeller. Vores antagelse er at den viden der allerede er beskrevet i en række eksisterende danske ordbøger, kan opfattes som ’ground truth’ for semantikken i det danske ordforråd. Vores metode går derfor ud på at ’vende’ de semantiske ordbøger om og bruge dem til at generere et benchmark der afprøver modellernes evne til at forstå dansk. Mere specifikt undersøger vi hvor godt modellerne i) forstår synonymi, nærsynonymi, og hvornår noget er semantisk associeret, ii) skaber inferens i relation til begrebsmæssig viden og nedarvning af egenskaber fra overbegreb til underbegreb, iii) laver korrekte følgeslutninger i forbindelse med specifikke handlinger og hændelser, iv) skelner mellem centrale betydninger af et ord i kontekst og v) håndterer positiv og negativ konnotation eller ’sentiment’ i løbende tekst. Vi afprøver vores datasæt på ChatGPT 3.5 turbo og på ChatGPT 4.0 og kan se at datasættene har en passende sværhedsgrad i forhold til hvad modellerne er i stand til at håndtere, om end ChatGPT 4.0 opnår særdeles gode resultater for flere af datasættene.

Referencer

Berdicevskis, A. et al. 2023. Superlim: A Swedish language understanding evaluation benchmark. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 8137–8153. Singapore: Association for Computational Linguistics. DOI:https://doi.org/10.18653/v1/2023.emnlp-main.506.

Firth, J.R. 1957. A synopsis of linguistic theory, 1930-1955. Studies in linguistic analysis. Oxford: Blackwell.

Henrichsen, P.J. 2021. Det Centrale OrdRegister, Del 1. Oplæg ved Sprogteknologisk Konference 2021. Københavns Universitet. https://cst.ku.dk/kalender/sprogteknologisk-konference-2021/

Hershcovich, D. et al. 2022. Challenges and strategies in cross-cultural NLP. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long papers), 6997–7013. Dublin: Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/2022.acl-long.482.

Hjorth E. & K. Kristensen (red.). 2003–2005. Den Danske Ordbog. København: Det Danske Sprog- og Litteraturselskab & Gyldendal. https://ordnet.dk/ddo.

Lee T. & S. Trott. 2023. A jargon-free explanation of how AI large language models work. ArsTechnica. https://arstechnica.com/science/2023/07/a-jargon-freeexplanation-of-how-ai-large-language-models-work/

Lenci, A. & M. Sahlgren. 2023. Distributional semantics. Cambridge: Cambridge University Press. DOI: https://doi.org/10.1017/9780511783692.

Levin, B. 1991. English verb classes and alternations – a preliminary investigation. Denver: University of Colorado Press.

Nielsen, D.S. 2023. ScandEval: A benchmark for Scandinavian natural language processing. Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), 185–201, Tórshavn: University of Tartu Library. https://aclanthology.org/2023.nodalida-1.20

Nielsen, F.Å. & L.K. Hansen. 2017. Open semantic analysis: The case of word level semantics in Danish. Human language technologies. Challenges for computer science and linguistics, 415–419. Mannheim: Springer.

Nimb, S. et al. 2014. Den Danske Begrebsordbog. Odense: Det Danske Sprog- og Litteraturselskab & Syddansk Universitetsforlag.

Nimb, S. 2016. Der er ikke langt fra tanke til handling. Danske Studier 2016, 25–59.

Nimb, S. et al. 2017. From thesaurus to framenet. Electronic lexicography in the 21st century. Proceedings of eLex 2017, 1–22. Brno: Lexical Computing CZ s.r.o. https://elex.link/elex2017/wp-content/uploads/2017/09/paper01.pdf.

Nimb, S., N.H. Sørensen & T. Troelsgård. 2018. From standalone thesaurus to integrated related words in the Danish dictionary. Proceedings of the XVIII EURALEX International Congress: lexicography in global contexts, 916–923. Ljubljana: Ljubljana University Press, Faculty of Arts. https://euralex.org/publications/from-standalone-thesaurus-to-integrated-related-words-in-the-danish-dictionary/.

Nimb, S. et al. 2022a. COR-S – den semantiske del af Det Centrale OrdRegister (COR). LexicoNordica 29. https://tidsskrift.dk/lexn/article/view/134776.

Nimb, S. et al. 2022b. A thesaurus-based sentiment lexicon for Danish: The Danish Sentiment Lexicon. Proceedings of the 13th Language Resources and Evaluation Conference (LREC2022), 2826–2832. Marseille: European Language Resources Association. https://aclanthology.org/2022.lrec-1.302.

Pedersen, B.S. et al. 2009. DanNet: the challenge of compiling a wordnet for Danish by reusing a monolingual dictionary. Language resources and evaluation 43. 269–299. DOI: https://doi.org/10.1007/s10579-009-9092-1.

Pedersen, B.S. et al. 2016. The Semdax corpus. Sense annotations with scalable sense inventories. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 842–847. Paris: European Language Resources Association. https://aclanthology.org/L16-1136.

Pedersen, B.S., S. Nimb & S. Olsen. 2021. Dansk betydningsinventar i et datalingvistisk perspektiv. Danske Studier 2021. 72–106. https://danskestudier.dk/wp-content/uploads/2021/07/danske-studier-2021.pdf.

Pedersen, B.S. et al. 2022. Compiling a suitable level of sense granularity in a lexicon for AI purposes: the open source COR-Lexicon. Proceedings of the 13th Language Resources and Evaluation Conference (LREC2022), 51–60. Marseille: European Language Resources Association. https://aclanthology.org/2022.lrec-1.6.

Pedersen, B.S. et al. 2023. The DA-ELEXIS corpus - a sense-annotated corpus for Danish with parallel annotations for nine European languages. Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023), 11–18. Tórshavn: Association for Computational Linguistics. https://aclanthology.org/2023.resourceful-1.2.

Pedersen, B.S. et al. 2024. Towards a Danish semantic reasoning benchmark - compiled from lexical-semantic resources for assessing selected language understanding capabilities of large language models. Proceedings of the Fourteenth Language Resources and Evaluation Conference (LREC24). Torino: European Language Resources Association.

Pilehvar, M.T. & J. Camacho-Collados. 2019. WiC: the word-in-context dataset for evaluating context-sensitive meaning representations. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies, vol. 1, 1267–1273. Minneapolis: Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/N19-1128.

Pustejovsky, J. 1998. The generative lexicon. Cambridge: The MIT Press. DOI: https://doi.org/10.7551/mitpress/3225.001.0001

Samuel, D. et al. 2023. NorBench – a benchmark for Norwegian language models. Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDa- LiDa), 618–633. Tórshavn: University of Tartu Library. https://aclanthology.org/2023.nodalida-1.61.

Schneidermann, N.S., R. Hvingelby & B.S. Pedersen. 2020. Towards a gold standard for evaluating Danish word embeddings. Proceedings of the 12th Language Resources and Evaluation Conference (LREC2020), 4754–4763. Marseille: European Language Resources Association. https://aclanthology.org/2020.lrec-1.585/.

Tedeschi, S. et al. 2023. What’s the meaning of superhuman performance in today’s NLU? Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (vol. 1: long papers), 12471–12491, Toronto: Association for Computational Linguistics.

Vossen, P. (red.). 1999. EuroWordNet, A multilingual database with lexical semantic networks. Amsterdam: Kluwer Academic Publishers. DOI: https://doi.org/10.1017/S1351324903223299

Wang, A. et al. 2018. GLUE: A multi-task benchmark and analysis platform for natural language understanding. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: analyzing and interpreting neural networks for NLP, 353–355. Brussels: Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/W18-5446.

Wang, A. et al. 2020. SuperGLUE: A stickier benchmark for general-purpose language understanding systems. Advances in neural information processing systems 32 (Proceedings, NeurIPS 2019). Vancouver. https://proceedings.neurips.cc/paper_files/paper/2019/hash/4496bf24afe7fab6f046bf4923da8de6-Abstract.html.

Downloads

Publiceret

2024-07-10

Citation/Eksport

Sandford Pedersen, B., Hau Sørensen, N. C., Olsen, S., & Nimb, S. (2024). Evaluering af sprogforståelsen i danske sprogmodeller - med udgangspunkt i semantiske ordbøger. NyS, Nydanske Sprogstudier, 1(65), 8–40. https://doi.org/10.7146/nys.v1i65.143072

Nummer

Sektion

Artikler