Current position

I'm coordinating phonetic transcription for the ORTOFON corpus project at the Spoken Corpora Section of the Institute of the Czech National Corpus, Faculty of Arts, Charles University in Prague. I also help out with various data processing tasks and functionality prototyping (mainly using Python, Perl and R).


For more, check out my GitHub profile.

  • KonText Layout Switcher • a Tampermonkey/Greasemonkey script to customize the interface of the KonText corpus manager (see link for a screenshot)
  • MluvKonk • an experimental viewer for spoken corpus concordances as exported from KonText, which tries to give more visual cues as to the structure of the dialogue; similar functionality in KonText itself is hopefully coming up soon!
  • CNC Maps • an interactive web interface for displaying and manipulating concordances from the ORAL series corpora on a map. also features a component for browsing dialect recordings along with descriptions.
  • TransVer • a transcription verifier for our current spoken data collection project at the Czech National Corpus. written in Clojure, using the wonderful seesaw wrapper library around the horrible swing GUI toolkit.
  • my MA thesis (in czech) -- Perceptual sensitivity to music and speech stimuli in the frequency and temporal domains • if you're interested in the affinities between the ways humans process language and music, check it out!
  • EXMARaLDA/EXAKT tutorial • a short tutorial on using the EXAKT corpus concordancer tool. mainly intended for internal use, but you might find something useful in there. if you speak czech, that is :) suggestions for improvement welcome!
  • PraatEdit • a code editor for scripting the Praat speech analysis software environment, with syntax highlighting, written as a Java-learning project -- so expect bugs :)


pdf (in Czech; update: November 2016)



  • Petra Klimešová, Zuzana Komrsková, Marie Kopřivová, and David Lukeš. Avenues for corpus-based research on informal spoken Czech. In Piotr Pęzik and Jacek Tadeusz Waliński, editors, Language, Corpora and Cognition, volume 51 of Łódź Studies in Language, 145–162. Peter Lang Edition, 2017.
    Bibtex ] [ PDF ]

  • Pavel Šturm and David Lukeš. Fonotaktická analýza obsahu slabik na okrajích českých slov v mluvené a psané řeči. Slovo a slovesnost, 2017. To appear in issue 2/2017.
    Bibtex ]


  • David Lukeš and Zuzana Komrsková. Strategies for automatic morphological tagging of non-standard Czech. Conference presentation at the Slavic Spoken Corpora Workshop, Slavisches Seminar der Universität Freiburg i. Breisgau, 10 2016.
    Bibtex ] [ Slides ]

  • Adrian Zasina, Anna Řehořková, David Lukeš, Petra Poukarová, Václav Cvrček, and Zuzana Komrsková. Multidimenzionální analýza češtiny. Pilotní studie. Conference presentation at Korpusová lingvistika Praha 2016, ÚČNK FF UK, 9 2016.
    Bibtex ] [ Slides ]


  • Hana Goláňová, Marie Kopřivová, David Lukeš, and Martin Štěpán. Kartografické a geografické zpracování dat z mluvených korpusů. Korpus – gramatika – axiologie, pages 42–54, 2015.
    Bibtex ]

  • Petra Klimešová, Zuzana Komrsková, Marie Kopřivová, and David Lukeš. Slovo \emph to v mluvených korpusech ČNK, jeho prefixace a reduplikace. Časopis pro moderní filologii, 97(1):21–30, 2015.
    Bibtex ]

  • David Lukeš. Increasing speed and consistency of phonetic transcription of spoken corpora using ASR technology. In Federica Formato and Andrew Hardie, editors, Corpus Linguistics 2015 – Abstract Book, 222–223. UCREL, 2015.
    Bibtex ] [ Slides ]

  • David Lukeš. New tools for working with the ORAL series corpora of spoken Czech: AchSynku and MluvKonk. In Katarína Gajdošová and Adriána Žáková, editors, Natural Language Processing, Corpus Linguistics, Lexicography, 90–101. RAM-Verlag, 2015. Eighth International Conference. Proceedings of SLOVKO 2015.
    Bibtex ] [ PDF ] [ Slides ]

  • David Lukeš, Petra Klimešová, Zuzana Komrsková, and Marie Kopřivová. Experimental tagging of the ORAL series corpora: insights on using a stochastic tagger. In Pavel Král and Václav Matoušek, editors, TSD 2015, LNAI 9302, 342–350. Springer International Publishing, 2015.
    Bibtex ] [ Poster ]


  • Marie Kopřivová, Hana Goláňová, Petra Klimešová, Zuzana Komrsková, and David Lukeš. Multi-tier transcription of informal spoken Czech: the ORTOFON corpus approach. In Complex Visibles Out There. Proceedings of the Olomouc Linguistics Colloquium 2014: Language Use and Linguistic Structure, Olomouc Modern Language Series, Vol. 4, 529–544. Univerzita Palackého, 2014.
    Bibtex ] [ PDF ]

  • Marie Kopřivová, Petra Klimešová, Hana Goláňová, and David Lukeš. Mapping diatopic and diachronic variation in spoken Czech: the ORTOFON and DIALEKT corpora. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), 376–382. European Language Resources Association (ELRA), 2014.
    Bibtex ] [ PDF ] [ Poster ]

  • David Lukeš. Perspektivy fonetické anotace v korpusech mluveného jazyka. In Korpusová lingvistika Praha 2014 – Abstrakty, 125–127. Ústav Českého národního korpusu, 2014. Work-in-progress presentation.
    Bibtex ] [ PDF ]

  • David Lukeš, Dita Fejlová, and Radek Skarnitzl. Variability of Czech alveolar plosives: a locus equation perspective. AUC Philologica, Phonetica Pragensia, XIII(1):21–32, 2014.
    Bibtex ]


  • Dita Fejlová, David Lukeš, and Radek Skarnitzl. Formant contours in Czech vowels: speaker-discriminating potential. In INTERSPEECH-2013, 3182–3186. 2013.
    Bibtex ]