Current position

I'm coordinating phonetic transcription for the ORTOFON corpus project at the Spoken Corpora Section of the Institute of the Czech National Corpus, Faculty of Arts, Charles University in Prague. I also help out with various data processing tasks and functionality prototyping (mainly using Python, Perl and R).


Check out my GitHub profile.

Writing elsewhere

If you're interested in the affinities between people's abilities to process language and music, check my MA thesis (in Czech): Perceptual sensitivity to music and speech stimuli in the frequency and temporal domains. The repository also contains the perceptual test in Praat which is the meat of the research reported on in the thesis.

Back when I thought we'd be using EXAKT for querying our spoken corpora, I wrote a short tutorial about EXMARaLDA/EXAKT. It was mainly intended for internal use, but you might find something useful in there. If you speak czech, that is :)


pdf (in Czech; update: November 2016)



  • Petra Klimešová, Zuzana Komrsková, Marie Kopřivová, and David Lukeš. Avenues for corpus-based research on informal spoken Czech. In Piotr Pęzik and Jacek Tadeusz Waliński, editors, Language, Corpora and Cognition, volume 51 of Łódź Studies in Language, 145–162. Peter Lang Edition, 2017.
    Bibtex ] [ PDF ]

  • Pavel Šturm and David Lukeš. Fonotaktická analýza obsahu slabik na okrajích českých slov v mluvené a psané řeči. Slovo a slovesnost, 2017. To appear in issue 2/2017.
    Bibtex ]


  • David Lukeš and Zuzana Komrsková. Strategies for automatic morphological tagging of non-standard Czech. Conference presentation at the Slavic Spoken Corpora Workshop, Slavisches Seminar der Universität Freiburg i. Breisgau, 10 2016.
    Bibtex ] [ Slides ]

  • Adrian Zasina, Anna Řehořková, David Lukeš, Petra Poukarová, Václav Cvrček, and Zuzana Komrsková. Multidimenzionální analýza češtiny. Pilotní studie. Conference presentation at Korpusová lingvistika Praha 2016, ÚČNK FF UK, 9 2016.
    Bibtex ] [ Slides ]


  • Hana Goláňová, Marie Kopřivová, David Lukeš, and Martin Štěpán. Kartografické a geografické zpracování dat z mluvených korpusů. Korpus – gramatika – axiologie, pages 42–54, 2015.
    Bibtex ]

  • Petra Klimešová, Zuzana Komrsková, Marie Kopřivová, and David Lukeš. Slovo \emph to v mluvených korpusech ČNK, jeho prefixace a reduplikace. Časopis pro moderní filologii, 97(1):21–30, 2015.
    Bibtex ]

  • David Lukeš. Increasing speed and consistency of phonetic transcription of spoken corpora using ASR technology. In Federica Formato and Andrew Hardie, editors, Corpus Linguistics 2015 – Abstract Book, 222–223. UCREL, 2015.
    Bibtex ] [ Slides ]

  • David Lukeš. New tools for working with the ORAL series corpora of spoken Czech: AchSynku and MluvKonk. In Katarína Gajdošová and Adriána Žáková, editors, Natural Language Processing, Corpus Linguistics, Lexicography, 90–101. RAM-Verlag, 2015. Eighth International Conference. Proceedings of SLOVKO 2015.
    Bibtex ] [ PDF ] [ Slides ]

  • David Lukeš, Petra Klimešová, Zuzana Komrsková, and Marie Kopřivová. Experimental tagging of the ORAL series corpora: insights on using a stochastic tagger. In Pavel Král and Václav Matoušek, editors, TSD 2015, LNAI 9302, 342–350. Springer International Publishing, 2015.
    Bibtex ] [ Poster ]


  • Marie Kopřivová, Hana Goláňová, Petra Klimešová, Zuzana Komrsková, and David Lukeš. Multi-tier transcription of informal spoken Czech: the ORTOFON corpus approach. In Complex Visibles Out There. Proceedings of the Olomouc Linguistics Colloquium 2014: Language Use and Linguistic Structure, Olomouc Modern Language Series, Vol. 4, 529–544. Univerzita Palackého, 2014.
    Bibtex ] [ PDF ]

  • Marie Kopřivová, Petra Klimešová, Hana Goláňová, and David Lukeš. Mapping diatopic and diachronic variation in spoken Czech: the ORTOFON and DIALEKT corpora. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), 376–382. European Language Resources Association (ELRA), 2014.
    Bibtex ] [ PDF ] [ Poster ]

  • David Lukeš. Perspektivy fonetické anotace v korpusech mluveného jazyka. In Korpusová lingvistika Praha 2014 – Abstrakty, 125–127. Ústav Českého národního korpusu, 2014. Work-in-progress presentation.
    Bibtex ] [ PDF ]

  • David Lukeš, Dita Fejlová, and Radek Skarnitzl. Variability of Czech alveolar plosives: a locus equation perspective. AUC Philologica, Phonetica Pragensia, XIII(1):21–32, 2014.
    Bibtex ]


  • Dita Fejlová, David Lukeš, and Radek Skarnitzl. Formant contours in Czech vowels: speaker-discriminating potential. In INTERSPEECH-2013, 3182–3186. 2013.
    Bibtex ]