LD&C Special Publication No. 9: Language Documentation and Conservation in Europe

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 5 of 21
  • Item
    Foreword
    (University of Hawai'i Press, 2016-02) Ferreira, Vera ; Bouda, Peter
  • Item
    Brief considerations about language policy: An European assessment
    (University of Hawai'i Press, 2016-02) Carvalho Vicente, Paulo ; Carvalho Vicente, Francisco
    The rising of language policy worldwide is a consequence of a globalized world and the openness of borders. Even countries with a relative cultural homogeneity face nowadays new challenges regarding massive migration fluxes and the results of growing awareness for endangered languages and cultures, notably in Europe. This is being noticed around the Old Continent where diversity proves to be a distinct value since ever. In this paper we reflect on the scope of cultural identity and multilingualism to shed new light on language policy and consequently refresh our understanding of a key policy, which is already a decisive public policy for the European peoples.
  • Item
    Bridging divides: A proposal for integrating the teaching, research and revitalization of Nahuatl
    (University of Hawai'i Press, 2016-02) Olko, Justyna ; Sullivan, John
    This paper discusses major historical, cultural, linguistic, social and institutional factors contributing to the shift and endangerment of the Nahuatl language in Mexico. As a practical proposal, we discuss our strategy for its revitalization, as well as a series of projects and activities we have been carrying out for the last several years. Crucial to this approach are several complementary elements: interdisciplinary research, including documentary work, as well as investigation of both the historical and the present state of Nahuatl language and culture; integration of both Western and native-speaking indigenous researchers as equal partners and the provision of space for indigenous methodologies; creation of teaching programs for native and non-native speakers oriented toward the preparation of language materials; and close collaboration with indigenous communities in developing community-based programs. The operability of this strategy will depend greatly on our ability to foster collaboration across academic, social, and ideological boundaries, to integrate theory, methodology and program implementation, and to efficiently combine grass- roots and top-down approaches. An important aim is to restore the culture of literacy in Nahuatl through our monolingual Totlahtol series, publishing works from all variants of the language and encompassing all genres of writing. We also strive to strengthen the historical and cultural identity of native speakers by facilitating their access to the alphabetical texts written by their ancestors during the colonial era.
  • Item
    The first Mirandese text-to-speech system 
    (University of Hawai'i Press, 2016-02) Ferreira, José Pedro ; Chesi, Cristiano ; Baldewijns, Daan ; Braga, Daniela ; Dias, Miguel ; Correia, Margarita
    This paper describes the creation of base NLP resources and tools for an under-resourced minority language spoken in Portugal, Mirandese, in the context of the generation of a text-to-speech system, a collaborative citizenship project between Microsoft, ILTEC, and ALM – Associaçon de la Lhéngua Mirandesa. Development efforts encompassed the compilation of a large textual corpus, definition of a complete phone-set, development of a tokenizer, inflector, TN and GTP modules, and creation of a large phonetic lexicon with syllable segmentation, stress mark-up, and POS. The TTS system will provide an open access web interface freely available to the community, along with the other resources. We took advantage of mature tools, resources, and processes already available for phylogenetically-close languages, allowing us to cut development time and resources to a great extent, a solution that can be viable for other lesser-spoken languages which enjoy a similar situation.
  • Item
    BaTelÒc: A text base for the Occitan language
    (University of Hawai'i Press, 2016-02) Bras, Myriam ; Vergez-Couret, Marianne
    Language Documentation, as defined by Himmelmann (2006), aims at compiling and preserving linguistic data for studies in linguistics, literature, his- tory, ethnology, sociology. This initiative is vital for endangered languages such as Occitan, a romance language spoken in southern France and in several valleys of Spain and Italy. The documentation of a language concerns all its modalities, covering spoken and written language, various registers and so on. Nowadays, Occitan documentation mostly consists of data from linguistic atlases, virtual libraries from the modern to the contemporary period, and text bases for the Middle Ages. BaTelÒc is a text base for modern and contemporary periods. With the aim of creating a wide coverage of text collections, BaTelÒc gathers not only written literary texts (prose, drama and poetry) but also other genres such as technical texts and newspapers. Enough material is already available to foresee a text base of hundreds of millions of words. BaTelÒc not only aims at documenting Occitan, it is also designed to provide tools to explore texts (different criteria for corpus selection, concordance tools and more complex enquiries with regular expressions). As for linguistic analysis, the second step is to enrich the corpora with annotations. Natural Language Processing of endangered languages such as Occitan is very challenging. It is not possible to transpose existing models for resource-rich languages directly, partly because of the spelling, dialectal variations, and lack of standardization. With BaTelÒc we aim at providing corpora and lexicons for the development of basic natural language processing tools, namely OCR and a Part-of-Speech tagger based on tools initially designed for machine translation and which take variation into account.