Романенко Алексей Николаевич Робастное распознавание речи для низко-ресурсных языков



  • Название:
  • Романенко Алексей Николаевич Робастное распознавание речи для низко-ресурсных языков
  • Альтернативное название:
  • Романенко Олексій Миколайович Робасте розпізнавання мови для низько-ресурсних мов
  • Кол-во страниц:
  • 314
  • ВУЗ:
  • Национальный исследовательский университет ИТМО
  • Год защиты:
  • 2020
  • Краткое описание:
  • Романенко Алексей Николаевич Робастное распознавание речи для низко-ресурсных языков

    ОГЛАВЛЕНИЕ ДИССЕРТАЦИИ

    кандидат наук Романенко Алексей Николаевич

    Contents



    Реферат



    Synopsis



    1 Introduction



    1.1 Current State of Speech Recognition



    1.2 Speech Recognition for Low-Resource Languages



    1.3 Thesis contribution



    1.4 Outline



    2 Background



    2.1 Structure of modern ASR system



    2.2 Signal processing and feature extraction



    2.3 Acoustic Modelling



    2.3.1 Acoustic Model Based on GMM-HMM Structure



    2.3.2 Acoustic Model Based on Artificial Neural Networks



    2.2.3 DNN-HMM Acoustic Model



    2.3.4 Sequence-Discriminative Training of DNN-HMM Acoustic Model



    2.3.5 Recurrent Neural Network for Acoustic Modelling



    2.3.6 Error Backpropagation Through Time



    2.3.7 Deep Recurrent Networks



    2.3.8 Bidirectional Recurrent Networks



    2.3.9 Acoustic models based on RNN



    2.4 Language Modelling



    2.5 Decoding



    2.6 Summary



    3 Related works



    3.1 Feature Engineering



    3.2 Acoustic Models



    3.3 Language models



    3.4 Auxiliary Techniques



    3.5 Summary



    4 Novel Approaches and Universal Methodology



    4.1 Selection of Basic Acoustic Features



    4.2 Acoustic Modelling



    4.2.1 Initial GMM-HMM Models Training Pipeline



    4.2.2 Multi-Language Speaker-Dependent Bottleneck Extractors Training



    4.2.3 Final GMM-HMM Models Training



    4.2.4 DNN-HMM Acoustic Models



    4.2.5 Acoustic features combination



    4.2.6 Audio augmentation techniques



    4.2.7 Combination for acoustic modelling



    4.3 Language Modelling



    4.3.1 N-gram Based Language Model Training



    4.3.2 Addition of web-text data



    4.3.3 Generating new text data



    4.3.4 Neural Network Based Language Model Training



    4.4 Summary



    5 Data and tools



    5.1 Rationale for the choice of language sets



    5.2 Closed dataset



    5.3 Open datasets



    5.4 Third-Party Tools



    5.4.1 The Kaldi Speech Recognition Toolkit



    5.4.3 The char-rnn tool



    5.4.4 The SRI Language Modeling Toolkit



    5.4.5 The RNNLM Toolkit



    5.5 Summary



    6 Evaluation



    6.1 Acoustic Modeling



    6.1.1 Initial GMM-HMM Acoustic Models Training



    6.1.2 Features Selection for GMM-HMM Acoustic Models



    6.1.3 Multilingual Bottleneck Features



    6.1.4 Features Combination for NN Acoustic Models



    6.1.5 Audio Data Augmentation



    6.1.6 Sequence Training of NN Acoustic Models



    6.2 Language modelling



    6.3 Composition of the Final ASR System



    6.4 Combination of Models and Comparison with State-of-the-Art Results



    6.5 Summary



    7 Conclusion and Future Directions



    7.1 Summary



    7.2 Thesis Contributions



    7.2.1 Theoretical



    7.2.2 Practical



    7.2.3 Experimental



    7.3 Future Directions



    References



    Appendix



    A - Acoustic Modelling



    B - Language Modelling



    C - Final Models



    D - Combination of Models and Comparison with State-of-the-Art Results



    List of Figures



    List of Tables



    List of Own Publications
  • Список литературы:
  • -
  • Стоимость доставки:
  • 230.00 руб


ПОИСК ДИССЕРТАЦИИ, АВТОРЕФЕРАТА ИЛИ СТАТЬИ


Доставка любой диссертации из России и Украины