|
1 | 1 | # Roboy Sonosco |
2 | 2 | Roboy Sonosco (from Lat. sonus - sound and nōscō - I know, recognize) - a library for Speech Recognition based on Deep Learning models |
| 3 | + |
| 4 | +## Installation |
| 5 | + |
| 6 | +The supported OS is Ubuntu 18.04 LTS (however, it should work fine on other distributions). |
| 7 | +Supported Python version is 3.6+. |
| 8 | +Supported CUDA version is 10.0. |
| 9 | +Supported PyTorch version is 1.0. |
| 10 | + |
| 11 | +--- |
| 12 | + |
| 13 | +Install CUDA 10.0 from [NVIDIA website](https://developer.nvidia.com/cuda-10.0-download-archive). Make sure that your local gcc, g++, cmake versions are not older than the ones used to compile your OS kernel. |
| 14 | + |
| 15 | +You will need to download the latest [cuDNN](https://developer.nvidia.com/rdp/cudnn-archive) for CUDA 10.0. |
| 16 | +Unzip it: |
| 17 | +``` |
| 18 | +tar -xzvf cudnn-9.0-linux-x64-v7.tgz |
| 19 | +``` |
| 20 | +Run |
| 21 | +``` |
| 22 | +sudo cp cuda/include/cudnn.h /usr/local/cuda/include |
| 23 | +sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64 |
| 24 | +sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn* |
| 25 | +``` |
| 26 | + |
| 27 | +**All of the following steps you may perform inside [Anaconda](https://www.anaconda.com/) or [virtualenv](https://virtualenv.pypa.io/en/latest/)** |
| 28 | + |
| 29 | +Install [PyTorch](https://pytorch.org/get-started/locally/). For your particular configuration, you may want to build it from the [sources](https://github.com/pytorch/pytorch). |
| 30 | + |
| 31 | +Install SeanNaren's fork for Warp-CTC bindings. **Deprecated**: will be updated to use [built-in](https://pytorch.org/docs/stable/nn.html#torch.nn.CTCLoss) functions. |
| 32 | +``` |
| 33 | +git clone https://github.com/SeanNaren/warp-ctc.git |
| 34 | +cd warp-ctc; mkdir build; cd build; cmake ..; make |
| 35 | +export CUDA_HOME="/usr/local/cuda" |
| 36 | +cd ../pytorch_binding && python setup.py install |
| 37 | +``` |
| 38 | + |
| 39 | +Install pytorch audio: |
| 40 | +``` |
| 41 | +sudo apt-get install sox libsox-dev libsox-fmt-all |
| 42 | +git clone https://github.com/pytorch/audio.git |
| 43 | +cd audio && python setup.py install |
| 44 | +``` |
| 45 | + |
| 46 | +If you want decoding to support beam search with an optional language model, install [ctcdecode](https://github.com/parlance/ctcdecode): |
| 47 | +``` |
| 48 | +git clone --recursive https://github.com/parlance/ctcdecode.git |
| 49 | +cd ctcdecode && pip install . |
| 50 | +``` |
| 51 | + |
| 52 | +Clone this repo and run this within the repo: |
| 53 | +``` |
| 54 | +pip install -r requirements.txt |
| 55 | +``` |
| 56 | + |
| 57 | +### Mixed Precision |
| 58 | +If you want to use mixed precision training, you have to install [NVIDIA Apex](https://devblogs.nvidia.com/apex-pytorch-easy-mixed-precision-training/): |
| 59 | + |
| 60 | +``` |
| 61 | +git clone --recursive https://github.com/NVIDIA/apex.git |
| 62 | +cd apex && pip install . |
| 63 | +``` |
| 64 | + |
| 65 | +## Usage |
| 66 | + |
| 67 | +### Dataset |
| 68 | + |
| 69 | +To create a dataset you must create a CSV manifest file containing the locations of the training data. This has to be in the format of: |
| 70 | + |
| 71 | +``` |
| 72 | +/path/to/audio.wav,/path/to/text.txt |
| 73 | +/path/to/audio2.wav,/path/to/text2.txt |
| 74 | +... |
| 75 | +``` |
| 76 | +There is an example in examples directory. |
| 77 | + |
| 78 | +### Training, Testing and Inference |
| 79 | + |
| 80 | +Fundamentally, you can run the scripts the same way: |
| 81 | + |
| 82 | +``` |
| 83 | +python3 train.py --config /path/to/config/file.yaml |
| 84 | +python3 test.py --config /path/to/config/file.yaml |
| 85 | +python3 infer.py --config /path/to/config/file.yaml |
| 86 | +``` |
| 87 | +The scripts are initialised via configuration files. |
| 88 | + |
| 89 | +#### Configuration |
| 90 | + |
| 91 | +Configuration file contains arguments for ModelWrapper initialisation as well as extra parameters. Like this: |
| 92 | + |
| 93 | +``` |
| 94 | +train: |
| 95 | + ... |
| 96 | + log-dir: 'logs' # Location for log files |
| 97 | + def-dir: 'examples/checkpoints/', # Default location to save/load models |
| 98 | + model-name: 'asr_final.pth' # File name to save the best model |
| 99 | + sample-rate: 16000 # Sample rate |
| 100 | + window: 'hamming' # Window type for spectrogram generation |
| 101 | + batch-size: 32 # Batch size for training |
| 102 | + checkpoint: True # Enables checkpoint saving of model |
| 103 | + ... |
| 104 | +``` |
| 105 | +More configuration examples with descriptions you may find in the config directory. |
| 106 | + |
| 107 | +## Acknowledgements |
| 108 | + |
| 109 | +This project is partially based on SeanNaren's [deepspeech.pytorch](https://github.com/SeanNaren/deepspeech.pytorch) repository. |
0 commit comments