Skip to content

Commit 3ab9500

Browse files
committed
Added description in README
1 parent 96b8f7e commit 3ab9500

2 files changed

Lines changed: 109 additions & 2 deletions

File tree

README.md

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,109 @@
11
# Roboy Sonosco
22
Roboy Sonosco (from Lat. sonus - sound and nōscō - I know, recognize) - a library for Speech Recognition based on Deep Learning models
3+
4+
## Installation
5+
6+
The supported OS is Ubuntu 18.04 LTS (however, it should work fine on other distributions).
7+
Supported Python version is 3.6+.
8+
Supported CUDA version is 10.0.
9+
Supported PyTorch version is 1.0.
10+
11+
---
12+
13+
Install CUDA 10.0 from [NVIDIA website](https://developer.nvidia.com/cuda-10.0-download-archive). Make sure that your local gcc, g++, cmake versions are not older than the ones used to compile your OS kernel.
14+
15+
You will need to download the latest [cuDNN](https://developer.nvidia.com/rdp/cudnn-archive) for CUDA 10.0.
16+
Unzip it:
17+
```
18+
tar -xzvf cudnn-9.0-linux-x64-v7.tgz
19+
```
20+
Run
21+
```
22+
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
23+
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
24+
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
25+
```
26+
27+
**All of the following steps you may perform inside [Anaconda](https://www.anaconda.com/) or [virtualenv](https://virtualenv.pypa.io/en/latest/)**
28+
29+
Install [PyTorch](https://pytorch.org/get-started/locally/). For your particular configuration, you may want to build it from the [sources](https://github.com/pytorch/pytorch).
30+
31+
Install SeanNaren's fork for Warp-CTC bindings. **Deprecated**: will be updated to use [built-in](https://pytorch.org/docs/stable/nn.html#torch.nn.CTCLoss) functions.
32+
```
33+
git clone https://github.com/SeanNaren/warp-ctc.git
34+
cd warp-ctc; mkdir build; cd build; cmake ..; make
35+
export CUDA_HOME="/usr/local/cuda"
36+
cd ../pytorch_binding && python setup.py install
37+
```
38+
39+
Install pytorch audio:
40+
```
41+
sudo apt-get install sox libsox-dev libsox-fmt-all
42+
git clone https://github.com/pytorch/audio.git
43+
cd audio && python setup.py install
44+
```
45+
46+
If you want decoding to support beam search with an optional language model, install [ctcdecode](https://github.com/parlance/ctcdecode):
47+
```
48+
git clone --recursive https://github.com/parlance/ctcdecode.git
49+
cd ctcdecode && pip install .
50+
```
51+
52+
Clone this repo and run this within the repo:
53+
```
54+
pip install -r requirements.txt
55+
```
56+
57+
### Mixed Precision
58+
If you want to use mixed precision training, you have to install [NVIDIA Apex](https://devblogs.nvidia.com/apex-pytorch-easy-mixed-precision-training/):
59+
60+
```
61+
git clone --recursive https://github.com/NVIDIA/apex.git
62+
cd apex && pip install .
63+
```
64+
65+
## Usage
66+
67+
### Dataset
68+
69+
To create a dataset you must create a CSV manifest file containing the locations of the training data. This has to be in the format of:
70+
71+
```
72+
/path/to/audio.wav,/path/to/text.txt
73+
/path/to/audio2.wav,/path/to/text2.txt
74+
...
75+
```
76+
There is an example in examples directory.
77+
78+
### Training, Testing and Inference
79+
80+
Fundamentally, you can run the scripts the same way:
81+
82+
```
83+
python3 train.py --config /path/to/config/file.yaml
84+
python3 test.py --config /path/to/config/file.yaml
85+
python3 infer.py --config /path/to/config/file.yaml
86+
```
87+
The scripts are initialised via configuration files.
88+
89+
#### Configuration
90+
91+
Configuration file contains arguments for ModelWrapper initialisation as well as extra parameters. Like this:
92+
93+
```
94+
train:
95+
...
96+
log-dir: 'logs' # Location for log files
97+
def-dir: 'examples/checkpoints/', # Default location to save/load models
98+
model-name: 'asr_final.pth' # File name to save the best model
99+
sample-rate: 16000 # Sample rate
100+
window: 'hamming' # Window type for spectrogram generation
101+
batch-size: 32 # Batch size for training
102+
checkpoint: True # Enables checkpoint saving of model
103+
...
104+
```
105+
More configuration examples with descriptions you may find in the config directory.
106+
107+
## Acknowledgements
108+
109+
This project is partially based on SeanNaren's [deepspeech.pytorch](https://github.com/SeanNaren/deepspeech.pytorch) repository.

config/train.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@ train:
44
labels-path: 'examples/labels.json' # Contains all characters for transcription
55
log-dir: 'logs' # Location for log files
66
def-dir: 'examples/checkpoints/', # Default location to save/load models
7-
model-name: 'deepspeech_final.pth' # File name to save the best model
8-
load-from: 'deepspeech_final.pth' # File name containing a checkpoint to continue/finetune
7+
git
8+
load-from: 'asr_final.pth' # File name containing a checkpoint to continue/finetune
99

1010
sample-rate: 16000 # Sample rate
1111
window-size: 0.02 # Window size for spectrogram in seconds

0 commit comments

Comments
 (0)