rocAL RNNT dataloading in Python#
rocAL supports the RNNT speech recognition model through audio readers and other functions that can be used with PyTorch.
All the functions used for RNNT dataloading are available in the amd.rocal.fn
module. See Using rocAL with Python API for more details about this module.
All the augmentations used in the RNNT dataloader pipeline are available as part of rocAL. These augmentations need to be plugged into the rocAL PyTorch dataloader to run the training. PyTorch samples can be found in the rocAL GitHub repository.
Note
The rocAL GitHub repository does not host the entire RNNT dataloader source.
Function |
Description |
Details |
---|---|---|
|
Resamples an audio signal. |
Resampling is achieved by applying a sinc filter with a Hann window. The extent is controlled by the function’s |
|
Detects leading and trailing silences. |
Returns the beginning and length of the non-silent region. Compares the short-term power calculated for the window length of the signal with a silence cut-off threshold. The signal is considered to be silent when the short term power in decibels is less than the cut-off threshold in decibels. |
|
Slices the input. |
The slice is specified by an anchor and a shape for the slice. |
|
Applies a preemphasis filter to the input. |
The filter used is |
|
Produces a spectrogram from a 1D audio signal. |
|
|
Converts a spectrogram to a mel spectrogram. |
Conversion is done by applying a bank of triangular filters where the frequency dimension is selected from the input layout. |
|
Converts a magnitude to decibels. |
The conversion is done using |
|
Normalizes an input. |
Normalization is done by removing the mean and dividing by the standard deviation. |