44100Hz - default frames per second
How can we approach audio data?
- Raw amplitudes - hard to extract features
- Spectrograms - already know some models which can deal with it
Time domain vs Frequency domain
Fast Fourier Transformation
FF is all you need!
MFCC
LSTM architecture
torchaudio or librossa
S3PRL - audio processing models (recognition and so)
Squeezeformer