Danchik Audio processing

Created time
Mar 5, 2023 09:18 AM
Summary
Progress
Done
Category
Programming
URL
Source
Tet-A-Tet
44100Hz - default frames per second

How can we approach audio data?

  1. Raw amplitudes - hard to extract features
  1. Spectrograms - already know some models which can deal with it

Time domain vs Frequency domain

Fast Fourier Transformation
FF is all you need!
MFCC
LSTM architecture
 
torchaudio or librossa
 
S3PRL - audio processing models (recognition and so)
Squeezeformer