How to: clean up audio quality in a video

TL;DR

Extract audio to mp3 using ffmpeg.
Clean audio with machine learning (e.g. something like https://podcast.adobe.com/enhance or if you want to get your hands dirty https://github.com/shahules786/mayavoz).
Replace audio in original video with ffmpeg.

Backstory

I've been on an AI/ML kick recently and have been trying out a ton of new tech and products. I recently discovered Modal, which seems like AWS Lambda for machine learning (but super fast config/deploy and great developer experience). I signed up for their service and was greeted by an action-packed Loom that sounded totally blown out. I looked up the author on Twitter and saw this tweet:

So I figured I could put some goodwill into the community and help their onboarding experience by making the video sound a bit better.

Step 1: download the video from Loom

There's a button in the UI. Just click it. :)

Step 2: Extract audio from video with ffmpeg

$ ffmpeg -i loom_original.mp4 blown_out_audio.mp3

If command lines aren't your thing, you can also use something like Audio Extractor.

Step 3: Clean up audio

Drag and drop blown_out_audio.mp3 onto https://podcast.adobe.com/enhance and wait about 3 minutes. This auto-saves as blown_out_audio (enhanced).wav.

If you don't want to use a 3rd party service, you could also use an ML model like this one: https://github.com/shahules786/mayavoz and set up your own pipeline.

Step 4: Replace audio in original video

$ ffmpeg -i loom_original.mp4 -i "blown_out_audio (enhanced).wav" -c:v copy -map 0:v:0 -map 1:a:0 new_hotness.mp4

Conclusion

We are in a golden age of machine learning. There are going to be thousands of services launched in the next year to help with image/text/audio/video generation/manipulation/editing/enhancement. Given the amount of open source code and models that exist, these services will likely be completely free addons to existing products or incredibly cheap & accessible products.

References

A Complete Guide to Speech Enhancement