Sign InSign Up

How to clean a transcript based on specific knowledge

A practical way to make automatic YouTube transcripts more reliable when names, locations, and wording conventions matter.


A not so unusual problem with automatic Youtube transcripts is that they can be inaccurate when they are faced with proper names, company names or specific wording/pronunciation based on the location of the speaker.

A way to fix that is to use an LLM powered by a prompt that's specific about those issues.
For instance, for proper names:

## Spellings

Below is a list of names that may be mentioned in the video, followed by their incorrect spellings. Based on the context, if you find out that they are mentioned, either with these incorrect spellings or any other variations, fix the spelling. If only a part of that name is mentioned, fix the spelling of that part only. Add the corresponding link only the first time they are mentioned.

Additionally, if you find out that a word has been transcribed incorrectly based on the context, you should also fix it, even if it's not in the list below.

### People

* [Franco Betteo](https://www.linkedin.com/in/franco-betteo-3230bb96/) (author of the video): "Be Teo", "Ve Teo", "Betheo", "Be Theo"

This way, when the LLM encounters a variation of my last name listed or similar, it will fix it and add the link to my Linkedin too in .md format.

This same logic can be used for company names or other specific words and phrases that might appear. Could be expressions of your country, British vs American English, or prononucaitons that the transcript can fail to capture (different states in USA).

And since we are already using an LLM to clean those specifics, we can add more general things we would like to fix, such as:

* Convert all mentions of years and quantities to numeric format.
* Convert "por ciento" to its symbol, "%"

You can be as creative as you want and include as many conventions you would like the transcript to follow. This pattern used across a bulk of transcripts can give you structured transcripts that follow your desired format.

Thanks to Silver.dev for the open code example of how they use this.


Related Articles & Tools

How to Convert YouTube Videos to Text

Methods for single videos and bulk workflows.

Conventions Mentioned In This Post

  • Proper-name spelling normalization with first-mention links
  • Context-aware correction for words not in the predefined list
  • Formatting normalization, such as numeric years and percentage symbols

Product

FeaturesPricingBulk DownloaderFree ToolsAPI Access

Connect

Questions? Email [email protected]

Follow me on @franbetteo

Not affiliated with YouTube or Google.
© 2026 downloadyoutubetranscripts.com

We contribute revenue to removing CO₂ from atmosphere