To do that, use a spell checking program like aspell (or whatever you want) to go through each SRT file. Now that most of your common OCR issues are fixed, you still have plenty of spelling mistakes to fix. They need to be something like "Replace 5 followed by letters with S followed by the same letters".ģ. And they have to be complex regex filters, not just simple "replace 5 with S" because that will screw up everything. My final filter file had over 200 replacements, and I had to figure out each one by hand. Create a filter with whatever find/replace program you want (I use sed), and replace all the common issues in the SRT files. Go through the SRT files and look for common issues such as I being written as | or i, or S being written as 5 or vice versa. To fix the OCR errors, you first need to recognize all the common problems. Now you have a bunch of SRT files for all your videos, but they will have loads of spelling errors (no matter how good the OCR is), and also OCR errors (such as I being recognized as a | or incorrect spacing or incorrect newlines).Ģ. Do this with whatever software you like, I used vobsub2srt which uses tesseract as the OCR backend. This is going to be a really high-level explanation.Įxtract all VOBSUBs from files with ffmpeg, then OCR the VOBSUBs to SRT. Here is what I had to go through to get perfect subtitles for an anime I ripped from DVD that had VOBSUBs (Dragon Ball Z Dragon Box Set). It's not as simple as putting it through one program and that's that. I just recently did this and it is much more laborious and time-consuming than you think. It worked great and I recommend it for anyone else struggling with VobSubs. Has anyone else had success with these tools? Are there any other tools I should be using?Įdit: I've found success using SubtitleEdit. This could be a problem with the source files, so I've been trying to get one of the other tools working.Īvidemux - I read that there should be an option to convert VOBSUB under "tools". SubRip - This works the best of the tools I've tried, but it entirely freezes and crashes at 51%. I've tried changing the space detection sensitivity to both ends of the range without any difference. I have to enter entire lines for them to be recorded correctly and the algorithm doesn't seem to learn and improve the detection, making this unusable. VobSub - I can begin the OCR process, but it seems to fail to detect any spaces. Almost all of the recommendations I've seen are many years old. I've tried using a few different tools that I see circulating without any success. I've been trying to convert some truly ugly VOBSUBs to SRT subtitles.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |