Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best way to extract closed caption from movie files?

I need to extract closed caption information from movie files, I have tried ccextractor but it does not seem to work.

I have captured a video stream (with closed caption in it) and saved to a file and then I run ccextractor... but it can't find anything!

My video samples are below:

http://dl.dropbox.com/u/10244901/gsd.mpg

http://dl.dropbox.com/u/10244901/gsd_b.mpg

First try:

cvlc -I dummy v4l2:///dev/video1:width=720:height=480:norm=ntsc:standard=ntsc:pixelformat=2:aspect-ratio=4\:3:channel=0 --sout "#transcode{vcodec=mp2v}:standard{access=file,mux=dummy,dst=gsd.mpg}"

lzzz@ideiatu:~/Downloads/ccextractor.0.64/linux$ ./ccextractor gsd.mpg 
CCExtractor 0.64, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: gsd.mpg
[Raw Mode: Broadcast] [Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: No]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: Latin-1] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
Teletext page: Autodetect]
Start credits text: [None]
Creating gsd.srt

-----------------------------------------------------------------
Opening file: gsd.mpg
File seems to be an elementary stream, enabling ES mode
Analyzing data in general mode


New video information found
[720 * 480] [AR: 02 - 4:3] [FR: 03 - 25] [progressive: yes]

133%  |  01:40
Number of NAL_type_7: 0
Number of VCL_HRD: 0
Number of NAL HRD: 0
Number of jump-in-frames: 0
Number of num_unexpected_sei_length: 0

Total frames time:      00:01:41:200  (2530 frames at 25.00fps)

Min PTS:                00:00:00:000
Max PTS:                00:01:41:200
Length:                 00:01:41:200

Initial GOP time:       00:00:00:000
Final GOP time:         00:01:40:800+10F
Diff. GOP length:       00:01:40:800+10F    (00:01:41:133)
Done, processing time = 0 seconds
This is beta software. Report issues to cfsmp3 at gmail...

Second try:

cvlc -I dummy gsd.mpg --sout "#standard{access=file,mux=ts,dst=gsd_b.mpg}"



lzzz@ideiatu:~/Downloads/ccextractor.0.64/linux$ ./ccextractor gsd_b.mpg
CCExtractor 0.64, Carlos Fernandez Sanz, Volker Quetschke.
--------------------------------------------------------------------------
Input: gsd_b.mpg
[Raw Mode: Broadcast] [Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: No]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: Latin-1] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
Teletext page: Autodetect]
Start credits text: [None]
Creating gsd_b.srt

-----------------------------------------------------------------
Opening file: gsd_b.mpg
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
Decode captions from MPEG-2 video stream [0x02]  -  PID: 68

New PID found: 68


New video information found
[720 * 480] [AR: 02 - 4:3] [FR: 03 - 25] [progressive: yes]

100%  |  00:00
Number of NAL_type_7: 0
Number of VCL_HRD: 0
Number of NAL HRD: 0
Number of jump-in-frames: 0
Number of num_unexpected_sei_length: 0

Total frames time:      00:01:41:040  (2526 frames at 25.00fps)

Min PTS:                02:59:52:437
Max PTS:                02:59:52:677
Length:                 00:00:00:240

Initial GOP time:       00:00:00:000
Final GOP time:         00:01:40:800 +6F
Diff. GOP length:       00:01:40:800 +6F    (00:01:41:000)
Done, processing time = 0 seconds
This is beta software. Report issues to cfsmp3 at gmail...
like image 313
lzzzfelipe Avatar asked Nov 25 '12 20:11

lzzzfelipe


People also ask

Can you extract hardcoded subtitles?

The hardcoded subtitles, refer to the subtitles that have been burnt or embedded into the video image, which will appear on the video from beginning to end and cannot be manually turned on or off. The only method of extracting hardcoded subtitles is resort to optical character recognition (OCR) technology.

How do I extract subtitles from a zip file?

Extract your subtitle file: Subtitle files will download in ZIP folders, but you can remove the subtitle file itself by doing the following: Windows — Double-click the ZIP folder, click Extract at the top of the window, click Extract all, and click Extract at the bottom of the window that appears.


1 Answers

Some movies don't have a hidden file containing the captions, but the subtitles are hardcoded into the video, meaning they are actually part of the video and cannot be distinguished.

You can try to google for a standalone version of subtitles for a movie.

like image 120
Zikato Avatar answered Nov 08 '22 15:11

Zikato