I need to extract closed caption information from movie files, I have tried ccextractor
but it does not seem to work.
I have captured a video stream (with closed caption in it) and saved to a file and then I run ccextractor
... but it can't find anything!
My video samples are below:
http://dl.dropbox.com/u/10244901/gsd.mpg
http://dl.dropbox.com/u/10244901/gsd_b.mpg
First try:
cvlc -I dummy v4l2:///dev/video1:width=720:height=480:norm=ntsc:standard=ntsc:pixelformat=2:aspect-ratio=4\:3:channel=0 --sout "#transcode{vcodec=mp2v}:standard{access=file,mux=dummy,dst=gsd.mpg}"
lzzz@ideiatu:~/Downloads/ccextractor.0.64/linux$ ./ccextractor gsd.mpg
CCExtractor 0.64, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: gsd.mpg
[Raw Mode: Broadcast] [Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: No]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: Latin-1] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
Teletext page: Autodetect]
Start credits text: [None]
Creating gsd.srt
-----------------------------------------------------------------
Opening file: gsd.mpg
File seems to be an elementary stream, enabling ES mode
Analyzing data in general mode
New video information found
[720 * 480] [AR: 02 - 4:3] [FR: 03 - 25] [progressive: yes]
133% | 01:40
Number of NAL_type_7: 0
Number of VCL_HRD: 0
Number of NAL HRD: 0
Number of jump-in-frames: 0
Number of num_unexpected_sei_length: 0
Total frames time: 00:01:41:200 (2530 frames at 25.00fps)
Min PTS: 00:00:00:000
Max PTS: 00:01:41:200
Length: 00:01:41:200
Initial GOP time: 00:00:00:000
Final GOP time: 00:01:40:800+10F
Diff. GOP length: 00:01:40:800+10F (00:01:41:133)
Done, processing time = 0 seconds
This is beta software. Report issues to cfsmp3 at gmail...
Second try:
cvlc -I dummy gsd.mpg --sout "#standard{access=file,mux=ts,dst=gsd_b.mpg}"
lzzz@ideiatu:~/Downloads/ccextractor.0.64/linux$ ./ccextractor gsd_b.mpg
CCExtractor 0.64, Carlos Fernandez Sanz, Volker Quetschke.
--------------------------------------------------------------------------
Input: gsd_b.mpg
[Raw Mode: Broadcast] [Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: No]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: Latin-1] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
Teletext page: Autodetect]
Start credits text: [None]
Creating gsd_b.srt
-----------------------------------------------------------------
Opening file: gsd_b.mpg
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
Decode captions from MPEG-2 video stream [0x02] - PID: 68
New PID found: 68
New video information found
[720 * 480] [AR: 02 - 4:3] [FR: 03 - 25] [progressive: yes]
100% | 00:00
Number of NAL_type_7: 0
Number of VCL_HRD: 0
Number of NAL HRD: 0
Number of jump-in-frames: 0
Number of num_unexpected_sei_length: 0
Total frames time: 00:01:41:040 (2526 frames at 25.00fps)
Min PTS: 02:59:52:437
Max PTS: 02:59:52:677
Length: 00:00:00:240
Initial GOP time: 00:00:00:000
Final GOP time: 00:01:40:800 +6F
Diff. GOP length: 00:01:40:800 +6F (00:01:41:000)
Done, processing time = 0 seconds
This is beta software. Report issues to cfsmp3 at gmail...
The hardcoded subtitles, refer to the subtitles that have been burnt or embedded into the video image, which will appear on the video from beginning to end and cannot be manually turned on or off. The only method of extracting hardcoded subtitles is resort to optical character recognition (OCR) technology.
Extract your subtitle file: Subtitle files will download in ZIP folders, but you can remove the subtitle file itself by doing the following: Windows — Double-click the ZIP folder, click Extract at the top of the window, click Extract all, and click Extract at the bottom of the window that appears.
Some movies don't have a hidden file containing the captions, but the subtitles are hardcoded into the video, meaning they are actually part of the video and cannot be distinguished.
You can try to google for a standalone version of subtitles for a movie.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With