A proof-of-concept for creating a digital representation of a CTA-608 (CEA/EIA-608) waveform, aka "analog line-21 captions" encoded in the video signal.
Warning
This is NOT reference code. This has only been tested with FFmpeg's readeia608, which is not a Line 21 reference decoder.
I could not find an open-source analog line-21 encoder out there. I don't know why I really needed an analog line-21 encoder, but there wasn't one.
There's probably good reason why an open-source analog line-21 encoder does not exist... A VBI would not typically be visible to a digital user; nor would the HBI. These concepts do not appear in digital. But I felt that FFmpeg's readeia608 decoder was missing an analog line-21 encoder companion.
Ok, sometimes you will see the analog line-21 waveform when digitizing a source into a full-frame 486 picture, but the VBI should not really appear in a digital 480 picture. The team over at vhs-decode do care about the VBI and preserving the full frame, including VBI and HBI, but the average user does not need analog line-21 captions. But if you have found this script, you probably know all that already.
This repo could be used as a basis for:
- Enhancing or correcting a vhs-decode.
- Retro-style video creation.
- Creating analog line-21 captions on the top-row of a SVCD.
This script just takes an simple hardcoded array of CTA-608 two-byte words (the same as used in SCC) and encodes those two-byte words into a series of sequential video frames, to simulate Line-21 analog closed captions. It is a badly written proof of concept.
- You can't set the timings.
- You can't pipe stuff in. It is a proof of concept.
- You can't send the script an SCC file. It is not an SCC convertor, processor or encoder.
- This script does not deal with DTVCCs, MPEG-2 Picture User Data, H.264 SEI side data. It is for analog line-21s. Analog line-21s predate digital storage and transmission such as DTVCCs, which were subsequent mechanisms & technologies to transmit those same bytes through MPEG headers. If you have found this repository looking for the inclusion of CTA-608 in MPEG Headers (aka 608-compatibility-bytes), see projects such as CC_MUX (DVD) and libcaption (DTVCC), referenced later in this file.
The Line 21s in this script are probably in the wrong place, when compared to real-world implementations. But the script can be tweaked to change the height of the pseudo-VBI. If you want a pseudo-VBI of six lines, adjust as you see fit.
It gets a little tricky between digital vertical lines and the analog-scanline naming conventions. I make no apologies. The common locations of various similar payloads in VBI were...
field1 | field2 | data |
---|---|---|
14 | 277 | VITC |
16 | 279 | VITC Backup |
21 | 284 | EIA-608 Line 21 |
22 | 285 | Wide-screen Signaling |
This naive script willingly ignores inconveniences like interlaced frames, setup/pedestal, IRE, sine-waves, colorrange, colorspace and plenty more. It is "ones and zeros, baby". The pixels are either 0 or 255, but the script can be modified to produce 1-254 (SDI legal), or 16-240 (limited), or any value of intensity to simulate setup/pedestal when combined with an HBI.
Analog horizontal scanlines do not operate in a currency of pixels, they operate in time. Fortunately, in Table 2, Line 21 Waveform Timing of the EIA-608 Spec (now free to access), it lists timings relative to the timing of the Data Bit 'D' within the Start Bits.
Section | Timing |
---|---|
Clock Run-In (B) | 6.5 * D (but lets use 7 * D) |
Clock Run-In to Third Start Bit (C) | 2 * D |
Data Bit (D) | 1 * D |
Data Characters (E) | 16 * D |
That gets us to 26 * D. Since Clock Run-In needs to be a sinusoidal wave of freqency D to produce 01010101010101
, that needs 14 pixels. Thus D need to be a minimum of 2 pixels.
In "The Closed Captioning Handbook", Gary Robson writes "The Clock Run-In signal is 7 full cycles of a 0.5034965 MHz sine wave centered around the 25 IRE level, lasting 12.91 ps." Robson was a member of the 608 and 708 working group, so we'll use 7*D as the Clock Run-In.
Section | Pixels (D = 2 pixels) |
---|---|
Clock Run-In (B) | 7 * D = 14 pixels (could get away with 2 * 6.5 = 13) |
Clock Run-In to Third Start Bit (C) | 2 * D = 4 pixels |
Data Bit (D) | 1 * D = 2 pixels |
Data Characters (E) | 16 * D = 32 pixels |
You could probably get away with just a total of 51 pixels minimum if you use 1010101010101
as the Clock Run-In(B) of 6.5 * D = 6.5 * 2 pixels = 13 pixels. 13+4+2+32 = absolute bare minimum of 51 pixels. But that's a nasty number. So we'll use a full 01010101010101
as the run-in giving us a minimum of 52.
Once the 52 pixel waveform is encoded as an image, the script scales that to scanline of 640 square pixels wide. It is a bit irritating that the payload ends up as 52 horizontal width is mathematically imperfect when placed on a 640 square pixel scanline (but given that analog line-21 608s rely on sine waves, it is within the realms of error).
- There must be a more elegant way of pre & post padding to a harmonic of 640 (64?), so that it scales pixel-perfect to 640.
- 51 is not much better than 52. And decoders may not expect the scanline to start immediately on a 1.
It is expected that a user will encode the 640x1 scanline into non-square pixels.
I understand that Python PIL/pillow is RGB only. Of course, it would be preferable to operate in YUV (yuvio? imageio?). But "if all you got is a hammer, the whole world looks like a nail".
PIL it shall be, we'll let FFmpeg take the pain of color formats.
I'm not a developer. I don't speak Python. This is my first Python script. I don't know Python best practice nor any naming or coding conventions. If there is an error, reader, you'll be better qualified than I to fix. The code will probably run faster with pypy3 rather than python3.
Yeah, there is a bit of duplication in the code. If anyone wants to produce a function to dedupe the commands for field 1/2, they are more than welcome.
There is no dependency checking, no error checking, no type checking, everything is a string (rather than Python3 integers in 'bytes'). The script is just a series of commands.
The four magic documents that explain how to construct 608 pairs are...
- Title-47, from The Man.
- CTA Line 21 Data Services (ANSI/CTA-608-E S-2019). Now freely-available from CTA. Contains useful implementation details, including extended chararacter sets which are not covered in Title-47.
- The Closed Captioning Handbook, ROBSON, Gary D (Elsevier). Note: this hard-to-find book is an excellent reference. Robson was involved in the development of both 608 and 708. This book is a great companion to Title-47 and CTA-608-E.
- McPoodle's CC codes and CC characters. Credit to McPoodle for reverse-engineering SCCs before "accessibility-standards became accessible to the public". It should be cross-referenced to Title-47 and CTA-608-E, and should not be considered reference material (since it is reverse engineered), but is a useful quick guide. It may contain implementation errors.
Other useful material...
- FFmpeg's readeia608 documentation and source code. Credit: Paul.
- sccyou. A bash script for converting line-21s to SCC. Credit: Dave Rice & Paul.
- The History of Closed Captions by Chris Lott.
- Digital Video and HD, Algorithms & Interfaces. POYNTON, Charles. (Morgan Kaufmann)
- libzvbi
- A blog on Decoding Closed Captioning
- Closed Captioning: More Ingenious than You Know
... and SCTE, SMPTE, ATSC, the good people at WGBH, the National Captioning Institute TeleCaption I, II, 3000, 4000 & VR-100 devices. The crew at ld/vhs-decode. And, of course, Team FFmpeg.
Although this proof-of-concept does not aim to deal with the digital representation of CTA-608 within MPEG2 Picture User Data nor H.264 SEI side-data, the following are useful resources on DTVCCs.
- ATSC 1.0 A/53
- CTA Digital Television Closed Captioning (ANSI/CTA-708-E S-2023), aka DTVCCs
- libcaption for inserting DTVCCs and examples for muxing 608-compatibility-bytes in H.264 SEI side-data. Credit: Matt Szatmary, formerly at Twitch, now over at mux.com. At the time of writing, there are no known, actively maintained, open-source projects that easily allow for DTVCCs in MPEG-2 Picture User Data, although it may be possible to modify the H.264 examples included in libcaption.
The SVCD Specification states that analog line-21 captions can be included on the top pixel row of an NTSC SVCD. An SVCD player would be expected to modulate this on line-21 of an analog output.
"V.3.3 Special Information in the MPEG video signal. If bit|3] of the Status Flags entry of the file INFO.SD is set to one, then the top pixel row of the MPEG picture can contain special information. In this case the top pixel row is intended to be displayed at line 21 of the video output signal for NTSC. This Special Information is used for Closed Caption in USA."
Until the DVD-Video Format Book is publicly released in early 2025, it is unclear whether the DVD-Video specification supports analog line-21 captions on a video scanline, similar to SVCD. It is considered factual that NTSC DVD-Video allow video signalling outside of an 720x480 NTSC frame, it seems that support for analog line-21s would be unlikely - unless a top-row workaround via a similar mechanism to SVCD is utilized.
- Where Closed Captioning is supported in DVD-Video, the data is stored as picture user data header and the line-21 output is regenerated/modulated on an analog output. See McPoodle's CC_MUX for a reverse-engineered interpretation of real-world implementations. In The Closed Captioning Handbook, Robson suggests that Closed Captioning support in DVD-Video was somewhat an afterthought. "At the last minute, support was thrown into the DVD specifications for embedded line 21 captioning... Unfortunately, not all commercial players support the line 21 captioning capability..."
- It is noted that although the mechanism for DVD-Video and ATSC 1.0 are similar, they differ in implementation. The DVD-Video specification pre-dates ATSC 1.0 by several years.
From https://github.com/zapping-vbi/zvbi. Output from this repository has not been tested against zvbi-ntsc-cc
$ zvbi-ntsc-cc -h
CCDecoder 0.13 -- Closed Caption and XDS decoder
Copyright (C) 2003-2007 Mike Baker, Mark K. Kim, Michael H. Schimek
<[email protected]>; Based on code by [email protected].
This program is licensed under GPL 2 or later. NO WARRANTIES.
Usage: zvbi-ntsc-cc [options]
Options:
-? | -h | --help | --usage Print this message and exit
-1 ... -4 | --cc1-file ... --cc4-file filename
Append caption channel CC1 ... CC4 to this file
-b | --no-webtv Do not print WebTV links
-c | --cc Print Closed Caption (includes WebTV)
-d | --device filename VBI device [/dev/vbi]
-f | --filter type[,type]* Select XDS info: all, call, desc, length,
network, rating, time, timecode, timezone,
title. Multiple -f options accumulate. [all]
-k | --keyword string Break caption line at this word (broken?).
Multiple -k options accumulate.
-l | --channel number Select caption channel 1 ... 4 [no filter]
-p | --plain-ascii Print plain ASCII, else insert VT.100 color,
italic and underline control codes
-r | --raw line-number Dump raw VBI data
-s | --sentences Decode caption by sentences
-v | --verbose Increase verbosity
-w | --window Open debugging window (with -r option)
-x | --xds Print XDS info
-C | --cc-file filename Append all caption to this file [stdout]
-R | --semi-raw Dump semi-raw VBI data (with -r option)
-X | --xds-file filename Append XDS info to this file [stdout]
Installation of zvbi-ntsc-cc
via homebrew, for macOS (and possibly linuxbrew?)...
$ brew info lescanauxdiscrets/tap/zvbi
==> lescanauxdiscrets/tap/zvbi: stable 0.2.35
http://zapping.sourceforge.net/
Installed
/opt/homebrew/Cellar/zvbi/0.2.35 (25 files, 1.4MB) *
Built from source on 2024-12-18 at 09:46:31
From: https://github.com/lescanauxdiscrets/homebrew-tap/blob/HEAD/Formula/zvbi.rb
- Rec. ITU-R BT.1119-2 Recommendation ITU-R BT.1119-2 Wide-screen Signalling for Broadcasting (Signalling for wide-screen and other enhanced television parameters).
- Wide Screen Signaling could be simulated in the same way, however encoding WSS in an NTSC 525 digital stream would need careful consideration, since in bit 7, NTSC WSS signals whether the frame is a reference frame. This would rely on either prior-knowledge (such as FFmpeg's
force_key_frames
) or a predictable reference frame cadence (such as a fixed-GOP/sub-GOP) prior to generation of the waveform. - vhs-decode's Wide Screen Signaling wiki