John Harding - Insert Catchy Title Here

Insert witty phrase/saying here.

Saturday, November 08, 2008

Subtitles on AppleTV and iPhone

I've been trying to keep an eye on the various discussions around getting AppleTV and the iPhone to display subtitles - some of the content in the iTunes store has subtitles, but Apple has no published documentation about how to add your own. It's kind of a pain to track, as there are a lot of older articles discussing burning subtitles into the video itself. This has the nice advantage of working directly off the bitmap subtitles ripped from DVDs, but makes it awkward to share a single video file between AppleTV and iPhone. It also means you can't turn them off, and if you use the zoom feature on the iPhone, you've got to be extra careful where you position them.

There are a few mentions of the fact that the subtitle tracks are basically just standard 3GP timed text tracks, but with the 'hdlr' atom handler_type set to 'sbtl' intsead of 'text'. MP4Box has support for 3GP timed text:
mp4box -add subtitle_track.ttxt video.mp4
They use their own file format for the timed text source, but can covert from SRT or SUB.

What's bizarre is that quicktime, AppleTV, and iTunes all behave slightly differently and are more or less sensitive to different aspects.













File Extensiontext vs. stblSet LanguageQuicktimeiPhoneAppleTV
mp4textNoIgnoredIgnoredIgnored
m4vtextNoDisplayed, regardless of subtitle/caption settingIgnoredIgnored
mp4textlang=enIgnoredIgnoredIgnored
m4vtextlang=enDisplayed, regardless of subtitle/caption setting. Different font than without language set.IgnoredIgnored
mp4sbtlNoIgnoredDisplayed, with subtitle control. Language shows as "undetermined"Ignored
m4vsbtlNoDisplayed, with subtitle control. Text not sized appropriately.Displayed, with subtitle control. Language shows as "undetermined"Not displayed. Can bring up Subtitle menu, but language shows as a second "Off" item, with both checked. Selecting has no effect.
mp4sbtllang=enIgnoredDisplayed, with subtitle control. Language shows as "English"Ignored
m4vsbtllang=enDisplayed, with subtitle control. Text not sized appropriately.Displayed, with subtitle control. Language shows as "English"Displayed, with subtitle control. Language shows as "English"

Basically:
Quicktime only works with .m4v file extension, does not require a language to be set, and handles 'text' differently than 'sbtl'
AppleTV only works with .m4v file extension, requires a language to be set, and only works with 'sbtl'
iPhone works with both .mp4 and .m4v file extensions, does not require a language to be set, and only works with 'sbtl'

I think what's happening with Quicktime is that it's attempting to use positioning information, of which I haven't specified any. Will look at that next to see if it's possible to make something that plays properly in all 3 cases.

Note: The still frames for the text/m4v files show the caption on the iPhone, even though it doesn't display during playback. I assume this is because iTunes generates the still frame.

The linked post indicates that you need to change the alternate_group in the subtitle's tkhd atom from 0x0000 to 0x0002. My testing did not show any change in handling with or without this modification in Quicktime, AppleTV, iPhone (firmware 2.1), or iPod Touch (firmware 2.1)

I had to force a software update on my AppleTV to get it to handle the files properly, even though I was already on Take 2. The symptom was that holding the "play" button would blank the screen. It worked fine after update.

Update:
Things work slightly better if you convert the SRT file to ttxt first:
mp4box -ttxt subtitles.srt
It appears that if you just add an SRT file to a video, mp4box makes the text box for the subtitles the same size as the video. So apparently, iPhone and AppleTV ignore the placement, while Quicktime respects it. So what you need to do is:
  • Convert from SRT to ttxt as listed
  • Modify the <TextStreamHeader> element in the resulting ttxt file: width=[width of video] height=[20% of video height] translation_x=0 translation_y=[video height - height]
Quicktime appears to disregard the font size, and instead adjusts the font based on the height of the displayed text area - enough to fit 2 lines of text. This means the amount of text you can fit is basically a function of the width of the text display.