Is there any way to host subtitles in a git repository for easier synchronization?

Jimmy_Wu · August 15, 2021, 8:35pm

I spent 3.5 hours today to update 79 subtitle files! There’s got to be a better way!!!

I think there’s some sort of S3-like data storage infrastructure support for open edx right? Is there any way to perhaps store subtitles in a checked out repository on this S3 storage and then point videos at that, so that subtitles can be updated just by a git pull? I’m really open to any other sort of creative ways to make this easier, because this is going to break our backs in terms of support as we grow…

arbrandes · August 16, 2021, 1:28pm

Since you mention git, in the past I’ve worked around many Studio bottlenecks such as the one you describe by authoring entire courses directly in OLX, version-controlling the source in git, then simply reimporting the course when needed. This works nicely because reimporting a course will basically just overwrite everything.

There even used to be a graphical way to point a course to git repository and reimport it straight from there in the “system dashboard”, but the dashboard itself is gone now (plus, it never really worked very well). But there is an API to do imports. At the time, we wrote olx-utils to do it remotely, as well as allowing some DRYing of the XML with a template engine.

Addendum: you could configure a git hook to publish a course whenever there is a git push, effectively turning publishing courses into a continous deployment flow.

Addendum 2: you can check if any of this would work for you by exporting a course into a tarball from Studio, extracting the OLX, making the changes you want (presumably to subtitles), tarballing everything back, then reimporting. If you get the desired results, then this would warrant further investigation.

Jimmy_Wu · October 15, 2021, 2:36pm

This is an interesting idea. I will look into it.

Jimmy_Wu · October 15, 2021, 7:14pm

At least to a first approximation, this doesn’t seem to work. I exported a class. Modified a subtitle file (course/static/0eabdd4c-5be9-4876-88e6-75d4384696dc-en.srt) so that it had a clear change in the first line of the subtitles. Tar.gz’ed it back up. Imported the class. Went and looked at the subtitles for the video I modified, but they did not contain the modification.

I didn’t really understand what you meant by DRYing the XML, so do I need to poke some other timestamps or files or something to maybe make the import process think the thing I’m uploading is new (because maybe it looked entirely the same from an XML perspective so it decided there were no changes needed?)

Rohan · April 7, 2022, 2:36pm

@arbrandes now that I learned that youtube transcript import is broken for the forseeable future, I went looking for other ways to deal with transcript importing. I observed the same thing as @Jimmy_Wu, but then I did another test:

First I created a completely different class, edited the file in place, and then uploaded the tar.gz to the 2nd class. But it showed no change. Suspicious that this might have to do with both classes using the same video ID, I made a change in the 1st class’s transcript and refreshed the second class, and indeed it changed too. This showed that video IDs are treated as unique but also shared between classes, and thus need to be changed between classes.

I went and did a find and replace on the unique ID used in the .srt file (0eabdd4c-5be9-4876-88e6-75d4384696dc in Jimmy’s example above would also correspond to the video ID) I found that if I changed the file name from 0eabdd4c-5be9-4876-88e6-75d4384696dc-en.srt to e.g. foobar-en.srt, and then changed 0eabdd4c-5be9-4876-88e6-75d4384696dc everywhere to foobar (it seems to show up only in a file like course/vertical/b63cdf2a9bde418db24e426823344516.xml), then the first time I uploaded it over a test course, it seems to correctly get its own transcript file. However, if I then make another edit to that foobar-en.srt file, and import the updated tar.gz over the existing 2nd class content, the change does not appear.

This suggests to me that the import process is doing some sort of “does this video ID already exist? If so, don’t update it” type check (which explains both Jimmy’s observations, and mine). Do you know if this is accurate?

(If so, it would suggest in order to store things in a repository I’d also need some script which would re-randomize all video IDs before import. That wouldn’t work for me, since I need a human-readable name for transcript files like foobar-en.srt so that people can know which one corresponds to the transcript they want to fix. Also, deleting and recreating the class on every transcript update wouldn’t be viable, because I’d lose all the forum posts & grades etc)

Topic		Replies	Views
Transcripts are tied to eachother Educators	2	344	August 5, 2022
Do you use Git to version course content? Kindly provide some inputs Authoring	8	750	May 19, 2022
Extra video subtitle transcripts are retained in a class (as evidenced upon course export) Site Operators nutmeg	0	228	February 26, 2023
Course import issue from ironwood to juniper Site Operations Help how-to , juniper	13	953	December 15, 2020
Hoping for ability to import/sync Markdown (.md) from GitHub into new/existing courses Authoring	8	1822	February 2, 2022

Is there any way to host subtitles in a git repository for easier synchronization?

Related topics