How to obtain course content data

Hello,

I was wondering if anyone could advise us about the best way to obtain course content data to be used in a machine learning project. Specifically, we would like to obtain the raw HTML files of the units (vertical blocks) that are displayed in the course content iFrame.

We have attempted to use the Course Blocks REST API but are not finding the raw unit text there.

Thank you!

@dave @BrianMesick

Hi Courtney,

You should be able to get the URL pieces for all of the vertical blocks from the Course Blocks API, you can then visit those links with an authenticated user to get the rendered HTML. The final links should look like this:

http://your.site/xblock/block-v1:OpenedX+DemoX+DemoCourse+type@vertical+block@ef7db790b1964645aee3b6ea9df76be7

I can’t recall if the API gives you the full link or just the block id, if it’s the latter you may need to prepend everything before block-v1 but it should be doable.