Library import: PDF to library.xml

OlenaSaf · January 30, 2025, 1:16am

Hi, I have PDF files (400+ pages) to be imported to the library. Is there any simple way or automatized solution to convert it to library.xml in proper format?

braden · February 3, 2025, 11:44pm

There are many ways to do that, but I wouldn’t call it “simple”.

First, you need to convert each of the PDFs to text. If it’s mostly plain text, you could write a python script that uses a PDF library to extract text and then convert it to OLX. If the documents have complex structure and/or images, you will likely want to use Python (or any type of script / batch processing tool) to convert the PDFs to images, and then write a script that uses a multi-modal LLM to convert the images to OLX. To get a good result, you’ll have to do the first few manually and include them in your prompt as examples (“few-shot prompting”).

Then, you need to import that OLX into your Open edX instance. For legacy libraries, you can create a .tar.gz file in the correct format, using an exported library as an example. Legacy libraries don’t support static asset files like images though. For the newest (Sumac) version of Open edX, the new libraries feature doesn’t yet have import/export support, but it does have a REST API and/or Python API that you can use to import each component and its associated asset files like images.

Topic		Replies	Views
Is there export and import of libraries (AI Library Builder)? Architecture how-to , tutor , sumac	3	90	September 29, 2025
Bulk Problem Importer to Unit Page Instructional Design	3	353	May 26, 2024
Batch question entry to the library Educators how-to	5	723	April 6, 2021
How to Import multiple Courses/Libraries? Authoring	20	2923	May 7, 2022
How to create the courses of openedx using data of excel sheet or any anther docs Tutor Help how-to , devstack , tutor , olive	2	623	March 20, 2023

Library import: PDF to library.xml

Related topics