MongoDB - Find Block Types in Modulestore

I’m trying to figure out how I can search the MongoDB openedx collections and traverse my way up the parents (vertical → sequential → chapter → course) from a given html block_type that I found in the modulestore.definitions collection.

For this example I’ve located the html block_type of _id: ObjectId('66ba230533475de206c3a260') using this query.

{
block_type: "html", 
"fields.data": RegExp(/The video below shows the simulation of a differential drive robot implementing a line-following controller in CoppeliaSim using Python./)
}

Searching the modulestore.structures collection for the block.definition it comes back with timeout. Am I doing this correctly to find an object with this html block in it?

{
"blocks.block_type": "html",
 "blocks.definition": ObjectId('66ba230533475de206c3a260')
}

cc @dave @braden

Hi @Zachary_Trabookis!

If I understood well, we’ve done this for Panorama to extract the course structures in tabular form from MongoDB. We first extracted all the blocks, and then implemented a recurrent function to find all the parents of each one. Take a look at the code, it might help you work it out.

1 Like

@Zachary_Trabookis: You have the data relationship correct, so the query you’re talking about will work in that sense. But it’s going to be extremely expensive, since it’s going to full-scan every structure document, which means you’re looking at every historical version of every course in order to see which ones have a pointer to that definition. And it’s possible that many, many of them will. If you added that three months ago to a course, then every unpruned version of that course since that time will have a pointer to it. If you re-ran a course that had that, both the old and new runs have references to it. If it was a in a v1 library and you used it in a course with a LibraryContentBlock, then both the library and borrowing course will have references.

Do you mind stepping back a bit and explaining what your end goal is? There might be easier ways to go about it.

@dave
We’re trying to figure out what course this html content resides in. When we went to search for it manually in the CMS course where the content should be included, we noticed that the content had disappeared. I wasn’t editing the course, so I was in charge to see if maybe the course uploader had put the content in a different course by mistake or deleted it.

They were last editing the course at the beginning of August 2024, so maybe we could limit the search to edited_on to a date range. I believe this still won’t work because it’s checking the values of edited_on for all documents like you mentioned.

  "blocks.edit_info.edited_on": { 
      $gte: ISODate("2024-08-01T00:00:00.000Z"),
      $lt: ISODate("2024-08-11T00:00:00.000Z"),
   }

I believe there are some ways you can optimize this, though I don’t remember the specific functions/syntax. I find myself re-reading through the MongoDB docs literally every time I want to query MongoDB for anything.

The ObjectIDs that MongoDB create have a timestamp encoded into them. Structure documents are never modified after they are written, except as part of the pruning process. So if you know the date range, you should be able to craft a range query for your ObjectIds, which will be indexed and won’t require inspecting the actual contents to narrow down the records.

Also, limiting your query to stop at the first hit might help, if you’re not doing that already.

1 Like