I have a question regarding the data in the data packages that are delivered to edX partners in .mongo files, in specific the shared field _id. I already asked partner support this, but they pointed me towards this community, so I hope you can help!
In the edX documentation it states the following:
The 12-byte MongoDB unique ID for this collection. Like all MongoDB IDs, the IDs are monotonically increasing and the first four bytes are a timestamp.
We are currently setting up our own mongo DB for these files, and would like to avoid any duplicate events. We tested a few files and tried to upload duplicates into our database, but it seems the unique ID is generated when storing the event, it isn’t already present in the files we receive.
Worry we might get duplicates events if an export we get also contains events we already have stored in the database. Receiving new data packages with more data than we asked for/ data we already had is something that has happened in the past, hence the question.
So the question is: what can we do to prevent duplicate events in our database? We were thinking along the lines of composing our own unique ID with the user_id and timestamp, but are open to other/better suggestions!
I hope our question is clear, but let me know if you need more information!