Hello! Hope you all are alright in this strange times
I’ve been trying to get up and running OpenEdx Insights, but it gets tough. We recently (2 months ago) manage to get all Analytics tasks running in a development environment. During the summer we tried to get it in production and in fact we have it all but Learner Analytics.
In Insights it looks like the app cannot match students data with the course, it says literally “No student information is compatible with your course”:
In MYSQL the table module_engagement is filled with student data, so that’s something:
In Elastic Search anyway there is no data:
However, when running Learner Analytics pipeline tasks we encounter with this problem “No module named mechanize”:
2020-09-07 18:56:22,572 ERROR 127896 [luigi-interface] worker.py:213 - [pid 127896] Worker Worker(salt=014115435, workers=1, host=analytics-openedx.ti.uam.es, username=hadoop, pid=127896, sudo_user=root) failed ModuleEngagementRosterIndexTask(source=["hdfs://localhost:9000/data/"], expand_interval=0 w 2 d 0 h 0 m 0 s, pattern=[".*tracking.log.*"], date_pattern=%Y%m%d, warehouse_path=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, host=["http://localhost:9200/"], date=2020-09-02, obfuscate=False, scale_factor=1, alias=roster, number_of_shards=5)
Traceback (most recent call last):
File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/worker.py", line 194, in run
new_deps = self._run_get_new_deps()
File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/worker.py", line 131, in _run_get_new_deps
task_gen = self.task.run()
File "/var/lib/analytics-tasks/analyticstack/venv/local/lib/python2.7/site-packages/edx/analytics/tasks/common/elasticsearch_load.py", line 414, in run
super(ElasticsearchIndexTask, self).run()
File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/contrib/hadoop.py", line 781, in run
self.job_runner().run_job(self)
File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/contrib/hadoop.py", line 622, in run_job
run_and_track_hadoop_job(arglist, tracking_url_callback=job.set_tracking_url)
File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/contrib/hadoop.py", line 390, in run_and_track_hadoop_job
return track_process(arglist, tracking_url_callback, env)
File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/contrib/hadoop.py", line 380, in track_process
(tracking_url, e), out, err)
HadoopJobError: Streaming job failed with exit code 1. Additionally, an error occurred when fetching data from http://analytics-openedx.ti.uam.es:8088/proxy/application_1598962284049_0248/: No module named mechanize
Attached is edx_analytics.log.zip (273.4 KB) with full log.
Anyway, it seems to be something on configuration, but not sure what’s missing.
Thanks for any help in advance!