Unable to get Learner Analytics up and running

Hello! Hope you all are alright in this strange times

I’ve been trying to get up and running OpenEdx Insights, but it gets tough. We recently (2 months ago) manage to get all Analytics tasks running in a development environment. During the summer we tried to get it in production and in fact we have it all but Learner Analytics.

In Insights it looks like the app cannot match students data with the course, it says literally “No student information is compatible with your course”:

In MYSQL the table module_engagement is filled with student data, so that’s something:
image

In Elastic Search anyway there is no data:

However, when running Learner Analytics pipeline tasks we encounter with this problem “No module named mechanize”:

2020-09-07 18:56:22,572 ERROR 127896 [luigi-interface] worker.py:213 - [pid 127896] Worker Worker(salt=014115435, workers=1, host=analytics-openedx.ti.uam.es, username=hadoop, pid=127896, sudo_user=root) failed    ModuleEngagementRosterIndexTask(source=["hdfs://localhost:9000/data/"], expand_interval=0 w 2 d 0 h 0 m 0 s, pattern=[".*tracking.log.*"], date_pattern=%Y%m%d, warehouse_path=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, host=["http://localhost:9200/"], date=2020-09-02, obfuscate=False, scale_factor=1, alias=roster, number_of_shards=5)
Traceback (most recent call last):
  File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/worker.py", line 194, in run
    new_deps = self._run_get_new_deps()
  File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/worker.py", line 131, in _run_get_new_deps
    task_gen = self.task.run()
  File "/var/lib/analytics-tasks/analyticstack/venv/local/lib/python2.7/site-packages/edx/analytics/tasks/common/elasticsearch_load.py", line 414, in run
    super(ElasticsearchIndexTask, self).run()
  File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/contrib/hadoop.py", line 781, in run
    self.job_runner().run_job(self)
  File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/contrib/hadoop.py", line 622, in run_job
    run_and_track_hadoop_job(arglist, tracking_url_callback=job.set_tracking_url)
  File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/contrib/hadoop.py", line 390, in run_and_track_hadoop_job
    return track_process(arglist, tracking_url_callback, env)
  File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/contrib/hadoop.py", line 380, in track_process
    (tracking_url, e), out, err)
HadoopJobError: Streaming job failed with exit code 1. Additionally, an error occurred when fetching data from http://analytics-openedx.ti.uam.es:8088/proxy/application_1598962284049_0248/: No module named mechanize

Attached is edx_analytics.log.zip (273.4 KB) with full log.

Anyway, it seems to be something on configuration, but not sure what’s missing.

Thanks for any help in advance!

Hi @Yago, I am also facing the same error as you, could you please share your solution, thanks in advance