Run task AnswerDistributionToMySQLTask failure in analytics-pipeline

Hi all,

I’m working analytics-pipeline and get the error after run this task:

launch-task AnswerDistributionToMySQLTaskWorkflow --local-scheduler --remote-log-level DEBUG --include '"[\"*tracking.log*\"]"' --src '"[\"hdfs://localhost:9000/data\"]"' --dest '"[\"/tmp/answer_dist\"]"' --mapreduce-engine local --name test_task

the result:

ProblemCheckEvent(name=test_task, src=["[", “”", “h”, “d”, “f”, “s”, “:”, “/”, “/”, “l”, “o”, “c”, “a”, “l”, “h”, “o”, “s”, “t”, “:”, “9”, “0”, “0”, “0”, “/”, “d”, “a”, “t”, “a”, “”", “]”], dest="["/tmp/answer_dist"]", include=["[", “”", “", “t”, “r”, “a”, “c”, “k”, “i”, “n”, “g”, “.”, “l”, “o”, “g”, "”, “”", “]”], manifest=None)
Traceback (most recent call last):
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/worker.py”, line 194, in run
new_deps = self._run_get_new_deps()
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/worker.py”, line 131, in _run_get_new_deps
task_gen = self.task.run()
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/contrib/hadoop.py”, line 781, in run
self.job_runner().run_job(self)
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/contrib/hadoop.py”, line 683, in run_job
for i in luigi.task.flatten(job.input_hadoop()):
File “/home/testing/edx-analytics-pipeline/edx/analytics/tasks/common/mapreduce.py”, line 134, in input_hadoop
return convert_to_manifest_input_if_necessary(self.manifest_id, super(MapReduceJobTask, self).input_hadoop())
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/contrib/hadoop.py”, line 796, in input_hadoop
return luigi.task.getpaths(self.requires_hadoop())
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/task.py”, line 819, in getpaths
return struct.output()
File “/home/testing/edx-analytics-pipeline/edx/analytics/tasks/common/pathutil.py”, line 104, in output
return [task.output() for task in self.requires()]
File “/home/testing/edx-analytics-pipeline/edx/analytics/tasks/common/pathutil.py”, line 78, in generate_file_list
yield ExternalURL(filepath)
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/task_register.py”, line 99, in call
h[k] = instantiate()
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/task_register.py”, line 80, in instantiate
return super(Register, cls).call(*args, **kwargs)
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/task.py”, line 436, in init
self.task_id = task_id_str(self.get_task_family(), self.to_str_params(only_significant=True))
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/task.py”, line 480, in to_str_params
params_str[param_name] = params[param_name].serialize(param_value)
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/parameter.py”, line 255, in serialize
return str(x)
UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\u0151’ in position 63: ordinal not in range(128)
INFO 8254 [luigi-interface] worker.py:501 - Informed scheduler that task ProblemCheckEvent______tmp_answer___________________None_f1b13602f3 has status FAILED
INFO 8254 [luigi-interface] worker.py:401 - Worker Worker(salt=398503439, workers=1, host=testing-virtual-machine, username=root, pid=8254, sudo_user=testing) was stopped. Shutting down Keep-Alive thread
INFO 8254 [luigi-interface] interface.py:208 -
===== Luigi Execution Summary =====
Scheduled 5 tasks of which:

  • 2 present dependencies were encountered:
    • 1 ExternalURL(url=/home/testing/edx-analytics-pipeline/mysql_creds.json)
    • 1 PathSetTask(…)
  • 1 failed:
    • 1 ProblemCheckEvent(…)
  • 2 were left pending, among these:
    • 2 had failed dependencies:
      • 1 AnswerDistributionPerCourse(…)
      • 1 AnswerDistributionToMySQLTaskWorkflow(…)

This progress looks :frowning: because there were failed tasks

The error is:

UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\u0151’ in position 63: ordinal not in range(128)

and I reference the article Analytics pipeline: Failed to run task AnswerDistributionWorkflow but i can not fix.

Please help me to resolve this problem. Thanks!

Hi @Henry ,

Take a look error message at
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/parameter.py”, line 255, in serialize
return str(x)

You have to insert a code to print which value is. In my experiences, file names in your local machine are used ascii character. In my case, I have rename it and it works as expected.

Hope you can fix it soon. Good luck