Run task AnswerDistributionToMySQLTask failure in analytics-pipeline

Hi all,

I’m working analytics-pipeline and get the error after run this task:

launch-task AnswerDistributionToMySQLTaskWorkflow --local-scheduler --remote-log-level DEBUG --include '"[\"*tracking.log*\"]"' --src '"[\"hdfs://localhost:9000/data\"]"' --dest '"[\"/tmp/answer_dist\"]"' --mapreduce-engine local --name test_task

the result:

ProblemCheckEvent(name=test_task, src=[“[”, “"”, “h”, “d”, “f”, “s”, “:”, “/”, “/”, “l”, “o”, “c”, “a”, “l”, “h”, “o”, “s”, “t”, “:”, “9”, “0”, “0”, “0”, “/”, “d”, “a”, “t”, “a”, “"”, “]”], dest=“["/tmp/answer_dist"]”, include=[“[”, “"”, “", “t”, “r”, “a”, “c”, “k”, “i”, “n”, “g”, “.”, “l”, “o”, “g”, "”, “"”, “]”], manifest=None)
Traceback (most recent call last):
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/worker.py”, line 194, in run
new_deps = self._run_get_new_deps()
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/worker.py”, line 131, in _run_get_new_deps
task_gen = self.task.run()
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/contrib/hadoop.py”, line 781, in run
self.job_runner().run_job(self)
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/contrib/hadoop.py”, line 683, in run_job
for i in luigi.task.flatten(job.input_hadoop()):
File “/home/testing/edx-analytics-pipeline/edx/analytics/tasks/common/mapreduce.py”, line 134, in input_hadoop
return convert_to_manifest_input_if_necessary(self.manifest_id, super(MapReduceJobTask, self).input_hadoop())
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/contrib/hadoop.py”, line 796, in input_hadoop
return luigi.task.getpaths(self.requires_hadoop())
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/task.py”, line 819, in getpaths
return struct.output()
File “/home/testing/edx-analytics-pipeline/edx/analytics/tasks/common/pathutil.py”, line 104, in output
return [task.output() for task in self.requires()]
File “/home/testing/edx-analytics-pipeline/edx/analytics/tasks/common/pathutil.py”, line 78, in generate_file_list
yield ExternalURL(filepath)
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/task_register.py”, line 99, in call
h[k] = instantiate()
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/task_register.py”, line 80, in instantiate
return super(Register, cls).call(*args, **kwargs)
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/task.py”, line 436, in init
self.task_id = task_id_str(self.get_task_family(), self.to_str_params(only_significant=True))
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/task.py”, line 480, in to_str_params
params_str[param_name] = params[param_name].serialize(param_value)
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/parameter.py”, line 255, in serialize
return str(x)
UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\u0151’ in position 63: ordinal not in range(128)
INFO 8254 [luigi-interface] worker.py:501 - Informed scheduler that task ProblemCheckEvent______tmp_answer___________________None_f1b13602f3 has status FAILED
INFO 8254 [luigi-interface] worker.py:401 - Worker Worker(salt=398503439, workers=1, host=testing-virtual-machine, username=root, pid=8254, sudo_user=testing) was stopped. Shutting down Keep-Alive thread
INFO 8254 [luigi-interface] interface.py:208 -
===== Luigi Execution Summary =====
Scheduled 5 tasks of which:

  • 2 present dependencies were encountered:
    • 1 ExternalURL(url=/home/testing/edx-analytics-pipeline/mysql_creds.json)
    • 1 PathSetTask(…)
  • 1 failed:
    • 1 ProblemCheckEvent(…)
  • 2 were left pending, among these:
    • 2 had failed dependencies:
      • 1 AnswerDistributionPerCourse(…)
      • 1 AnswerDistributionToMySQLTaskWorkflow(…)

This progress looks :frowning: because there were failed tasks

The error is:

UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\u0151’ in position 63: ordinal not in range(128)

and I reference the article Analytics pipeline: Failed to run task AnswerDistributionWorkflow but i can not fix.

Please help me to resolve this problem. Thanks!

Hi @Henry ,

Take a look error message at
File “/home/testing/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/parameter.py”, line 255, in serialize
return str(x)

You have to insert a code to print which value is. In my experiences, file names in your local machine are used ascii character. In my case, I have rename it and it works as expected.

Hope you can fix it soon. Good luck

Thanks @Nguyen_Truong_Thin for your reply,
Follow your guide, I fixed above error and get another error:

Traceback (most recent call last):
File “/edx/app/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/worker.py”, line 194, in run
new_deps = self._run_get_new_deps()
File “/edx/app/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/worker.py”, line 131, in _run_get_new_deps
task_gen = self.task.run()
File “/edx/app/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/contrib/hadoop.py”, line 781, in run
self.job_runner().run_job(self)
File “/edx/app/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/contrib/hadoop.py”, line 525, in run_job
subprocess.call(run_cmd)
File “/usr/lib/python2.7/subprocess.py”, line 172, in call
return Popen(*popenargs, **kwargs).wait()
File “/usr/lib/python2.7/subprocess.py”, line 394, in init
errread, errwrite)
File “/usr/lib/python2.7/subprocess.py”, line 1047, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

I also refer Analytics pipeline: Failed to run task AnswerDistributionWorkflow - #2 by Nguyen_Truong_Thin and @Nguyen_Truong_Thin mention the error may be related to tracking.log file in this article

my tracking.log file located at

Found 1 items
-rw-r--r--   1 hadoop supergroup     119801 2021-09-15 19:53 /data/tracking.log

my override.cfg

[event-logs]
pattern = [".*tracking.log.*"]
source = hdfs://localhost:9000/data/
expand_interval = 30 days

my command execute task

launch-task AnswerDistributionToMySQLTaskWorkflow
–local-scheduler
–remote-log-level DEBUG
–include ‘“.tracking.log.”’
–src ‘“hdfs://localhost:9000/data/”’
–dest ‘“hdfs://localhost:9000/tmp/answer_dist”’
–n-reduce-tasks 1
–name test_task

I still stuck in above error. Thanks!

Hi @Henry ,

Take a look the error message: No such file or directory
That meant the configuration was incorrect or the file /data/tracking.log didn’t exist in hdfs

You have shown me the file has already existed, so the issue will be now configuration. But what configuration?

We can see 2 places that need to check

  1. event-logs in override.cfg
    In my configuration was
[event-logs]
pattern = [".*?.log.*"]
source = ["hdfs://localhost:9000/data/logs/tracking/"]
expand_interval = 2 days
  1. the command task AnswerDistributionToMySQLTaskWorkflow at my end was:
launch-task AnswerDistributionToMySQLTaskWorkflow \
    --local-scheduler \
    --remote-log-level DEBUG
    --include '"[\".*?.log-.*\"]"' \
    --src '"[\"hdfs://localhost:9000/data/\"]"' \
    --dest /tmp/answer_dist \
    --n-reduce-tasks 1 \
    --name test_task

To work with Insights, let be patient.
Hope you can fix it