Thanks @Nguyen_Truong_Thin for your reply,
Follow your guide, I fixed above error and get another error:
Traceback (most recent call last):
File “/edx/app/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/worker.py”, line 194, in run
new_deps = self._run_get_new_deps()
File “/edx/app/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/worker.py”, line 131, in _run_get_new_deps
task_gen = self.task.run()
File “/edx/app/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/contrib/hadoop.py”, line 781, in run
self.job_runner().run_job(self)
File “/edx/app/edx-analytics-pipeline/venvs/edx-analytics-pipeline/src/luigi/luigi/contrib/hadoop.py”, line 525, in run_job
subprocess.call(run_cmd)
File “/usr/lib/python2.7/subprocess.py”, line 172, in call
return Popen(*popenargs, **kwargs).wait()
File “/usr/lib/python2.7/subprocess.py”, line 394, in init
errread, errwrite)
File “/usr/lib/python2.7/subprocess.py”, line 1047, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
I also refer Analytics pipeline: Failed to run task AnswerDistributionWorkflow - #2 by Nguyen_Truong_Thin and @Nguyen_Truong_Thin mention the error may be related to tracking.log file in this article
my tracking.log file located at
Found 1 items
-rw-r--r-- 1 hadoop supergroup 119801 2021-09-15 19:53 /data/tracking.log
my override.cfg
[event-logs]
pattern = [".*tracking.log.*"]
source = hdfs://localhost:9000/data/
expand_interval = 30 days
my command execute task
launch-task AnswerDistributionToMySQLTaskWorkflow
–local-scheduler
–remote-log-level DEBUG
–include ‘“.tracking.log.”’
–src ‘“hdfs://localhost:9000/data/”’
–dest ‘“hdfs://localhost:9000/tmp/answer_dist”’
–n-reduce-tasks 1
–name test_task
I still stuck in above error. Thanks!