Error running AnswerDistributionWorkflow

Hi All,

I recently deployed ironwood lms and managed to install insights on a different server. I am at a point of running analytics tasks but got stuck at the first one. Can anyone help me with understanding this error please.

DEBUG:edx.analytics.tasks.launchers.local:Loading override configuration ‘override.cfg’…
ERROR: Uncaught exception in luigi
Traceback (most recent call last):
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/retcodes.py”, line 74, in run_with_retcodes
worker = luigi.interface._run(argv)[‘worker’]
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/interface.py”, line 248, in _run
return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 116, in get_task_obj
return self._get_task_cls()(**self._get_task_kwargs())
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 133, in _get_task_kwargs
res.update(((param_name, param_obj.parse(attr)),))
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/parameter.py”, line 940, in parse
return list(json.loads(x, object_pairs_hook=_FrozenOrderedDict))
File “/usr/lib/python2.7/json/init.py”, line 352, in loads
return cls(encoding=encoding, **kw).decode(s)
File “/usr/lib/python2.7/json/decoder.py”, line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File “/usr/lib/python2.7/json/decoder.py”, line 382, in raw_decode
raise ValueError(“No JSON object could be decoded”)
ValueError: No JSON object could be decoded
ERROR:luigi-interface:Uncaught exception in luigi
Traceback (most recent call last):
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/retcodes.py”, line 74, in run_with_retcodes
worker = luigi.interface._run(argv)[‘worker’]
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/interface.py”, line 248, in _run
return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 116, in get_task_obj
return self._get_task_cls()(**self._get_task_kwargs())
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 133, in _get_task_kwargs
res.update(((param_name, param_obj.parse(attr)),))
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/parameter.py”, line 940, in parse
return list(json.loads(x, object_pairs_hook=_FrozenOrderedDict))
File “/usr/lib/python2.7/json/init.py”, line 352, in loads
return cls(encoding=encoding, **kw).decode(s)
File “/usr/lib/python2.7/json/decoder.py”, line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File “/usr/lib/python2.7/json/decoder.py”, line 382, in raw_decode
raise ValueError(“No JSON object could be decoded”)
ValueError: No JSON object could be decoded
Connection to localhost closed.
Exiting with status = 40

Cannot figure out which json file it is talking about. Output.json file is shown below

{

    "username": "pipeline001",
    "host": "localhost",
    "password": "passwordxxxx",
    "port": 3306

}

Regards,
Neville

Looks like it’s trying to read your override.cfg

If I dont specify override.cfg like below i still get an error
export UNIQUE_NAME=$(date +%Y-%m-%dT%H_%M_%SZ)
remote-task AnswerDistributionWorkflow --host localhost --user ubuntu --remote-name analyticstack --wait
–local-scheduler --verbose
–src [hdfs://localhost:9000/data]
–dest hdfs://localhost:9000/tmp/pipeline-task-scheduler/AnswerDistributionWorkflow/$UNIQUE_NAME/dest
–name $UNIQUE_NAME
–output-root hdfs://localhost:9000/tmp/pipeline-task-scheduler/AnswerDistributionWorkflow/$UNIQUE_NAME/course
–include [“tracking.log.gz*”]
–manifest hdfs://localhost:9000/tmp/pipeline-task-scheduler/AnswerDistributionWorkflow/$UNIQUE_NAME/manifest.txt
–base-input-format “org.edx.hadoop.input.ManifestTextInputFormat”
–lib-jar [hdfs://localhost:9000/edx-analytics-pipeline/packages/edx-analytics-hadoop-util.jar]
–n-reduce-tasks 1
–marker hdfs://localhost:9000/tmp/pipeline-task-scheduler/AnswerDistributionWorkflow/$UNIQUE_NAME/marker
–credentials /edx/etc/edx-analytics-pipeline/output.json

Error code is different
DEBUG:edx.analytics.tasks.launchers.local:Configuration file ‘override.cfg’ does not exist!
ERROR: Uncaught exception in luigi
Traceback (most recent call last):
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/retcodes.py”, line 74, in run_with_retcodes
worker = luigi.interface._run(argv)[‘worker’]
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/interface.py”, line 248, in _run
return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 116, in get_task_obj
return self._get_task_cls()(**self._get_task_kwargs())
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 133, in _get_task_kwargs
res.update(((param_name, param_obj.parse(attr)),))
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/parameter.py”, line 940, in parse
return list(json.loads(x, object_pairs_hook=_FrozenOrderedDict))
File “/usr/lib/python2.7/json/init.py”, line 352, in loads
return cls(encoding=encoding, **kw).decode(s)
File “/usr/lib/python2.7/json/decoder.py”, line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File “/usr/lib/python2.7/json/decoder.py”, line 382, in raw_decode
raise ValueError(“No JSON object could be decoded”)
ValueError: No JSON object could be decoded
ERROR:luigi-interface:Uncaught exception in luigi
Traceback (most recent call last):
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/retcodes.py”, line 74, in run_with_retcodes
worker = luigi.interface._run(argv)[‘worker’]
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/interface.py”, line 248, in _run
return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 116, in get_task_obj
return self._get_task_cls()(**self._get_task_kwargs())
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 133, in _get_task_kwargs
res.update(((param_name, param_obj.parse(attr)),))
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/parameter.py”, line 940, in parse
return list(json.loads(x, object_pairs_hook=_FrozenOrderedDict))
File “/usr/lib/python2.7/json/init.py”, line 352, in loads
return cls(encoding=encoding, **kw).decode(s)
File “/usr/lib/python2.7/json/decoder.py”, line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File “/usr/lib/python2.7/json/decoder.py”, line 382, in raw_decode
raise ValueError(“No JSON object could be decoded”)
ValueError: No JSON object could be decoded
Connection to localhost closed.
Exiting with status = 4

Anybody having any comment about this error would greatly help. thanks :slight_smile:

@nevilleonline I don’t think it’s a JSON file that’s causing the error… I think it’s a JSON-formatted task parameter. I can see two in your command which aren’t valid JSON:

You need to put quotes around those strings:

–src ["hdfs://localhost:9000/data"]
–lib-jar ["hdfs://localhost:9000/edx-analytics-pipeline/packages/edx-analytics-hadoop-util.jar"]

See AnswerDistributionWorkflow task documentation.

Thanks Jill for your response. I tried with the quotes and it seems to give the same error.

export UNIQUE_NAME=$(date +%Y-%m-%dT%H_%M_%SZ)

export UNIQUE_NAME=$(date +%Y-%m-%dT%H_%M_%SZ)
remote-task AnswerDistributionWorkflow --host localhost --user ubuntu --remote-name analyticstack --skip-setup --wait
–local-scheduler --verbose
–src [“hdfs://localhost:9000/data”]
–dest hdfs://localhost:9000/tmp/pipeline-task-scheduler/AnswerDistributionWorkflow/$UNIQUE_NAME/dest
–name $UNIQUE_NAME
–output-root hdfs://localhost:9000/tmp/pipeline-task-scheduler/AnswerDistributionWorkflow/$UNIQUE_NAME/course
–include [“tracking.log.gz”]
–manifest hdfs://localhost:9000/tmp/pipeline-task-scheduler/AnswerDistributionWorkflow/$UNIQUE_NAME/manifest.txt
–base-input-format “org.edx.hadoop.input.ManifestTextInputFormat”
–lib-jar [“hdfs://localhost:9000/edx-analytics-pipeline/site-packages/edx-analytics-hadoop-util.jar”]
–n-reduce-tasks 1
–marker hdfs://localhost:9000/tmp/pipeline-task-scheduler/AnswerDistributionWorkflow/$UNIQUE_NAME/marker
–credentials /edx/etc/edx-analytics-pipeline/output.json

Error below

DEBUG:edx.analytics.tasks.launchers.local:Loading override configuration ‘override.cfg’…
ERROR: Uncaught exception in luigi
Traceback (most recent call last):
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/retcodes.py”, line 74, in run_with_retcodes
worker = luigi.interface._run(argv)[‘worker’]
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/interface.py”, line 248, in _run
return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 116, in get_task_obj
return self._get_task_cls()(**self._get_task_kwargs())
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 133, in _get_task_kwargs
res.update(((param_name, param_obj.parse(attr)),))
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/parameter.py”, line 940, in parse
return list(json.loads(x, object_pairs_hook=_FrozenOrderedDict))
File “/usr/lib/python2.7/json/init.py”, line 352, in loads
return cls(encoding=encoding, **kw).decode(s)
File “/usr/lib/python2.7/json/decoder.py”, line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File “/usr/lib/python2.7/json/decoder.py”, line 382, in raw_decode
raise ValueError(“No JSON object could be decoded”)
ValueError: No JSON object could be decoded
ERROR:luigi-interface:Uncaught exception in luigi
Traceback (most recent call last):
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/retcodes.py”, line 74, in run_with_retcodes
worker = luigi.interface._run(argv)[‘worker’]
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/interface.py”, line 248, in _run
return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 116, in get_task_obj
return self._get_task_cls()(**self._get_task_kwargs())
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 133, in _get_task_kwargs
res.update(((param_name, param_obj.parse(attr)),))
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/parameter.py”, line 940, in parse
return list(json.loads(x, object_pairs_hook=_FrozenOrderedDict))
File “/usr/lib/python2.7/json/init.py”, line 352, in loads
return cls(encoding=encoding, **kw).decode(s)
File “/usr/lib/python2.7/json/decoder.py”, line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File “/usr/lib/python2.7/json/decoder.py”, line 382, in raw_decode
raise ValueError(“No JSON object could be decoded”)
ValueError: No JSON object could be decoded
Connection to localhost closed.
Exiting with status = 40

Is there anything else you can see that i am not seeing?

The error logs don’t help, and I can’t see from your command what tripping it up :frowning:

This is really hacky, but you could try modifying this luigi code shown above to get it to print out the param_name when the parsing fails, so you can see which variable is causing the issue? e.g.

    def _get_task_kwargs(self):
        """
        Get the local task arguments as a dictionary. The return value is in
        the form ``dict(my_param='my_value', ...)``
        """
        res = {}
        for (param_name, param_obj) in self._get_task_cls().get_params():
            attr = getattr(self.known_args, param_name)
            if attr:
                try:
                    res.update(((param_name, param_obj.parse(attr)),))
                except ValueError as err:
                    print("Error parsing JSON %s, value=%s" % (param_name, param_obj))
                    raise err
        return res