Error running AnswerDistributionWorkflow

nevilleonline · July 9, 2020, 7:30am

Hi All,

I recently deployed ironwood lms and managed to install insights on a different server. I am at a point of running analytics tasks but got stuck at the first one. Can anyone help me with understanding this error please.

DEBUG:edx.analytics.tasks.launchers.local:Loading override configuration ‘override.cfg’…
ERROR: Uncaught exception in luigi
Traceback (most recent call last):
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/retcodes.py”, line 74, in run_with_retcodes
worker = luigi.interface._run(argv)[‘worker’]
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/interface.py”, line 248, in _run
return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 116, in get_task_obj
return self._get_task_cls()(**self._get_task_kwargs())
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 133, in _get_task_kwargs
res.update(((param_name, param_obj.parse(attr)),))
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/parameter.py”, line 940, in parse
return list(json.loads(x, object_pairs_hook=_FrozenOrderedDict))
File “/usr/lib/python2.7/json/init.py”, line 352, in loads
return cls(encoding=encoding, **kw).decode(s)
File “/usr/lib/python2.7/json/decoder.py”, line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File “/usr/lib/python2.7/json/decoder.py”, line 382, in raw_decode
raise ValueError(“No JSON object could be decoded”)
ValueError: No JSON object could be decoded
ERROR:luigi-interface:Uncaught exception in luigi
Traceback (most recent call last):
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/retcodes.py”, line 74, in run_with_retcodes
worker = luigi.interface._run(argv)[‘worker’]
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/interface.py”, line 248, in _run
return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 116, in get_task_obj
return self._get_task_cls()(**self._get_task_kwargs())
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 133, in _get_task_kwargs
res.update(((param_name, param_obj.parse(attr)),))
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/parameter.py”, line 940, in parse
return list(json.loads(x, object_pairs_hook=_FrozenOrderedDict))
File “/usr/lib/python2.7/json/init.py”, line 352, in loads
return cls(encoding=encoding, **kw).decode(s)
File “/usr/lib/python2.7/json/decoder.py”, line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File “/usr/lib/python2.7/json/decoder.py”, line 382, in raw_decode
raise ValueError(“No JSON object could be decoded”)
ValueError: No JSON object could be decoded
Connection to localhost closed.
Exiting with status = 40

Cannot figure out which json file it is talking about. Output.json file is shown below

{

    "username": "pipeline001",
    "host": "localhost",
    "password": "passwordxxxx",
    "port": 3306

}

Regards,
Neville

David_Adams · July 9, 2020, 9:30pm

Looks like it’s trying to read your override.cfg

nevilleonline · July 10, 2020, 4:45am

If I dont specify override.cfg like below i still get an error
export UNIQUE_NAME=$(date +%Y-%m-%dT%H_%M_%SZ)
remote-task AnswerDistributionWorkflow --host localhost --user ubuntu --remote-name analyticstack --wait
–local-scheduler --verbose
–src [hdfs://localhost:9000/data]
–dest hdfs://localhost:9000/tmp/pipeline-task-scheduler/AnswerDistributionWorkflow/$UNIQUE_NAME/dest
–name $UNIQUE_NAME
–output-root hdfs://localhost:9000/tmp/pipeline-task-scheduler/AnswerDistributionWorkflow/$UNIQUE_NAME/course
–include [“tracking.log.gz*”]
–manifest hdfs://localhost:9000/tmp/pipeline-task-scheduler/AnswerDistributionWorkflow/$UNIQUE_NAME/manifest.txt
–base-input-format “org.edx.hadoop.input.ManifestTextInputFormat”
–lib-jar [hdfs://localhost:9000/edx-analytics-pipeline/packages/edx-analytics-hadoop-util.jar]
–n-reduce-tasks 1
–marker hdfs://localhost:9000/tmp/pipeline-task-scheduler/AnswerDistributionWorkflow/$UNIQUE_NAME/marker
–credentials /edx/etc/edx-analytics-pipeline/output.json

Error code is different
DEBUG:edx.analytics.tasks.launchers.local:Configuration file ‘override.cfg’ does not exist!
ERROR: Uncaught exception in luigi
Traceback (most recent call last):
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/retcodes.py”, line 74, in run_with_retcodes
worker = luigi.interface._run(argv)[‘worker’]
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/interface.py”, line 248, in _run
return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 116, in get_task_obj
return self._get_task_cls()(**self._get_task_kwargs())
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 133, in _get_task_kwargs
res.update(((param_name, param_obj.parse(attr)),))
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/parameter.py”, line 940, in parse
return list(json.loads(x, object_pairs_hook=_FrozenOrderedDict))
File “/usr/lib/python2.7/json/init.py”, line 352, in loads
return cls(encoding=encoding, **kw).decode(s)
File “/usr/lib/python2.7/json/decoder.py”, line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File “/usr/lib/python2.7/json/decoder.py”, line 382, in raw_decode
raise ValueError(“No JSON object could be decoded”)
ValueError: No JSON object could be decoded
ERROR:luigi-interface:Uncaught exception in luigi
Traceback (most recent call last):
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/retcodes.py”, line 74, in run_with_retcodes
worker = luigi.interface._run(argv)[‘worker’]
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/interface.py”, line 248, in _run
return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 116, in get_task_obj
return self._get_task_cls()(**self._get_task_kwargs())
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 133, in _get_task_kwargs
res.update(((param_name, param_obj.parse(attr)),))
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/parameter.py”, line 940, in parse
return list(json.loads(x, object_pairs_hook=_FrozenOrderedDict))
File “/usr/lib/python2.7/json/init.py”, line 352, in loads
return cls(encoding=encoding, **kw).decode(s)
File “/usr/lib/python2.7/json/decoder.py”, line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File “/usr/lib/python2.7/json/decoder.py”, line 382, in raw_decode
raise ValueError(“No JSON object could be decoded”)
ValueError: No JSON object could be decoded
Connection to localhost closed.
Exiting with status = 4

nevilleonline · July 15, 2020, 9:13am

Anybody having any comment about this error would greatly help. thanks

jill · July 15, 2020, 9:35am

@nevilleonline I don’t think it’s a JSON file that’s causing the error… I think it’s a JSON-formatted task parameter. I can see two in your command which aren’t valid JSON:

You need to put quotes around those strings:

–src ["hdfs://localhost:9000/data"]
–lib-jar ["hdfs://localhost:9000/edx-analytics-pipeline/packages/edx-analytics-hadoop-util.jar"]

See AnswerDistributionWorkflow task documentation.

nevilleonline · July 20, 2020, 6:53am

Thanks Jill for your response. I tried with the quotes and it seems to give the same error.

export UNIQUE_NAME=$(date +%Y-%m-%dT%H_%M_%SZ)

export UNIQUE_NAME=$(date +%Y-%m-%dT%H_%M_%SZ)
remote-task AnswerDistributionWorkflow --host localhost --user ubuntu --remote-name analyticstack --skip-setup --wait
–local-scheduler --verbose
–src [“hdfs://localhost:9000/data”]
–dest hdfs://localhost:9000/tmp/pipeline-task-scheduler/AnswerDistributionWorkflow/$UNIQUE_NAME/dest
–name $UNIQUE_NAME
–output-root hdfs://localhost:9000/tmp/pipeline-task-scheduler/AnswerDistributionWorkflow/$UNIQUE_NAME/course
–include [“tracking.log.gz”]
–manifest hdfs://localhost:9000/tmp/pipeline-task-scheduler/AnswerDistributionWorkflow/$UNIQUE_NAME/manifest.txt
–base-input-format “org.edx.hadoop.input.ManifestTextInputFormat”
–lib-jar [“hdfs://localhost:9000/edx-analytics-pipeline/site-packages/edx-analytics-hadoop-util.jar”]
–n-reduce-tasks 1
–marker hdfs://localhost:9000/tmp/pipeline-task-scheduler/AnswerDistributionWorkflow/$UNIQUE_NAME/marker
–credentials /edx/etc/edx-analytics-pipeline/output.json

Error below

DEBUG:edx.analytics.tasks.launchers.local:Loading override configuration ‘override.cfg’…
ERROR: Uncaught exception in luigi
Traceback (most recent call last):
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/retcodes.py”, line 74, in run_with_retcodes
worker = luigi.interface._run(argv)[‘worker’]
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/interface.py”, line 248, in _run
return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 116, in get_task_obj
return self._get_task_cls()(**self._get_task_kwargs())
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 133, in _get_task_kwargs
res.update(((param_name, param_obj.parse(attr)),))
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/parameter.py”, line 940, in parse
return list(json.loads(x, object_pairs_hook=_FrozenOrderedDict))
File “/usr/lib/python2.7/json/init.py”, line 352, in loads
return cls(encoding=encoding, **kw).decode(s)
File “/usr/lib/python2.7/json/decoder.py”, line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File “/usr/lib/python2.7/json/decoder.py”, line 382, in raw_decode
raise ValueError(“No JSON object could be decoded”)
ValueError: No JSON object could be decoded
ERROR:luigi-interface:Uncaught exception in luigi
Traceback (most recent call last):
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/retcodes.py”, line 74, in run_with_retcodes
worker = luigi.interface._run(argv)[‘worker’]
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/interface.py”, line 248, in _run
return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 116, in get_task_obj
return self._get_task_cls()(**self._get_task_kwargs())
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 133, in _get_task_kwargs
res.update(((param_name, param_obj.parse(attr)),))
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/parameter.py”, line 940, in parse
return list(json.loads(x, object_pairs_hook=_FrozenOrderedDict))
File “/usr/lib/python2.7/json/init.py”, line 352, in loads
return cls(encoding=encoding, **kw).decode(s)
File “/usr/lib/python2.7/json/decoder.py”, line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File “/usr/lib/python2.7/json/decoder.py”, line 382, in raw_decode
raise ValueError(“No JSON object could be decoded”)
ValueError: No JSON object could be decoded
Connection to localhost closed.
Exiting with status = 40

Is there anything else you can see that i am not seeing?

jill · July 22, 2020, 7:16am

The error logs don’t help, and I can’t see from your command what tripping it up

This is really hacky, but you could try modifying this luigi code shown above to get it to print out the param_name when the parsing fails, so you can see which variable is causing the issue? e.g.

    def _get_task_kwargs(self):
        """
        Get the local task arguments as a dictionary. The return value is in
        the form ``dict(my_param='my_value', ...)``
        """
        res = {}
        for (param_name, param_obj) in self._get_task_cls().get_params():
            attr = getattr(self.known_args, param_name)
            if attr:
                try:
                    res.update(((param_name, param_obj.parse(attr)),))
                except ValueError as err:
                    print("Error parsing JSON %s, value=%s" % (param_name, param_obj))
                    raise err
        return res

nevilleonline · October 2, 2020, 2:47pm

Apologies Jill for responding late. Managed to do the change in the luigi code. something new has popped up. Hopefully it makes sense to you.

“Error parsing JSON lib_jar, value=<luigi.parameter.ListParameter object at 0x7fd87f304dd0>”

DEBUG:edx.analytics.tasks.launchers.local:Loading override configuration ‘override.cfg’…
Error parsing JSON lib_jar, value=<luigi.parameter.ListParameter object at 0x7fd87f304dd0>
ERROR: Uncaught exception in luigi
Traceback (most recent call last):
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/retcodes.py”, line 74, in run_with_retcodes
worker = luigi.interface._run(argv)[‘worker’]
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/interface.py”, line 248, in _run
return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 116, in get_task_obj
return self._get_task_cls()(**self._get_task_kwargs())
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 137, in _get_task_kwargs
raise err
ValueError: No JSON object could be decoded
ERROR:luigi-interface:Uncaught exception in luigi
Traceback (most recent call last):
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/retcodes.py”, line 74, in run_with_retcodes
worker = luigi.interface._run(argv)[‘worker’]
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/interface.py”, line 248, in _run
return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 116, in get_task_obj
return self._get_task_cls()(**self._get_task_kwargs())
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 137, in _get_task_kwargs
raise err
ValueError: No JSON object could be decoded
Connection to localhost closed.

Regards,
Neville

jill · October 6, 2020, 3:54am

Well that’s cool, it confirms that the issue is with parsing your --lib-jar parameter at least. Unfortunately, the value printed doesn’t tell us what’s been parsed from the command line like I’d hoped it would.

I can’t tell from your previous post (because Discourse insists on using “smart quotes”), but can you confirm that you’re only using plain double quotes around the strings in that list?

nevilleonline · October 6, 2020, 7:04am

Thanks Jill, for your response. I have tried putting quotes like below.

–src [“hdfs://localhost:9000/data”]
–lib-jar [“hdfs://localhost:9000/edx-analytics-pipeline/packages/edx-analytics-hadoop-util.jar”]

I tried deleting and then uploading the lib jar file in hadoop as well. But the error is the same.

jill · October 7, 2020, 4:45am

I’ve found one of our client configurations which uses the following instead:

–lib-jar '"[\"hdfs://localhost:9000/edx-analytics-pipeline/packages/edx-analytics-hadoop-util.jar\"]"'

I’m sorry it’s so convoluted, but here’s an explanation, from the outside in:

The single quotes around the whole thing wrap the value for the shell command.
The double quotes around the array make the array string parseable as a JSON string.
The double quotes around the hdfs.. need to be escaped with a \ to make them read as double quotes once the JSON string is read.

If that works, we can work with edX to get the analytics task documentation updated too.

nevilleonline · October 8, 2020, 6:38am

Thanks Jill for your response. I still cant shake that error off. Error message is slightly different.

–lib-jar ‘"[“hdfs://localhost:9000/edx-analytics-pipeline/packages/edx-analytics-hadoop-util.jar”]"’ \

Same error shown below.

Error parsing JSON src, value=<luigi.parameter.ListParameter object at 0x7fbac9a0cd50>
ERROR: Uncaught exception in luigi
Traceback (most recent call last):
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/retcodes.py”, line 74, in run_with_retcodes
worker = luigi.interface._run(argv)[‘worker’]
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/interface.py”, line 248, in _run
return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 116, in get_task_obj
return self._get_task_cls()(**self._get_task_kwargs())
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 137, in _get_task_kwargs
raise err
ValueError: No JSON object could be decoded
ERROR:luigi-interface:Uncaught exception in luigi
Traceback (most recent call last):
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/retcodes.py”, line 74, in run_with_retcodes
worker = luigi.interface._run(argv)[‘worker’]
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/interface.py”, line 248, in _run
return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 116, in get_task_obj
return self._get_task_cls()(**self._get_task_kwargs())
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/cmdline_parser.py”, line 137, in _get_task_kwargs
raise err
ValueError: No JSON object could be decoded
Connection to localhost closed.

jill · October 8, 2020, 7:04am

Ooh this is progress! It’s now complaining about the --src parameter, which It means that we did find the magic incantation for --lib-jar! Just have to do the same for the other list parameters.

--src '"[\"hdfs://localhost:9000/data\"]"'

And maybe also:

--include '"[\"*tracking.log*.gz\"]"'

Shell escaping is painful! The good news is, for all the other analytics task types, you can use an override.cfg, which has a much easier syntax, e.g. luigi_docker.cfg

nevilleonline · October 9, 2020, 9:54am

Thanks Jill. It worked. I no longer get the JSON error. But I am stuck with another error
“Cannot overwrite a table with an empty result set.”

Not sure how to deeal with this. Do I need to delete the previous hadoop data and mysql data? Need to know which tables if you can and then I can try to re-run this task.

2020-10-09 18:48:06,588 ERROR 34358 [edx.analytics.tasks.common.mysql_load] mysql_load.py:393 - Cannot overwrite a table with an empty result set.
2020-10-09 18:48:06,591 ERROR 34358 [luigi-interface] worker.py:213 - [pid 34358] Worker Worker(salt=367386724, workers=1, host=insights.millicenttechnologies.co.in, username=hadoop, pid=34358, sudo_user=millicentr) failed AnswerDistributionToMySQLTaskWorkflow(database=reports, credentials=/edx/etc/edx-analytics-pipeline/output.json, name=2020-10-09T18_44_02Z, src=[“hdfs://localhost:9000/data”], dest=hdfs://localhost:9000/tmp/pipeline-task-scheduler/AnswerDistributionWorkflow/2020-10-09T18_44_02Z/dest, include=[“tracking.log.gz”], manifest=hdfs://localhost:9000/tmp/pipeline-task-scheduler/AnswerDistributionWorkflow/2020-10-09T18_44_02Z/manifest.txt, answer_metadata=None, base_input_format=org.edx.hadoop.input.ManifestTextInputFormat)
Traceback (most recent call last):
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/worker.py”, line 194, in run
new_deps = self._run_get_new_deps()
File “/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/worker.py”, line 146, in _run_get_new_deps
requires = task_gen.send(next_send)
File “/var/lib/analytics-tasks/analyticstack/venv/local/lib/python2.7/site-packages/edx/analytics/tasks/common/mysql_load.py”, line 377, in run
self.insert_rows(cursor)
File “/var/lib/analytics-tasks/analyticstack/venv/local/lib/python2.7/site-packages/edx/analytics/tasks/common/mysql_load.py”, line 332, in insert_rows
raise Exception(‘Cannot overwrite a table with an empty result set.’)
Exception: Cannot overwrite a table with an empty result set.

jill · October 11, 2020, 4:36am

That error comes from this parameter: allow_empty_insert = False

A couple of tasks make this configurable (InsertToMysqlAllVideoTask, ModuleEngagement), but unfortunately, AnswerDistributionWorkflow isn’t one of them.

To work around it, you can change this line to be allow_empty_insert = True.

nevilleonline · October 12, 2020, 6:12am

Thanks Jill once again. Got it to work finally. I changed it False to True in 2 files.

edx-analytics-pipeline/edx/analytics/tasks/common/mysql_load.py

allow_empty_insert = False

to

allow_empty_insert = True

/var/lib/analytics-tasks/analyticstack/venv/local/lib/python2.7/site-packages/edx/analytics/tasks/common/mysql_load.py

allow_empty_insert = False

to

allow_empty_insert = True

Finally a good result. Not sure how if it will affect the other tasks.

===== Luigi Execution Summary =====

Scheduled 7 tasks of which:

2 present dependencies were encountered:
- 1 ExternalURL(url=/edx/etc/edx-analytics-pipeline/output.json)
- 1 PathSetTask(…)
5 ran successfully:
- 1 AnswerDistributionOneFilePerCourseTask(…)
- 1 AnswerDistributionPerCourse(…)
- 1 AnswerDistributionToMySQLTaskWorkflow(…)
- 1 AnswerDistributionWorkflow(…)
- 1 ProblemCheckEvent(…)

This progress looks because there were no failed tasks or missing external dependencies

===== Luigi Execution Summary =====

jill · October 12, 2020, 7:10am

Hallelujah! Sorry it was such a messy journey for you @nevilleonline. Could you mark this post as Solved, so others can have hope of solving this issue too?

A tip for the other analytics tasks: it’s best to create and use an override.cfg file to specify the parameters for those tasks which remain the same for every run. For example, to set allow_empty_insert for the aforementioned Video task, your override.cfg file needs a stanza like the one below. You can see the section and parameter name in the parameter’s config_path, which in this case is {'section': 'videos', 'name': 'allow_empty_insert'}

[videos]
allow_empty_insert = true

Similarly, your array parameters get easier to specify in the .cfg file, and don’t require all that gnarly escaping:

[event-logs]
pattern = [".*tracking.log-(?P<date>[0-9]+).*"]
source = ["hdfs://localhost:9000/data/logs/tracking/"]

Topic		Replies	Views
Analytics pipeline: Failed to run task AnswerDistributionWorkflow Site Operators analytics	16	2205	April 22, 2021
edX analytics pipeline fails Site Operations Help analytics	5	1727	September 19, 2021
Run task AnswerDistributionToMySQLTask failure in analytics-pipeline Site Operators analytics	3	679	September 20, 2021
Running analytics tasks exits Site Operations Help how-to	2	998	December 5, 2019
Unable to run tasks to update insights Site Operations Help analytics	5	910	May 12, 2020

Error running AnswerDistributionWorkflow

Related topics