Error When running Remote Task ModuleEngagementWorkflowTask for Insights

Hope you guys are doing well. I’ve gotten all the remote-tasks to run except for the last one ModuleEngagementWorkflowTask. I installed insights using the single server installation method at Confluence

Using the following command:

remote-task --host localhost --user ubuntu --remote-name analyticstack --skip-setup --wait ModuleEngagementWorkflowTask \
--date $(date +%Y-%m-%d -d "2021-04-28") \
--indexing-tasks 5 \
--throttle 0.5 \
--n-reduce-tasks 1

I’m getting the following error:

2021-04-28 04:28:30,005 ERROR 17422 [luigi-interface] rpc.py:134 - Failed connecting to remote scheduler 'http://localhost:8082'
Traceback (most recent call last):
  File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/rpc.py", line 129, in _fetch
    response = self._fetcher.fetch(full_url, body, self._connect_timeout)
  File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/rpc.py", line 85, in fetch
    resp = self.session.get(full_url, data=body, timeout=timeout)
  File "/var/lib/analytics-tasks/analyticstack/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 546, in get
    return self.request('GET', url, **kwargs)
  File "/var/lib/analytics-tasks/analyticstack/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/var/lib/analytics-tasks/analyticstack/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/var/lib/analytics-tasks/analyticstack/venv/local/lib/python2.7/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='localhost', port=8082): Max retries exceeded with url: /api/ping (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5c27ce7d90>: Failed to establish a new connection: [Errno 111] Connection refused',))
2021-04-28 04:28:30,006 INFO 17422 [luigi-interface] rpc.py:126 - Retrying attempt 2 of 3 (max)
2021-04-28 04:28:30,007 INFO 17422 [luigi-interface] rpc.py:116 - Wait for 30 seconds
2021-04-28 04:29:00,032 ERROR 17422 [luigi-interface] rpc.py:134 - Failed connecting to remote scheduler 'http://localhost:8082'
Traceback (most recent call last):
  File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/rpc.py", line 129, in _fetch
    response = self._fetcher.fetch(full_url, body, self._connect_timeout)
  File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/rpc.py", line 85, in fetch
    resp = self.session.get(full_url, data=body, timeout=timeout)
  File "/var/lib/analytics-tasks/analyticstack/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 546, in get
    return self.request('GET', url, **kwargs)
  File "/var/lib/analytics-tasks/analyticstack/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/var/lib/analytics-tasks/analyticstack/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/var/lib/analytics-tasks/analyticstack/venv/local/lib/python2.7/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='localhost', port=8082): Max retries exceeded with url: /api/ping (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5c27d0bdd0>: Failed to establish a new connection: [Errno 111] Connection refused',))

I fixed the 8082 issue by using the command luigid --port 8082 on another shell of the instance, but now I am also getting the following:

2021-04-28 04:39:30,141 WARNING 18183 [edx.analytics.tasks.util.elasticsearch_target] elasticsearch_target.py:88 - ConnectionError((<urllib3.connection.HTTPConnection object at 0x7fa79c289910>, u'Connection to 172.17.0.1 timed out. (connect timeout=60)')) caused by: ConnectTimeoutError((<urllib3.connection.HTTPConnection object at 0x7fa79c289910>, u'Connection to 172.17.0.1 timed out. (connect timeout=60)'))

I checked to see if there’s an elasticsearch instance running on the insights server using the systemctl command… and there isn’t. Anyone have any tips on resolving this?

You’re doing great to get this far @c3ho ! The analytics system isn’t designed for single instance deployments, and so there’s a lot missing. OpenCraft have done some exploration into how to make analytics run on a single (ubuntu) instance, and this is among several changes which would need to be made to the playbooks and configuration to fully automate this.

I think you can also fix this issue by passing --local-scheduler to the ModuleEngagementWorkflowTask, see task docs.

You can deploy elasticsearch using the edx:configuration ansible role, see e.g. this ironwood.2 patch. Then you’ll need to tell the pipeline what host/port to contact elasticsearch, e.g. here on devstack.cfg.

@jill Thanks a lot for the quick reply! I’ll look into it right away, I’ve read a ton of your replies to other people struggling with the installation as well and they’ve been extremely helpful!

1 Like

Solution works!

Wanted to note for other people:
There should be an elasticsearch.service now. When you run systemctl status elasticsearch.service a few addresses show up.

I used the publish_address in the http row and used that value for host in the override.cfg file within /var/lib/analytics-tasks/analyticstack/repo. So it looks something like this:

[elasticsearch]
host = ["<publish_address_value>"]
1 Like

Hooray! Glad you sorted this out @c3ho !

1 Like