Hello there,
As i can see on the docs, the analytics is installed by default with open edx starting from the ironwood release, but well after i install open edx, the analytics is not working on the port 18110 so i used the docs and i tried to install it, but it looks like the docs are only to install it on an aws instance? but i want to install it on my vps directly.
I tried with this script: https://openedx.atlassian.net/wiki/spaces/OpenOPS/pages/43385371/edX+Analytics+Installation
i updated the required values, but i am receiving an error:
TASK [aws : Gather ec2 facts for use in other roles] **************************************************************************************************
fatal: [localhost]: FAILED! => {âcensoredâ: âthe output has been hidden due to the fact that âno_log: trueâ was specified for this resultâ, âchangedâ: false}
to retry, use: --limit @/root/configuration/playbooks/analytics_single.retry
PLAY RECAP ********************************************************************************************************************************************
localhost : ok=34 changed=7 unreachable=0 failed=1
After i receive another error:
GATHERING FACTS ***************************************************************
previous known host file not found
fatal: [localhost] => SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue
TASK: [luigi | configuration directory created] *******************************
FATAL: no hosts matched or all hosts have already failed â aborting
PLAY RECAP ********************************************************************
to retry, use: --limit @/root/task.retry
localhost : ok=0 changed=0 unreachable=1 failed=0
My configs are:
#!/bin/bash
LMS_HOSTNAME=âhttp://xxx.xxx.xxx.243â
INSIGHTS_HOSTNAME=âhttp://xxx.xxx.xxx.243:8110/â # Change this to the externally visible domain and scheme for your Insights install, ideally HTTPS
DB_USERNAME=âxxxxxxâ
DB_HOST=âlocalhostâ
DB_PASSWORD=âxxxxxxxxxâ
DB_PORT=â3306â
Yep, currently AWS is the only officially supported environment for analytics deployments, because of all the pieces required to run the analytics pipeline, which feeds data into Insights (see architecture diagram). We at OpenCraft set up analytics on AWS a lot for clients, so weâve assembled some documentation for how to do this, be beware that itâs not straightforward: openedx-deployment.doc.opencraft.com, under Analytics.
However, AWS is cost-prohibitive for a lot of deployments, and also, people with small- and medium-sized LMS user bases doesnât really need the massively-scaled infrastructure that Open edXâs AWS analytics deployment provides. Thereâs a couple of options.
Figures @john and Appsembler built Figures, which provides some of the data reporting available in Open edX Insights/analytics.
Since it runs in the same python environment as the LMS, itâs much easier to install, use, and contribute to.
Depending on which version of Open edX youâre running, Iâd totally recommend trying it out to see if it meets your needs. Theyâre happy to accept pull requests too, if you find bugs or have features you want to add!
OpenStack Analytics
OpenCraft are working enhancing our Open edX deployment service (Ocim) to make it possible to run Insights and the Analytics Pipeline on a single OpenStack (OVH) instance.
The timeline for completing this isnât yet known, so nothing has been upstreamed or properly documented yet. But I can share what weâve done so far, and youâre welcome to use what you like. Again beware: itâs not a simple process.
Also note: we use S3 buckets for cost and authentication reasons, but you can use any hdfs-friendly locations.
Based my configuration branch on our ironwood.2 release branch, cf changes made
Hello @jill ! Thank you for replying, actually i am not able to use AWS sadly but i got some help on Slack from @sambapete ( big thanks for him ) so i disabled the AWS related tasks this way:
AWS_GATHER_FACTS: false
Then i solved another problem about the database migration and finally i am stuck on the following task:
Task analytics_pipeline : enable Hadoop services
Placement: configuration/playbooks/roles/analytics_pipeline/tasks/main.yml:136
Error message: Could not find the requested service [âhdfs-namenodeâ, âhdfs-datanodeâ, âyarn-resourcemanagerâ, âyarn-nodemanagerâ, âyarn-proxyserverâ, 'mapreduce-historyserver
The hadoop user is there and everything seems good⌠i am trying to solve this for now.
A quick google search for that error suggests that thereâs an issue with the ansible service module on some systems. If you add daemon_reload: yes to that task as suggested here, does it help?
Hi @jill ! Actually i just enabled those services manually and commented the check there cause even after adding daemon_reload: yes and trying a lot of things itâs not working⌠So i just bypassed those steps and fixed some other things and finally it seems like everything goes well and the installation is done Now i am having problems with setting the authentication it redirecting me to 127.0.0.1:8000 even after updating the insights.yml and the lms.env.json with the public ip address⌠I added the trusted client on my admin dashboard etc too⌠this is weiredâŚ
Is there any updated or more clear steps to fix this cause i believe that i did what is there on the docs?
Most of the steps there are already done⌠the problem is with this redirect to 127.0.0.1:8000 i cannot find where is should update it to make the redirection goes to the public ip and not 127.0.0.1.
I really appreciate your reply!
I solved that issue sadly by updating directly on /edx/app/insights/edx_analytics_dashboard/analytics_dashboard/settings/base.py It looks like restarting the insights will not load any changes⌠(very weird⌠anyway).
Right now i am having another issue which is: invalid_request The requested redirect didn't match the client settings.
I tried with the troubleshoting section here: https://openedx-deployment.doc.opencraft.com/en/latest/analytics/insights/#oauth2
The links are all good⌠But still getting that errorâŚ
Yep, OAuth is tricky. Note that itâs not Open edX making this hard, the django social authentication settings have to be exactly right.
I need more information about your config to debug this⌠can you post your /edx/etc/insights.yml, your full LMS URL, the LMS_BASE_SCHEME from /edx/etc/lms.yml, and the values in the /admin/oauth2/client/ created for Insights? (with keys and secrets redacted of course) Thereâs a mismatch in there somewhere.
Hello @jill, here are my files and everything⌠i am totally tired of this⌠:
My client config with everything clear ( i dont care anymore about keys and secrets⌠i will remove this afterâŚ) :
@ettayeb_mohamed Hey, looks like you sorted it out? What was the fix?
I was able to register a new account on your LMS, and was able to authenticate. Getting a 403 on the Insights home page, but thatâs usual (unfortunately) if the pipeline tasks havenât run yet.
Hello @jill,
I think that the problem was with the insights version⌠itâs something like that the lms working with oidc and the insights with oauth2 ( /complete/edx-oidc VS /complete/edx-oauth2/ )
The solution was to add some variable to the ansible-playbook command this way: ansible-playbook -i localhost, -c local analytics_single.yml --extra-vars "INSIGHTS_LMS_BASE=<LMS DOMAIN> INSIGHTS_VERSION=open-release/ironwood.master ANALYTICS_API_VERSION=open-release/ironwood.master"
I think that it was installing another version of the insights thatâs itâŚ
Right now all is good, i even solved all the hadoop problems etc⌠but i cannot run the pipeline tasks i donât know why
(pipeline) root@vps759767:~/edx-analytics-pipeline# remote-task --host localhost --user root --remote-name analyticstack --skip-setup --wait ImportEnrollmentsIntoMysql --interval 2016 --local-scheduler
Parsed arguments = Namespace(branch=âreleaseâ, extra_repo=None, host=âlocalhostâ, job_flow_id=None, job_flow_name=None, launch_task_arguments=[âImportEnrollmentsIntoMysqlâ, ââintervalâ, â2016â, ââlocal-schedulerâ], log_path=None, override_config=None, package=None, private_key=None, python_version=None, remote_name=âanalyticstackâ, repo=None, secure_config=None, secure_config_branch=None, secure_config_repo=None, shell=None, skip_setup=True, sudo_user=âhadoopâ, user=ârootâ, vagrant_path=None, verbose=False, virtualenv_extra_args=None, wait=True, wheel_url=None, workflow_profiler=None)
Running commands from path = /root/pipeline/share/edx.analytics.tasks
Remote name = analyticstack
Running command = [âsshâ, â-ttâ, â-oâ, âForwardAgent=yesâ, â-oâ, âStrictHostKeyChecking=noâ, â-oâ, âUserKnownHostsFile=/dev/nullâ, â-oâ, âKbdInteractiveAuthentication=noâ, â-oâ, âPasswordAuthentication=noâ, â-oâ, âUser=rootâ, â-oâ, âConnectTimeout=10â, âlocalhostâ, âsudo -Hu hadoop /bin/bash -c âcd /var/lib/analytics-tasks/analyticstack/repo && . $HOME/.bashrc && . /var/lib/analytics-tasks/analyticstack/venv/bin/activate && launch-task ImportEnrollmentsIntoMysql --interval 2016 --local-schedulerââ]
Warning: Permanently added âlocalhostâ (ECDSA) to the list of known hosts.
/bin/bash: line 0: cd: /var/lib/analytics-tasks/analyticstack/repo: No such file or directory
Connection to localhost closed.
Exiting with status = 1
The pipeline task want to read the tracking logs from hdfs (or s3, when configured to read from there), so you should sync your tracking logs to that hdfs store periodically.
Actually that was the main problem!!! there was nothing on that hdfs store!
When i ran that playbook i didnât receive any error so i though that all is good
I reran that manually and i restarted the taks and everything works well then i run the sync db command and finally everything is working well and the dashboard is there!!!
Big thanks for you @jill!!!