Installing insights

ettayeb_mohamed · February 3, 2020, 10:51am

Hello there,
As i can see on the docs, the analytics is installed by default with open edx starting from the ironwood release, but well after i install open edx, the analytics is not working on the port 18110 so i used the docs and i tried to install it, but it looks like the docs are only to install it on an aws instance? but i want to install it on my vps directly.
I tried with this script:
https://openedx.atlassian.net/wiki/spaces/OpenOPS/pages/43385371/edX+Analytics+Installation

i updated the required values, but i am receiving an error:

TASK [aws : Gather ec2 facts for use in other roles] **************************************************************************************************
fatal: [localhost]: FAILED! => {“censored”: “the output has been hidden due to the fact that ‘no_log: true’ was specified for this result”, “changed”: false}
to retry, use: --limit @/root/configuration/playbooks/analytics_single.retry

PLAY RECAP ********************************************************************************************************************************************
localhost : ok=34 changed=7 unreachable=0 failed=1

After i receive another error:

GATHERING FACTS ***************************************************************
previous known host file not found
fatal: [localhost] => SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue

TASK: [luigi | configuration directory created] *******************************
FATAL: no hosts matched or all hosts have already failed – aborting

PLAY RECAP ********************************************************************
to retry, use: --limit @/root/task.retry

localhost : ok=0 changed=0 unreachable=1 failed=0

My configs are:

#!/bin/bash

LMS_HOSTNAME=“http://xxx.xxx.xxx.243”
INSIGHTS_HOSTNAME=“http://xxx.xxx.xxx.243:8110/” # Change this to the externally visible domain and scheme for your Insights install, ideally HTTPS
DB_USERNAME=“xxxxxx”
DB_HOST=“localhost”
DB_PASSWORD=“xxxxxxxxx”
DB_PORT=“3306”

Anyone could help?

jill · February 11, 2020, 9:09am

Hi @ettayeb_mohamed ! Thanks for posting your question, since installing analytics is a source of frustration for a lot of people.

Not sure where you saw this, but analytics isn’t installed by default with Open edX? edX have made a lot of improvements with the devstack since it moved to Docker, and so doing development with the analytics pipeline is now supported by default on the docker devstack, but AFAIK, the production deployment still requires separate deployment steps.

ettayeb_mohamed:

but it looks like the docs are only to install it on an aws instance? but i want to install it on my vps directly.
…
TASK [aws : Gather ec2 facts for use in other roles] **************************************************************************************************
fatal: [localhost]: FAILED!

Yep, currently AWS is the only officially supported environment for analytics deployments, because of all the pieces required to run the analytics pipeline, which feeds data into Insights (see architecture diagram). We at OpenCraft set up analytics on AWS a lot for clients, so we’ve assembled some documentation for how to do this, be beware that it’s not straightforward: openedx-deployment.doc.opencraft.com, under Analytics.

However, AWS is cost-prohibitive for a lot of deployments, and also, people with small- and medium-sized LMS user bases doesn’t really need the massively-scaled infrastructure that Open edX’s AWS analytics deployment provides. There’s a couple of options.

Figures
@john and Appsembler built Figures, which provides some of the data reporting available in Open edX Insights/analytics.

Since it runs in the same python environment as the LMS, it’s much easier to install, use, and contribute to.

Depending on which version of Open edX you’re running, I’d totally recommend trying it out to see if it meets your needs. They’re happy to accept pull requests too, if you find bugs or have features you want to add!

OpenStack Analytics

OpenCraft are working enhancing our Open edX deployment service (Ocim) to make it possible to run Insights and the Analytics Pipeline on a single OpenStack (OVH) instance.

The timeline for completing this isn’t yet known, so nothing has been upstreamed or properly documented yet. But I can share what we’ve done so far, and you’re welcome to use what you like. Again beware: it’s not a simple process.

Also note: we use S3 buckets for cost and authentication reasons, but you can use any hdfs-friendly locations.

Based my configuration branch on our ironwood.2 release branch, cf changes made

Deployed using this modified playbook and these ansible variables:

Click to expand ansible variables

Replace FIXMEs with real values.

SANDBOX_ENABLE_CERTIFICATES: false
SANDBOX_ENABLE_ANALYTICS_API: true
SANDBOX_ENABLE_INSIGHTS: true
SANDBOX_ENABLE_PIPELINE: true
INSIGHTS_NGINX_PORT: 80

# packages required to install and run the pipeline
analytics_pipeline_debian_pkgs:
  - "mysql-server-5.6"
  - python-mysqldb
  - libpq-dev

NGINX_INSIGHTS_APP_EXTRA: |
  # Use /status instead of /heartbeat endpoint to keep Ocim provisioning happy
  rewrite ^/heartbeat$ /status;

# Allows hadoop/hdfs to write to our S3 bucket.
HADOOP_CORE_SITE_EXTRA_CONFIG:
  fs.s3.awsAccessKeyId: "{{ AWS_ACCESS_KEY_ID }}"
  fs.s3.awsSecretAccessKey: "{{ AWS_SECRET_ACCESS_KEY }}"
  fs.s3.region: us-east-1   # FIXME: should be a variable
  fs.s3.impl: "org.apache.hadoop.fs.s3native.NativeS3FileSystem"

# Use our mysql database for the hive database
HIVE_METASTORE_DATABASE_HOST: "{{ EDXAPP_MYSQL_HOST }}"
HIVE_METASTORE_DATABASE_NAME: hive
HIVE_METASTORE_DATABASE_USER:  # FIXME
HIVE_METASTORE_DATABASE_PASSWORD: # FIXME

HIVE_SITE_EXTRA_CONFIG:
  datanucleus.autoCreateSchema: true
  datanucleus.autoCreateTables: true
  datanucleus.fixedDatastore: true

# EDXAPP Variables needed by config below
EDXAPP_LMS_ROOT_URL: "{{ EDXAPP_LMS_BASE_SCHEME | default('https') }}://{{ EDXAPP_LMS_BASE }}"
ANALYTICS_API_LMS_BASE_URL: "{{ EDXAPP_LMS_ROOT_URL }}"

# ANALYTICS_API Variables needed by playbooks
ANALYTICS_API_EMAIL_HOST: localhost
ANALYTICS_API_EMAIL_HOST_PASSWORD: ''
ANALYTICS_API_EMAIL_HOST_USER: ''
ANALYTICS_API_EMAIL_PORT: 25
# ANALYTICS_API_GIT_IDENTITY: '{{ COMMON_GIT_IDENTITY }}'
ANALYTICS_API_LANGUAGE_CODE: en-us
ANALYTICS_API_PIP_EXTRA_ARGS: --use-wheel --no-index --find-links=http://edx-wheelhouse.s3-website-us-east-1.amazonaws.com/Ubuntu/precise/Python-2.7
ANALYTICS_API_LANGUAGE_CODE: en-us
ANALYTICS_API_PIP_EXTRA_ARGS: --use-wheel --no-index --find-links=http://edx-wheelhouse.s3-website-us-east-1.amazonaws.com/Ubuntu/precise/Python-2.7
ANALYTICS_API_SERVICE_CONFIG:
  ANALYTICS_DATABASE: reports
  API_AUTH_TOKEN: # FIXME
  DATABASES: '{{ ANALYTICS_API_DATABASES }}'
  # nb: using default localhost elasticsearch
  EMAIL_PORT: '{{ ANALYTICS_API_EMAIL_PORT }}'
  LANGUAGE_CODE: en-us
  SECRET_KEY: '{{ ANALYTICS_API_SECRET_KEY }}'
  STATICFILES_DIRS: []
  STATIC_ROOT: '{{ COMMON_DATA_DIR }}/{{ analytics_api_service_name }}/staticfiles'
  TIME_ZONE: UTC
# This password must be 40 characters or fewer
ANALYTICS_API_USER_PASSWORD: # FIXME
ANALYTICS_API_USERS:
  apiuser001: '{{ ANALYTICS_API_USER_PASSWORD }}'
  dummy-api-user: # FIXME

# INSIGHTS Variables needed by playbooks
INSIGHTS_APPLICATION_NAME: "Insights {{ EDXAPP_PLATFORM_NAME }}"
INSIGHTS_BASE_URL: # FIXME
INSIGHTS_CMS_COURSE_SHORTCUT_BASE_URL: https://{{ EDXAPP_CMS_BASE }}/course
INSIGHTS_CMS_NGINX_PORT: '{{ EDXAPP_PLATFORM_NAME }}'
INSIGHTS_CSRF_COOKIE_NAME: crsftoken
# INSIGHTS_DATABASES stanza defined above
INSIGHTS_DATA_API_AUTH_TOKEN: '{{ ANALYTICS_API_USER_PASSWORD }}'
INSIGHTS_DOC_BASE: http://edx-insights.readthedocs.org/en/latest
INSIGHTS_DOC_LOAD_ERROR_URL: http://edx-insights.readthedocs.org/en/latest/Reference.html#error-conditions
INSIGHTS_FEEDBACK_EMAIL: dashboard@example.com
INSIGHTS_GUNICORN_EXTRA: ''
INSIGHTS_GUNICORN_WORKERS: '8'
INSIGHTS_LANGUAGE_COOKIE_NAME: language
INSIGHTS_LMS_BASE: https://{{ EDXAPP_LMS_BASE }}
INSIGHTS_LMS_COURSE_SHORTCUT_BASE_URL: https://{{ EDXAPP_LMS_BASE }}/courses
INSIGHTS_MKTG_BASE: 'https://{{ EDXAPP_LMS_BASE }}'
# credentials should be auto-generated, not hardcoded here.
INSIGHTS_OAUTH2_KEY: # FIXME
INSIGHTS_OAUTH2_SECRET: # FIXME
INSIGHTS_OAUTH2_URL_ROOT: https://{{ EDXAPP_LMS_BASE }}/oauth2
INSIGHTS_OPEN_SOURCE_URL: http://code.edx.org/
INSIGHTS_PLATFORM_NAME: '{{ EDXAPP_PLATFORM_NAME }}'
INSIGHTS_PRIVACY_POLICY_URL: 'https://{{ EDXAPP_LMS_BASE }}/edx-privacy-policy'
INSIGHTS_SESSION_COOKIE_NAME: sessionid
INSIGHTS_SOCIAL_AUTH_REDIRECT_IS_HTTPS: true
INSIGHTS_SUPPORT_EMAIL: support@example.com

Made some minor mods to the analytics pipeline, cf diff, and used that branch to run the pipeline.

Used this configuration for the pipeline:

Click to expand override.cfg

Replace THINGS-IN-ALL-CAPS with real values.

[hive]
warehouse_path = s3://BUCKET-NAME-HERE/analytics/warehouse/

[database-export]
database = CLIENT-PREFIX-HERE_reports
credentials = s3://BUCKET-NAME-HERE/analytics/config/output.json

[database-import]
database = CLIENT-PREFIX-HERE_edxapp
credentials = s3://BUCKET-NAME-HERE/analytics/config/input.json
destination = s3://BUCKET-NAME-HERE/analytics/warehouse/

[otto-database-import]
database = CLIENT-PREFIX-HERE_ecommerce
credentials = s3://BUCKET-NAME-HERE/analytics/config/input.json

[map-reduce]
engine = hadoop
marker = s3://BUCKET-NAME-HERE/analytics/marker/
lib_jar = [
    "hdfs://localhost:9000/lib/hadoop-aws-2.7.2.jar",
    "hdfs://localhost:9000/lib/aws-java-sdk-1.7.4.jar"]

[event-logs]
pattern = [".*tracking.log-(?P<date>[0-9]+).*"]
expand_interval = 30 days
source = ["s3://BUCKET-NAME-HERE/CLIENT-PREFIX-HERE/logs/tracking/"]

[event-export]
output_root = s3://BUCKET-NAME-HERE/analytics/event-export/output/
environment = simple
config = s3://BUCKET-NAME-HERE/analytics/event_export/config.yaml
gpg_key_dir = s3://BUCKET-NAME-HERE/analytics/event_export/gpg-keys/
gpg_master_key = master@key.org
required_path_text = FakeServerGroup

[event-export-course]
output_root = s3://BUCKET-NAME-HERE/analytics/event-export-by-course/output/

[manifest]
threshold = 500
input_format = org.edx.hadoop.input.ManifestTextInputFormat
lib_jar = s3://BUCKET-NAME-HERE/analytics/packages/edx-analytics-hadoop-util.jar
path = s3://BUCKET-NAME-HERE/analytics/manifest/

[user-activity]
overwrite_n_days = 10
output_root = s3://BUCKET-NAME-HERE/analytics/activity/

[answer-distribution]
valid_response_types = customresponse,choiceresponse,optionresponse,multiplechoiceresponse,numericalresponse,stringresponse,formularesponse
    
[enrollments]
interval_start = 2017-01-01
overwrite_n_days = 3
blacklist_date = 2001-01-01
blacklist_path = s3://BUCKET-NAME-HERE/analytics/enrollments-blacklist/

[enrollment-reports]
src = s3://BUCKET-NAME-HERE/CLIENT-PREFIX-HERE/logs/tracking/
destination = s3://BUCKET-NAME-HERE/analytics/enrollment_reports/output/
offsets = s3://BUCKET-NAME-HERE/analytics/enrollment_reports/offsets.tsv
blacklist = s3://BUCKET-NAME-HERE/analytics/enrollment_reports/course_blacklist.tsv
history = s3://BUCKET-NAME-HERE/analytics/enrollment_reports/enrollment_history.tsv

[course-summary-enrollment]
# JV - course catalog is optional, and was causing CourseProgramMetadataInsertToMysqlTask errors.
# enable_course_catalog = true
enable_course_catalog = false

[financial-reports]
shoppingcart-partners = {"DEFAULT": "edx"}

[geolocation]
geolocation_data = s3://BUCKET-NAME-HERE/analytics/packages/GeoIP.dat
 
[location-per-course]
interval_start = 2017-01-01
overwrite_n_days = 3

[calendar]
interval = 2017-01-01-2030-01-01

[videos]
dropoff_threshold = 0.05
allow_empty_insert = true
overwrite_n_days = 3

[elasticsearch]
host = ["http://localhost:9200/"]

[module-engagement]
alias = roster_1_2
number_of_shards = 5
overwrite_n_days = 3
allow_empty_insert = true

[ccx]
enabled = false

[problem-response]
report_fields = [
    "username",
    "problem_id",
    "answer_id",
    "location",
    "question",
    "score",
    "max_score",
    "correct",
    "answer",
    "total_attempts",
    "first_attempt_date",
    "last_attempt_date"]
report_output_root = s3://BUCKET-NAME-HERE/analytics/reports/

[edx-rest-api]
# Create using:
# ./manage.py lms --settings=devstack create_oauth2_client  \
#   http://localhost:9999  # URL does not matter \
#   http://localhost:9999/complete/edx-oidc/  \
#   confidential \
#   --client_name "Analytics Pipeline" \
#   --client_id oauth_id \
#   --client_secret oauth_secret \
#   --trusted
client_id = oauth_id
client_secret = oauth_secret
auth_url = https://LMS_URL_HERE/oauth2/access_token/

[course-list]
api_root_url = https://LMS_URL_HERE/api/courses/v1/courses/

[course-blocks]
api_root_url = https://LMS_URL_HERE/api/courses/v1/blocks/

Then, the analytics tasks can be run on the local machine using this script. Schedule it to run daily via cron to keep your data updated.

Click to expand pipeline.sh

Replace the variables in the FIXME block with real values.

#!/bin/bash

# Acquire lock using this script itself as the lockfile.
# If another pipeline task is already running, then exit immediately.
exec 200<$0
flock -n 200 || { echo "`date` Another pipeline task is already running."; exit 1; }

# Run as hadoop user
. $HOME/hadoop/hadoop_env
. $HOME/venvs/pipeline/bin/activate
cd $HOME/pipeline

export OVERRIDE_CONFIG=$HOME/override.cfg

HIVE='hive'
HDFS="hadoop fs"

# FIXME set these variables
FROM_DATE=2017-01-01
NUM_REDUCE_TASKS=12
TRACKING_LOGS_S3_BUCKET="s3://TRACKING-LOG-BUCKET-GOES-HERE"
TRACKING_LOGS_S3_PATH="$TRACKING_LOGS_S3_BUCKET/logs/tracking/"
HADOOP_S3_BUCKET="$TRACKING_LOGS_S3_BUCKET"  # bucket/path for temporary/intermediate storage
HADOOP_S3_PATH="$HADOOP_S3_BUCKET/analytics"
HDFS_ROOT="$HADOOP_S3_PATH"
TASK_CONFIGURATION_S3_BUCKET="$TRACKING_LOGS_S3_BUCKET"  # bucket/path containing task configuration files
TASK_CONFIGURATION_S3_PATH="$TASK_CONFIGURATION_S3_BUCKET/analytics/packages/"
# /FIXME set these variables

END_DATE=$(date +"%Y-%m-%d")
INTERVAL="$FROM_DATE-$END_DATE"
REMOTE_TASK="launch-task"
WEEKS=10
ADD_PARAMS=""
LOCKFILE=/tmp/pipeline-tasks.lock

if [ -f $LOCKFILE ]; then
        echo "This script is already running."
        exit
else
        touch $LOCKFILE
fi

DO_SHIFT=0
getopts e:w:p: PARAM
while [ $? -eq 0 ]; do
        case "$PARAM" in
                (e)
                        echo "Using end_date: $OPTARG"
                        END_DATE=$OPTARG
                        DO_SHIFT=$(( $DO_SHIFT + 2 ))
                        ;;
                (w)
                        echo "Using WEEKS=$OPTARG"
                        WEEKS=$OPTARG
                        DO_SHIFT=$(( $DO_SHIFT + 2 ))
                        ;;
                (p)
                        echo "Using file pattern: $OPTARG"
                        ADD_PARAMS="--pattern '$OPTARG'"
                        DO_SHIFT=$(( $DO_SHIFT + 2 ))
                        ;;
        esac
        getopts e:w:p: PARAM
done

if [ $DO_SHIFT -gt 0 ]; then
        shift $DO_SHIFT
fi

if [ "$1x" != "x" ]; then
        echo "Adding parameters: $@"
        ADD_PARAMS="$@"
fi

# Run history tasks once to bootstrap new deployments.
RUN_ENROLLMENTS_HISTORY=0
RUN_GEOGRAPHY_HISTORY=0
RUN_LEARNER_ANALYTICS_HISTORY=0

# Run incremental tasks daily
RUN_ENROLLMENTS=1
RUN_PERFORMANCE=1
RUN_GEOGRAPHY=1
RUN_ENGAGEMENT=1
RUN_VIDEO=1
RUN_LEARNER_ANALYTICS=1

# Run engagement task if today is a Monday
if [ $(date +%u) -eq 1 ]; then
        RUN_ENGAGEMENT=1
fi

if [ ! -d /tmp/$END_DATE ]; then
        mkdir /tmp/$END_DATE
fi


if [ $RUN_ENROLLMENTS_HISTORY -gt 0 ]; then

   # http://edx-analytics-pipeline-reference.readthedocs.io/en/latest/running_tasks.html#history-task
   $REMOTE_TASK CourseEnrollmentEventsTask \
     --interval "$INTERVAL" \
     --local-scheduler \
     --overwrite \
     --n-reduce-tasks $NUM_REDUCE_TASKS \
     $ADD_PARAMS > /tmp/$END_DATE/CourseEnrollmentEventsTask.log 2>&1
fi

if [ $RUN_ENROLLMENTS -gt 0 ]; then

  # https://groups.google.com/d/msg/openedx-ops/pCuzvbG1OyA/FehWsxTgBwAJ
  # Since Gingko, using a persistent Hive metastore causes issues with the enrolments summary data.
  # The workaround is to delete the previously calculated summary data.
  $HIVE -e 'USE default;DROP TABLE IF EXISTS course_grade_by_mode;' \
      >> /tmp/$END_DATE/cleanup.log 2>&1
  $HDFS -rm -r $HDFS_ROOT/warehouse/course_grade_by_mode/* \
      >> /tmp/$END_DATE/cleanup.log 2>&1
  $HIVE -e 'USE default;DROP TABLE IF EXISTS course_meta_summary_enrollment;' \
    >> /tmp/$END_DATE/cleanup.log 2>&1
  $HDFS -rm -r $HDFS_ROOT/warehouse/course_meta_summary_enrollment/* \
      >> /tmp/$END_DATE/cleanup.log 2>&1

  $REMOTE_TASK ImportEnrollmentsIntoMysql \
    --interval "$INTERVAL" \
    --local-scheduler \
    --overwrite \
    --overwrite-n-days 1 \
    --n-reduce-tasks $NUM_REDUCE_TASKS \
    $ADD_PARAMS > /tmp/$END_DATE/ImportEnrollmentsIntoMysql.log 2>&1
fi

if [ $RUN_PERFORMANCE -gt 0 ]; then

  NOW=`date +%s`
  ANSWER_DIST_S3_BUCKET=$HADOOP_S3_PATH/intermediate/answer_dist/$NOW

  $REMOTE_TASK AnswerDistributionWorkflow \
    --local-scheduler \
    --src "[\"$TRACKING_LOGS_S3_PATH\"]" \
    --dest "$ANSWER_DIST_S3_BUCKET" \
    --name AnswerDistributionWorkflow \
    --output-root "$HADOOP_S3_PATH/grading_reports/" \
    --include "[\"*tracking.log*.gz\"]" \
    --manifest "$ANSWER_DIST_S3_BUCKET/manifest.txt" \
    --base-input-format "org.edx.hadoop.input.ManifestTextInputFormat" \
    --lib-jar "[\"$TASK_CONFIGURATION_S3_PATH/edx-analytics-hadoop-util.jar\"]" \
    --n-reduce-tasks $NUM_REDUCE_TASKS \
    --marker "$ANSWER_DIST_S3_BUCKET/marker" \
    $ADD_PARAMS > /tmp/$END_DATE/AnswerDistributionWorkflow.log 2>&1
fi

if [ $RUN_GEOGRAPHY_HISTORY -gt 0 ]; then

  # http://edx-analytics-pipeline-reference.readthedocs.io/en/latest/running_tasks.html#id6
  $REMOTE_TASK LastDailyIpAddressOfUserTask \
    --local-scheduler \
    --interval $INTERVAL \
    --n-reduce-tasks $NUM_REDUCE_TASKS \
    $ADD_PARAMS > /tmp/$END_DATE/LastDailyIpAddressOfUserTask.log 2>&1
fi

if [ $RUN_GEOGRAPHY -gt 0 ]; then

  $REMOTE_TASK InsertToMysqlLastCountryPerCourseTask \
    --local-scheduler \
    --interval-end $END_DATE \
    --n-reduce-tasks $NUM_REDUCE_TASKS \
    --overwrite \
    $ADD_PARAMS > /tmp/$END_DATE/InsertToMysqlLastCountryPerCourseTask.log 2>&1
fi

if [ $RUN_ENGAGEMENT -gt 0 ]; then

  WEEKS=24

  $REMOTE_TASK InsertToMysqlCourseActivityTask \
    --local-scheduler \
    --end-date $END_DATE \
    --weeks $WEEKS \
    --n-reduce-tasks $NUM_REDUCE_TASKS \
    $ADD_PARAMS > /tmp/$END_DATE/CourseActivityWeeklyTask.log 2>&1
fi

if [ $RUN_VIDEO -gt 0 ]; then
  $REMOTE_TASK InsertToMysqlAllVideoTask \
    --local-scheduler \
    --interval $INTERVAL \
    --n-reduce-tasks $NUM_REDUCE_TASKS \
    $ADD_PARAMS > /tmp/$END_DATE/InsertToMysqlAllVideoTask.log 2>&1
fi

if [ $RUN_LEARNER_ANALYTICS_HISTORY -gt 0 ]; then

  # http://edx-analytics-pipeline-reference.readthedocs.io/en/latest/running_tasks.html#id12
  $REMOTE_TASK ModuleEngagementIntervalTask \
    --local-scheduler \
    --interval $INTERVAL \
    --n-reduce-tasks $NUM_REDUCE_TASKS \
    --overwrite-from-date $END_DATE \
    --overwrite-mysql \
    $ADD_PARAMS > /tmp/$END_DATE/ModuleEngagementIntervalTask.log 2>&1
fi

if [ $RUN_LEARNER_ANALYTICS -gt 0 ]; then

  $REMOTE_TASK ModuleEngagementWorkflowTask \
    --local-scheduler \
    --date $END_DATE \
    --indexing-tasks 5 \
    --throttle 0.5 \
    --n-reduce-tasks $NUM_REDUCE_TASKS \
    $ADD_PARAMS > /tmp/$END_DATE/ModuleEngagementWorkflowTask.log 2>&1
fi

rm -f $LOCKFILE

ettayeb_mohamed · February 11, 2020, 9:24am

Hello @jill ! Thank you for replying, actually i am not able to use AWS sadly but i got some help on Slack from @sambapete ( big thanks for him ) so i disabled the AWS related tasks this way:

AWS_GATHER_FACTS: false

Then i solved another problem about the database migration and finally i am stuck on the following task:

Task analytics_pipeline : enable Hadoop services
Placement: configuration/playbooks/roles/analytics_pipeline/tasks/main.yml:136
Error message: Could not find the requested service [‘hdfs-namenode’, ‘hdfs-datanode’, ‘yarn-resourcemanager’, ‘yarn-nodemanager’, ‘yarn-proxyserver’, 'mapreduce-historyserver

The hadoop user is there and everything seems good… i am trying to solve this for now.

jill · February 14, 2020, 5:11am

Glad you’re making progress here @ettayeb_mohamed!

A quick google search for that error suggests that there’s an issue with the ansible service module on some systems. If you add daemon_reload: yes to that task as suggested here, does it help?

ettayeb_mohamed · February 14, 2020, 9:26am

Hi @jill ! Actually i just enabled those services manually and commented the check there cause even after adding daemon_reload: yes and trying a lot of things it’s not working… So i just bypassed those steps and fixed some other things and finally it seems like everything goes well and the installation is done Now i am having problems with setting the authentication it redirecting me to 127.0.0.1:8000 even after updating the insights.yml and the lms.env.json with the public ip address… I added the trusted client on my admin dashboard etc too… this is weired…
Is there any updated or more clear steps to fix this cause i believe that i did what is there on the docs?

jill · February 14, 2020, 9:54am

Do any of these tips help?

https://openedx-deployment.doc.opencraft.com/en/latest/analytics/insights/#oauth2

ettayeb_mohamed · February 14, 2020, 2:12pm

Most of the steps there are already done… the problem is with this redirect to 127.0.0.1:8000 i cannot find where is should update it to make the redirection goes to the public ip and not 127.0.0.1.

jill · February 16, 2020, 11:35pm

There’s a couple of places where redirect URLs are specified during authentication:

/edx/etc/insights.yml – the SOCIAL_AUTH_EDX_OIDC_* variables.
The LMS Django Admin, URL ending in /admin/oauth2/client/: the redirect URI

ettayeb_mohamed · February 17, 2020, 1:59pm

I really appreciate your reply!
I solved that issue sadly by updating directly on /edx/app/insights/edx_analytics_dashboard/analytics_dashboard/settings/base.py It looks like restarting the insights will not load any changes… (very weird… anyway).
Right now i am having another issue which is:
invalid_request The requested redirect didn't match the client settings.
I tried with the troubleshoting section here: https://openedx-deployment.doc.opencraft.com/en/latest/analytics/insights/#oauth2
The links are all good… But still getting that error…

jill · February 17, 2020, 9:58pm

Yep, OAuth is tricky. Note that it’s not Open edX making this hard, the django social authentication settings have to be exactly right.

I need more information about your config to debug this… can you post your /edx/etc/insights.yml, your full LMS URL, the LMS_BASE_SCHEME from /edx/etc/lms.yml, and the values in the /admin/oauth2/client/ created for Insights? (with keys and secrets redacted of course) There’s a mismatch in there somewhere.

ettayeb_mohamed · February 18, 2020, 4:18pm

Hello @jill, here are my files and everything… i am totally tired of this… :
My client config with everything clear ( i dont care anymore about keys and secrets… i will remove this after…) :

Here is the important part of the insights.yml:

SOCIAL_AUTH_EDX_OIDC_ID_TOKEN_DECRYPTION_KEY: 92fb605d041bfaa8e8f69ccb4abfb620e3f7c35a
SOCIAL_AUTH_EDX_OIDC_ISSUER: http://51.91.253.243/oauth2
SOCIAL_AUTH_EDX_OIDC_KEY: 3d7050fb2085a2c2a325
SOCIAL_AUTH_EDX_OIDC_LOGOUT_URL: http://51.91.253.243/logout
SOCIAL_AUTH_EDX_OIDC_SECRET: 92fb605d041bfaa8e8f69ccb4abfb620e3f7c35a
SOCIAL_AUTH_EDX_OIDC_URL_ROOT: http://51.91.253.243/oauth2
SOCIAL_AUTH_REDIRECT_IS_HTTPS: false

Here is the important part from my lms.env.json:

"JWT_EXPIRATION": 30, 
"JWT_ISSUER": "http://51.91.253.243/oauth2", 
"JWT_PRIVATE_SIGNING_KEY": null, 
"LANGUAGE_CODE": "en", 
"LANGUAGE_COOKIE": "openedx-language-preference", 
"LMS_BASE": "51.91.253.243", 
"LMS_INTERNAL_ROOT_URL": "http://51.91.253.243", 
"LMS_ROOT_URL": "http://51.91.253.243", 
…
…
“OAUTH_DELETE_EXPIRED”: true,
“OAUTH_ENFORCE_SECURE”: false,
“OAUTH_EXPIRE_CONFIDENTIAL_CLIENT_DAYS”: 365,
“OAUTH_EXPIRE_PUBLIC_CLIENT_DAYS”: 30,
“OAUTH_OIDC_ISSUER”: “http://51.91.253.243/oauth2”,

jill · February 20, 2020, 12:21am

@ettayeb_mohamed Hey, looks like you sorted it out? What was the fix?

I was able to register a new account on your LMS, and was able to authenticate. Getting a 403 on the Insights home page, but that’s usual (unfortunately) if the pipeline tasks haven’t run yet.

ettayeb_mohamed · February 20, 2020, 2:12pm

Hello @jill,
I think that the problem was with the insights version… it’s something like that the lms working with oidc and the insights with oauth2 ( /complete/edx-oidc VS /complete/edx-oauth2/ )
The solution was to add some variable to the ansible-playbook command this way:
ansible-playbook -i localhost, -c local analytics_single.yml --extra-vars "INSIGHTS_LMS_BASE=<LMS DOMAIN> INSIGHTS_VERSION=open-release/ironwood.master ANALYTICS_API_VERSION=open-release/ironwood.master"
I think that it was installing another version of the insights that’s it…
Right now all is good, i even solved all the hadoop problems etc… but i cannot run the pipeline tasks i don’t know why

(pipeline) root@vps759767:~/edx-analytics-pipeline# remote-task --host localhost --user root --remote-name analyticstack --skip-setup --wait ImportEnrollmentsIntoMysql --interval 2016 --local-scheduler
Parsed arguments = Namespace(branch=‘release’, extra_repo=None, host=‘localhost’, job_flow_id=None, job_flow_name=None, launch_task_arguments=[‘ImportEnrollmentsIntoMysql’, ‘–interval’, ‘2016’, ‘–local-scheduler’], log_path=None, override_config=None, package=None, private_key=None, python_version=None, remote_name=‘analyticstack’, repo=None, secure_config=None, secure_config_branch=None, secure_config_repo=None, shell=None, skip_setup=True, sudo_user=‘hadoop’, user=‘root’, vagrant_path=None, verbose=False, virtualenv_extra_args=None, wait=True, wheel_url=None, workflow_profiler=None)
Running commands from path = /root/pipeline/share/edx.analytics.tasks
Remote name = analyticstack
Running command = [‘ssh’, ‘-tt’, ‘-o’, ‘ForwardAgent=yes’, ‘-o’, ‘StrictHostKeyChecking=no’, ‘-o’, ‘UserKnownHostsFile=/dev/null’, ‘-o’, ‘KbdInteractiveAuthentication=no’, ‘-o’, ‘PasswordAuthentication=no’, ‘-o’, ‘User=root’, ‘-o’, ‘ConnectTimeout=10’, ‘localhost’, “sudo -Hu hadoop /bin/bash -c ‘cd /var/lib/analytics-tasks/analyticstack/repo && . $HOME/.bashrc && . /var/lib/analytics-tasks/analyticstack/venv/bin/activate && launch-task ImportEnrollmentsIntoMysql --interval 2016 --local-scheduler’”]
Warning: Permanently added ‘localhost’ (ECDSA) to the list of known hosts.
/bin/bash: line 0: cd: /var/lib/analytics-tasks/analyticstack/repo: No such file or directory
Connection to localhost closed.
Exiting with status = 1

ettayeb_mohamed · February 24, 2020, 3:41pm

I am having this error when i run the tasks:

jill · February 27, 2020, 3:56am

There’s an error in your screenshot that could be the culprit:

Required argument: -input

Are there any tracking logs under hdfs://localhost:9000/data/ that match the configured pattern .*tracking.log.*?

ettayeb_mohamed · February 27, 2020, 12:04pm

I have a file tracking.log under /edx/var/log/tracking that’s it!
this hdfs://localhost:9000/data/ should be pointed there i think?

jill · February 27, 2020, 10:58pm

The pipeline task want to read the tracking logs from hdfs (or s3, when configured to read from there), so you should sync your tracking logs to that hdfs store periodically.

The analytics devstack does this with a cron job, see analytics_pipeline playbook.

ettayeb_mohamed · February 28, 2020, 11:45am

Actually that was the main problem!!! there was nothing on that hdfs store!
When i ran that playbook i didn’t receive any error so i though that all is good
I reran that manually and i restarted the taks and everything works well then i run the sync db command and finally everything is working well and the dashboard is there!!!
Big thanks for you @jill!!!

ettayeb_mohamed · February 28, 2020, 11:46am

I will prepare a full guide on the next days and share it with you.

jill · February 28, 2020, 11:33pm

Thank you for your persistence @ettayeb_mohamed! So pleased you got it working.

Topic		Replies	Views
Analytics Installation Site Operations Help	32	1947	January 20, 2021
Installing insights on Ubuntu 16.4 Site Operators	5	1054	April 14, 2021
Adding analytics insight on native hawthorn fails with recipe for target 'migrate' failed Site Operations Help	12	1056	June 20, 2020
OpenEdx insights Site Operations Help how-to	3	773	January 16, 2020
Installation of Insights for LMS on Hawthorn Site Operations Help	2	404	July 9, 2020

Installing insights

Related topics