Help running native install inside a github runner for CI

I’m trying to make a basic CI for the native installation steps. The installation is running fine until the script starts installing discovery. It fails when attempting to run make production-requirements. Any help would be appreciated, thanks!

fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["make", "production-requirements"], "delta": "0:01:03.643773", "end": "2021-04-26 14:14:49.603882", "msg": "non-zero return code", "rc": 2, "start": "2021-04-26 14:13:45.960109", 
"stderr": "WARNING: You are using pip version 20.0.2; however, version 21.1 is available.\n
You should consider upgrading via the '/edx/app/discovery/venvs/discovery/bin/python -m pip install --upgrade pip' command.\n
npm WARN optional SKIPPING OPTIONAL DEPENDENCY: fsevents@1.2.13 (node_modules/watchpack-chokidar2/node_modules/fsevents):\nnpm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for fsevents@1.2.13: wanted {\"os\":\"darwin\",\"arch\":\"any\"} (current: {\"os\":\"linux\",\"arch\":\"x64\"})\nnpm WARN optional SKIPPING OPTIONAL DEPENDENCY: fsevents@2.3.1 (node_modules/fsevents):\nnpm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for fsevents@2.3.1: wanted {\"os\":\"darwin\",\"arch\":\"any\"} (current: {\"os\":\"linux\",\"arch\":\"x64\"})\n\n\n
┌──────────────────────────────────────────────────────────┐\n
│                 npm update check failed                  │\n
│           Try running with sudo or get access            │\n
│           to the local update config store via           │\n
│ sudo chown -R $USER:$(id -gn $USER) /home/runner/.config │\n
└──────────────────────────────────────────────────────────┘\n

How to reproduce it:

  1. Add the action to any repository (see PR)
  2. Go to the actions tab, select the action and “Run Workflow” image

I also run /edx/bin/supervisorctl status at the end of the action and this is the output:

analytics_api                    RUNNING   pid 111475, uptime 0:18:17
cms                              RUNNING   pid 99139, uptime 0:31:41
ecommerce                        RUNNING   pid 107834, uptime 0:20:17
ecomworker                       RUNNING   pid 109626, uptime 0:19:35
edxapp_worker:cms_default_1      RUNNING   pid 87489, uptime 0:45:47
edxapp_worker:cms_high_1         RUNNING   pid 87493, uptime 0:45:46
edxapp_worker:lms_default_1      RUNNING   pid 87500, uptime 0:45:44
edxapp_worker:lms_high_1         RUNNING   pid 87506, uptime 0:45:43
edxapp_worker:lms_high_mem_1     RUNNING   pid 87512, uptime 0:45:42
forum                            RUNNING   pid 166742, uptime 0:01:39
insights                         RUNNING   pid 115660, uptime 0:13:01
lms                              RUNNING   pid 101683, uptime 0:30:24

Maybe as a workaround we could try by adding this in config.yml

edx_django_service_user: runner

or maybe, in order to keep the other services with the current config.

discovery_user: runner

Good shout! I’ll try it out.

Why is this is an issue in CI, but not for other users? If I understand correctly, the problem comes from the fact that we are trying to use an npm cache in /home/runner/.config, which is not writable by user discovery. Is my interpretation correct?

I’ve tried it but the git clone step fails.

TASK [git_clone : Checkout code over https] ************************************
failed: [localhost] (item={'PROTOCOL': 'https', 'DOMAIN': 'github.com', 'PATH': 'edx', 'REPO': 'course-discovery.git', 'VERSION': 'open-release/lilac.master', 'DESTINATION': '/edx/app/discovery/discovery', 'SSH_KEY': None}) 
=> {"ansible_loop_var": "item", "changed": false, "cmd": "/usr/bin/git clone --origin origin --depth 1 --branch open-release/lilac.master https://github.com/edx/course-discovery.git /edx/app/discovery/discovery", 
"item": {"DESTINATION": "/edx/app/discovery/discovery", "DOMAIN": "github.com", "PATH": "edx", "PROTOCOL": "https", "REPO": "course-discovery.git", "SSH_KEY": null, "VERSION": "open-release/lilac.master"}, 
"msg": "fatal: could not create work tree dir '/edx/app/discovery/discovery': Permission denied", "rc": 128, "stderr": "fatal: could not create work tree dir '/edx/app/discovery/discovery': Permission denied\n", "stderr_lines": ["fatal: could not create work tree dir '/edx/app/discovery/discovery': Permission denied"], "stdout": "", "stdout_lines": []}

That’s my understanding too. It’s this step and specifically the part where it’s using bower to install some packages.

The question boils down to: why is the npm cache for the discovery user located in /home/runner/.config? Is it because the $HOME variable is incorrectly defined at this point?

Ah I see, makes sense. I will look into why it would be different from a vanilla ubuntu install.