Following the guide here https://openedx.atlassian.net/wiki/spaces/OpenOPS/pages/43385371/edX+Analytics+Installation
and running the following remote task:
remote-task --host localhost --repo https://github.com/edx/edx-analytics-pipeline --user ubuntu --override-config $HOME/edx-analytics-pipeline/config/devstack.cfg --wheel-url http://edx-wheelhouse.s3-website-us-east-1.amazonaws.com/Ubuntu/precise --remote-name analyticstack --wait TotalEventsDailyTask --interval 2015 --output-root hdfs://localhost:9000/output/ --local-scheduler
I got it to work before on a previous instance of EC2, but had to terminate the server due to messing some stuff up. The following error is now appearing on my new EC2 instance install:
2021-04-15 17:56:27,750 ERROR 5809 [luigi-interface] worker.py:213 - [pid 5809] Worker Worker(salt=227197670, workers=1, host=ec2-ip-address-here, username=hadoop, pid=5809, sudo_user=ubuntu) failed TotalEventsDailyTask(source=["hdfs://localhost:9000/data/"], interval=2015, expand_interval=0 w 2 d 0 h 0 m 0 s, pattern=[".*tracking.log.*"], date_pattern=%Y%m%d, output_root=hdfs://localhost:9000/output/)
Traceback (most recent call last):
File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/worker.py", line 194, in run
new_deps = self._run_get_new_deps()
File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/worker.py", line 131, in _run_get_new_deps
task_gen = self.task.run()
File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/contrib/hadoop.py", line 781, in run
self.job_runner().run_job(self)
File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/contrib/hadoop.py", line 625, in run_job
luigi.contrib.hdfs.HdfsTarget(output_hadoop).move_dir(output_final)
File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/contrib/hdfs/target.py", line 150, in move_dir
self.fs.rename_dont_move(self.path, path)
File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/contrib/hdfs/abstract_client.py", line 55, in rename_dont_move
return super(HdfsFileSystem, self).rename_dont_move(path, dest)
File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/target.py", line 174, in rename_dont_move
self.move(path, dest)
File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/contrib/hdfs/hadoopcli_clients.py", line 93, in move
self.mkdir(parent_dir)
File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/contrib/hdfs/hadoopcli_clients.py", line 166, in mkdir
self.call_check(cmd)
File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/contrib/hdfs/hadoopcli_clients.py", line 68, in call_check
raise hdfs_error.HDFSCliError(command, p.returncode, stdout, stderr)
HDFSCliError: Command ['/edx/app/hadoop/hadoop/bin/hadoop', 'fs', '-mkdir', '-p', 'hdfs://localhost:9000'] failed [exit code 255]
---stdout---
---stderr---
-mkdir: Fatal internal error
java.lang.NullPointerException
at org.apache.hadoop.fs.FileSystem.fixRelativePart(FileSystem.java:2207)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1300)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
at org.apache.hadoop.fs.shell.Mkdir.processNonexistentPath(Mkdir.java:73)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:273)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
------------
2021-04-15 17:56:27,761 INFO 5809 [luigi-interface] worker.py:501 - Informed scheduler that task TotalEventsDailyTask__Y_m_d_0_w_2_d_0_h_0_m__2015_f795eaa25e has status FAILED
2021-04-15 17:56:27,810 INFO 5809 [luigi-interface] worker.py:401 - Worker Worker(salt=227197670, workers=1, host=ip-172-31-68-208, username=hadoop, pid=5809, sudo_user=ubuntu) was stopped. Shutting down Keep-Alive thread
2021-04-15 17:56:27,812 INFO 5809 [luigi-interface] interface.py:208 -
===== Luigi Execution Summary =====
Scheduled 2 tasks of which:
* 1 present dependencies were encountered:
- 1 PathSelectionByDateIntervalTask(source=["hdfs://localhost:9000/data/"], interval=2015, expand_interval=0 w 2 d 0 h 0 m 0 s, pattern=[".*tracking.log.*"], date_pattern=%Y%m%d)
* 1 failed:
- 1 TotalEventsDailyTask(...)
This progress looks :( because there were failed tasks
===== Luigi Execution Summary =====
Connection to localhost closed.
Exiting with status = 30