Error When Running Remote Task For Insights (Ubuntu 16.04)

Following the guide here https://openedx.atlassian.net/wiki/spaces/OpenOPS/pages/43385371/edX+Analytics+Installation

and running the following remote task:

remote-task --host localhost --repo https://github.com/edx/edx-analytics-pipeline --user ubuntu --override-config $HOME/edx-analytics-pipeline/config/devstack.cfg --wheel-url http://edx-wheelhouse.s3-website-us-east-1.amazonaws.com/Ubuntu/precise --remote-name analyticstack --wait TotalEventsDailyTask --interval 2015 --output-root hdfs://localhost:9000/output/ --local-scheduler

I got it to work before on a previous instance of EC2, but had to terminate the server due to messing some stuff up. The following error is now appearing on my new EC2 instance install:

2021-04-15 17:56:27,750 ERROR 5809 [luigi-interface] worker.py:213 - [pid 5809] Worker Worker(salt=227197670, workers=1, host=ec2-ip-address-here, username=hadoop, pid=5809, sudo_user=ubuntu) failed    TotalEventsDailyTask(source=["hdfs://localhost:9000/data/"], interval=2015, expand_interval=0 w 2 d 0 h 0 m 0 s, pattern=[".*tracking.log.*"], date_pattern=%Y%m%d, output_root=hdfs://localhost:9000/output/)
Traceback (most recent call last):
  File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/worker.py", line 194, in run
    new_deps = self._run_get_new_deps()
  File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/worker.py", line 131, in _run_get_new_deps
    task_gen = self.task.run()
  File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/contrib/hadoop.py", line 781, in run
    self.job_runner().run_job(self)
  File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/contrib/hadoop.py", line 625, in run_job
    luigi.contrib.hdfs.HdfsTarget(output_hadoop).move_dir(output_final)
  File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/contrib/hdfs/target.py", line 150, in move_dir
    self.fs.rename_dont_move(self.path, path)
  File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/contrib/hdfs/abstract_client.py", line 55, in rename_dont_move
    return super(HdfsFileSystem, self).rename_dont_move(path, dest)
  File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/target.py", line 174, in rename_dont_move
    self.move(path, dest)
  File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/contrib/hdfs/hadoopcli_clients.py", line 93, in move
    self.mkdir(parent_dir)
  File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/contrib/hdfs/hadoopcli_clients.py", line 166, in mkdir
    self.call_check(cmd)
  File "/var/lib/analytics-tasks/analyticstack/venv/src/luigi/luigi/contrib/hdfs/hadoopcli_clients.py", line 68, in call_check
    raise hdfs_error.HDFSCliError(command, p.returncode, stdout, stderr)
HDFSCliError: Command ['/edx/app/hadoop/hadoop/bin/hadoop', 'fs', '-mkdir', '-p', 'hdfs://localhost:9000'] failed [exit code 255]
---stdout---

---stderr---
-mkdir: Fatal internal error
java.lang.NullPointerException
        at org.apache.hadoop.fs.FileSystem.fixRelativePart(FileSystem.java:2207)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1300)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
        at org.apache.hadoop.fs.shell.Mkdir.processNonexistentPath(Mkdir.java:73)
        at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:273)
        at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
        at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
        at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
------------
2021-04-15 17:56:27,761 INFO 5809 [luigi-interface] worker.py:501 - Informed scheduler that task   TotalEventsDailyTask__Y_m_d_0_w_2_d_0_h_0_m__2015_f795eaa25e   has status   FAILED
2021-04-15 17:56:27,810 INFO 5809 [luigi-interface] worker.py:401 - Worker Worker(salt=227197670, workers=1, host=ip-172-31-68-208, username=hadoop, pid=5809, sudo_user=ubuntu) was stopped. Shutting down Keep-Alive thread
2021-04-15 17:56:27,812 INFO 5809 [luigi-interface] interface.py:208 -
===== Luigi Execution Summary =====

Scheduled 2 tasks of which:
* 1 present dependencies were encountered:
    - 1 PathSelectionByDateIntervalTask(source=["hdfs://localhost:9000/data/"], interval=2015, expand_interval=0 w 2 d 0 h 0 m 0 s, pattern=[".*tracking.log.*"], date_pattern=%Y%m%d)
* 1 failed:
    - 1 TotalEventsDailyTask(...)

This progress looks :( because there were failed tasks

===== Luigi Execution Summary =====

Connection to localhost closed.
Exiting with status = 30

hi @c3ho. I got same error. Have you fixed it yet? And how did you fix it?

Try this, “hadoop namenode -format”