I put my tracking logs from 2018-11-10 to 2019-11-22 to hadoop (http://localhost:9000/data/) then I ran ImportEnrollmentsIntoMysql
like this:
$ launch-task ImportEnrollmentsIntoMysql --local-scheduler --interval 2015-01-01-2019-11-22
My override.cfg:
[hive]
release = apache
database = default
warehouse_path = hdfs://localhost:9000/edx-analytics-pipeline/warehouse/
[map-reduce]
engine = hadoop
remote_log_level = INFO
[database-export]
database = reports
credentials = /edx/etc/edx-analytics-pipeline/output.json
[database-import]
database = edxapp
credentials = /edx/etc/edx-analytics-pipeline/input.json
destination = hdfs://localhost:9000/edx-analytics-pipeline/warehouse/
[map-reduce]
engine = hadoop
marker = hdfs://localhost:9000/edx-analytics-pipeline/marker/
[event-logs]
pattern = [".*tracking.log.*"]
expand_interval = 2 days
source = ["hdfs://localhost:9000/data/"]
[manifest]
threshold = 500
input_format = org.edx.hadoop.input.ManifestTextInputFormat
lib_jar = hdfs://localhost:9000/edx-analytics-pipeline/packages/edx-analytics-hadoop-util.jar
path = hdfs://localhost:9000/edx-analytics-pipeline/manifest/
[enrollments]
interval_start = 2015-01-01
interval_end = 2019-11-23
source = ["hdfs://localhost:9000/data"]
pattern = [".*"]
warehouse_path = hdfs://localhost:9000/edx-analytics-pipeline/warehouse/
overwrite_n_days = 0
n_reduce_tasks = 8
As you can see it ran successfully:
===== Luigi Execution Summary =====
Scheduled 27 tasks of which:
* 13 present dependencies were encountered:
- 1 CourseEnrollmentPartitionTask(...)
- 1 CourseEnrollmentSummaryPartitionTask(...)
- 1 CourseGradeByModePartitionTask(warehouse_path=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, date=2019-12-08)
- 1 CourseMetaSummaryEnrollmentPartitionTask(warehouse_path=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, date=2019-11-22)
- 1 CourseTableTask(warehouse_path=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/)
...
* 14 ran successfully:
- 1 CourseGradeByModeDataTask(...)
- 1 CourseMetaSummaryEnrollmentDataTask(...)
- 1 CourseMetaSummaryEnrollmentIntoMysql(...)
- 1 EnrollmentByBirthYearDataTask(...)
- 1 EnrollmentByBirthYearToMysqlTask(...)
...
This progress looks :) because there were no failed tasks or missing external dependencies
but there are no data inserted in mysql database:
mysql> select * from course_enrollment_daily;
Empty set (0.00 sec)
Can anyone help?