Elasticsearch: maximum shards open

eazaika · May 23, 2022, 1:57pm

Tutor maple.1, almost empty installation, only 1 demo course instead
When use ‘tutor local quickstart’ gets error:

Creating index 'person_20220523_120624'
2022-05-23 12:06:24,175 WARNING 24 [elasticsearch] /openedx/venv/lib/python3.8/site-packages/elasticsearch/connection/base.py:293 - PUT http://elasticsearch:9200/person_20220523_120624 [status:400 request:0.012s]
2022-05-23 12:06:24,175 WARNING 24 [elasticsearch] /openedx/venv/lib/python3.8/site-packages/elasticsearch/connection/base.py:293 - PUT http://elasticsearch:9200/person_20220523_120624 [status:400 request:0.012s]
Traceback (most recent call last):
  File "./manage.py", line 15, in <module>
    execute_from_command_line(sys.argv)
  File "/openedx/venv/lib/python3.8/site-packages/django/core/management/__init__.py", line 419, in execute_from_command_line
    utility.execute()
  File "/openedx/venv/lib/python3.8/site-packages/django/core/management/__init__.py", line 413, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/openedx/venv/lib/python3.8/site-packages/django/core/management/base.py", line 354, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/openedx/venv/lib/python3.8/site-packages/django/core/management/base.py", line 398, in execute
    output = self.handle(*args, **options)
  File "/openedx/discovery/course_discovery/apps/edx_elasticsearch_dsl_extensions/management/commands/update_index.py", line 87, in handle
    self._update(models, options)
  File "/openedx/discovery/course_discovery/apps/edx_elasticsearch_dsl_extensions/management/commands/update_index.py", line 102, in _update
    alias, new_index_name = self.prepare_backend_index(index)
  File "/openedx/discovery/course_discovery/apps/edx_elasticsearch_dsl_extensions/management/commands/update_index.py", line 231, in prepare_backend_index
    registered_index.create(using=backend)
  File "/openedx/venv/lib/python3.8/site-packages/elasticsearch_dsl/index.py", line 279, in create
    return self._get_connection(using).indices.create(
  File "/openedx/venv/lib/python3.8/site-packages/elasticsearch/client/utils.py", line 168, in _wrapped
    return func(*args, params=params, headers=headers, **kwargs)
  File "/openedx/venv/lib/python3.8/site-packages/elasticsearch/client/indices.py", line 123, in create
    return self.transport.perform_request(
  File "/openedx/venv/lib/python3.8/site-packages/elasticsearch/transport.py", line 415, in perform_request
    raise e
  File "/openedx/venv/lib/python3.8/site-packages/elasticsearch/transport.py", line 381, in perform_request
    status, headers_response, data = connection.perform_request(
  File "/openedx/venv/lib/python3.8/site-packages/elasticsearch/connection/http_urllib3.py", line 277, in perform_request
    self._raise_error(response.status, raw_data)
  File "/openedx/venv/lib/python3.8/site-packages/elasticsearch/connection/base.py", line 330, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
elasticsearch.exceptions.RequestError: RequestError(400, 'validation_exception', 'Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open;')
ERROR: 1
Error: Command failed with status 1: docker-compose -f /home/eazaika/.local/share/tutor/env/local/docker-compose.yml -f /home/eazaika/.local/share/tutor/env/local/docker-compose.prod.yml --project-name tutor_local -f /home/eazaika/.local/share/tutor/env/local/docker-compose.jobs.yml run --rm discovery-job sh -e -c make migrate

# Development partners
./manage.py create_or_update_partner  \
  --site-id 1 \
  --site-domain discovery.domain:8381 \
  --code dev --name "Open edX - development" \
  --lms-url="http://lms:8000" \
  --studio-url="http://cms:8000" \
  --courses-api-url "http://domain:8000/api/courses/v1/" \
  --organizations-api-url "http://domain:8000/api/organizations/v1/"

# Production partner
./manage.py create_or_update_partner  \
  --site-id 2 \
  --site-domain discovery.domain \
  --code openedx --name "Open edX" \
  --lms-url="http://lms:8000" \
  --studio-url="http://cms:8000" \
  --courses-api-url "https://domain/api/courses/v1/" \
  --organizations-api-url "https://domain/api/organizations/v1/"

./manage.py refresh_course_metadata --partner_code=openedx
./manage.py update_index --disable-change-limit

Find that receipt to increasing shards:

curl -X PUT localhost:9200/_cluster/settings -H "Content-Type: application/json" -d '{ "persistent": { "cluster.max_shards_per_node": "3000" } }'

but what i need to know its why i gets this error
Besides it gets an error: Failed to connect to localhost port 9200 ERR_CONNECTION_REFUSED
Maybe it needs to set cluster.max_shards_per_node in elasticsearch.yml but how?

eazaika · June 1, 2022, 10:13am

Found plenty of indexes:

[elasticsearch@4274dbe558c9 ~]$ curl -s -XGET http://localhost:9200/_cat/indices
yellow open comments_20220512151748907        APwXnVx5QPyOs9wXUR-zDA 1 1 0 0   208b   208b
yellow open comment_threads_20220420110119812 uKVfT362QoWvNaBaehVm1Q 1 1 0 0   208b   208b
yellow open comment_threads_20220421140723216 EsfQXu8eTlGdzBS6iqfHOA 1 1 0 0   208b   208b
yellow open comments_20220518093920630        2jGwAbOqTm6tBHI5Zc726g 1 1 0 0   208b   208b
yellow open course_run_20220516_133057        1JkP7PQsSI6P2iCYiHvskQ 1 1 0 0   208b   208b
yellow open course_run_20220505_094828        uSuSqAARQ8SElHOeo3CRyw 1 1 0 0   208b   208b
yellow open comments_20220516125225845        LxLdChFyR5-BJ1S1yDsncg 1 1 0 0   208b   208b
yellow open comments_20220512110921356        Gl0s-T7TRCWzsGUUwBppqw 1 1 0 0   208b   208b
yellow open course_20220516_133057            kI83gde5SC2ZbAz3cRhPvA 1 1 1 0 14.4kb 14.4kb
...
yellow open course_run_20220420_113413        ZjxMtSfXQRi8elPyvIVMFg 1 1 0 0   208b   208b
yellow open course_run_20220511_112155        cxN3XB8bTuSz-zLNy0jlvw 1 1 0 0   208b   208b
yellow open course_20220512_114458            1-sFkSozRR2ObE-mePi7tg 1 1 1 0 14.4kb 14.4kb
yellow open comment_threads_20220505114159155 ncMxyeIqTwiTejhkgE8Z8g 1 1 0 0   208b   208b
yellow open course_20220420_113412            hhCp0kSXTx-SiDsTinU-rg 1 1 1 0 14.4kb 14.4kb
yellow open course_20220420_092450            XwEdHVm2S3CaY2r64M1zng 1 1 1 0 14.4kb 14.4kb
...
yellow open course_20220414_110636            CFBDHu2wSty-qQ1fNv-pZQ 1 1 1 0 14.4kb 14.4kb
yellow open person_20220505_133544            BZcTxXYDT5mSrjq8slh1yQ 1 1 0 0   208b   208b
yellow open course_20220414_131656            -f7S8i7DROCjTOHfVJ3MPg 1 1 1 0 14.4kb 14.4kb
yellow open comments_20220210110138624        9FYxtr2KSpedQYEpVhBBWA 1 1 0 0   208b   208b
...
yellow open program_20220518_093907           MplZ3lm2RBSIrsXUh9CofA 1 1 0 0   208b   208b
yellow open program_20220421_140709           qCcsU7jSTd6N7ViQjQHfLw 1 1 0 0   208b   208b
yellow open course_run_20220428_154302        KjUkUBRyR3is8-kiSgJgTQ 1 1 0 0   208b   208b
...
yellow open course_20220421_154534            JNVzpDH6Q_WKZ7m0F84ShA 1 1 1 0 14.4kb 14.4kb
yellow open program_20220512_132120           wYIYZUl0Qdm6RGGis2AhxQ 1 1 0 0   208b   208b
yellow open comments_20220516134927127        TX5iPZ85R1efxRxLkNTAOQ 1 1 0 0   208b   208b
yellow open course_run_20220428_124239        _SWpVWPiShCX656JKgJkng 1 1 0 0   208b   208b

And 500 active shards:

[elasticsearch@4274dbe558c9 ~]$ curl -XGET 'http://localhost:9200/_cluster/health'
{
"cluster_name":"openedx",
"status":"yellow",
"timed_out":false,
"number_of_nodes":1,
"number_of_data_nodes":1,
"active_primary_shards":500,
"active_shards":500,
"relocating_shards":0,
"initializing_shards":0,
"unassigned_shards":500,
"delayed_unassigned_shards":0,
"number_of_pending_tasks":0,
"number_of_in_flight_fetch":0,
"task_max_waiting_in_queue_millis":0,
"active_shards_percent_as_number":50.0
}

I have no idea why on 1-node cluster I get plenty of indexes and out of limit open shards

eazaika · July 5, 2022, 8:29am

So, status is yellow - that means that only replica’s data loss
We can look to all UNASSIGNED indicies and delete it, but its temporary decision
Some step of investigation:

curl -XGET localhost:9200/_cluster/allocation/explain?pretty
{
  "index" : "course_20220421_065830",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "CLUSTER_RECOVERED",
    "at" : "2022-07-05T07:29:12.391Z",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "Brs64_C8TwyvSBvm5Imyxw",
      "node_name" : "718d083f80fb",
      "transport_address" : "local ip:9300",
      "node_attributes" : {
        "ml.machine_memory" : "8316706816",
        "xpack.installed" : "true",
        "transform.node" : "true",
        "ml.max_open_jobs" : "20"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to this node [[course_20220421_065830][0], node[Brs64_C8TwyvSBvm5Imyxw], [P], s[STARTED], a[id=gofKLYQRSJSHCKBzYNiFgg]]"
        }
      ]
    }
  ]
}

It’s a single-stack node, so we need to reduce the replica shards to 0 (+ possible data-loss +performance issues, but makes cluster health’s green):

curl -s -XPUT 'http://localhost:9200/_settings' -H 'Content-Type: application/json' -d '{"index":{"number_of_replicas":0}}'

Gotcha!

curl http://localhost:9200/_cluster/health?pretty
{
  "cluster_name" : "openedx",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 500,
  "active_shards" : 500,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Besides, to be clean, we can rebuild indices (if its no matter to any data):

tutor local run forum bash
bin/rake search:rebuild_indices

system · October 3, 2022, 8:29am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tutor initialization error on "bundle exec rake search:rebuild_indices" Site Operations Help tutor	8	968	October 9, 2022
Tutor: Upgrade to Nutmeg - Discovery plugin init finishes with error Site Operations Help tutor	15	1024	September 25, 2022
Open edX elasticsearch.exceptions.RequestError: RequestError(400, Tutor Help how-to , tutor	1	395	July 23, 2024
An error occurred when searching for “Zen and the Art of Motorcycle Maintenance” Site Operations Help	12	446	September 22, 2022
Elastic search configuration issue 'https://****.privatelink.centralindia.azure.elastic-cloud.com' ([Errno -2] Name or service not known) Tutor Help tutor	0	20	December 2, 2024

Elasticsearch: maximum shards open

Related topics