SAP Knowledge Base Article - Preview

3086497 - Solr collections have all replicas in state "DOWN"

Symptom

It was observed for Solr 8.4 installation having 4 Solr nodes and collections with 4 or more replicas.

Some collections were having all their replicas in state "DOWN" which makes it impossible to retrieve any data from a particular index.

With each restart of Solr pods the situation changes, but never gets back to normal. Some collections are available after the restart, while the others are going down.

Solr logs show that Zookeeper has no information about leader for a given collection - the Zookeeper data node is empty:

2021-01-14 14:08:29.352 ERROR (zkCallback-8-thread-1072) [c:master_electronics_Product_flip s:shard1 r:core_node10 x:master_electronics_Product_flip_shard1_replica_n9] o.a.s.c.SyncStrategy Sync Failed:java.lang.IndexOutOfBoundsException: Index -1 out of bounds for length 99

What is crucial:

  • Sync Failed:java.lang.IndexOutOfBoundsException: Index -1 out of bounds for length 99


Also, there are recurring entries saying:

2021-01-14 14:08:26.735 INFO  (zkCallback-8-thread-1072) [c:master_electronics_Product_flop s:shard1 r:core_node12 x:master_electronics_Product_flop_shard1_replica_n11] o.a.s.u.PeerSync PeerSync: core=master_electronics_Product_flop_shard1_replica_n11 url=http://solr-1.solr.default.svc.cluster.local:8983/solr  Received 0 versions from http://solr-3.solr.default.svc.cluster.local:8983/solr/master_electronics_Product_flop_shard1_replica_n15/ fingerprint:null
2021-01-14 14:08:26.735 INFO  (zkCallback-8-thread-1072) [c:master_electronics_Product_flop s:shard1 r:core_node12 x:master_electronics_Product_flop_shard1_replica_n11] o.a.s.u.PeerSync PeerSync: core=master_electronics_Product_flop_shard1_replica_n11 url=http://solr-1.solr.default.svc.cluster.local:8983/solr  Received 98 versions from http://solr-0.solr.default.svc.cluster.local:8983/solr/master_electronics_Product_flop_shard1_replica_n1/ fingerprint:null
2021-01-14 14:08:26.735 INFO  (zkCallback-8-thread-1072) [c:master_electronics_Product_flop s:shard1 r:core_node12 x:master_electronics_Product_flop_shard1_replica_n11] o.a.s.u.PeerSync PeerSync: core=master_electronics_Product_flop_shard1_replica_n11 url=http://solr-1.solr.default.svc.cluster.local:8983/solr  Our versions are too old. ourHighThreshold=1688866528692273152 otherLowThreshold=1688866843678212096 ourHighest=-1688866529678983168 otherHighest=-1688866846729568256
2021-01-14 14:08:26.735 INFO  (zkCallback-8-thread-1072) [c:master_electronics_Product_flop s:shard1 r:core_node12 x:master_electronics_Product_flop_shard1_replica_n11] o.a.s.u.PeerSync PeerSync: core=master_electronics_Product_flop_shard1_replica_n11 url=http://solr-1.solr.default.svc.cluster.local:8983/solr DONE. sync failed
2021-01-14 14:08:26.735 INFO  (zkCallback-8-thread-1072) [c:master_electronics_Product_flop s:shard1 r:core_node12 x:master_electronics_Product_flop_shard1_replica_n11] o.a.s.c.SyncStrategy Leader's attempt to sync with shard failed, moving to the next candidate
2021-01-14 14:08:26.738 INFO  (zkCallback-8-thread-1072) [c:master_electronics_Product_flop s:shard1 r:core_node12 x:master_electronics_Product_flop_shard1_replica_n11] o.a.s.c.ShardLeaderElectionContext There may be a better leader candidate than us - going back into recovery
2021-01-14 14:08:26.741 INFO  (zkCallback-8-thread-1072) [c:master_electronics_Product_flop s:shard1 r:core_node12 x:master_electronics_Product_flop_shard1_replica_n11] o.a.s.c.ShardLeaderElectionContextBase No version found for ephemeral leader parent node, won't remove previous leader registration.
2021-01-14 14:08:26.741 WARN  (updateExecutor-5-thread-93-processing-n:solr-1.solr.default.svc.cluster.local:8983_solr x:master_electronics_Product_flop_shard1_replica_n11 c:master_electronics_Product_flop s:shard1 r:core_node12) [c:master_electronics_Product_flop s:shard1 r:core_node12 x:master_electronics_Product_flop_shard1_replica_n11]


What is crucial:

  • sync failed
  • Leader's attempt to sync with shard failed, moving to the next candidate
  • There may be a better leader candidate than us - going back into recovery
  • No version found for ephemeral leader parent node, won't remove previous leader registration.


Those collections have no leader elected, hence they are marked as not available and have status "DOWN".


Read more...

Product

SAP Commerce 1905 ; SAP Commerce 2005

Keywords

KBA , CEC-COM-CPS-SER , Search , Problem

About this page

This is a preview of a SAP Knowledge Base Article. Click more to access the full version on SAP ONE Support launchpad (Login required).

Search for additional results

Visit SAP Support Portal's SAP Notes and KBA Search.