NullPointerException in RegionSizeCalculator when using Hive Warehouse Connector on HDInsight (5.1) / Azure Spark

ECSEKI Tibor 5 Reputation points
2025-06-23T13:43:11.2466667+00:00

Hello,

I'm encountering a NullPointerException when running a Spark job on Azure HDInsight (Spark with Hive Warehouse Connector). The job fails during input split generation from an HBase table. The relevant portion of the log is:

25/06/01 06:28:18 ERROR ApplicationMaster [Driver]: User class threw exception: java.lang.NullPointerException
java.lang.NullPointerException: null
	at org.apache.hadoop.hbase.mapreduce.RegionSizeCalculator.getRegionServersOfTable(RegionSizeCalculator.java:104)
	at org.apache.hadoop.hbase.mapreduce.RegionSizeCalculator.init(RegionSizeCalculator.java:79)
	at org.apache.hadoop.hbase.mapreduce.RegionSizeCalculator.<init>(RegionSizeCalculator.java:61)
	at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRegionSizeCalculator(TableInputFormatBase.java:605)
	at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.oneInputSplitPerRegion(TableInputFormatBase.java:292)
	at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:255)
	at org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:254)
	at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:160)

Environment details:

  • Cluster type: Azure HDInsight 5.1
  • Spark version: 3.3.1.5.1.7.7
  • Hive Warehouse Connector version: 2.1.0.5.1.7.7
  • Job type: Spark job accessing HBase via Hive Warehouse Connector

Upon investigation, this issue appears to be related to the known HBase bug HBASE-28354, where RegionSizeCalculator.getRegionServersOfTable() can throw a NullPointerException if no region servers are returned due to transient issues or misconfiguration.

Follow-up Questions:

  • Is there any known workaround for this in Azure HDInsight environments?
  • When is the fix for HBASE-28354 expected to be incorporated into Azure-managed HBase or HDInsight images?
  • Are there specific versions or image updates planned that already contain this patch?

Thank you in advance for any details or recommendations.

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
231 questions
{count} vote

1 answer

Sort by: Most helpful
  1. Chandra Boorla 14,585 Reputation points Microsoft External Staff Moderator
    2025-06-23T20:48:57.06+00:00

    @ECSEKI Tibor

    Thank you for sharing the detailed logs and highlighting the reference to HBASE-28354. Based on your environment and the stack trace, the issue you're encountering, a NullPointerException during input split generation from an HBase table, is indeed consistent with this known upstream bug.

    This typically occurs when the RegionSizeCalculator tries to retrieve region servers from HBase, but receives a null value, often due to transient service availability or misconfiguration.

    Recommended actions & workarounds

    Here are a few steps and mitigations you can try in your HDInsight 5.1 environment:

    Check HBase Region Server Health - Ensure that all HBase region servers are active and healthy. Use the HBase Master UI to check for any unassigned regions or inactive servers. You can also run hbase hbck to verify the overall consistency of your HBase setup.

    Validate Configuration Parameters - Please confirm that your Hive and HBase configuration settings are correct, especially. hive.zookeeper.quorum , hive.metastore.uris. Incorrect or inconsistent values here can lead to connection issues that manifest in Spark jobs via HWC.

    Retry or Delay Job Execution - This error can sometimes result from race conditions during cluster startup. If possible. Retry the job after a short delay. Add a health check step to ensure all HBase components are ready before the Spark job begins

    Avoid RegionSize-Based Splits (if possible) - If your workload doesn’t require input split optimization based on region size, consider using a simpler input format or adjusting your Spark logic to bypass this logic.

    Version Compatibility Check You’re using -

    • Spark 3.3.1.5.1.7.7
    • Hive Warehouse Connector 2.1.0.5.1.7.7
    • HDInsight 5.1

    These versions are compatible, but we recommend staying updated with the HDInsight component versioning page in case newer images are released with important fixes.

    Fix availability in HDInsight

    As of now, the fix for HBASE-28354 has not yet been incorporated into any public HDInsight images, including version 5.1. Microsoft has not yet published an official release timeline for when this patch will be included.

    I hope this information helps. Please do let us know if you have any further queries.

    Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

    Thank you.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.