Thank you for sharing the detailed logs and highlighting the reference to HBASE-28354. Based on your environment and the stack trace, the issue you're encountering, a NullPointerException
during input split generation from an HBase table, is indeed consistent with this known upstream bug.
This typically occurs when the RegionSizeCalculator
tries to retrieve region servers from HBase, but receives a null
value, often due to transient service availability or misconfiguration.
Recommended actions & workarounds
Here are a few steps and mitigations you can try in your HDInsight 5.1 environment:
Check HBase Region Server Health - Ensure that all HBase region servers are active and healthy. Use the HBase Master UI to check for any unassigned regions or inactive servers. You can also run hbase hbck
to verify the overall consistency of your HBase setup.
Validate Configuration Parameters - Please confirm that your Hive and HBase configuration settings are correct, especially. hive.zookeeper.quorum
, hive.metastore.uris
. Incorrect or inconsistent values here can lead to connection issues that manifest in Spark jobs via HWC.
Retry or Delay Job Execution - This error can sometimes result from race conditions during cluster startup. If possible. Retry the job after a short delay. Add a health check step to ensure all HBase components are ready before the Spark job begins
Avoid RegionSize-Based Splits (if possible) - If your workload doesn’t require input split optimization based on region size, consider using a simpler input format or adjusting your Spark logic to bypass this logic.
Version Compatibility Check You’re using -
- Spark 3.3.1.5.1.7.7
- Hive Warehouse Connector 2.1.0.5.1.7.7
- HDInsight 5.1
These versions are compatible, but we recommend staying updated with the HDInsight component versioning page in case newer images are released with important fixes.
Fix availability in HDInsight
As of now, the fix for HBASE-28354 has not yet been incorporated into any public HDInsight images, including version 5.1. Microsoft has not yet published an official release timeline for when this patch will be included.
I hope this information helps. Please do let us know if you have any further queries.
Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.
Thank you.