An Azure managed cluster service for open-source analytics.
Thank you for contacting to Microsoft QA, below are the few detailed steps to mitigate the reported steps -
You are exploring how to enable HBase Accelerated Writes in Azure HDInsight and have questions about durability, performance trade-offs, and recovery. Let’s walk through each point:
What happens if the local disk copy fails before syncing to ADLS Gen2?
If the local Write Ahead Log (WAL) gets corrupted or fails before syncing to ADLS Gen2, that data could be lost—unless it’s replicated. HDInsight helps reduce this risk by keeping three copies of the WAL on different managed disks. So, unless all three copies fail at the same time, the chance of losing data is very low.
Is there a possibility of data loss?
Yes, but it’s rare. Data loss would only occur if all three WAL copies fail simultaneously, which is unlikely because HDInsight automatically replicates WAL data across multiple disks.
Does HDInsight rely only on local WAL?
No. It adds redundancy by replicating the WAL across three disks connected to different VMs, which improves durability.
How does recovery work?
If there’s a failure, HBase uses the WAL to replay recent updates that weren’t flushed to disk. Thanks to replication, the system can recover using these WAL copies.
How does write-latency improvement compare to durability risk?
Accelerated Writes can make writes 4–10 times faster by using premium SSD-managed disks instead of writing directly to cloud storage. This is great for performance, but it does introduce some durability risk if all WAL copies fail. It’s best suited for write-heavy workloads where speed is critical.
What kind of performance gains should you expect?
Expect a big boost—writes can be 4–10 times faster, which is especially helpful for applications that handle a lot of writes.
Is Accelerated Writes recommended for all workloads?
Not necessarily. It’s most useful for write-intensive workloads. If your app is mostly read-heavy, you probably don’t need it.
Additional Tips:
- Make sure your cluster has at least three worker nodes for durability and recovery.
- Always flush and disable HBase tables before making changes to avoid losing WAL data.
References: