How to reduce unnecessary high memory usage in a Databricks cluster?

Senad Hadzikic 20 Reputation points
2024-05-08T08:58:46.4433333+00:00

We are having unnecessary high memory usage even when nothing is running on the cluster. When the cluster first starts, it's fine, but when I run a script and it finishes executing, nothing gets back to the idle (initial) state (even hours after nothing else was executed).

Screenshot 2024-05-08 at 10.53.08

Cluster config:
Screenshot 2024-05-08 at 10.56.09

Some settings i tried:
Screenshot 2024-05-08 at 10.56.41

Spark Config:
spark.executor.extraJavaOptions -XX:+UseG1GC -XX:MaxGCPauseMillis=500 -XX:ParallelGCThreads=20 -XX:ConcGCThreads=5 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:G1HeapRegionSize=8M spark.memory.storageFraction 0.5 spark.dynamicAllocation.maxExecutors 10 spark.driver.extraJavaOptions -XX:+UseG1GC -XX:MaxGCPauseMillis=500 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=10M -Xloggc:/databricks/driver/logs/gc.log -XX:G1HeapRegionSize=8M -XX:+ExplicitGCInvokesConcurrent spark.dynamicAllocation.enabled true spark.memory.fraction 0.6 spark.dynamicAllocation.minExecutors 1

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,232 questions
Azure Startups
Azure Startups
Azure: A cloud computing platform and infrastructure for building, deploying and managing applications and services through a worldwide network of Microsoft-managed datacenters.Startups: Companies that are in their initial stages of business and typically developing a business model and seeking financing.
404 questions
{count} votes

6 answers

Sort by: Most helpful
  1. Ben Gislason 5 Reputation points
    2024-05-24T12:27:24.49+00:00

    I have a similar problem and it does not seem like the root question at hand is getting answered. Why is there so much memory in the first place? I am experiencing exact same situation with a very simple join operation on a 3x9 pyspark dataframe. As soon as I click run at all, same memory consumption as shown above is seen in mine. Where is all this memory usage coming from? My same code ran fine a week ago.

    1 person found this answer helpful.
    0 comments No comments

  2. Luigi Greselin 5 Reputation points
    2024-06-18T07:47:33.34+00:00

    I am experiencing the same problem. I even switched from a 28 Gb Memory cluster to a 56 Gb. It gets completely full after the first run

    1 person found this answer helpful.
    0 comments No comments

  3. PRADEEPCHEEKATLA 90,241 Reputation points
    2024-05-10T03:29:28.78+00:00

    @Senad Hadzikic - If you want to release the cached memory in your Databricks cluster without restarting the cluster itself, you can try the following steps:

    • Use the spark.catalog.clearCache() method to clear the cached data in Spark. This method removes all cached data from memory and disk. You can run this method in a notebook cell to clear the cached data.
    • Use the dbutils.fs.unmount() method to unmount any mounted file systems. Mounted file systems can consume memory, so unmounting them can help free up memory. You can run this method in a notebook cell to unmount any mounted file systems.
    • Use the sync command to flush the file system buffers and free up memory. You can run this command in a notebook cell to flush the file system buffers.
    • Use the echo /proc/sys/vm/drop_caches command to drop the page cache, dentries, and inodes. This command can help free up memory that is being used by the operating system cache. However, this command requires root access, so you might need to contact your Databricks administrator to run this command.
    • Consider using a different type of Databricks cluster. For example, you might try using a different instance type or a different number of nodes to see if this improves memory usage.

    Note that these steps might not free up all of the memory that is being used by your Databricks cluster, but they can help free up some memory. If you are still experiencing high memory usage after trying these steps, you might need to consider opening a support ticket for further assistance.

    Hope this helps. Do let us know if you any further queries.


  4. Alex 0 Reputation points
    2024-07-08T00:53:04.0066667+00:00

    Same problem here, I did the same as the user in the last comment, increased the cluster memory. But it still shows almost 100% memory utilization long after the script in a Notebook has completed. Ironically, clearing the cache does not help. Only way to solve it is to restart the cluster.

    User's image

    0 comments No comments

  5. Alex 0 Reputation points
    2024-08-05T19:59:13.4333333+00:00

    Hi @PRADEEPCHEEKATLA , anybody at Microsoft looking into this? If not, can you point us to the right Databricks forum/contact. It's been months without a solution.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.