The ANALYZE TABLE command is applicable in Spark 3.5 and is used to collect statistics about a specific table or all tables in a specified schema. This command can help in generating optimal query plans by providing the necessary statistics to the query optimizer. However, the effectiveness of the command and the visibility of the statistics can depend on the specific configurations and the context in which it is used.
In your case, if you are not seeing the expected statistics after running ANALYZE TABLE, it could be due to limitations in how statistics are collected or exposed in your Spark environment. Specifically, the statistics collected may not be fully integrated with Adaptive Query Execution (AQE) if they are not properly maintained or updated.
To ensure that the statistics are exposed to AQE, it is recommended to run ANALYZE TABLE after any significant changes to the table, such as after a MERGE operation. Additionally, enabling predictive optimization for Unity Catalog managed tables can also help in automatically running ANALYZE and keeping statistics up to date, which is beneficial for AQE.
If you continue to experience issues with missing statistics, you may want to check the configurations related to statistics collection and AQE in your Spark setup, as well as ensure that the statistics are being computed correctly during the ANALYZE TABLE execution.
References: