Send Azure Databricks application logs to Azure Monitor
Note
This article relies on an open source library hosted on GitHub at: https://github.com/mspnp/spark-monitoring.
The original library supports Azure Databricks Runtimes 10.x (Spark 3.2.x) and earlier.
Databricks has contributed an updated version to support Azure Databricks Runtimes 11.0 (Spark 3.3.x) and above on the l4jv2
branch at: https://github.com/mspnp/spark-monitoring/tree/l4jv2.
Please note that the 11.0 release is not backward compatible due to the different logging systems used in the Databricks Runtimes. Be sure to use the correct build for your Databricks Runtime. The library and GitHub repository are in maintenance mode. There are no plans for further releases, and issue support will be best-effort only. For any additional questions regarding the library or the roadmap for monitoring and logging of your Azure Databricks environments, please contact [email protected].
This article shows how to send application logs and metrics from Azure Databricks to a Log Analytics workspace. It uses the Azure Databricks Monitoring Library, which is available on GitHub.
Prerequisites
Configure your Azure Databricks cluster to use the monitoring library, as described in the GitHub readme.
Note
The monitoring library streams Apache Spark level events and Spark Structured Streaming metrics from your jobs to Azure Monitor. You don't need to make any changes to your application code for these events and metrics.
Send application metrics using Dropwizard
Spark uses a configurable metrics system based on the Dropwizard Metrics Library. For more information, see Metrics in the Spark documentation.
To send application metrics from Azure Databricks application code to Azure Monitor, follow these steps:
Build the spark-listeners-loganalytics-1.0-SNAPSHOT.jar JAR file as described in the GitHub readme.
Create Dropwizard gauges or counters in your application code. You can use the
UserMetricsSystem
class defined in the monitoring library. The following example creates a counter namedcounter1
.import org.apache.spark.metrics.UserMetricsSystems import org.apache.spark.sql.SparkSession object StreamingQueryListenerSampleJob { private final val METRICS_NAMESPACE = "samplejob" private final val COUNTER_NAME = "counter1" def main(args: Array[String]): Unit = { val spark = SparkSession .builder .getOrCreate val driverMetricsSystem = UserMetricsSystems .getMetricSystem(METRICS_NAMESPACE, builder => { builder.registerCounter(COUNTER_NAME) }) driverMetricsSystem.counter(COUNTER_NAME).inc(5) } }
The monitoring library includes a sample application that demonstrates how to use the
UserMetricsSystem
class.
Send application logs using Log4j
To send your Azure Databricks application logs to Azure Log Analytics using the Log4j appender in the library, follow these steps:
Build the spark-listeners-1.0-SNAPSHOT.jar and the spark-listeners-loganalytics-1.0-SNAPSHOT.jar JAR file as described in the GitHub readme.
Create a log4j.properties configuration file for your application. Include the following configuration properties. Substitute your application package name and log level where indicated:
log4j.appender.A1=com.microsoft.pnp.logging.loganalytics.LogAnalyticsAppender log4j.appender.A1.layout=com.microsoft.pnp.logging.JSONLayout log4j.appender.A1.layout.LocationInfo=false log4j.additivity.<your application package name>=false log4j.logger.<your application package name>=<log level>, A1
You can find a sample configuration file here.
In your application code, include the spark-listeners-loganalytics project, and import
com.microsoft.pnp.logging.Log4jconfiguration
to your application code.import com.microsoft.pnp.logging.Log4jConfiguration
Configure Log4j using the log4j.properties file you created in step 3:
getClass.getResourceAsStream("<path to file in your JAR file>/log4j.properties") { stream => { Log4jConfiguration.configure(stream) } }
Add Apache Spark log messages at the appropriate level in your code as required. For example, use the
logDebug
method to send a debug log message. For more information, see Logging in the Spark documentation.logTrace("Trace message") logDebug("Debug message") logInfo("Info message") logWarning("Warning message") logError("Error message")
Note
If you're using the library and you have Apache Spark Notebooks, any logs that Spark generates during execution for the notebook automatically go to Log Analytics.
There is a limitation for Python to support custom logging messages using the Spark configured Log4j. Logs can only be sent from the driver node because executor nodes don't have access to the Java virtual machine from Python.
Run the sample application
The monitoring library includes a sample application that demonstrates how to send both application metrics and application logs to Azure Monitor. To run the sample:
Build the spark-jobs project in the monitoring library, as described in the GitHub readme.
Navigate to your Databricks workspace and create a new job, as described in Create and run Azure Databricks Jobs.
In the job detail page, select Set JAR.
Upload the JAR file from
/src/spark-jobs/target/spark-jobs-1.0-SNAPSHOT.jar
.For Main class, enter
com.microsoft.pnp.samplejob.StreamingQueryListenerSampleJob
.Select a cluster that is already configured to use the monitoring library. See Configure Azure Databricks to send metrics to Azure Monitor.
When the job runs, you can view the application logs and metrics in your Log Analytics workspace.
Application logs appear under SparkLoggingEvent_CL:
SparkLoggingEvent_CL | where logger_name_s contains "com.microsoft.pnp"
Application metrics appear under SparkMetric_CL:
SparkMetric_CL | where name_s contains "rowcounter" | limit 50
Important
After you verify the metrics appear, stop the sample application job.
Next steps
Deploy the performance monitoring dashboard that accompanies this code library to troubleshoot performance issues in your production Azure Databricks workloads.