Примечание
Для доступа к этой странице требуется авторизация. Вы можете попробовать войти или изменить каталоги.
Для доступа к этой странице требуется авторизация. Вы можете попробовать изменить каталоги.
This document provides an example of using Azure PowerShell to run Apache Hive queries in an Apache Hadoop on HDInsight cluster.
Примечание.
This document does not provide a detailed description of what the HiveQL statements that are used in the examples do. For information on the HiveQL that is used in this example, see Use Apache Hive with Apache Hadoop on HDInsight.
Предпосылки
Кластер Apache Hadoop в HDInsight. См. Начало работы с HDInsight на Linux.
Модуль Az для PowerShell установлен.
Выполнение запроса Hive
Azure PowerShell provides cmdlets that allow you to remotely run Hive queries on HDInsight. Internally, the cmdlets make REST calls to WebHCat on the HDInsight cluster.
The following cmdlets are used when running Hive queries in a remote HDInsight cluster:
-
Connect-AzAccount
: Authenticates Azure PowerShell to your Azure subscription. -
New-AzHDInsightHiveJobDefinition
: Creates a job definition by using the specified HiveQL statements. -
Start-AzHDInsightJob
: Sends the job definition to HDInsight and starts the job. Будет возвращен объект задания. -
Wait-AzHDInsightJob
: Uses the job object to check the status of the job. It waits until the job completes or the wait time is exceeded. -
Get-AzHDInsightJobOutput
: Used to retrieve the output of the job. -
Invoke-AzHDInsightHiveJob
: Used to run HiveQL statements. This cmdlet blocks the query completes, then returns the results. -
Use-AzHDInsightCluster
: Sets the current cluster to use for theInvoke-AzHDInsightHiveJob
command.
The following steps demonstrate how to use these cmdlets to run a job in your HDInsight cluster:
Using an editor, save the following code as
hivejob.ps1
.# Login to your Azure subscription $context = Get-AzContext if ($context -eq $null) { Connect-AzAccount } $context #Get cluster info $clusterName = Read-Host -Prompt "Enter the HDInsight cluster name" $creds=Get-Credential -Message "Enter the login for the cluster" #HiveQL #Note: set hive.execution.engine=tez; is not required for # Linux-based HDInsight $queryString = "set hive.execution.engine=tez;" + "DROP TABLE log4jLogs;" + "CREATE EXTERNAL TABLE log4jLogs(t1 string, t2 string, t3 string, t4 string, t5 string, t6 string, t7 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE LOCATION 'wasbs:///example/data/';" + "SELECT * FROM log4jLogs WHERE t4 = '[ERROR]';" #Create an HDInsight Hive job definition $hiveJobDefinition = New-AzHDInsightHiveJobDefinition -Query $queryString #Submit the job to the cluster Write-Host "Start the Hive job..." -ForegroundColor Green $hiveJob = Start-AzHDInsightJob -ClusterName $clusterName -JobDefinition $hiveJobDefinition -ClusterCredential $creds #Wait for the Hive job to complete Write-Host "Wait for the job to complete..." -ForegroundColor Green Wait-AzHDInsightJob -ClusterName $clusterName -JobId $hiveJob.JobId -ClusterCredential $creds # Print the output Write-Host "Display the standard output..." -ForegroundColor Green Get-AzHDInsightJobOutput ` -Clustername $clusterName ` -JobId $hiveJob.JobId ` -HttpCredential $creds
Откройте новую командную строку Azure PowerShell . Change directories to the location of the
hivejob.ps1
file, then use the following command to run the script:.\hivejob.ps1
When the script runs, you're prompted to enter the cluster name and the HTTPS/Cluster Admin account credentials. You may also be prompted to sign in to your Azure subscription.
When the job completes, it returns information similar to the following text:
Display the standard output... 2012-02-03 18:35:34 SampleClass0 [ERROR] incorrect id 2012-02-03 18:55:54 SampleClass1 [ERROR] incorrect id 2012-02-03 19:25:27 SampleClass4 [ERROR] incorrect id
As mentioned earlier,
Invoke-Hive
can be used to run a query and wait for the response. Use the following script to see how Invoke-Hive works:# Login to your Azure subscription $context = Get-AzContext if ($context -eq $null) { Connect-AzAccount } $context #Get cluster info $clusterName = Read-Host -Prompt "Enter the HDInsight cluster name" $creds=Get-Credential -Message "Enter the login for the cluster" # Set the cluster to use Use-AzHDInsightCluster -ClusterName $clusterName -HttpCredential $creds $queryString = "set hive.execution.engine=tez;" + "DROP TABLE log4jLogs;" + "CREATE EXTERNAL TABLE log4jLogs(t1 string, t2 string, t3 string, t4 string, t5 string, t6 string, t7 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE LOCATION '/example/data/';" + "SELECT * FROM log4jLogs WHERE t4 = '[ERROR]';" Invoke-AzHDInsightHiveJob ` -StatusFolder "statusout" ` -Query $queryString
Выходные данные выглядят так:
2012-02-03 18:35:34 SampleClass0 [ERROR] incorrect id 2012-02-03 18:55:54 SampleClass1 [ERROR] incorrect id 2012-02-03 19:25:27 SampleClass4 [ERROR] incorrect id
Примечание.
For longer HiveQL queries, you can use the Azure PowerShell Here-Strings cmdlet or HiveQL script files. The following snippet shows how to use the
Invoke-Hive
cmdlet to run a HiveQL script file. The HiveQL script file must be uploaded to wasbs://.Invoke-AzHDInsightHiveJob -File "wasbs://<ContainerName>@<StorageAccountName>/<Path>/query.hql"
For more information about Here-Strings, see HERE-STRINGS.
Устранение неполадок
If no information is returned when the job completes, view the error logs. To view error information for this job, add the following to the end of the hivejob.ps1
file, save it, and then run it again.
# Print the output of the Hive job.
Get-AzHDInsightJobOutput `
-Clustername $clusterName `
-JobId $job.JobId `
-HttpCredential $creds `
-DisplayOutputType StandardError
This cmdlet returns the information that is written to STDERR during job processing.
Сводка
As you can see, Azure PowerShell provides an easy way to run Hive queries in an HDInsight cluster, monitor the job status, and retrieve the output.
Дальнейшие действия
For general information about Hive in HDInsight:
Дополнительная информация о других способах работы с Hadoop в HDInsight: