Thanks for the assistance @LeelaRajeshSayana-MSFT .
When failures happen they there tend to be a number of failures consecutively. Seems to me like there's some kind of blip in the handling of the Direct Method requests at the Azure side. Maybe while a refresh of config or something happens.
My current theory is that this is a issue at the IoT Hub because I can't see any failures in my edge module. I see all the successful calls. Judging by the 404 'Not Found' response, the IoT Hub seems to think the module doesn't exist and is 'Not Found' for a short period. This response is coming from the IoT Hub. It isn't coming from my module because it isn't reaching my module.
It is always 404 348, but I think after investigation this is the number of bytes in the response and nothing more insightful.
To rule out connectivity issues between our Edge Device and the IoT Hub, I've also tested on a Azure hosted VM running as an edge device. We see exactly the same sporadic 'Not Found' failures in Direct Methods calls when the load test are pointed towards it. We send around 5 Direct Method calls a second.
I didn't set specific a timeout - but since I am getting a 404 I don't expect this to be related to timeouts. If it is then I should be getting a timeout response code, not a 404, right?
Sorry - I don't understand why you're asking if I am able to call the ping Direct Method. I am able to call my actual Direct Method on the vast majority of occasions. But if it does help, yes I can call the 'ping' direct method on the edgeAgent.
We do have a robust retry mechanism implemented, but I want to find out why this is happening at relatively low load. Our system needs to be able to potentially handle hundreds of Direct Methods a second. I need to be sure it can be handled by our module and also by the IoT Hub.
Here's the class I'm using to invoke the Direct Methods. As you can see it uses the azure-iot-hub
python module.
