Share via


Error in reading large data from RESt API

Question

Friday, May 10, 2019 6:55 AM

hi all

I am facing difficulty in reading large data from REST api.

in My C# code, I am using RESTClient to read from API

=============================================

RestClient client = new RestClient(Uri);
                RestRequest request = new RestRequest(Method.GET);
                request.AddHeader("Accept", "application/zip");
                request.AddHeader("authorization", $"apikey {apiKey}");

                IRestResponse response = await Task.Run(() => client.Execute(request));
                ResponseResult.RawBytes = response.RawBytes;
                ResponseResult.Data = ToReferenceData(new MemoryStream(response.RawBytes));

=================================================

REST api returns 175 MB zip file. So the above code always fails with OutOfMemoryException

Any idea how to get this working?

I have tried HTTPClient, as well as memoryStream but didnt work.

Please help if anybody knows how to get around this.

thanks,

Gopal

All replies (11)

Friday, May 10, 2019 7:24 AM

Hi Gopal_Shn,

Thank you for posting here.

Since this thread is related to RestClient, it is a third-party product, you could post in the following form.

https://groups.google.com/forum/#!forum/restsharp

**Note:**This response contains a reference to a third party World Wide Web site. Microsoft is providing this information as a convenience to you. Microsoft does not control these sites and has not tested any software or information found on these sites; Therefore, Microsoft cannot make any representations regarding the quality, safety, or suitability of any software or information found there. There are inherent dangers in the use of any software found on the Internet, and Microsoft cautions you to make sure that you completely understand the risk before retrieving any software from the Internet.

The Visual C# forum discusses and asks questions about the C# programming language, IDE, libraries, samples, and tools.

Best Regards,

Jack

MSDN Community Support
Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact [email protected].


Friday, May 10, 2019 2:04 PM

My gut instinct is that it is your ToReferenceData method that you didn't show. Whose cleaning up that stream you created around the binary data? HttpClient doesn't have this issue. We transfer files a lot larger than that using HttpClient all the time.

Michael Taylor http://www.michaeltaylorp3.net


Sunday, May 12, 2019 7:42 AM

No Michael.

ToReferenceData() is just below:

private ReferenceData ToReferenceData(MemoryStream zip_data)
        {
            return new ReferenceData
            {
                zip_data = zip_data
            };
        }

INfact the error is in the response from REST api itself.

Response shows OutOfMemoryException, hence the program doesnt execute further.

Thanks!


Sunday, May 12, 2019 10:08 PM

I have no problem with HttpClient and requests of this size. I'm going to assume then that since the issue is with the response processing itself that the issue is in RestClient. As Jack mentioned, that is a third party library. Please post to their GitHub site and see what they say. In the meantime you might consider switching to HttpClient and see if the problem goes away.

Michael Taylor http://www.michaeltaylorp3.net


Monday, May 13, 2019 7:24 AM

hi..

I switched to HTTPClient now.

Problem was probably with MemoryStream which I was using to get response from API.

Now I receive the response in Stream class instead of MemoryStream.

And get the response using await client.SendAsync

Now the problem is:

I need to return this Stream to calling function and then copy this to BLOB storage.

However, unable to do so. When I try to retun this, it says unassigned variable.

So now I have a situation where I have stream which holds the data, but cannot return it to calling function


Monday, May 13, 2019 1:34 PM

A MemoryStream is a Stream. Stream is the abstract version. MemoryStream either preallocates an array, dynamically grows an array or wraps an existing array. In your case you're pass it an array so it'll just use that. Since no memory is allocated it wouldn't throw an exception. I suspect the issue is with the RestClient Response objects handling of the file.

"However, unable to do so. When I try to retun this, it says unassigned variable."

We need to see the code.

Michael Taylor http://www.michaeltaylorp3.net


Tuesday, May 14, 2019 12:56 AM

Hi.

Below is my code:

===================================================================

public class APIResponse
    {
        public APIResponse() { }

        public APIResponse(MemoryStream zip_data) {
            this.api_zip_data = zip_data;
        }

        public byte[] RawBytes { get; set; }

        public Stream api_zip_data { get; set;  }

        //public Models.GTFSReferenceData gtfsData { get; set; }

        public string ErrorMessage { get; set; }

        public string StackTrace { get; set; }
    }

public class ResponseReader
{

    public async Task<APIResponse> GetData()
    {
        var ResponseResult = new ApiResponse(new MemoryStream());
        HttpClient client;
        HttpRequestMessage request;
        HttpResponseMessage response;
        Stream streamToReadFrom;

        client = new HttpClient();

        request = new HttpRequestMessage(HttpMethod.Get, uri);
        request.Headers.Add("Accept", "application/zip");
        request.Headers.Add("authorization", $"apikey {apiKey}");

        response = await client.SendAsync(request,HttpCompletionOption.ResponseHeadersRead);
        streamToReadFrom = await response.Content.ReadAsStreamAsync();

        using (streamToReadFrom = await response.Content.ReadAsStreamAsync())
        {
            await streamToReadFrom.CopyToAsync(ResponseResult.api_zip_data);

        }

    return ResponseResult;

    }
}

public static class GetAPIData
    {
        [FunctionName("GetAPIData")]
        public static async Task<HttpResponseMessage> Run([HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)]HttpRequestMessage req, TraceWriter log)
        {
            try
            {
                // creating config
                var response = new APIResponse();

                // creating a repo
                var blobStorageRepo = new BlobStorageRepository(StorageAccountConnectionString);

                var DataReader = new NSWTransport.OpenData.Reader.ResponseReader($"{config.BaseUri}", config.ApiKey);
                response = await ResponseReader.GetData();

                /* Save to Blob */

                ZipInputStream zipData = new ZipInputStream(response.api_zip_data);
                ZipEntry zipFileData = zipData.GetNextEntry();

                while (zipFileData != null)
                {
                    using (MemoryStream unZipFileData = new MemoryStream())
                    {
                        StreamUtils.Copy(zipData, unZipFileData, new byte[4096]);
                        await GTFSblobStorageRepo.UploadByteContents("APIData", DateTime.Now.ToString("yyyyMMdd") + "/" + zipFileData.Name, unZipFileData.ToArray());

                    }

                    zipFileData = zipData.GetNextEntry();
                }

                return req.CreateResponse(HttpStatusCode.OK, "Success");

            }
            catch (Exception ex)
            {

                return req.CreateResponse(HttpStatusCode.BadRequest, ex.Message);
                // return req.CreateResponse(HttpStatusCode.BadRequest, JsonConvert.SerializeObject(ex.Message), "application / json");

            }

        }
    }

==================================================================

1) APIResponse class has Stream api_zip_data

2) Class ResponseReader has logic to read from api and store in Stream

then return the response to main calling function

3) ResponseReader is instantiated from main class GetAPIData and function GEtData()

is invoked.

Below are the issues:

1) If I use MemoryStream instead of Stream to read the api data, it always fails with OutOfMemoryException

I also tried other methods such as Readbuffer, ReadByteArray but all of them fail with same error.

The api I am trying to invoke return 180MB zip file which MemoryStream can't handle.

2) I am able to read the api data using Stream and the readAsynch logic.

I validated it by downloading on my local laptop and the zip was valid.

3) However the challenge is returning this Stream to the calling function.

streamToReadFrom holds the api response data, which I need 

to copy to ResponseResult.api_zip_data, and then return this ResponseResult to calling function to further unzip and copy the data to blob.

But it gives error api_zip_data is null.

Initially when I created instance of the base class, I passed the MemoryStream to its constructor,

* var ResponseResult = new ApiResponse(new MemoryStream());*

but not sure if the memoryStream will be able to initializalize Stream in the class APIResponse.

====================================================================

Hope it is clear now.. How do i solve this?

I am sutck at this since long time now.

Pls help


Tuesday, May 14, 2019 1:26 AM

Further to the previous details,

I tested the above code, but it fails when I try to copy Stream data to memoryStream, with the same memory exception.

so ideally, I would need to copy to Stream.

But I have not been able to do that, since I cannot instantiate abstract class Stream, and it always gives null exception.


Tuesday, May 14, 2019 1:48 AM

You don't want to copy the data from a stream provided by HttpClient into a MemoryStream. There are several reasons for this.

1) You're going to have to allocate a sequential array big enough to hold that file. Unless you really need a byte array then don't do that.

2) Copying from the HttpClient's stream to a MemoryStream will replicate the data so a 150MB file will take up 300MB in memory. 

3) No code should be written that requires a MemoryStream. All stream-based code uses Stream so there is no reason to do this short of being able to release the response.

In your ResponseReader code, just set the api_zip_data property to the stream provided by the HttpResponseMessage. Note however that passing streams around is tricky because the owner of ApiResponse is now responsible for the lifetime of that stream. Therefore ApiResponse should implement IDisposable so the caller knows to clean it up when it is done.

Ideally your code shouldn't be passing a stream around. If you intend to unzip the contents and whatnot it might be better to have your GetApiData method to accept the name of a temp file to store the stream in. Then you can stream the contents directly to a temp file that higher level code can use. This is especially useful when dealing with public-facing code. If this is a private helper type then passing around a stream is OK provided you handle the lifetime issues.

public class APIResponse : IDisposable
{
    public APIResponse ( Stream data )
    {
       api_zip_data = data;
    }

    //Unclear why you need RawBytes as you shouldn't unpack the stream
    //doing so would double memory consumption
    
    public byte[] RawBytes { get; set; }

    //This is questionable, would probably not expose a setter for this
    public Stream api_zip_data { get; private set; }

    public string ErrorMessage { get; set; }

    public string StackTrace { get; set; }

    public void Dispose ()
    {
        if (api_zip_data != null)
        {
            api_zip_data.Dispose();
            api_zip_data = null;
        };
    }
}

public class ResponseReader
{
    //Pass a client that already has the URL and security stuff configured - should be set up by DI container
    //so you only create the HttpClient once
    public ResponseReader ( HttpClient client )
    {
        _client = client;
    }

    public async Task<APIResponse> GetData ( string resourceUrl, string apiKey )
    {        
        var request = new HttpRequestMessage()
        {
            Method = HttpMethod.Get,
            RequestUri = new Uri(resourceUrl),
        };
        request.Headers.Add("Accept", "application/zip");
        request.Headers.Add("authorization", $"apikey {apiKey}");

        var response = await _client.SendAsync(request, HttpCompletionOption.ResponseHeadersRead);
        response.EnsureSuccessStatusCode();

        var stream = await response.Content.ReadAsStreamAsync();
        return new APIResponse(stream);
    }

    private readonly HttpClient _client;
}

Another thing to be aware of is that HttpClient is not designed to be created over and over again. You should create it once and then reuse it for the life of your app. Each instance is a socket connection to a remote server and attempting to recreate it will cause you to run out of sockets. Refer to MSDN for more information.

Michael Taylor http://www.michaeltaylorp3.net


Wednesday, May 22, 2019 1:21 AM

thanks Michael.

I changed entire logic and was able to get the data downloaded from the REST api.

Now the challenge is it returns zip file, which I need to unzip and upload to Azure BLOB storage.

Though this works fine locally, it doesn't work as expected when deployed to azure function app and no error either

Any idea on this?


Wednesday, May 22, 2019 1:57 AM

Not off the top of my head. I think you should post this to the Azure forums to see if there is something you need to do special. In general Azure Functions should just work with .NET code.

Michael Taylor http://www.michaeltaylorp3.net