Plupload request 408 (timeout) issue — How I diagnosed

First of all, one word for anyone who does not know about plupload: a popular Javascript library that handles nearly all of file uploading matters.

Recently I came across a weird issue: users uploaded files to our server using plupload but got a series of 408 responses with zero chunk size for each request. I divided this into two sub-issues:

  1. why got 408
  2. why zero chunk size

Then I read the source code of Plupload and tried to find the retry logic. The code that are relevant to error-retry:

function handleError() {  if (retries-- > 0) {    delay(uploadNextChunk, 1000);  } else {    file.loaded = offset; // reset all progress    up.trigger('Error', {      code : plupload.HTTP_ERROR,      message : plupload.translate('HTTP Error.'),      ...    });  }}xhr.onerror = function() {  handleError();};

Simply put, plupload will retry uploadNextChunk() for retries times until all of them fail in case of error, then triggers an ‘HTTP_ERROR’. The core code of uploadNextChunk() is:

function uploadNextChunk() {  var chunkBlob, args = {}, curChunkSize;
if (chunkSize && features.chunks && blob.size > chunkSize) { // blob will be of type string if it was loaded in memory curChunkSize = Math.min(chunkSize, blob.size - offset); chunkBlob = blob.slice(offset, offset + curChunkSize); } else { curChunkSize = blob.size; chunkBlob = blob; } ...}

So the logic here is very clear: plupload will fetch a slice of the Blob each time to form a single request to the server(with chunks and chunk num info attached) and the server will assemble them to restore the raw file, e.g. we set chunk size as 4 MB and our file is 4 GB, then plupload will generate 4 GB / 4 MB = 1024 POST requests to complete the whole file uploading. I didn’t find any clues here and all corner checking work well.

To reproduce this issue, I made a few attempts and finally got one. The following is the process:

According to my previous experiences, many of such failing requests happen because of network change. For instance, our code are waiting for an incoming event but cancels the operation before it comes because of unexpected timeout and leaves many objects as ‘undefined’ when the event finally arrive. So my first trial is to interrupt the network after it successfully uploads one chunk. Unfortunately, the interruption will cause xhr.onerror then retry several times. Chunk size is normal and the uploading terminates after retries times. Unable to reproduce.

Try to reason it from bottom up. When error happens, the chunk size is 0. And I noticed that

chunkBlob = blob.slice(offset, offset + curChunkSize)
Fig 1. Blob slice function

The slice function will return ‘empty’ object if start (offset in this case) is larger than the size of the blob. So is offset assigned a large value somewhere? I looked up all the places that offset got a value but not able to find anything abnormal either.

Once during debugging, I thought that what would happen if I deleted the file after it successfully uploaded one chunk? It should result in error, though I don’t know what it is. After setting break points, I finally reproduce the 408 issue with zero size chunk, like the following:

Fig 2. Reproduce the issue

After tracing back, I found it was still the blob.slice function which worked in the end: after user deletes one file, the File object which represents the file in javascript has zero size, and blob.slice works just by File.slice, so results in a zero size chunk(then zero ‘filesize’ in my case). This will trigger server 408 response and retry retries times. So in one word, I believe the issue occurs just because of weird user actions: delete while uploading? strange!


For anyone who encounters similar issues:

  • Please verify user actions
  • Try to reproduce then diagnose it step by step

Hope it helps!

Breathtaking interfaces and strong services together make great products

Breathtaking interfaces and strong services together make great products