Upload huge files with Perl’s LWP::UserAgent

Everybody who work with Perl knows LWP::UserAgent, the most used library when you need to work with HTTP connections.

The library has some methods that cover the most common usage cases, such as GET and POST request.
If you need something more particular you have to set up a HTTP::Request object and pass it to LWP::UserAgent’s request method. I don’t think I’m saying something unexpected.

I recently got some problems by sending a bug file. It wasn’t really big because I’m talking about a 100 MB file, but it was bit enough to send my small VPS server out of memory. This was because I needed to pass it as raw POST payload for Google Drive API and to do that I was slurping it into memory. A bad idea.

So I needed a way to pass it by file handle or someway splitting it into chunks to avoid the memory problem. Unfortunately the documentation about this procedure is not really clear, so I post it here.

Basically sending POST raw data is this simple:

use strict;
use LWP::UserAgent;


my $ua = LWP::UserAgent->new;

my $request = HTTP::Request->new(
  POST => 'http://example.com/upload.php'
);

$request->content('my raw content');

$ua->request($request);

As said this isn’t a good solution if your data is big.

So in LWP::UserAgent documentation there is a note on the request method:

You are allowed to use a CODE reference as content in the request object passed in. The content function should return the content when called. The content can be returned in chunks. The content function will be invoked repeatedly until it return an empty string to signal that there is no more content.

An example would have explained it better, but there is no and it remains a bit obscure. But here we go:

use strict;
use LWP::UserAgent;


my $ua = LWP::UserAgent->new;

my $request = HTTP::Request->new(
  POST => 'http://example.com/upload.php'
);

my ($fh, $buf);
open $fh, 'my-big-file.txt'
$request->content(sub {
  return sysread($fh, $buf, 1048576) ? $buf : undef;
});

$ua->request($request);

Things doesn’t became much more complicated. I only defined two variables: a file handle and a buffer. Then I passed a sub as content. If you want you can define it elsewhere and then pass the CODE reference, but it does not really matter.

The POST payload will be requested to the sub until it returns a false value. Our sub will sysread a chunk of 1 MB at every iteration keeping memory usage low.

That’s all. You can modify the sub as needed, maybe reading the file in text mode or doing something else, it’s up to you.

 

Lascia un commento

Il tuo indirizzo email non sarà pubblicato. I campi obbligatori sono contrassegnati *