This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

httpGrab.pl Documentation


NAME

htpGrab.pl


Purpose

httpGrab.pl uses the LWP library to make HTTP requests.


SCRIPT CATEGORIES

HTTP - suggested


PREREQUISITES

This script depends on both the strict and vars pragmas. The script also uses the LWP::UserAgent and Getopt::Long modules.


COREQUISITES

If the Time::HiRes module is available, it is used to generate higher-resolution timings on the time test criterion and the script timing.

If the MIME::Base64 module is available, it can be used to generate Basic HTTP authentication for a request.


OSNAMES

any. Tested on MSWin32 and Linux.


Description

httpGrab.pl uses the LWP library to make HTTP requests. Although the output is normally written to STDOUT, httpGrab.pl can also be used in simple profiling and other tests.

httpGrab.pl began as a simple script to understand the LWP::UserAgent module. Although it was originally written as a throw-away script, it was useful in working on dynamic web sites. As time went on, the script was enhanced with various extra features to make it useful for the work I was doing. This accumulation of features has resulted in a script unlike any other of its type.


Usage

Usage httpGrab.pl [options] url

  or
      httpGrab.pl  [options] -f file

Where options are any of

   -m method  perform a 'method' request instead of a GET
   -a agent-string
              provide a user-agent string
   -c cookie  provide a cookie, can be used multiple times for multiple
              cookies.
   -t content-type
              provide a content-type for POST requests
   -x proxy   provide a proxy server for the request
   -H header=value
              provide header data for the request, can be used multiple
              times, argument is of the form header=value
   -A userid:password
              provide a userid and password string for HTTP Basic authentication
   -b         output the body of the response
   -B         output the body of the response, forced binary write
   -h         output the headers of the response
   -r         output the response line
   -p [n]     profiling, return the time taken in seconds, supports
              an optional number of repetitions
   -n         number of repetitions, when profiling (deprecated)
   -f file    load urls from a file instead of from command line
   -s         simple request, do not follow redirects

The options are separated into four groupings. The first set defines the request. The second set defines what parts of the response are printed. The third set specifies profiling options. The last option specifies a file containing a list of URLs to request.

Request Options

The -m option allows the user to specify the HTTP method to use on this request. The default is 'GET'. If the method specified is 'POST', httpGrab.pl will retrieve the body of the request from STDIN.

The -a option specifies a user agent identification. This is particularly useful for pages that have different behavior for different browsers. The default user-agent string is 'httpGrab/0.92'.

The -c option specifies a cookie to be sent with the request. The -c option may be supplied multiple times to send multiple cookies. The value of the argument for this option is of the form 'name=value'.

The -t option specifies the content-type header for a 'POST' request. The default is 'application/x-www-form-urlencoded'. This content type is ignored unless the specified method is 'POST'.

The -x option specifies a proxy server to use when attempting to reach the specified URLs.

The -H option provides a header for the request. This option can be supplied multiple times for multiple headers. The argument for this option is of the form 'header=value'.

The -A option provides a userid and password for use with the HTTP Basic authentication scheme.

Response Options

By default, httpGrab.pl prints all of the response to STDOUT. This behavior can be modified through the use of one or more of the following options. To duplicate the default behavior, use the options -rhb.

The -b option causes the body of the response to be written to STDOUT.

The -h option causes the headers of the response to be written to STDOUT.

The -r option causes the response line to be written to STDOUT.

The -B option causes the body of the response to be written to STDOUT, just like the -b option. However, this option causes the output to be written as binary. This is particularly useful to allow the retrieval of binary data that has been misidentified by the server without translation by some operating systems.

Profiling Options

The profiling options perform only the most basic timing test. The time is measured from the beginning of the request until the response is completely returned. The profiling in httpGrab.pl does not (currently) support downloading any embedded components of the page, such as images or stylesheets.

The -p option turns on profiling and optionally supplies a number of times to repeat the request for better accuracy.

The deprecated -n option supplies a number of times to repeat the request. That ability is now provided by the -p option.

URL File

The -f option specifies a file from which to read a list of URLs. These URLs will be requested in order by httpGrab.pl. The only possible surprising result is the interaction between -f and -p. If the -p supplies a number, httpGrab.pl makes the request multiple times on the first URL. Then, it runs makes multiple requests on the second URL, etc.


Outstanding issues

In general httpGrab.pl works fairly well, but there are a few features I would like to add, at some point.

  • Ability to do SSL.

  • Ability to retrieve embedded content and stylesheets for timing.

To my knowledge there are no bugs in the current release.