The setenv fiasco
Most UNIX developers are familiar with the concept of
environment
variables. Basically, environment variables are a way of passing
configuration information to a process. Unlike command-line arguments, which
usually need to be specified explicitly, child processes inherit the
environment variables that their parents had.
The "environment" is an unordered set of key-value pairs. The keys are strings and so are the values. If you have a terminal open, you can see what your current environment looks like by typing
env. It might look a little bit like this:
ORBIT_SOCKETDIR=/tmp/orbit-cmccabe
HOSTNAME=highcastle
IMSETTINGS_INTEGRATE_DESKTOP=yes
TERM=xterm
SHELL=/bin/bash
HISTSIZE=1000
GTK_RC_FILES=/etc/gtk/gtkrc:/home/cmccabe/.gtkrc-1.2-gnome2
WINDOWID=44042990
QTDIR=/usr/lib64/qt-3.3
QTINC=/usr/lib64/qt-3.3/include
IMSETTINGS_MODULE=none
JAVA_OPTS=-Xcheck:jni:nonfatal
USER=cmccabe
CSCOPE_EDITOR=/usr/bin/vim
...
Environment varibles are a simple and elegant way of passing around
configuration information. It's easy to put a bunch of configuration variables
into a script, and simply run that script before your application runs. You
don't need to write a parser or ask your users to define a new configuration
file syntax. Environment variables are accessible from any programming
language, and available on any of the major platforms (Linux, Windows,
MacOS.)
You can read an environment variable with
getenv. You can remove
environment variables with
clearenv or
unsetenv. You can set an
environment variable with
setenv or
putenv.
A Day at the Races
None of the POSIX environment functions are thread-safe. Even the humble
getenv, which you would expect to be re-entrant, is actually not
required to be so. If you create a program that calls
getenv from
multiple threads without using a mutex to serialize access, you are relying on
implementation-specific behavior, and POSIX makes no guarantees. I'm not aware
of any implementation that will actually have problems with this kind of code,
but it's still worth noting. However, if you are modifying the environment in
one thread while other threads are reading it, you
will have problems on
Linux.
The solution is simple: just hold a mutex while accessing or modifying the
environment. However, this makes using environment variables a lot more
cumbersome. It also makes it difficult to use environment variables in a
shared library, or in a scripting language embedded in a larger application.
Modifying the Environment
POSIX doesn't tell you how to allocate space for new environment variables.
The original set of environment variables that are passed to the process when
it starts are
not allocated using
malloc. So if you add a new
environment variable, should the string be allocated using
malloc or
not? Nobody knows. Similarly, if you clear an environment variable using
clearenv, should you call
free on it? Good question-- but nobody
knows the answer.
If you call
free on a variable that was not originally allocated with
malloc, heap corruption will result. On the other hand, if you don't
free a variable that was
malloced earlier, that is a memory leak. It's
quite a dilemma, and POSIX is no help at all here.
There are actually three ways to set environment variables on Linux. You could
use
putenv,
setenv, or modify the global variable
environ.
putenv is the simplest way. You give it a pointer to a
string of the form KEY=VALUE, and it adds that pointer to the global
environment.
setenv tries to be more clever. It will use
malloc
to create a new KEY=VALUE string based on the KEY and VALUE that you pass to
it.
Personally, I prefer to use
putenv when adding environment variables.
With
putenv, you can avoid memory leaks by using statically allocated
strings. The other method does not give you this choice.
setenv does
not exist on HPUX, Solaris, and some other UNIX platforms. However,
putenv is mis-implemented as
setenv on Mac OS X, FreeBSD, and
some ancient Linux platforms.
Conclusion
So there you have it. The situation is a mess. You can't access environment
variables safely from multiple threads and can't portably call
setenv or
putenv without introducing memory leaks.
Your best bet is probably to avoid modifying the environment at all, if
possible. If that is not possible, try wrapping all accesses to the
environment with a mutex , and modifying
environ yourself, to make sure
nobody is
mallocing behind your back.
Stepping back a little bit, are these memory leaks and race conditions really
such a big deal? Well, even small memory leaks can be annoying if they clutter
up the output of tools like
valgrind, a memory leak diagnosis tool.
Also, once you allow incorrect code in your program, it tends to propagate
itself, as new programmers look to the old code for examples of how to do
things. Do yourself a favor and get it right the first time. As
Captain
Planet would no doubt say, cleaning up the environment is everyone's
responsibility.
Russian Translation (Финалист сетенов)