GNU autotools aka the GNU Build System is a build system designed to produce a portable source code package that can be compiled about everywhere.
The intentions are good: when properly used, a configure
script is generated
that runs everywhere a POSIX compatible shell is available, and
a Makefile that
can be used everywhere a make
program is
available.
No further dependencies are required for the user, and the
process to build
source with ./configure
, make
and make
install
is well-established and
understood.
From the developer's perspective though, things look a bit different. In order to create the mentioned configure script and Makefile, autotools uses the following 3 main components:
To use them, the developer needs perl
and gnu
m4
installed
in addition to the tools themselves, as well as a basic
understanding of m4,
shell scripting, Makefiles, and the complex
interaction between autoconf and
automake.
He also needs a lot of time and patience, because each change
to the
input files requires execution of the slow autoreconf
to rebuild the
generated sources, and running ./configure
and make
for testing.
Libtool is a shell script wrapper around the compiler and the linker with >9000 lines, which makes every compiler invocation about a 100 times slower. It is notorious for breaking static linking of libraries and cross-compilation due to replacing e.g. "-lz" with "/usr/lib/libz.so" in the linker command. Apart from being buggy and full of wrong assumptions, it's basically unmaintained (last release was 6 years ago).
While there's reasonably complete documentation for autoconf and automake available, it is seriously lacking in code examples, and so many supposedly simple tasks become a continuous game of trial and error.
Due to all of the above and more, many developers are overwhelmed and frustrated and rightfully call autotools "autocrap" and switch to other solutions like CMake or meson.
But those replacements are even worse: they trade the complexity of autotools on the developer side against heavy dependencies on the user side.
meson requires a bleeding edge python install, and CMake is a
huge C++
clusterfuck consisting of millions of LOC, which takes up
>400 MB disk
space when built with debug info.
Additionally meson and cmake invented their own build procedure
which is
fundamentally different from the well-known configure/make/make
install
trinity, so the user has to learn how to deal with yet another
build system.
Therefore, in my opinion, the best option is not to switch to another build system, but simply to only use the good parts of autotools.
It's kinda hard to figure out what libtool is actually good
for, apart from
breaking one's build and making everything 100x slower.
The only legitimate usecase I can see is executing dynamically
linked programs
during the build, without having to fiddle around with LD_LIBRARY_PATH
.
That's probably useful to run testcases when doing a native
build
(as opposed to a cross-compile), but that can also be achieved
by simply
statically linking to the list of objects that need to be
defined in the
Makefile anyhow.
Libtool being invoked for every source file is the main reason
for GNU make's
reputation of being slow.
If GNU make is properly used, one would need to compile
thousands of files for
a noticeable difference in speed to the oh-so-fast ninja.
Automake is a major pain in the ass.
Makefiles are generated by the configure script by doing a set
of
variable replacements on Makefile.in
, which in
turn is generated by automake
from Makefile.am
on the developer's end.
The only real advantage that automake offers over a handwritten Makefile is that conditionals can be used that work on any POSIX compatible make implementation, since those are still not standardized to this day, and that dependency information on headers is generated automatically. The latter can be implemented manually using the -M options to gcc, like -MMD.
The only good part of autotools is the generated portable configure script, and the standard way of using the previously mentioned trinity to build. The configure script is valuable for many reasons:
Additionally to the above, autoconf-generated configure scripts have some useful features built in:
On the other hand generated configure scripts tend to be quite big, and, as they are executed serially on a single CPU core, rather slow.
Fortunately this can be fixed by removing the vast majority of checks, just assume C99 support as a given and move on. While you're at it, throw out that check for a 20 year old HP-UX bug, please.
A modern project using autotools should only use autoconf to
generate the
configure script, and a single top-level Makefile that's
handwritten.
Build configuration can be passed to the Makefile using a single
file
that's included from the Makefile.
The number of configure checks should be reduced to the bare
minimum,
there's little point in testing e.g. for the existence of stdio.h
which is
standardized since at least C89, especially if then later the
preprocessor
macro HAVE_STDIO_H
isn't even used and stdio.h is
included unconditionally.
Used this way, the configure script will be quick in execution
and
small enough to be included in the VCS which allows the users to
checkout and
build any commit without having to do the autotools dance.
A good guide for writing concise configure scripts is available
here.
As for conditionals in make, I'm pretty much in favor of simply
assuming
GNU make
as a given and using its way to do
conditionals.
It's in widespread use (default make implementation on any
Linux distro)
and therefore available as gmake
even on BSD
installations
that usually prefer their own make implementation.
Apart from that it's lightweight (my statically linked make 3.82
binary has
a mere 176 KB) and one of the most portable programs around.
The alternative is to target POSIX make and do the conditionals using automake-style text substitutions in the configuration file produced by the configure run.
Our example project uses the following files: configure.ac, Makefile, config.mak.in, main.c, foo1.c and foo42.c with the following contents respectively.
configure.ac:
AC_INIT([my project], 1.0.0, maintainer@foomail.com, myproject)
AC_CONFIG_FILES([config.mak])
AC_PROG_CC
AC_LANG(C)
AC_ARG_WITH(foo,
AS_HELP_STRING([--with-foo=1,42], [return 1 or 42 [1]]),
[foo=$withval],
[foo=1])
AC_SUBST(FOO_SOURCE, $foo)
AC_OUTPUT()
Makefile:
include config.mak
OBJS=main.o $(FOO).o
EXE=foo-app
all: $(EXE)
$(OBJS): config.mak
$(EXE): $(OBJS)
$(CC) -o $@ $(OBJS) $(LDFLAGS)
clean:
rm -f $(EXE) $(OBJS)
install:
install -Dm 755 $(EXE) $(DESTDIR)$(BINDIR)/$(EXE)
.PHONY: all clean install
config.mak.in:
# whether to build foo1 or foo42
FOO=foo@FOO_SOURCE@
PREFIX=@prefix@
BINDIR=@bindir@
CFLAGS=@CFLAGS@
CPPFLAGS=@CPPFLAGS@
LDFLAGS=@LDFLAGS@
main.c:
#include <stdio.h>
extern int foo();
int main() {
printf("%d\n", foo());
}
foo1.c:
int foo() { return 1; }
foo42.c:
int foo() { return 42; }
You can get these files here.
After having the files in place, run autoreconf -i
to generate the configure
script. You'll notice that it runs unusually quickly, about 1
second, as opposed
to projects using automake where one often has to wait for a
full minute.
The configure script provides the usual options like --prefix, --bindir, processes the CC, CFLAGS, etc variables exported in your shell or passed like
CFLAGS="-g3 -O0" ./configure
just as you'd expect it to, and provides the option --with-foo=[42,1] to let the user select whether he wants the foo42 or foo1 option.
AC_CONFIG_FILES([config.mak])
Here we instruct autoconf that config.mak is to be generated
from config.mak.in
when it hits AC_OUTPUT()
(which causes config.status
to be executed).
It will replace all values that we either specified with AC_SUBST()
,
or the built-in defaults like prefix
and bindir
(see config.status for
the full range) with those specified by the user.
AC_ARG_WITH(...)
This implements our --with-foo multiple choice option. You can
read about how
it works in the usual
autoconf documentation.
Other autoconf macros that you will find handy include AC_CHECK_FUNCS
,
AC_CHECK_HEADERS
, AC_CHECK_LIB
, AC_COMPILE_IFELSE
to implement the
various checks that autoconf offers, as well as AC_ARG_ENABLE
to implement
the typical --enable/--disable switches.
AC_SUBST(FOO_SOURCE, $foo)
This replaces the string @FOO_SOURCE@
in
config.mak.in with the value assigned
by the user via the AC_ARG_WITH()
statement, when
config.mak is written.
The rest of the contents in configure.ac are the standard boilerplate for C programs.
include config.mak
This statement in Makefile includes the config.mak generated by
configure.
If it is missing, running make will fail as it should.
config.mak will provide us with all the values in config.mak.in,
where each
occurence of @var@
is replaced with the results of
the configure process.
OBJS=main.o $(FOO).o
This sets OBJS to either main.o foo1.o or main.o foo42.o,
depending on the
choice of the user via the --with-foo
switch. We
didn't even have to use
conditionals for it.
$(OBJS): config.mak
We let $(OBJS) depend on config.mak, so they're scheduled for rebuild when the configuration was changed with another ./configure execution, as the user might have changed his CFLAGS or --with-foo setting to something else. For bonus points, you could put all build-relevant settings into e.g. config-build.mak, and directory-related stuff stuff into config-install.mak (unless you hardcode directory names into the binary) and make the dependency to config-build.mak only.
install -Dm 755 $(EXE) $(DESTDIR)$(BINDIR)/$(EXE)
Two important things about this line:
install
target, always use $(DESTDIR)
in front of $(BINDIR),
$(PREFIX) etc, so the user can install
stuff to a staging directory as is custom.-Dm mode
option isn't
compatible with BSD's install
program. The BSD guys really should get their act together and
finally support
this handy option, until then you can use the portable
script here that emulates
it.The rest of the Makefile contents are pretty standard. You might notice the absence of a specific rule to build .o files from .c, we use the implicit rule of make for this purpose.
FOO=foo@FOO_SOURCE@
config.status will replace the string @FOO_SOURCE@
with either 1
or 42
,
depending on which --with-foo option was used (1 being the
default), shortly
before configure terminates and writes config.mak. The values
for @CFLAGS@
and the other variables will be replaced with the settings the
configure
scripts defaults to or those set by the user.
... should be self-explanatory.
You can run ./configure && make
now and
see that it works -
foo-app
is created, and make
DESTDIR=/tmp/foobar install
installs
foo-app into /tmp/foobar/bin/foo-app.
./configure --with-foo=42 && make
should
cause the foo-app binary to print
42 instead of 1.
If you want to learn more about the build process and
especially how it works
in regard to cross-compilation, you can check out my article Mastering
and designing C/C++ build systems.
https://sabotage-linux.neocities.org/blog/5/
As the maintainer of sabotage linux, a distro compiled from source, with >1000 packages, and being involved in the development of musl libc, I've seen a wide variety of odd build systems, or regular build systems used in an odd way. Which resulted in lots of issues trying to get other people's packages building.
The vast majority of build system coders and developers using these build systems for their packages do not understand in detail how their toolchains are supposed to be used, and especially cross-compilation is a topic the majority of people knows nothing about. The intent of this blog post is to explain the basic mechanisms, to change this situation.
But first, let's establish the meaning of some terms.
From here on, the term user
will be used to mean
the person trying to compile
your software package from source. We're not concerned here
about people using
the compilation result via a binary package.
Now we will first take a quick look at the basic concepts involved in compilation, followed by the typical 3 stages of a build process, which are: Configuration, Compilation, Installation.
So in order to get your program compiled on a variety of different hosts, you typically need to interface with the following components:
For the C programming language, the convention is that on the
user's system
there's a C compiler installed in the default search PATH
with the name cc
.
It can be overridden with the environment variable CC
.
so if CC
is set to clang
, the
build system should use clang
instead of
cc
.
A sanely designed build system does something along the lines of:
if is_not_set($CC): CC = cc
For C++, the default binary name is c++
and the
environment variable CXX
.
Note that the user may choose to set CC or CXX to something
that includes
multiple items, for example CC=powerpc-gcc
-I/tmp/powerpc/include
.
Therefore, in a shell script, when you want to use the CC
command to compile
something, the $CC variable needs to be be used unquoted, i.e.
$CC
and not
"$CC"
since the latter would force the shell to
look for a binary with the
spaces inside the filename.
(For the record, the compiler is the program that turns your
sourcecode into an object file, e.g. cc foo.c -c -o
foo.o
)
Fortunately with C and C++, unless you do highly unusual
things, you will
never have to invoke the linker directly. instead you can
simply use CC
or CXX
and they will know from the context that
a linker is needed, and
call the linker themselves.
(For the record, the linker is what takes a couple
.o files and turns them into an executable or a shared
library, e.g.:
cc foo.o bar.o -o mybinary.elf
)
There will be a couple options you will have to use so the
compilation works
in a certain way. For example, your code may require the flag
-std=c99
if
you use C99 features.
Additionally, the user will want or need to use certain
flags.
For this purpose, the environment variable CFLAGS
is used.
If the user didn't specify any CFLAGS
himself,
you may decide to set some
sane default optimization flags (the default for GNU autoconf
packages is
-O2 -g -Wall
). The CFLAGS used for the
compilation should always put the
user-set CFLAGS last in the command line, so the user has the
ability to
override some defaults he doesn't like.
The following logic describes this:
REQUIRED_CFLAGS=-std=c99
CFLAGS_FOR_COMPILE=$(REQUIRED_CFLAGS) $(CFLAGS)
For C++, these flags are called CXXFLAGS
, and
the logic is precisely the
same.
There's also CPPFLAGS
, which is used for
preprocessor directives such as
-DUSE_THIS_FEATURE -DHAVE_OPENGL
and include
directories for header lookup.
More about headers soon. Again, user-supplied CPPFLAGS
need to be respected
and used after the CPPFLAGS
the build system
requires.
Last but not least we have LDFLAGS
, these are
flags used at link time.
It contains things such as -L
linker library
search path directives,
-lxxx
directives that specify which libraries to
link against, and other
linker options such as -s
(which means "strip
the resulting binary").
Here, again, the rule is
to respect user-provided LDFLAGS
and put them
after your own in the linker
command.
From here on, whenever we talk about cc
or CC
or CFLAGS
, the exact same
applies to c++
, CXX
and CXXFLAGS
for C++.
When writing code in C or C++, you necessarily need to use
libraries installed
on the end users machine. At least, you would need to use the
C or C++
standard library implementation. The former is known as libc
,
the latter as
libstdc++
or libc++
. Optionally
some other libraries, such as libpng
may be needed.
In compiled form, these libraries consist of header files,
and the library
itself, as either static (.a archive
) or dynamic
library
(.so, .dynlib, .dll
).
These headers and libs are stored in a location on your user's
machine, which
is typically /usr/include
for headers and /usr/lib
for libraries, but this
is none of your concern. It's the job of the user to configure
his compiler
in such a way that when you e.g. #include
<stdio.h>
it works (usually the user
uses his distro-provided toolchain which is properly set up).
Cross-compilation means that you compile for a different
platform than
the one you're using, for example if you want to compile ARM
binaries for your
raspberry pi from your x86_64 desktop.
It's not really much different than regular compilation, you
pass your compiler
name as CC, e.g. CC=armv7l-linux-musl-gcc
and
set your C and CPP flags such
that they point into the lib/ and include/ dirs with your
other ARM stuff in it.
For example, if you prepare a rootfs for your raspberry pi in
/tmp/piroot
,
you'd probably set up your compiler-related environment vars
as following:
CC=armv7l-linux-musl-gcc
CPPFLAGS=-isystem /tmp/piroot/include
LDFLAGS=-L/tmp/piroot/lib
In compiler jargon, the armv7l-linux-musl
prefix to your toolchain name is the
so-called triplet
. All components of your
toolchain are prefixed with it, for
example the ar
archiver is called armv7l-linux-musl-ar
,
the same applies for
as
, ld
, ranlib
, strip
,
objdump
, etc.
In Autoconf-based build systems, you pass the triplet as
--host=armv7l-linux-musl
to ./configure
,
whereas Makefile-only based systems
usually use a CROSS_COMPILE
environment
variable, which is set to triplet plus
a trailing dash, e.g. CROSS_COMPILE=armv7l-linux-musl-
.
In your own build system, you should follow the GNU autoconf
convention though.
What makes cross-compilation a bit tricky is
you can't execute binaries that you created with the crosscompiler toolchain on your build machine.
Some packages require to compile tools that are then
executed, for example
to generate some headers and similar things.
These tools need to be compiled with a different compiler,
targeting your
host's OS, arch, etc, which in autoconf jargon is
confusingly called the
"build" compiler, whereas Makefile-only based build
systems call it HOSTCC
.
Note that when such programs, to be executed on the host,
require other
additional object files that are needed for the target
build too, these
object files need to be compiled twice, once for the host
and once for the
target. In such a case it is advisable to give the object
files different
extensions, maybe .ho
for object files meant
for the host, and .o
for
those intended for the target.
This also means your configuration process can't use checks that require to build and then run a binary.
There's a variety of reasons host include/ and lib/
directories can leak
into the build process, causing havoc. In the worst case,
you only have an
include directory leak which has wrong sizes for types in
them, which can
result in a successfully linked binary that will crash or
corrupt memory.
If host library
directories leak into the build, you will get link errors
when the linker
tries to link ARM and x86_64 binaries together.
One of the worst offenders in this regard is libtool
,
which should be
avoided at all costs.
If you design a build system from scratch, keep in mind that your users probably don't want to spend a lot of time learning about your system. They simply want to get the process done as painlessly and quickly as possible (which implies that the build system itself should have as little external dependencies as possible).
Please do respect existing conventions, and try to model your build system's user interface after the well-established GNU autoconf standards, because it's what's been around for 20+ years and what the majority of packages use, so it's very likely that the user of your package is familiar with its usage. Also, unlike more hip build tools of the day, their user interface is the result of a long evolutionary process. Autoconf does have a lot of ugly sides to it, but from a user perspective it is pretty decent and has a streamlined way to configure the build.
Before we can start building, we need to figure out a few
things.
If the package has optional functionality, the user needs to
be able to specify
whether he wants it or not. Some functionality might require
additional
libraries, etc. This stage in the build process is
traditionally done via a
script called configure
.
Your package may have some non-essential code or feature, that might pull in a big external library, or may be undesirable for some people for other reasons.
Traditionally, this is achieved by passing a flag such as --disable-libxy
or
--without-feature
, or conversely --with-feature
or --enable-libxy
.
If such a flag is passed, the script can then write for
example a configuration
header that has some preprocessor directive to disable the
code at compile time.
Or such a directive is added to the CPPFLAGS
used during the build.
These flags should be documented when the configure script is
being run with
the --help
switch.
Sometimes one needs to use functionality that differs from system to system, so we need to figure out in which way the user's system provides it.
The wrong way to go about this is to hardcode assumptions about specific platforms (OS/compiler/C standard library/library combinations) with ifdefs like this:
#if OPENSSL_VERSION_NUMBER >= 0x10100000
/* OpenSSL >= 1.1 added DSA_get0_pqg() */
DSA_get0_pqg(dsa, &p, &q, &g);
#else
...
#endif
This is wrong for several reasons:
Linux distros sometimes backport functionality from newer library versions to the older library version they ship. Conversely a newer library version that supposedly should have the functionality, could have been selectively downgraded with a patch fixing a specific bug, which might require to undo a new feature.
For some libraries, as in this case OpenSSL, API-compatible replacements exist (here libressl or Apple's fork boringssl).
The proper way to figure out whether DSA_get0_pqg()
exists, is... to actually
check whether it exists, by compiling a small testcase using
it (more below),
and pass a preprocessor flag such as HAVE_DSA_GET0_PQG
to the code in
question.
Even worse than the above hardcoded version number check is
when people assume
that a certain C library implementation, for example musl
,
have a certain bug
or behaviour or lack a certain function, because at the time
they tested it
that was the case. If a __MUSL__
macro would
exist
, they would just hardcode
their assumption into the code, even though the very next
version of musl might
have fixed the bug or added the function in question, which
would then result
in compile errors or even worse, bogus behaviour at runtime.
You should NEVER hardcode any absolute paths for headers or libraries into your build system, nor should you start searching in the user's filesystem for them. This would make it impossible to use your package on systems with a non-standard directory layout, or for people that need to crosscompile it (more on cross-compilation just a little further down).
The majority of third-party libraries install their headers
either into a
separate sub-directory in the compiler's default include path
(for example /usr/include/SDL/*.h
), or if
there's only one or two headers
directly into the include dir (for example /usr/include/png.h
).
Now
when you want to test for whether the user's system has the
libpng headers
installed, you simply create a temporary .c file with the
following contents:
#include <png.h>
typedef int foo;
and then use $CC $CPPFLAGS $CFLAGS -c temp.c
and check whether the command
succeeded. If it did, then the png.h
is
available through either the
compiler's default include directory search paths, or via a
user-supplied
-I incdir
statement which he can provide if his
libpng is installed in a
non-standard location such as $HOME/include
.
Note that this approach is cross-compile safe, because we didn't need to execute any binary.
If you want to use headers of a library such as SDL
that installs a number
of headers into a subdir, you should reference them in your
code via
#include <SDL/SDL.h>
and not #include
<SDL.h>
, because the latter will
require the addition of -I path
include search
path directives.
After you've established that the user has libpng's headers installed, you might want to check whether it links correctly and whether it provides a certain function you're using (though testing for this only makes sense if the function is a recent addition).
Again, you check this by writing a temporary .c file, that looks roughly like:
#include <png.h>
int main() {png_set_compression_buffer_size(0, 0);}
the command to test it is: $CC $CPPFLAGS $CFLAGS
temp.c -lpng $LDFLAGS
.
If the command succeeds, it means that one of libpng.a/.so is
available in the
compiler's default library search path, (or in some -L
directive the user
added to his LDFLAGS
) and that it contains the
function
png_set_compression_buffer_size
.
The latter is established by using a main()
function, which forces the linker to fail on missing symbols
(also note the
omission of -c
).
If your aim is only to test whether the libpng library is installed, the test can be written as:
#include <png.h>
int main() {return 0;}
and compiled exactly as the previous. Note that this test actually checks that both the header exists AND the library, so by using this kind of test you don't actually need to test for header and library separately. Again, we merely compiled the testcase and didn't need to execute it.
For simple libraries such as zlib
you should
always try first whether you can
simply link to e.g. -lz
.
If that doesn't work, you can fall back to a tool called pkg-config
or one of
its clones such as pkgconf
, which is widely
used.
The path to the tool is user provided via the environment
variable PKG_CONFIG
.
If not set, the fall-back is to use pkg-config
instead.
It can be used like this:
$PKG_CONFIG --cflags gtk+-2.0
This will print a couple of -I include directives that are required to find the headers of gtk+2.
Likewise
$PKG_CONFIG --libs gtk+-2.0
can be used to query the LDFLAGS required for linking gtk+2.
Note that by default, pkg-config looks into $(prefix)/lib/pkgconfig
,
which
is not compatible with crosscompilation.
the environment variable PKG_CONFIG_SYSROOT_DIR
can be set to the
crosscompile rootfs root directory, e.g. /tmp/piroot
and
PKG_CONFIG_LIBDIR
to /tmp/piroot/lib/pkgconfig
,
or
if PKG_CONFIG
is not set, but a
cross-compile triplet was passed to
the configuration process, and a triplet-prefixed
pkg-config exists in the
PATH
, this is being used instead of the
host's pkg-config
, e.g.
armv7l-linux-musl-pkg-config
.
The authors of some packages wrote their own package specific
pkg-config
replacement, reasoning unknown. For example, on my machine the
following
proprietary -config programs exist: allegro-config
,
croco-config
,
curl-config
,freetype-config
, gpg-error-config
,
icu-config
,
libpng-config
, pcap-config
, pcre-config
,
python-config
,
sdl-config
, xml2-config
...
What they all have in common is that they do things differently and they are not cross-compile compatible. Usually, whenever one of them is being used by a build system, cross-compilation breakage follows. Because these tools simply return the include and library directories of the host.
Unfortunately, the authors of some of these programs refuse
to write portable
pkg-config files instead. OTOH, most of them require no
special include dirs,
and their --libs
invocation simply returns -lfoo
.
For those few that don't (the worst offenders are apr-1-config
tools from
Apache Foundation), as a build system author, I suppose, the
only correct way
to deal with them is to not use them at all, but instead force
the user to
specify the include and library paths for these libraries with
some
configuration parameters.
Example: --apr-1-cflags=-I/include/apr-1
In some rare cases, one needs to know e.g. the size of long of the toolchain target at compile time. Since we cannot execute any testbinaries that would run e.g.
printf("%zu\n", sizeof(long));
and then parse their output because we need to stay compatible with cross-compilers, the proper way to do it is by using a "static assertion" trick like here:
/* gives compile error if sizeof(long) is not 8 */
int arr[sizeof(long) == 8 ? 1 : -1];
Compile the testcase with $CC $CPPFLAGS $CFLAGS -c
temp.c
.
Another way is to run e.g.
$CC $CPPFLAGS -dM -E - </dev/null | grep __SIZEOF_LONG__
This command (without the piped grep) makes GCC and derivates spit out a list of built-in macros. Only GCC and Clang based toolchains that came out during the last couple years support this though, so the static assert method should be prefered.
Unfortunately, varying platforms have provided endianness test macros in different headers. Because of that, many build system authors resorted to compiling and running a binary that does some bit tricks to determine the endianness and print a result.
However since we cannot run a binary as we want to stay cross-compile compatible , we need to find another way to get the definition. I've actually spent a lot of effort by trying dozens of compiler versions and target architectures and came up with a public domain single-header solution, that has portable fallback functions that can do endian conversions even if the detection failed, although at a slight runtime cost.
I would advise its usage, rather than trying to hack together a custom thing.
I've also come across a number of checks that required to run
a testcase and
therefore prevented crosscompilation from working. Mostly,
these are tests
for a certain bug or odd behaviour.
However, it is wrong to assume that because the system the
test binary currently
runs on has a certain bug, the end user's system will have the
same bug.
The binary might for example be distributed as a package, and
might suddenly
start misbehaving if another component that fixes the bug is
updated.
Therefore the only safe and correct way to deal with this
situation is to write
a check that's
executed when the binary is used at runtime, and then sets a
flag like bug=1;
and then have two different codepaths, one for a system with
the bug and one for
a system without it.
In GNU Autoconf, the way to tell it that you're
cross-compiling is by setting a
--host=triplet
parameter with the triplet of the
target toolchain, additional
to putting the crosscompiler name into the CC
environment variable.
The triplet is then used to prefix all parts of the toolchain,
like
RANLIB=$(triplet)-ranlib
STRIP=$(triplet)-strip
etc.
For the build host, there's also a parameter called --build=triplet
.
If not set, the configure process will try whether gcc
or cc
is available,
and then use that. If set, all toolchain components targeting
the host you're on
will be prefixed with this triplet. It can be queried by
running
$CC -dumpmachine
. Usually, it is not necessary to
set it.
As mentioned it's hugely preferable to test for functionality
rather
than platform.
But if you really think it's necessary to figure out
the target OS,
do not use uname
which is totally bogus. It
simply returns the OS of
the compiler user, who might use an Apple computer but
cross-compile for
NetBSD.
You can instead derive the target OS via $CC
-dumpmachine
, which returns the
toolchain target triplet, or by parsing the output of
$CC $CPPFLAGS -dM -E - </dev/null
Knowledge about system paths is required for 2 reasons. One is that during the Installation stage we need to know where files like the compiled program binary need to be installed in. The other is that our program or library might require some external data files. For example, the program might require a database at runtime.
For this reason, a --prefix
variable is passed
to the configure step.
On most typical linux installations --prefix=/usr
would be used for a system
install, whereas --prefix=/usr/local
is
typically used for an alternate
installation from source of a package the distribution
provides but for some
reason is not sufficient for the user.
Sabotage Linux and others use an empty prefix, i.e. --prefix=
,
which means
that for example binaries go straight into /bin
and not /usr/bin
, etc.
Many hand-written configure scripts get this wrong and treat
--prefix= as if
the user hadn't passed --prefix at all, and fall back to the
default.
The default, btw is traditionally /usr/local
.
So in case your program needs a database, let's say leetpackage.sqlite
,
you
would probably hardcode the following db path into your
binary:
#define DB_PATH PREFIX "/share/leetpackage/leetpackage.sqlite"
where PREFIX would be set as part of CPPFLAGS or similar
according to the user's
selection.
For more fine-grained control, traditional configure scripts
also add options
like --bindir
, --libdir
, --includedir
,
--mandir
, --sysconfigdir
, etc
additional to --prefix
,
which, if not set, default to ${prefix}/bin
, ${prefix}/lib
,
${prefix}/include
etc.
More on paths in the Installation chapter.
After the configuration step finished, it should have written
the configuration
data in some form, either a header, or a Makefile include
file, which is then
included by the actual Makefile (or equivalent).
This should include any previously mentioned environment
variables, so it
is possible to login in a different shell session without any
of them set, yet
getting the same result when running make
.
Some users of GNU autotools create the Makefile from a
template (usually called
Makefile.in
) at the end of the configure run, but
I personally found this to
be really impractical, because when making changes to the
Makefile template,
configure has to be re-run every single time.
Therefore I recommend writing the settings into a file called
config.mak
, which is included by the Makefile.
The actual compilation is typically run by executing make
,
which on most
systems defaults to GNU make, which is a lot more powerful
than the traditional
BSD makes. Its code is small and written in portable C, so
it's easy to get it
bootstrapped quickly on systems that don't have it yet, unlike
competitors such
as CMake, which is 1) written in C++ which takes a lot longer
to parse than C,
and 2) consists of > 1 million lines of code and 3)
occupies a considerable
amount of HDD space once installed.
Anyway, GNU make can even be found pre-installed on the BSDs,
it's called
gmake
there.
Here, the following conventions apply:
CC foo.c
for debugging
purposes, a flag V=1
, short for
verbose, can be passed as in make V=1
.-jN
flag is used, such
as in make -j8
if you want
to use 8 parallel processes.If a Makefile is used for building, the build process should
be tested using
several parallel processes, because failure to document
dependencies of files
properly often results in broken parallel builds, even though
they seem to work
perfectly with -j1
.
Do note that you should not strip
binaries,
ever. If the user wants his
binaries stripped, he will pass -s
as part of
his LDFLAGS
.
The Installation is typically done using the make
install
command.
Additionally there's an important variable that distro
maintainers use for
packaging: DESTDIR
.
If for example, at configure time, --prefix=/usr
was set, then
make install DESTDIR=/tmp/foo
should cause stuff
to be installed into
/tmp/foo/usr
, so if your package compiles a
binary called myprog
, it should
end up in /tmp/foo/usr/bin/myprog
. A typical
install rule would look like
this:
bindir ?= $(prefix)/bin
...
install: myprog
install -Dm 755 myprog $(DESTDIR)$(bindir)/myprog
here we use the install
program to install the
binary myprog
to its
destination with mode 755 (-m 755
) and create all
path compontens along
the way (-D
).
Unfortunately, the install
program shipped with
some BSDs and Mac OS X
refuse to implement these practical options, therefore this
portable
replacement implementation can
be used instead.
It is a good idea and the common practice to explicitly set the permissions during the install step, because the user doing the installation might unwittingly have some restrictive umask set, which can lead to odd issues later on.
Even if the build system you intend to write does not use
Makefiles, you should
respect the existing conventions (unlike CMake & co which
NIH'd everything)
like V=1
, -j8
, DESTDIR
,
--prefix
, etc.
One of the big advantages of GNU's autotools system is that, from a user's perspective, they require nothing more than a POSIX-compatible shell to execute configure scripts, and GNU make, which as already mentioned is really slim, written in portable C, and widely available while requiring less than one MB of HDD space (my GNU make 3.82 install takes 750KB total including docs).
So in my opinion, the build system of the future, in whatever language it's written in, and how many millions of lines of code it consists of, should do precisely the same: it should at least have the option to generate a configure script and a stand-alone GNU Makefile, which is shipped in release tarballs. That way only the developers of the package need the build toolkit and its dependencies installed on their machine, while the user can use the tools he already has installed, and can interface with the build system in a way he's already familiar with.
19 Apr 2019 19:34 UTC - Added paragraph "Checking for the target OS"