Skip to content

Smaller C Linker Wiki

alexfru edited this page Nov 11, 2014 · 19 revisions

Smaller C Linker

 

What is Smaller C linker?

What features does the linker support?

How do I compile the linker?

How do I run the linker?

Limitations and implementation details

What is Smaller C linker?

Smaller C linker is a linker specifically designed to statically link x86 ELF object files produced by the Netwide Assembler (AKA NASM) into executable files. The linker can also link standard archive files containing such x86 ELF object files. The linker may be used either as part of Smaller C or as a standalone tool. The output formats are: .COM, .EXE (for DOS), .EXE/PE (for Windows), ELF (for Linux) and flat executables.

What features does the linker support?

The linker can produce so-called "map files" that list the sections, the global public symbols and their addresses in memory and in the output executable file. These may be extremely useful for debugging since the linker does not store any debugging/symbolic information in the produced executables.

You can specify the stack size for the tiny, small and huge memory models in DOS executables.

You can override the default name of the entry point symbol.

You can specify the "origin" for Windows, Linux and flat executables. In Windows/PE and Linux/ELF executables it's the image base address, the address at which the PE/ELF headers will be loaded, which then will be followed by the code and data sections. In flat executables it's the address/offset at which the first byte of the file will be loaded for execution. The first byte in flat executables is part of the first executable instruction. Example: when linking into DOS .COM executables, the implicit origin is naturally set to 0x100 by the linker itself.

How do I compile the linker?

Just compile smlrl.c with your favorite 32-bit (or 64-bit?) compiler for a little-endian platform. No special compilation options should be needed.

The linker should be compiled with a 32-bit-capable compiler because ELF object files may have sections larger than 64KB and they may not fit into memory if the linker is compiled into a 16-bit program. If you compile the linker with Smaller C, you must compile it using either the -seg32 or the -huge options because Smaller C supports 32-bit integer types only in those memory mode(l)s and the linker makes use of 32-bit integers.

Currently, the linker lacks byte order conversion and will not work with little-endian x86 ELF object files on a big-endian platform.

How do I run the linker?

smlrl <option(s)> <input file(s)>

where input files are x86 ELF object files with the file extension ".o" and/or standard archive files (those that you make with the "ar" utility in Linux/UNIX) with the file extension ".a" and containing x86 ELF object files inside.

If there are many input files, you can create a special text file listing all those input files and instead provide that file's name prefixed with the at (@) character. This is especially useful when linking in DOS, where command lines are restricted to some 120+ characters. You can store linking options in this file as well. The options and the file names must be separated by white space (spaces, tabs or new line characters, whichever you choose). Note: currently, spaces in file names aren't supported inside @-files.

Since the linker supports many output formats and does not know which one is expected from it, the output format must always be specified explicitly with the appropriate option (one of: -tiny, -small, -huge, -pe, -win, -elf, -flat16, -flat32).

Options:

  • -o <output executable file> Specifies the name of the executable file. If this option isn't given, the executable file will be named "a.out".
  • -tiny Must be specified when linking into DOS .COM programs using the tiny memory model (CS=DS=ES=SS, all code, data and stack are in the same segment).
  • -small Must be specified when linking into DOS .EXE programs using the small memory model (DS=ES=SS, all data and stack are in the same segment, but the code is in a separate segment, CS≠SS).
  • -huge Must be specified when linking into DOS .EXE programs using the huge memory model. This model is specific to Smaller C and isn't compatible with huge models supported by other DOS compilers.
  • -pe or -win Must be specified when linking into Windows .EXE/PE programs. If the linker finds the symbol __dll_imports, it will create a proper DLL import table. See this example for how you could make that happen.
  • -elf Must be specified when linking into Linux/ELF programs.
  • -flat16 Must be specified when linking into 16-bit flat executables similar to DOS .COM programs. If the entry point is not at the very beginning, the linker will insert at the beginning a jump instruction to the entry point.
  • -flat32 Must be specified when linking into 32-bit flat executables. If the entry point is not at the very beginning, the linker will insert at the beginning a jump instruction to the entry point.
  • -stack <number> Specifies the stack size (in bytes) for the tiny, small and huge DOS memory models. To be used with the options -tiny, -small and -huge. If the option isn't given, the linker assumes 8192 for the tiny and small memory models and 32768 for the huge memory model. The number can be decimal, hex or octal (e.g. 8192, 0x2000 and 020000 would all specify the same value).
  • -map <somefile.map> Creates a map file with the sections and global public symbols and their locations in memory and in the output executable. This option is only useful for debugging.
  • -entry <name> Specifies the name of the entry point symbol. If this option isn't given, the entry point is assumed to be __start. Note the two leading underscore characters. If you want a C function to be the default entry point, it must be named as either _start (if the compiler prepends an underscore character by default or if using Smaller C with the option -leading-underscore) or __start (if the compiler does not prepend any underscore characters or if using Smaller C with the option -no-leading-underscore).
  • -origin <number> Specifies the origin for the executable file as an integer constant, decimal, hex or octal (e.g. 10, 0xA or 012 would all specify the same value). In Windows/PE and Linux/ELF executables it's the image base address, the address in memory at which the PE/ELF headers will be loaded, which then will be followed by the code and data sections. In flat executables it's the address/offset in memory at which the first byte of the file will be loaded for execution. The (E)IP register is expected to have this address/offset at start. The first byte in flat executables is part of the first executable instruction. Example: when linking into DOS .COM executables, the implicit origin is naturally set to 0x100 by the linker itself. This option should typically be used only when making flat executables. In other cases the default origin value chosen by the linker should be sufficient.
  • -verbose Causes printing of a lot of linking-related information to the standard output. You should probably only use this option when you run into some problem and need either extra info for troubleshooting. Be prepared for a lot of output.

Limitations and implementation details

  • The linker has only been tested with x86 ELF object files produced by NASM. It may not support object files produced by e.g. GNU as or gcc.
  • The linker supports relocation types R_386_32 (1), R_386_PC32 (2) and their 16-bit extensions R_386_16 (20) and R_386_PC16 (21).
  • The linker does not support SHT_RELA relocation sections.
  • The linker does not support weak symbols (STB_WEAK).
  • The linker does not support dynamic linking.
  • The linker does not produce relocateable Windows/PE or Linux/ELF executables as of now. Fortunately, both Windows and Linux support non-relocateable executables.
  • The linker sorts sections alphabetically within the same type of section (type being a combination of code/data, readable/writable, initialized/uninitialized). The sections appear in this order: ".text" (if exists), other code sections, read-only data sections (e.g. ".rodata"), writable initialized data sections (e.g. ".data"), uninitialized/zero-initialized sections (e.g. ".bss"). If in doubt, generate and examine the map file.
  • The linker merges all code sections into one code section and all data sections into one data section. This may be problematic on Windows and Linux if you want to make some code sections writable or some data sections read-only or executable.
  • The linker currently stores uninitialized/zero-initialized sections (e.g. ".bss") in the executable as a sequence of zero bytes. Also, as of now, Smaller C puts all variables into the ".data" section. So, if you want to make the executable as small as possible, avoid creating large global/static objects and instead prefer allocating large object memory via malloc() and the like.
  • The linker aligns and pads sections to the page size (4096 bytes) in memory and in the executable file when linking into Windows/PE or Linux/ELF executables.
  • The linker supports special symbols that mark the start and the end of a section. For example, the ".text" section start and end addresses can be found by taking the addresses of the special symbols __start__text and __stop__text. The end address is actually the address of the byte immediately following the last byte of the section. The dot character in section names is replaced by the underscore character when forming these special symbols.
    There are special symbols marking the start and the end of combined code sections and combined data sections. They are: __start_allcode__, __stop_allcode__, __start_alldata__, __stop_alldata__.
    There also exist a pseudo section for the stack portion of the data segment in the tiny and small 16-bit DOS memory models. Its start address can be found by taking the address of the special symbol __start_stack__. This pseudo section is located at the end of the 64KB data segment. The -stack option affects the address of the __start_stack__ symbol. The symbols __stop_alldata__ and __start_stack__ can be used to create a memory heap between the two locations.
Clone this wiki locally