Copying data to a buffer that is not large enough to hold that data results in a buffer overflow. Buffer overflows occur frequently when manipulating strings [Seacord 2013b]. To prevent such errors, either limit copies through truncation or, preferably, ensure that the destination is of sufficient size to hold the character data to be copied and the null-termination character. (See STR03-C. Do not inadvertently truncate a string.)
When strings live on the heap, this rule is a specific instance of MEM35-C. Allocate sufficient memory for an object. Because strings are represented as arrays of characters, this rule is related to both ARR30-C. Do not form or use out-of-bounds pointers or array subscripts and ARR38-C. Guarantee that library functions do not form invalid pointers.
Noncompliant Code Example (Off-by-One Error)
This noncompliant code example demonstrates an off-by-one
error [Dowd
2006]. The loop copies data from
src
to
dest
. However, because the loop does not account for the null-termination
character, it may be incorrectly written 1 byte past the end of
dest
.
#include <stddef.h>
void
copy(
size_t
n,
char
src[n],
char
dest[n]) {
size_t
i;
for
(i = 0; src[i] && (i
< n); ++i) {
dest[i] = src[i];
}
dest[i] =
'\0'
;
}
|
Compliant Solution (Off-by-One Error)
In this compliant solution, the loop termination condition is modified
to account for the null-termination character that is appended to
dest
:
#include <stddef.h>
void
copy(
size_t
n,
char
src[n],
char
dest[n]) {
size_t
i;
for
(i = 0; src[i] && (i
< n - 1); ++i) {
dest[i] = src[i];
}
dest[i] =
'\0'
;
}
|
Noncompliant Code Example (
gets()
)
The
gets()
function, which was deprecated in the C99 Technical Corrigendum
3 and removed from C11, is
inherently unsafe and should never be used because it provides no way
to control how much data is read into a buffer from
stdin
. This noncompliant code example assumes that
gets()
will not read more than
BUFFER_SIZE - 1
characters from
stdin
. This is an invalid assumption, and the resulting operation can
result in a buffer overflow.
The
gets()
function reads characters from the
stdin
into a destination array until end-of-file is encountered or a
newline character is read. Any newline character is discarded, and a
null character is written immediately after the last character read
into the array.
#include <stdio.h>
#define BUFFER_SIZE 1024
void
func(
void
) {
char
buf[BUFFER_SIZE];
if
(
gets
(buf) == NULL) {
/* Handle error */
}
}
|
See also MSC24-C. Do not use deprecated or obsolescent functions.
Compliant Solution (
fgets()
)
The
fgets()
function reads, at most, one less than the specified number of
characters from a stream into an array. This solution is compliant
because the number of characters copied from
stdin
to
buf
cannot exceed the allocated memory:
#include <stdio.h>
#include <string.h>
enum
{ BUFFERSIZE = 32 };
void
func(
void
) {
char
buf[BUFFERSIZE];
int
ch;
if
(
fgets
(buf,
sizeof
(buf), stdin)) {
/* fgets() succeeded; scan
for newline character */
char
*p =
strchr
(buf,
'\n'
);
if
(p) {
*p =
'\0'
;
}
else
{
/* Newline not found; flush
stdin to end of line */
while
((ch =
getchar
()) !=
'\n'
&& ch != EOF)
;
if
(ch == EOF && !
feof
(stdin) && !
ferror
(stdin)) {
/* Character resembles EOF;
handle error */
}
}
}
else
{
/* fgets() failed; handle
error */
}
}
|
The
fgets()
function is not a strict replacement for the
gets()
function because
fgets()
retains the newline character (if read) and may also return a
partial line. It is possible to use
fgets()
to safely process input lines too long to store in the
destination array, but this is not recommended for performance
reasons. Consider using one of the following compliant solutions when
replacing
gets()
.
Compliant Solution (
gets_s()
)
The
gets_s()
function reads, at most, one less than the number of characters
specified from the stream pointed to by
stdin
into an array.
The C Standard, Annex K [ISO/IEC 9899:2011], states
No additional characters are read after a new-line character (which is discarded) or after end-of-file. The discarded new-line character does not count towards number of characters read. A null character is written immediately after the last character read into the array.
If end-of-file is encountered and no characters have been read into the destination array, or if a read error occurs during the operation, then the first character in the destination array is set to the null character and the other elements of the array take unspecified values:
#define
__STDC_WANT_LIB_EXT1__ 1
#include <stdio.h>
enum
{ BUFFERSIZE = 32 };
void
func(
void
) {
char
buf[BUFFERSIZE];
if
(gets_s(buf,
sizeof
(buf)) == NULL) {
/* Handle error */
}
}
|
Compliant Solution (
getline()
, POSIX)
The
getline()
function is similar to the
fgets()
function but can dynamically allocate memory for the input buffer. If
passed a null pointer,
getline()
dynamically allocates a buffer of sufficient size to hold the input.
If passed a pointer to dynamically allocated storage that is too small
to hold the contents of the string, the
getline()
function resizes the buffer, using
realloc()
, rather than truncating the input. If successful, the
getline()
function returns the number of characters read, which can be
used to determine if the input has any null characters before the
newline. The
getline()
function works only with dynamically allocated buffers. Allocated
memory must be explicitly deallocated by the caller to avoid memory
leaks. (See MEM31-C.
Free dynamically allocated memory when no longer needed.)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void
func(
void
) {
int
ch;
size_t
buffer_size = 32;
char
*buffer =
malloc
(buffer_size);
if
(!buffer) {
/* Handle error */
return
;
}
if
((ssize_t size =
getline(&buffer, &buffer_size, stdin))
== -1) {
/* Handle error */
}
else
{
char
*p =
strchr
(buffer,
'\n'
);
if
(p) {
*p =
'\0'
;
}
else
{
/* Newline not found; flush
stdin to end of line */
while
((ch =
getchar
()) !=
'\n'
&& ch != EOF)
;
if
(ch == EOF && !
feof
(stdin) && !
ferror
(stdin)) {
/* Character resembles EOF;
handle error */
}
}
}
free
(buffer);
}
|
Note that the
getline()
function uses an in-band
error indicator, in violation of ERR02-C.
Avoid in-band error indicators.
Noncompliant Code Example (
getchar()
)
Reading one character at a time provides more flexibility in
controlling behavior, though with additional performance overhead.
This noncompliant code example uses the
getchar()
function to read one character at a time from
stdin
instead of reading the entire line at once. The
stdin
stream is read until end-of-file is encountered or a newline
character is read. Any newline character is discarded, and a null
character is written immediately after the last character read into
the array. Similar to the noncompliant code example that invokes
gets()
, there are no guarantees that this code will not result in a buffer
overflow.
#include <stdio.h>
enum
{ BUFFERSIZE = 32 };
void
func(
void
) {
char
buf[BUFFERSIZE];
char
*p;
int
ch;
p = buf;
while
((ch =
getchar
()) !=
'\n'
&& ch != EOF) {
*p++ = (
char
)ch;
}
*p++ = 0;
if
(ch == EOF) {
/* Handle EOF or error */
}
}
|
After the loop ends, if
ch == EOF
, the loop has read through to the end of the stream without
encountering a newline character, or a read error occurred before the
loop encountered a newline character. To conform to FIO34-C.
Distinguish between characters read from a file and EOF or WEOF, the
error-handling code must verify that an end-of-file or error has
occurred by calling
feof()
or
ferror()
.
Compliant Solution (
getchar()
)
In this compliant solution, characters are no longer copied to
buf
once
index == BUFFERSIZE - 1
, leaving room to null-terminate the string. The loop continues to
read characters until the end of the line, the end of the
file, or an error is encountered. When
chars_read > index
, the input string has been truncated.
#include <stdio.h>
enum
{ BUFFERSIZE = 32 };
void
func(
void
) {
char
buf[BUFFERSIZE];
int
ch;
size_t
index = 0;
size_t
chars_read = 0;
while
((ch =
getchar
()) !=
'\n'
&& ch != EOF) {
if
(index <
sizeof
(buf) - 1) {
buf[index++] = (
char
)ch;
}
chars_read++;
}
buf[index] =
'\0'
;
/* Terminate string */
if
(ch == EOF) {
/* Handle EOF or error */
}
if
(chars_read > index) {
/* Handle truncation */
}
}
|
Noncompliant Code Example (
fscanf()
)
In this noncompliant example, the call to
fscanf()
can result in a write outside the character array
buf
:
#include <stdio.h>
enum
{ BUF_LENGTH = 1024 };
void
get_data(
void
) {
char
buf[BUF_LENGTH];
if
(1 !=
fscanf
(stdin,
"%s"
, buf)) {
/* Handle error */
}
/* Rest of function */
}
|
Compliant Solution (
fscanf()
)
In this compliant solution, the call to
fscanf()
is constrained not to overflow
buf
:
#include <stdio.h>
enum
{ BUF_LENGTH = 1024 };
void
get_data(
void
) {
char
buf[BUF_LENGTH];
if
(1 !=
fscanf
(stdin,
"%1023s"
, buf)) {
/* Handle error */
}
/* Rest of function */
}
|
Noncompliant Code Example (
argv
)
In a hosted
environment, arguments read from the command line are stored in
process memory. The function
main()
, called at program startup, is typically declared as follows when the
program accepts command-line arguments:
int
main(
int
argc,
char
*argv[]) {
/* ... */
}
|
Command-line arguments are passed to
main()
as pointers to strings in the array members
argv[0]
through
argv[argc - 1]
. If the value of
argc
is greater than 0, the string pointed to by
argv[0]
is, by convention, the program name. If the value of
argc
is greater than 1, the strings referenced by
argv[1]
through
argv[argc - 1]
are the program arguments.
Vulnerabilities
can occur when inadequate space is allocated to copy a command-line
argument or other program input. In this noncompliant code example, an
attacker can manipulate the contents of
argv[0]
to cause a buffer overflow:
#include <string.h>
int
main(
int
argc,
char
*argv[]) {
/* Ensure argv[0] is not null
*/
const
char
*
const
name = (argc && argv[0])
? argv[0] :
""
;
char
prog_name[128];
strcpy
(prog_name, name);
return
0;
}
|
Compliant Solution (
argv
)
The
strlen()
function can be used to determine the length of the strings referenced
by
argv[0]
through
argv[argc - 1]
so that adequate memory can be dynamically allocated.
#include <stdlib.h>
#include <string.h>
int
main(
int
argc,
char
*argv[]) {
/* Ensure argv[0] is not null
*/
const
char
*
const
name = (argc && argv[0])
? argv[0] :
""
;
char
*prog_name = (
char
*)
malloc
(
strlen
(name) + 1);
if
(prog_name != NULL) {
strcpy
(prog_name, name);
}
else
{
/* Handle error */
}
free
(prog_name);
return
0;
}
|
Remember to add a byte to the destination string size to accommodate the null-termination character.
Compliant Solution (
argv
)
The
strcpy_s()
function provides additional safeguards, including accepting the size
of the destination buffer as an additional argument. (See STR07-C.
Use the bounds-checking interfaces for string manipulation.)
#define
__STDC_WANT_LIB_EXT1__ 1
#include <stdlib.h>
#include <string.h>
int
main(
int
argc,
char
*argv[]) {
/* Ensure argv[0] is not null
*/
const
char
*
const
name = (argc && argv[0])
? argv[0] :
""
;
char
*prog_name;
size_t
prog_size;
prog_size =
strlen
(name) + 1;
prog_name = (
char
*)
malloc
(prog_size);
if
(prog_name != NULL) {
if
(strcpy_s(prog_name, prog_size,
name)) {
/* Handle error */
}
}
else
{
/* Handle error */
}
/* ... */
free
(prog_name);
return
0;
}
|
The
strcpy_s()
function can be used to copy data to or from dynamically allocated
memory or a statically allocated array. If insufficient space is
available,
strcpy_s()
returns an error.
Compliant Solution (
argv
)
If an argument will not be modified or concatenated, there is no
reason to make a copy of the string. Not copying a string is the best
way to prevent a buffer overflow and is also the most efficient
solution. Care must be taken to avoid assuming that
argv[0]
is non-null.
int
main(
int
argc,
char
*argv[]) {
/* Ensure argv[0] is not null
*/
const
char
*
const
prog_name = (argc &&
argv[0]) ? argv[0] :
""
;
/* ... */
return
0;
}
|
Noncompliant Code Example (
getenv()
)
According to the C Standard, 7.22.4.6 [ISO/IEC 9899:2011]
The
getenv
function searches an environment list, provided by the host environment, for a string that matches the string pointed to byname
. The set of environment names and the method for altering the environment list are implementation defined.
Environment variables can be arbitrarily large, and copying them into fixed-length arrays without first determining the size and allocating adequate storage can result in a buffer overflow.
#include <stdlib.h>
#include <string.h>
void
func(
void
) {
char
buff[256];
char
*editor =
getenv
(
"EDITOR"
);
if
(editor == NULL) {
/* EDITOR environment
variable not set */
}
else
{
strcpy
(buff, editor);
}
}
|
Compliant Solution (
getenv()
)
Environmental variables are loaded into process memory when the
program is loaded. As a result, the length of these strings can be
determined by calling the
strlen()
function, and the resulting length can be used to allocate adequate
dynamic memory:
#include <stdlib.h>
#include <string.h>
void
func(
void
) {
char
*buff;
char
*editor =
getenv
(
"EDITOR"
);
if
(editor == NULL) {
/* EDITOR environment
variable not set */
}
else
{
size_t
len =
strlen
(editor) + 1;
buff = (
char
*)
malloc
(len);
if
(buff == NULL) {
/* Handle error */
}
memcpy
(buff, editor, len);
free
(buff);
}
}
|
Noncompliant Code Example (
sprintf()
)
In this noncompliant code example,
name
refers to an external string; it could have originated from user
input, the file system, or the network. The program constructs a file
name from the string in preparation for opening the file.
#include <stdio.h>
void
func(
const
char
*name) {
char
filename[128];
sprintf
(filename,
"%s.txt"
, name);
}
|
Because the
sprintf()
function makes no guarantees regarding the length of the generated
string, a sufficiently long string in
name
could generate a buffer overflow.
Compliant Solution (
sprintf()
)
The buffer overflow in the preceding noncompliant example can be
prevented by adding a precision to the
%s
conversion specification. If the precision is specified, no more than
that many bytes are written. The precision
123
in this compliant solution ensures that
filename
can contain the first 123 characters of
name
, the
.txt
extension, and the null terminator.
#include <stdio.h>
void
func(
const
char
*name) {
char
filename[128];
sprintf
(filename,
"%.123s.txt"
, name);
}
|
Compliant Solution (
snprintf()
)
A more general solution is to use the
snprintf()
function:
#include <stdio.h>
void
func(
const
char
*name) {
char
filename[128];
snprintf(filename,
sizeof
(filename),
"%s.txt"
, name);
}
|