Back to main

C Language Constructs and Elements

This section is an introduction to the basics of C. By the end of this section, you will be able to understand and run a simple C program. The C language comprises comments, identifiers, constants, statements and expressions, operators and keywords. These are the basic elements of C and in this section we introduce each of them and provide some examples of their use. We also introduce the pre-processor and show how it can be used in the construction of large packages.

Comments

A comment is a sequence of characters which explains a piece of code. Programmers use comments to say in plain English what the C commands in a program do. A comment begins with a /* and ends with a */.

Comments are ignored by the compiler - they exist solely to please humans reading the code. Every source file you write should start with a comment to explain what it does. It is often helpful to include your name and date as well. There should be lots of comments throughout the program to explain what is happening.

The examples below are taken from the source code of the minefield application.

A Simple Comment

/* Find out who the user is */

A Multi-line Comment

/* The stamp routine returns true if the indicated location has a mine
   or false if not. If a square with a border mine count of zero is stamped,
   surrounding squares are recursively exposed */

An Illegal Comment

/* This comment starts off fine,
       /* Nested comments aren't allowed */
   The comment ended on the line above so this is a syntax error */

Identifiers

Identifiers are user-supplied names for variables, functions and labels.

An identifier is a sequence of one or more letters, digits or underscores '_'. The name of a variable can then be used in place of its value. Similarly, the identifier of a function is used in place of the function. Some examples are given below.

Valid Identifiers

x /* A variable named x */
voltage /* An informative variable name */
Current /* Upper- and lower-case letters are allowed in identifiers */
sqrt( ) /* A mathematical function */

There are some important restrictions upon the form an identifier can take:

Invalid Identifiers

90_entry /* Identifiers may not start with a digit */
-letter /* Neither may they start with an operator */
int /* A reserved word can't be used as in identifier */

Constants

A constant is a fixed-value integer, real number character or string. It is an error to attempt to assign a value to a constant once it has been declared.

There are four types of constant in C:

int
whole-number values;
float and double
real numbers of respectively lesser and greater precision;
char
a single ASCII character enclosed in single quotes;
char [] a string)
an array of ASCII characters between double quotes to which the null character '\0' is appended (we will deal with arrays in due course).

Example Constants

const int x = 42;
const double e = 2.7182818285;
const char c = 'a';
const char s[] = "Hello, world!";

Note that 17 is an integer constant whereas 17.0 is a floating-point one. This can be confusing: 3/2 evaluates to 1 in C, because dividing an integer by an integer produces an integer result. 3.0/2.0 evaluates to 1.5. It is always best to express floating-point values that happen to be whole numbers with the trailing ".0" to avoid confusion.

The Character Set in C Programs

The Alpha-numeric part of the C character set comprises upper and lower case letters, A-Z and a-z, and digits 0-9. The Alpha-numeric characters are used to form constants, identifiers and keywords. C is case-sensitive, so the identifier number is not the same as Number. Other characters are grouped together as white space (blank spaces, new lines and tabs characters); punctuation: (parentheses ( ), braces { }, commas ',' and semicolons ';' etc.); and special characters (e.g. +, -, *, /, !, &, ->, # etc.).

Special characters are mostly operators, and # is used by the pre-processor.

Control Sequences such as C-c are used to force things to happen at run-time. The sequence names are largely historical. Strictly, the association of the characters (like C-z) with an action (like suspend) are a function of the operating system rather than the C programming language, but the Unix control sequences are now almost uniformly adopted.

C uses Escape Sequences to represent non-printing characters. The solidus ("backslash") chacter is used as the escape. For example, '\a' is the "alert" of "Bell" chacter, C-g or BEL in ASCII. Esacape sequences can be used in strings and character constants.

SequenceNameCodeHex
\aBell (alert)BEL\0x07
\bBackspaceBS\0x08
\fForm feedFF\0x0C
\nNew lineLF\0x0A
\rCarriage returnCR\0x0D
\tHorizontal tabHT\0x09
\vVertical tabVT\0x0B
\0Null terminatorNUL\0x00
\\Back slash\0x5C
\'Single quotation mark\0x27
\"Double quotation mark\0x22
\dddASCII character in octal
\xdddASCII character in hex
The sequences \ddd and \xddd allow any character in the ASCII character set to be given in terms of its equivalent octal or hexadecimal representation. Escape sequences allow you to send nongraphic control characters to a display device. For example, the ESC character ASCII-27 (Hex \x1B or Octal \033) is often used as the first character of a control command for a terminal or printer. Some escape sequences are device-specific. For instance, the vertical-tab \v and form-feed \f escape sequences might not affect screen output, but they do perform appropriate printer operations.

Statements and Expressions

An expression consists of an operator and its operand and it can occur whenever a value is allowed. In its simplest form an expression may be just a constant or a variable. Expressions can be combined with operators to create new expressions. As you can see from the examples, an expression can contain arithmetic or bitwise operators, returning a numberical result, or boolean (or logical) operators which compare values yielding a logical result.

Example Expressions

/* Let us assume the following values:
   i = 1
   n = 2

   i + 1 --- value is 2
   i > n --- value is 0 (false) */

An expression becomes a statement when it is used for a purpose. The examples above are turned into statements below. A statement in C is terminated by a semicolon. (Often when a program doesn't compile, it is a simple mistake like omitting a semicolon which is the cause of the error.)

Example Statements

i = i+1;    /* Increment i by 1 */
i++;        /* Alternative notation for incrementing i */
if( i < n ) /* Logical expression */
    puts(" i less than n");

A statement which explicitly modifies the value of a variable (e.g. i = 3) is called and assignment. C permits a very useful shorthand called an assignment operator which avoids stating the operand more than once:

Example Assignment Operators

count += 5;    /* short for count = count+5 */
dbl *= 2;      /* short for dbl = dbl * 2 */

Finally, statements can be grouped together into a block or compound statement by {enclosing them in braces}. At the moment, it won't be obvious why one should want to do this, but it is essential to allow such grouping when using conditional or loop constructs, which will be covered later.

Example Compound Statement

{ /* The following two assignments form a single compound statement */
    i = 1;
    n = 2*i;
}


In fact, all simple (non-compound) non-void C statements can be used as expressions. For example, the value of an assignment is equal to the value assigned. This has some very useful side effects: for example, if x and y are both integers, one can write:
x = y = 5;
The assignment y=5 is performed first, because the assignment operator associates right-to-left. But result of the assignment itself has a value of 5 which is then assigned to x. You will see the following C shorthand very often when opening disk files, or reading data from the console:
FILE *fp;
if ((fp = fopen("myfile")) != NULL) ...
This calls the C standard I/O library subroutine to open a disk file, returning the value assigned to fp. The returned value is compared to NULL, which would indicate failure, and if everything is OK, the program proceeds.
int c;
while ((c = getchar()) != EOF) {
    /* Process characters read one at a time from standard input
       until the end of file is reached */
    ...
}
c is the result of the getchar() function, resulting in a character being read from standard input (usually the keyboard). As long as the character read is not the end-of-file symbol, it is processed.

Operators

Operators are special character combinations that specify how values are to be transformed and assigned. Operators perform specific transformations on operands. Some are more important than others - they have higher precedence. They are included below for completeness but it is not usually necessary to remember them. The golden rule here is to make the code clear so use lots of brackets; even use brackets where you don't need them if it increases readability. The code produced by the compiler won't be any larger slower because of extra brackets in the source.

PrecedenceOperatorNameNotes
1( )Function or Parenthesis
[]Array
.Member selection
->Member selection
2+Unary positiveRIGHT to LEFT
-Unary negateRIGHT to LEFT
~Bitwise one's complement NOT
!Logical NOT
*Indirection
&Reference ("Address of")
++Increment
--Decrement
sizeofSize in bytes
(type)Type cast
3*Multiplication
/Division
%Remainder
4+Addition
-Subtraction
5<<Bitwise left shift
>>Bitwise right shift
6<Less than
>Greater than
<=Less than or equal to
>=Greater than or equal to
7==Equality
!=Inequality
8&Bitwise AND
9^Bitwise exclusive OR
10|Bitwise inclusive OR
11&&Logical AND
12||Logical OR
13exp1?exp2:exp3ConditionalRIGHT to LEFT
14=Simple assignmentRIGHT to LEFT
+=Addition assignment
-=Subtraction assignment
*=Multiplication assignment
/=Division assignment
%=Remainder assignment
>>=Right shift assignment
<<=Left shift assignment
&=Bitwise AND assignment
|=Bitwise OR assignment
^=Bitwise exclusive OR assignment
15,Sequential evaluation

Keywords

Keywords are reserved identifiers known to the compiler. Some of them, such as int and float have been introduced already. Others are described in future topics.

Because it would lead to multiple definitions, keywords cannot be used as user-defined identifiers.

The following is a list of keywords used by the compiler:

autodoubleintstruct
breakelselongswitch
caseenumregistertypedef
charexternreturnunion
constfloatshortunsigned
continueforsignedvoid
defaultgotosizeofwhile
doifstaticvolatile

The C Preprocessor

The pre-processor is the first stage of the compiler. The pre-processor directives tells the pre-processor what code to change before the compiler does its job. It could be used to define certain constants or include prototypes of standard functions.

Pre-processor directives all begin with a # character. It is important to appreciate that they are not part of the C language and different rules apply to their use. The most important is that, unlike C programs which can be laid out in any way which promotes readability, the # character must be the first character on the line although the # and the pre-processor directive may be separated on the line by white space.

The #define directive will substitute a given string for a different one everywhere it appears in the source code.

#define PI 3.14159
...
/* Calculate the area and store it */
area = PI * r * r;
...

#define macros can be made to take arguments, just like a function, but it is important to bear in mind that all that is taking place is string substitution. It is crucial to make sure there are no side effects of expanding the macro: it might be defined inside a header file, and introduce some subtle programming errors.

Dangerous Macros

#define D1(x) x - 2
#define S1(x) (x-2)
#define D2(x) x++
The macro D1 is dangerous because it does nothing to ensure operator precedence does not change the expected result. Evaluating 3*D1(1) evaluates 3*1-2 and returns the result 1, whereas the programmer had almost certainly intended the behaviour exhibited by the safe macro, S1. D2 is dangerous because it may not be obvious that the variable passed to it as an argument will be modified. It will also fail if expanded with a constant, e.g. D2(4), because the resulting 4++ is invalid C.

The following example includes a file called stdio.h which contains numerous definitions and declarations used by the standard input/output library functions.

#include <stdio.h> /* Standard I/O Functions
                      for Screen, Keyboard, Disk etc */
As a further point of convention, the file name can be enclosed by quotation marks "" or <angled brackets>. The choice directs the search of the compiler for the include file. Angle brackets direct the compiler to begin the search in directories specified in the compiler command line. If it is not found, the search continues in the "standard places" and in the standard directories specified in the INCLUDE environment variable used to specify header files supplied with the compiler. The "standard places" vary from system to system, but in posix-compliant systems the directory /usr/include will be among those searched.

Quotation marks direct the compiler to begin the search in the current directory, then to proceed to any directory specified in the command line and finally (if still not found), to search in the standard directories.

How to #include Your Own Definitions

#include "myfile.h" /* Include some definitions of your own */
This form directs the preprocessor to search the current directory, then any command line directories and finally the standard directories for the file myfile.h. When it is found, its contents replaces the #include line with the contents of myfile.h.

This simplifies maintenance when repeated across several source code files as only one directive or definition has to be changed for it to effect all by inclusion all subsequent compilations. Files of this nature are termed header files and by convention have the filename extension .h.


When writing C programs, it is usually wise to break the source code up among many different .c files containing related functions. To inform the compiler of the contents of the other files when compiling each one, function prototypes for all of the functions in a .c file are made available in an associated .h file, which can be included where appropriate.

It is good practice to include the associated .h file in the .c file. The reason for this is that very subtle bugs can be introduced if the programmer makes changes to the functions in the .c file, and inadvertently forgets to modify the .h file. Other .c files are now mis-informed as to the contents of the compiled object file. By including the .h file in the .c file, the compiler will check that both agree, and cause an error to be thrown if there is a mismatch.


Sometimes, one may wish to include a .h file exactly once. Maybe there are declarations in the header file which if executed twice in succession would result in a compiler error. Unfortunately, is is unreasonable to expect the programmer to keep track of exactly what is included where, and in any case, header files from different sub-packages which could otherwise be compiled separately may wish to include within themselves other header files.

A solution is to use the conditional preprocessor directives #ifndef, #ifdef and so on. Consider the following example:

Ensuring Inclusions are Singluar

/* File: myheader.h */
#ifndef __MYHEADER_H__
#define __MYHEADER_H__
/* The main body of the header file goes here */
#endif
If the file is being included for the first time, the pre-processor symbol __MYHEADER_H__ is undefined, so the pre-processor defines it and then reads the rest of the file. If the file is included for a second time, the __MYHEADER_H__ is defined, so the rest of the file body of the file is not included for again.

Note that the choice of preprocessor symbol, __MYHEADER_H__, is entirely arbitrary, but you should keep it consistent across all of the .h files in the package. In the section on identifiers, we warned against starting an identifier with an _ character. This does not apply to preprocessor symbols, which are not C program identifiers: when the file is fully expanded, the compiler will not read any of the preprocessor strings.


At last, a C program!

Step through the following noticing the comment associated with each line. Try entering the code, compiling it and running it as explained in the tools section. Remember not to enter the line numbers!

Summary:

C programs consist of various different constructs, including comments, identifiers, constants, statements and expressions, operators and keywords.

Before the compiler translates C to machine code, the source is passed through a pre-processor which permits macro definitions and the inclusion of other files.

System Requirements

To view this web resource, you will need:

Copyright and Acknowledgements

The tools upon which this course relies are Copyright the Free Software Foundataion where they are made available under the GPL (GNU Public Licence).

The content of this course was derived from that generated by many ex-colleagues at the University of Leeds, Department of Electronics and Electrical Engineering. Much of the content has been reworked, and substantially augmented, but Dr N J Bailey, Centre for Music Technology, The University of Glagsow. This manifestation is Copyright N J Bailey; some of the content is Copyright The University of Leeds.

Diagrams on this resource are drawn in XFig and are rendered by the browser using The University of Hamburg's Simple FIG viewer applet which is Copyright (C) 1996-2002 F.N.Hendrich, hendrich@informatik.uni-hamburg.de.

The source code, programming examples and exercises are all specific to this course, and are Copyright, Dr N J Bailey.

The applet for viewing and demonstrating C programs is Copyright Dr N J Bailey, and is to be found documented and with its source code on the Centre for Music Technology website under Software