This is the second post in ‘Fundamentals of C Programming‘ category after ‘Introduction to C Programming‘. The topics that this post will cover are:
- Character Set in C
- Tokens in C
- Keywords in C
- Identifiers in C
Character Set in C
C programming has a set of characters that are considered to be the building blocks to form basic program elements. The C character set contains 52 upper and lower-case letters of the Latin/English alphabets, i.e. A-Z and a-z. The ten decimal digits, i.e. 0-9 and some special characters like bracket, operators, semicolon, comma etc. are also present in the C character set. For ease in reading, let’s list out the elements of the C character set:
- Uppercase letters: A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z
- Lowercase letters: a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z
- Decimal digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
- Arithmetic operators: +, -, *, /, % (Modulo Division)
- Special characters (symbols):
{ } ( ) [ ] < > ; : , . ? $ ! = ‘ “ & | ^ ~ ` # \ blank – _ / * @ %
Tokens in C
Tokens refer to the character set that consists of the basic elements recognized by the compiler. There are six different class of tokens: keywords, string literals, constants, identifiers, operators and other separators. Blanks, horizontal and vertical tabs, new lines, formfeeds and comments are described collectively known as whitespace, which are ignored by the compiler. However, they make the code easily readable by the programmer since they separate adjacent identifiers, keywords and constants. Therefore, the types of tokens in C can be listed below as:
- Identifiers
- Constants
- Keywords
- Operators and punctuators
- Strings
- Special symbols
Keywords
Keywords are the reserved words that have standard, predefined meanings in a programming language. The ANSI C has defined 32 standard keywords. A keyword cannot be used for any other purpose than its predefined purpose. For example, a keyword can never be used as a variable name or any other identifier. The following keywords are present in C programming:
auto | double | int | long |
break | else | long | switch |
case | enum | register | typedef |
char | extern | return | union |
const | float | short | unsigned |
continue | for | signed | void |
default | goto | sizeof | volatile |
do | if | static | while |
Identifiers
The names given to various program elements like variables, functions, labels and other user defined items are known as identifiers. In short, they help to identify the user defined items. They can consist of letters and digits in any order, but the first character should always be a letter. Both upper and lower case letters can be used, although lowercase letters are used in most of the cases. Upper and lower case letters are not interchangeable because an uppercase letter is not equivalent to its corresponding lowercase letter. The underscore symbol (_) can also be included and it is considered as a letter. Therefore, an identifier can begin with an underscore too. However, an underscore is generally used in the middle of an identifier. The rules for defining the names of identifiers in C programming are given below:
- An identifier can be a combination of letters, digits and underscore (_) in any order.
- The first character of an identifier must be an alphabet or an underscore.
- Identifiers must contain only letters, digits or underscore.
- In an identifier, the upper and lower-case letters are treated as different since C is a case-sensitive language. For example, the identifier setting is not equivalent to SETTING or Setting.
- A keyword (reserved word) cannot be used as an identifier.
- There should not be whitespace in identifier.
- There is no restriction in the length of an identifier. However, only the first three characters are generally significant. For example, if a C compiler recognizes only the first three characters, then the identifiers pay and payment are the same.
The following table shows the examples of valid and invalid identifiers:
Correct (valid) | Incorrect (invalid) |
x | “x” |
roll_no | roll no |
y12 | 12y |
nepal | nepal’s |
item_1 | item-1 |
_temp | -temp |
order_no | order-no |