Jump to content

Reserved word

From Wikipedia, the free encyclopedia

In a programming language, a reserved word (sometimes known as a reserved identifier) is a word that cannot be used by a programmer as an identifier, such as the name of a variable, function, or label – it is "reserved from use". In brief, an identifier starts with a letter, which is followed by any sequence of letters and digits (in some languages underline '_' is treated as a letter!).

In an imperative programming language and in many object-oriented programming languages, apart from assignments and subroutine calls, keywords are often used to identify a particular statement, e.g. if, while, do, for, etc. Many languages treat keywords as reserved words, including Ada, C, C++, COBOL, Java, and Pascal. The number of reserved words varies widely from one language to another: C has about 30 while COBOL has about 400.

Note that a few languages do not have any reserved words. Fortran and PL/I identify keywords by context, while Algol 60 and Algol 68 generally use stropping to distinguish keywords from programmer-defined identifiers.

Most programming languages have a standard library (or libraries), e.g. mathematical functions sin, cos, etc. The names provided by a library are not reserved, and can be redefined by a programmer if the library functionality is not required.

Distinction

[edit]

When using an Interactive Development Environment (IDE) to develop a program, the IDE will generally highlight reserved words by displaying them in a different colour. In some IDEs, comments may also be highlighted (in yet another colour). This makes it easy for a programmer to notice unexpected use of a reserved word and/or failure to terminate a comment correctly.

There may be reserved words which are not keywords. For example, in Java, true and false are reserved words used as Boolean (logical) literals. As another example, in Pascal, div and mod are reserved words used as operators (integer division and remainder).

There may also be reserved words which have no defined meaning. For example, in Java, goto and const are listed as reserved words, but are not otherwise mentioned in the Java syntax rules.

A keyword such as if or while is used during syntax analysis to determine what sort of statement is being considered. Such analysis is much simpler if keywords are either reserved or stropped. Consider the complexity of using contextual analysis in Fortran 77 to distinguish:

  • IF (B) l1,l2  ! two-way branch, where B is a boolean/logical expression
  • IF (N) l1,l2,l3  ! three-way branch, where N is a numeric expression
  • IF (B) THEN  ! start conditional block
  • IF (B) THEN = 3.1 ! conditional assignment to variable THEN
  • IF (B) X = 10  ! single conditional statement
  • IF (B) GOTO l4  ! conditional jump
  • IF (N) = 2  ! assignment to a subscripted variable named IF

PL/I can also allow some apparently confusing constructions:

  • IF IF = THEN THEN ... (the second IF and the first THEN are variables).

Disadvantages

[edit]

Definition of reserved words in a language raises problems. The language may be difficult for new users to learn because of a long list of reserved words to memorize which can't be used as identifiers. It may be difficult to extend the language because addition of reserved words for new features might invalidate existing programs or, conversely, "overloading" of existing reserved words with new meanings can be confusing. Porting programs can be problematic because a word not reserved by one system or compiler might be reserved by another.

Because reserved words cannot be used as identifiers, users may choose deliberate misspellings of reserved words as identifiers instead, such as clazz for Java variables of type Class.[1]

Further Reservation

[edit]

Beyond reserving specific lists of words, some languages reserve entire ranges of words, for use as private spaces for future language version, different dialects, compiler vendor-specific extensions, or for internal use by a compiler, notably in name mangling.

This is most often done by using a prefix, often one or more underscores. C and C++ are notable in this respect: C99 reserves identifiers that start with two underscores or an underscore followed by an uppercase letter, and further reserves identifiers that start with a single underscore (in the ordinary and tag spaces) for use in file scope;[2] with C++03 further reserves identifiers that contain a double underscore anywhere[3] – this allows the use of a double underscore as a separator (to connect user identifiers), for instance.

The frequent use of a double underscores in internal identifiers in Python gave rise to the abbreviation dunder; this was coined by Mark Jackson[4] and independently by Tim Hochberg,[5] within minutes of each other, both in reply to the same question in 2002.[6][7]

Specification

[edit]

The list of reserved words and keywords in a language are defined when a language is developed, and both form part of a language's formal specification. Generally one wishes to minimize the number of reserved words, to avoid restricting valid identifier names. Further, introducing new reserved words breaks existing programs that use that word (it is not backwards compatible), so this is avoided. To prevent this and provide forward compatibility, sometimes words are reserved without having a current use (a reserved word that is not a keyword), as this allows the word to be used in future without breaking existing programs. Alternatively, new language features can be implemented as predefineds, which can be overridden, thus not breaking existing programs.

Reasons for flexibility include allowing compiler vendors to extend the specification by including non-standard features, different standard dialects of language to extend it, or future versions of the language to include additional features. For example, a procedural language may anticipate adding object-oriented capabilities in a future version or some dialect, at which point one might add keywords like class or object. To accommodate this possibility, the current specification may make these reserved words, even if they are not currently used.

A notable example is in Java, where const and goto are reserved words — they have no meaning in Java but they also cannot be used as identifiers. By reserving the terms, they can be implemented in future versions of Java, if desired, without breaking older Java source code. For example, there was a proposal in 1999 to add C++-like const to the language, which was possible using the const word, since it was reserved but currently unused; however, this proposal was rejected – notably because even though adding the feature would not break any existing programs, using it in the standard library (notably in collections) would break compatibility.[8] JavaScript also contains a number of reserved words without special functionality; the exact list varies by version and mode.[9]

Languages differ significantly in how frequently they introduce new reserved words or keywords and how they name them, with some languages being very conservative and introducing new keywords rarely or never, to avoid breaking existing programs, while other languages introduce new keywords more freely, requiring existing programs to change existing identifiers that conflict. A case study is given by new keywords in C11 compared with C++11, both from 2011 – recall that in C and C++, identifiers that begin with an underscore followed by an uppercase letter are reserved:[10]

The C committee prefers not to create new keywords in the user name space, as it is generally expected that each revision of C will avoid breaking older C programs. By comparison, the C++ committee (WG21) prefers to make new keywords as normal‐looking as the old keywords. For example, C++11 defines a new thread_local keyword to designate static storage local to one thread. C11 defines the new keyword as _Thread_local. In the new C11 header <threads.h>, there is a macro definition to provide the normal‐looking name:[11]

#define thread_local _Thread_local

That is, C11 introduced the keyword _Thread_local within an existing set of reserved words (those with a certain prefix), and then used a separate facility (macro processing) to allow its use as if it were a new keyword without any prefixing, while C++11 introduce the keyword thread_local despite this not being an existing reserved word, breaking any programs that used this, but without requiring macro processing.

Reserved words and language independence

[edit]

Microsoft's .NET Common Language Infrastructure (CLI) specification allows code written in 40+ different programming languages to be combined into a final product. Because of this, identifier/reserved word collisions can occur when code implemented in one language tries to execute code written in another language. For example, a Visual Basic (.NET) library may contain a class definition such as:

' Class Definition of This in Visual Basic.NET:

Public Class this
        ' This class does something...
End Class

If this is compiled and distributed as part of a toolbox, a C# programmer, wishing to define a variable of type "this" would encounter a problem: 'this' is a reserved word in C#. Thus, the following will not compile in C#:

// Using This Class in C#:

this x = new this();  // Won't compile!

A similar issue arises when accessing members, overriding virtual methods, and identifying namespaces.

This is resolved by stropping. To work around this issue, the specification allows placing (in C#) the at-sign before the identifier, which forces it to be considered an identifier rather than a reserved word by the compiler:

// Using This Class in C#:

@this x = new @this();  // Will compile!

For consistency, this use is also permitted in non-public settings such as local variables, parameter names, and private members.

See also

[edit]

References

[edit]
  1. ^ Zammetti, Frank (2007). Practical JavaScript, DOM Scripting and Ajax Projects. Apress. ISBN 9781430201977.
  2. ^ C99 specification, 7.1.3 Reserved identifiers
  3. ^ C++03 specification, 17.4.3.2.1 Global names [lib.global.names]
  4. ^ Jackson, Mark (September 26, 2002). "How do you pronounce "__" (double underscore)?". python-list (Mailing list). Retrieved November 9, 2014.
  5. ^ Hochberg, Tim (Sep 26, 2002). "How do you pronounce "__" (double underscore)?". python-list (Mailing list). Retrieved November 9, 2014.
  6. ^ "DunderAlias - Python Wiki". wiki.python.org.
  7. ^ Notz, Pat (Sep 26, 2002). "How do you pronounce "__" (double underscore)?". python-list (Mailing list). Retrieved November 9, 2014.
  8. ^ "Bug ID: JDK-4211070 Java should support const parameters (like C++) for code maintainence [sic]". Bugs.sun.com. Retrieved 2014-11-04.
  9. ^ "Lexical grammar - JavaScript | MDN". developer.mozilla.org. 8 November 2023.
  10. ^ C99 specification, 7.1.3 Reserved identifiers: "All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use."
  11. ^ C11:The New C Standard, Thomas Plum, "A Note on Keywords"