An introduction to Scheme for C programmers


You must unlearn what you have learned.

    -- Yoda

Purpose

This page is an introduction to the Scheme programming language for people who have already learned the C programming language, and specifically for such students who are also currently taking the CS 1 class at Caltech (Introduction to Computation). The reason for this page is that CS 1 uses Scheme, whereas many students in the class taking the course (most of whom are freshmen) have only programmed in C (and/or related languages like C++ or Java) before. This is an issue because Scheme and C each encourage a very different style of programming (and of thinking), and there are a large number of differences between the two languages, ranging from trivial syntactic differences to very deep semantic differences. In fact, it is probably safe to say that knowledge of C will not only not help you when learning Scheme, it may actually make the process more difficult. Therefore, this page summarizes the major differences between the two languages to make the transition easier. We will assume that you, the reader, are reading this because you are taking CS 1 at Caltech, and we will also assume that you have a reasonably thorough knowledge of C; if not, you have nothing to worry about ;-)

Before we begin, it's important that you realize that we are not just teaching you a new language and a new way of thinking to torture you. Programming languages change and evolve at an astonishing rate; many of the most popular languages around now didn't even exist a few years ago. Programming paradigms change more slowly, but there are already many of these (imperative, functional, object-oriented, logic programming, constraint-based programming, stream-based programming) with more on the way (quantum computation, DNA computation, etc.). If all we taught you was how to program in one language we would be doing you a disservice. What is needed is for us to train you to think in a different and more flexible way so that learning new languages and new paradigms isn't difficult any more. That's what this course is about. It will probably be painful at first, but if you stick with it it will be rewarding.

One thing we will not to do on this page is to indulge in pointless discussion over whether Scheme or C is a "better" language; both languages have their place. Scheme does have major advantages as a language for teaching the fundamental ideas of programming, which is why we use it in CS 1. A lot of the programs that you have to write for CS 1 will either not be (directly) writable in C or else would be much more difficult to write in C.

Another thing we won't do on this page is to try and explain the underlying semantics of Scheme; we'll do that in class. Instead, this is a practical summary of the differences between C and Scheme. You should always be aware, though, that when we say "feature X in C is equivalent to feature Y in Scheme", this doesn't mean that they are semantically equivalent; it just means that they are used to achieve the same end.

Finally, if you don't understand something on this page, particularly regarding Scheme, don't panic; we've tried to be fairly comprehensive here and many of the aspects of Scheme we discuss below aren't covered in class until after the midterm. As always, you should ask your TA, your recitation instructor, or the course instructor if you have any questions.


Menu

The differences between Scheme and C can be divided into these categories: We will discuss each of these individually. Most of these comments also apply to C++ and Java as compared to Scheme. The main exception is that Java, like Scheme but unlike C and C++, doesn't have pointers and does have garbage collection. Java and C++ also have a number of other features (exception handling, support for object-oriented programming) that are beyond the scope of this discussion.


Syntax

Syntactic sugar causes cancer of the semicolon.

    -- Alan Perlis

Atoms and lists

Scheme has an incredibly minimalistic syntax compared to C. There are basically only two fundamental syntactic forms: individual data objects (often called "atoms") and lists. Atoms consist mainly of numbers and identifiers. Lists are lists of atoms (or other lists) surrounded by parentheses. Some examples:

atoms lists
1 (x y z)
2.3 (1 2 3 4 5)
x ((1 2) (3 4) 5)

Usually the only other syntactic form you will see in Scheme are comments, which start with a semicolon ";" and go to the end of the line:

C Scheme
/* This is a comment in C. */ ; This is a comment in Scheme.

The semicolon has no other function in Scheme. In particular, it does not serve as an end-of-statement marker. There is no multi-line comment operator in Scheme either; each comment line must have its own semicolon. It is also quite typical in Scheme to have a comment that starts with more than one semicolon; this is just to make the comment look nicer and has no other meaning:

Scheme
; This is a comment in Scheme.
;; This is another comment in Scheme.

Identifiers

In C, identifiers (function and variable names) can only be made from the letters 'A' to 'Z', 'a' to 'z', the numbers '0' to '9' and the underscore '_'. Scheme is much more liberal about identifiers; in addition to the C characters, you can also use these characters in identifiers:
! $ % & * + - . / : < = > ? @ ^ ~ 
The most common of these characters to be used in identifiers are "?", "!", and "-". "?" is typically used in the names of functions that return a boolean (true/false) value (e.g. "even?"), "!" is typically used in the names of functions or special forms that change the values of variables (e.g. "set!") and hyphens are typically used in function names where C would use underscores (e.g. "calculate-result"). These are just conventions and are not enforced by the language. Also, note that the usual arithmetic operators (+, -, *, /) are just ordinary identifiers in Scheme. We'll talk more about them below.

Also, in Scheme, identifiers are not case-sensitive, so "foo" and "FOO" mean the same thing when used as e.g. a variable or function name. In C, identifiers are case-sensitive, so "foo" and "FOO" would represent two completely different names.

Prefix, not infix

Probably the most disorienting aspect of Scheme syntax for C programmers is that mathematical operations are expressed in prefix syntax rather than in infix syntax. "Infix" means that the operator (e.g. +, -, *, /) comes between the operands (e.g. numbers or variables) whereas "prefix" means that the operator comes before the operands. Here are some examples:

C Scheme
a + b (+ a b)
(a - b) * 2 (* (- a b) 2)

This is quite unpleasant for most people at first, but it does have several advantages:

Even if you absolutely hate prefix syntax at first, you will probably find that after a couple of weeks it won't bother you any more. Eventually you may even prefer it; it's simpler and less ambiguous.

Functions and special forms

By default, a parenthesized expression in Scheme is a function call. The rule for function calls is to evaluate all the arguments and then apply the function to the arguments. C uses the same rule (although with different syntax). There are a small number of "special forms" in Scheme that look like function calls but have different evaluation rules (for instance, an "if" expression only evaluates one of two subexpressions). Many of these special forms correspond to similar expressions in C; see below for an item-by-item description.

No special syntax

One confusing aspect of Scheme syntax is the fact that there is no special syntax for many of the language features that require special syntax in C. For instance, there is no special syntax for: Instead, these language features use special identifiers inside lists. For instance, a conditional expression is a list whose first element is the word "if". See below for more on this.


Data types

Numbers

Scheme has fewer numerical data types than C does. C has a variety of integer data types (short, int, long, unsigned short, etc.) whereas Scheme has only one. The Scheme integer type is quite powerful, though, in that it can hold arbitrarily large numbers (there is no integer overflow). Similarly, C has two floating-point data types (float and double) whereas Scheme only has one. Most Schemes (including DrScheme) also have built-in support for rational numbers and complex numbers, but we won't be using them in CS 1.

Booleans

Unlike C, Scheme has a real boolean type. "True" is designated by "#t" and "false" by "#f". DrScheme also lets you use "true" and "false" in place of "#t" and "#f" on many of the language levels, but don't do it: it isn't portable to other versions of Scheme.

Like C, Scheme is not too picky about true values; any value that is not a false value is considered to be true if it's used as a boolean.

Characters

Both C and Scheme have a character data type:

C Scheme
'a'   'b'  'c'  '\n'  ' ' #\a  #\b   #\c  #\newline  #\space

As you can see, some characters have mnemonic names e.g. #\newline.

Strings

In C, strings are just arrays of characters (chars). In Scheme, strings are a basic data type. The syntax is the same, except that Scheme doesn't necessarily support the C escape sequences like "\n". Many Schemes (including DrScheme) do support this syntax, though.

Symbols

Scheme also has a data type for symbols. See below for more on this.

Lists

Lists are a built-in data type in Scheme but not in C. These lists are singly-linked lists, which means that from a given element you can only go in one direction along the list. Other kinds of lists (e.g. doubly-linked lists) are not built in to either C or Scheme, but can easily be defined in either language. You will learn all about how to create and manipulate lists in CS 1 or in any Scheme textbook, so we won't go over that here.

Arrays

Both C and Scheme have arrays, but in Scheme, arrays are called "vectors" and can contain objects of any type, whereas in C they can usually only contain objects of a single type. See below for more about arrays.


Type systems

C is a statically typed language. That means that all variables have to have a pre-declared type, and it's an error to assign a value of one type to a variable that has been declared to have a different type (or at least, it generates a compiler warning). In addition, all variables must be declared before use. In contrast, Scheme is a dynamically typed language. In Scheme, values (data objects) have types, not variables. A variable can store values of any type, and can store a value of one type at one time and another type at another time. Some people say that Scheme is "weakly typed" because of this, but that's not true; it's simply that type information is used differently. Variables in Scheme do need to be declared before use, but their type doesn't have to be specified. Example:

C Scheme
int x = 0;
x = 1;
x = "foo"; /* compiler complains */
(define x 0) ; no type specified
(set! x 1)
(set! x "foo") ; OK

Note that many type errors in C don't result in errors but instead in compiler warnings (as in the above example). Thus, C's type checking is quite loose. We might say that C is only weakly strongly-typed ;-)


Built-in functions

Scheme has a fairly large number of built-in functions that can be used directly in programs (i.e. without the equivalent of a "#include" statement in C). Examples include basic math functions like sqrt (square root), sin (sine), cos (cosine) etc. These functions are NOT KEYWORDS; they are just ordinary functions. See the documentation in the links below for more information on these functions. Example:

C Scheme
#include <math.h>
sqrt(10.0);
(sqrt 10.0)


Language features that look different in Scheme when compared to C

Many language features are present in both C and Scheme but look quite different. We will show examples of each of these features without much discussion. For more details on the Scheme code see the links at the bottom of this page.


Language features that exist in Scheme but not in C

Scheme includes several language features that are not found in C:


Language features that exist in C but not in Scheme

The following constructs in C have no direct counterparts in Scheme:


Things you don't have to worry about in Scheme that you do have to worry about in C

Generally speaking, Scheme is a much safer language to program in than C. Large numbers of bugs that typically occur in C programs either can't happen at all in Scheme, or else they can't happen without giving rise to an immediate error which the Scheme environment reports to the programmer. Examples include:


Things you don't have to worry about in C that you do have to worry about in Scheme

There really is only one thing you have to worry about in Scheme that you don't have to worry about in C: type checking. In C, the compiler checks that all operations are performed on values of the correct type and complains if they're not (more or less; it's possible to disable this using type casts). In Scheme, type checks are done at run time, so that you can write a function with incorrect type usages, and no error will be reported until you run that function. Example:

C Scheme
void bad()
{
    char s[] = "foo";
    int result;

    result = 1 + s;  /* generates a compiler warning */
}
;; This is perfectly legal Scheme:
(define (bad)
  (+ 1 "foo"))

;; We get an error when we call the function:
(bad) 

Note, however, that even flagrant type errors like trying to add a string to an integer only result in compiler warnings, not outright errors. The type checking in C is actually quite loose (the same code would give a compiler error in C++), so the type checking in C doesn't help as much as you would think.


Programming style

You have taken your first step into a larger world.

    -- Obi-wan Kenobe

Scheme encourages a very different style of programming than C does, and this is a major obstacle for beginning Scheme programmers who are accomplished C programmers. Suffice it to say that if you think that all programming languages are basically the same, you are in for a very unpleasant shock. The style of programming that Scheme encourages is usually known as the functional programming style. This has a number of features:

In contrast, the style of programming that C encourages (forces?) is called the imperative programming style, and involves a lot of mutable state (variables, arrays, structs, etc.). Experienced C programmers who have not programmed in a functional style usually can't believe that it's even possible to program in any other way than the imperative style. As you will see in the course, it is indeed possible, but it takes some getting used to. The usual reaction is then something like "OK, I see how to do this, but why would you want to program that way? Isn't it making simple things needlessly complicated?" It turns out that programming in a functional style is a powerful way to write programs that are correct by construction. Programs with a lot of assignment statements and explicit iteration statements are much more vulnerable to various kinds of bugs (e.g. array indices that are off by one) than programs written in a functional style. However, this style of programming is bound to seem unnatural at first. Bear with it; there is a point to it.

Having said that, it's important to realize that it is possible to write code in Scheme in a completely imperative style as well. One of the big advantages of Scheme is that you can use multiple different programming styles (whichever one suits the problem best). In contrast, it is very difficult to program in C in anything other than a purely imperative style.


Programming environment

Programs written in C have to go through a somewhat elaborate compiling and linking cycle before they can generate a program that can be run. In contrast, most Scheme implementations provide an interactive programming environment where code can be typed in and executed immediately (DrScheme is an excellent example of this). This is very convenient when developing code because you get feedback much more quickly than you do when programming in C. Some Scheme implementations also include compilers that can produce stand-alone executables. If we have time, we may use one of these compilers at some point in the course.


Standards and dialects

C is pretty much a fully standardized language. There are minor differences between pre-ANSI C and ANSI C, and most compilers support a few extra features that aren't standard (e.g. inline functions, "//" comments; the GNU C compiler gcc supports a large number of other extensions as well), but basically, C is the same language everywhere. The situation is different with Scheme. There is a fundamental core of functionality that a Scheme implementation has to support in order for it to call itself Scheme; this is the Scheme standard. The current standard is called "R5RS"; it's the fifth revision of the Scheme standard. However, there are a lot of fundamental features that are not covered by the standard. They include things like Scheme's equivalent to C's structs, module systems, exception handling systems, and object-oriented extensions to Scheme. Because Scheme is so flexible, many such extensions exist. Unfortunately, almost all of them are incompatible with one another. This is not a problem for our purposes because we will only be needing the core R5RS Scheme for the course (and not even all of that). However, you should be aware that each particular implementation of Scheme is a dialect with a common core.


Links