[logo] Computation, Computers, and Programs
CS20a, Fall 2002

An introduction to fundamental
concepts in computer science

SEARCH

Home
Announcements
Policy
Assignments
Submit (Osaka)
Style
Example code
Pearls
Text
Syllabus
People
FAQ
Mailing Lists
Links

 

CS20a OCaml Style Guide

Derived from Cornell CS312 OCaml style guide

You have spent many years in secondary school learning English style and usage. Programming languages are no different. Every programming language has its own idioms and idiosyncrasies, and forcing one language's style upon another is like trying to speak French using the rules of English grammar. Of course there are some elements that are shared between all languages, which you will discover in the course of learning OCaml and comparing it with languages you already know. But there are some fundamental differences that make programming in OCaml quite different from programming in C or Java.

One of our main goals in this class is for you to develop an appreciation for concise, precise, and elegant code. This is in part reflected in your programming style. As you will soon realize, this class takes style very seriously. Listed below are some stylistic rules for OCaml which we would like you to follow. We will initially be quite strict and will deduct points for stylistic infractions, even if your program is functionally correct.

We have provided some example code that illustrates good coding style.

File Submission

80 Column Limit. No line of code may have more than 80 columns. Using more than 80 columns can cause your code to wrap around to the next line, which is devastating to readability.

No Tab Characters. Do not use the tab character (0x09). Instead, use spaces to control indenting. This is because the width of a tab is not uniform across all computers, and what looks good on your machine may look terrible on mine, especially if you have mixed spaces and tabs.

Begin With a Comment. All submitted files must begin with a comment. In other words, the first two characters of the file must be (*.

Code Must Compile. Any code you submit must compile under OCaml without errors or warnings. If it does not compile, we will not grade it. That means you will not receive any points for that problem. There is no excuse for it to not compile. Never submit anything that you have changed, no matter how small the change, without checking that it still compiles. You should treat compiler warnings as errors; we will be compiling with the -warn-error option.

Comments

COMMENT YOUR CODE!

You have been warned...

Avoid Useless Comments. Avoid comments that merely repeat the code they reference or state the obvious. Comments should state the invariants, the non-obvious, or any references that have more information about the code.

Avoid Over-commenting. Very many or very long comments in the code body are more distracting than helpful. Long comments may appear at the top of a file if you wish to explain the overall design of the code or refer to any sources that have more information about the algorithms or data structures. All other comments in the file should be as short as possible. A good place for a comment is just before a function declaration. Judicious choice of variable names can help minimize the need for comments.

If in doubt about whether or not a comment is useless, go ahead and put it in. Slight over-commenting is far better than under-commenting.

Comments Go Above the Code They Reference, as in the following example:

(* Sums a list of integers. *)
let sum = List.fold_left (+) 0

Line Breaks. Empty lines should only be included between value declarations within a struct block, especially between function declarations. It is not necessary to put empty lines between other declarations unless you are separating the different types of declarations (such as structures, types, exceptions and values). Unless function declarations within a let block are long, there should be no empty lines within a let block. There should never be an empty line within an expression.

Multi-line Commenting. When comments are printed on paper, the reader lacks the advantage of color highlighting performed by an editor such as Emacs. Multiline comments can be distinguished from code by preceding each line of the comment with a * similar to the following:

(*
 * This is one of those rare but long comments
 * that need to span multiple lines because
 * the code is unusually complex and requires
 * extra explanation.
 *)
let complicatedFunction () = ...

Naming and Declarations

Naming Conventions. The best way to tell at a glance something about the type of a variable is to use the standard OCaml naming conventions. The following are the rules that are followed by the OCaml libraries:

Token OCaml Naming Convention Example
Variables Symbolic or initial lower case. Use embedded caps or underscores for multiword names. getItem
get_item
Constructors Initial upper case. Use embedded caps for multiword names. Historic exceptions are nil, true, and false. Rarely are symbolic names like :: used. Node
EmptyQueue
Types All lower case. Use underscores for multiword names. priority_queue
Signatures All upper case or initial upper case. PRIORITY_QUEUE
Structures Initial upper case. Use embedded caps for multiword names. PriorityQueue
Functors Same as structure convention, except Fn completes the name. PriorityQueueFn

Some of these conventions are not enforced by the compiler, though violations of the variable/constructor conventions ought to cause warning messages because of the danger of a constructor turning into a variable when it is misspelled.

Use Meaningful Names. Another way of conveying information is to use meaningful variable names that reflect their intended use. Choose words or combinations of words describing the value. Variable names may be one letter in short let blocks. Functions used in a fold, filter, or map are often bound to the name f. Here is an example for short variable names:

let d = Date.fromTimeLocal(Time.now()) in
let m = Date.minute d in
let s = Date.second d in
let f n = (n mod 3) = 0 in
   List.filter f [m; s]

Type Annotations. Top-level functions and values should always be declared with types.

Avoid Global Mutable Variables. Mutable values should be local to closures and almost never declared as a structure's value. Global mutable values cause many problems. First, it is difficult to ensure that the mutable value is in the proper state, since it might have been modified outside the function or by a previous execution of the algorithm. This is especially problematic with concurrent threads. Second, and more importantly, having global mutable values makes it more likely that your code is nonreentrant. Without proper knowledge of the ramifications, declaring global mutable values can extend beyond bad style to incorrect code.

When to Rename Variables. You should rarely need to rename values, in fact this is a sure way to obfuscate code. Renaming a value should be backed up with a very good reason. One instance where renaming a variable is common and encouraged is when aliasing structures. In these cases, other modules used by functions within the current structure are aliased to one or two letter variables at the top of the struct block. This serves two purposes: it shortens the name of the module and it documents the modules you use. Here is an example:

struct
   module H = HashTable
   module A = Array
   ...
end

Order of Declarations in a Structure. When declaring elements in a structure, you should first alias the modules you intend to use, followed by the types, followed by exceptions, and lastly list all the value declarations for the structure. Here is an example:

struct
   module L = List
   type foo = unit
   exception InternalError
   let first list = L.nth list 0
end

Every declaration within the structure should be indented the same amount.

Indenting

Indent by three or four spaces. Be consistent.

Long expressions can be broken up and the parts aligned, as in the second example. Either is acceptable.

let x = "Long line..." ^
   "Another long line."

let x = "Long line..." ^
        "Another long line."

Match expressions should be indented as follows:

match expr with
   pat1 ->
      ...
 | pat2 ->
      ...

If each case is short, the following is also acceptable:

match expr with
   pat1 -> ...
 | pat2 -> ...

If expressions should be indented according to one of the following schemes:

if exp1 then exp2
else if exp3 then exp4
else if exp5 then exp6
else exp7

if exp1 then
   exp2
else
   exp3

if exp1 then exp2 else exp3

Comments should be indented to the level of the line of code that follows the comment.

Parentheses

Over Parenthesizing. Parentheses have many semantic purposes in ML, including constructing tuples, grouping sequences of side-effect expressions, forcing a non-default parse of an expression, and grouping structures for functor arguments. Their usage is very different from C or Java. Avoid using unnecessary parantheses when their presence makes your code harder to understand.

Match expressions. Wrap nested match expressions with parentheses. This avoids a common error involving nested match expressions. If the match expression is already wrapped by a begin...end block, you can drop the parentheses.

Nested lets. Blocks of code that have a sequence of let expressions should not be indented.

Bad Good
let x = 1 in
   let y = 2 in
      x + y
let x = 1 in
let y = 2 in
   x + y
let x = 1 in
let y = 2 in
x + y

            

Pattern Matching

No Incomplete Pattern Matches. Incomplete pattern matches are flagged with compiler warnings, which are tantamount to errors for grading purposes. Thus, if your program exhibits this behavior, the problem will get no points.

Pattern Match in the Function Arguments When Possible. Tuples, records and datatypes can be deconstructed using pattern matching. If you simply deconstruct the function argument before you do anything useful, it is better to pattern match in the function argument. Consider these examples:

Bad Good
let f arg1 arg2 =
   let x = fst arg1 in
   let y = snd arg1 in
   let z = fst arg2 in
   ...
let f (x, y) (z, _) = ...
let f arg1 =
   let x = arg1.foo in
   let y = arg1.bar in
   let baz = arg1.baz in
   ...
let f { foo = x; bar = y; baz = baz} = ...

Function Arguments Should Not Use Values for Patterns. You should only deconstruct values with variable names and/or wildcards in function arguments. If you want to pattern match against a specific value, use a match expression or an if expression. We include this rule because there are too many errors that can occur when you don't do this exactly right. Thus of the following two examples, you should use the latter:

let rec fact = function
   0 -> 1
 | n -> n * fact (n - 1)

let rec fact n =
   if n = 0 then
      1
   else
      n * fact (n - 1)

Avoid Unnecessary Projections. Prefer pattern matching to projections with function arguments or a value declarations. Using projections is okay as long as it is infrequent and the meaning is clearly understood from the context. The above rule shows how to pattern-match in the function arguments. Here is an example for pattern matching with value declarations.

Bad Good
let v = someFunction() in
let x = fst v in
let y = snd v in
   x + y
let x, y = someFunction () in
   x + y

Combine nested match Expressions. Rather than nest match expressions, you can combine them by pattern matching against a tuple, provided the tests in the case expressions are independent. Here is an example:

Bad
let d = Unix.localtime (Unix.time ()) in
  match d.Unix.tm_mon with
    0 -> (match d.Unix.tm_mday with
             1 -> printf "Happy New Year"
           | _ -> ())
  | 5 -> (match d.Unix.tm_mday with
            4 -> printf "Happy Independence Day"
          | _ -> ())
  | 9 -> (match d.Unix.tm_mday with
             10 -> printf "Happy Metric Day"
           | _ -> ())

Good
match Unix.localtime (Unix.time ()) with
   { Unix.tm_mon = 0; Unix.tm_mday = 1 } ->
      printf "Happy New Year"
 | { Unix.tm_mon = 5; Unix.tm_mday = 4 } ->
      printf "Happy Independence Day"
 | { Unix.tm_mon = 9; Unix.tm_mday = 31 } ->
      printf "Happy Metric Day"
 | _ ->
      ()

Avoid the use List.hd, or List.tl. The functions hd and tl are used to deconstruct option types and list types. However, they raise exceptions on certain inputs. You should avoid these functions altogether. It is usually easy to achieve the same effect with pattern matching. If you cannot manage to avoid them, you should handle any exceptions that they might raise.

Factoring

Avoid breaking expressions over multiple lines. If a tuple consists of more than two or three elements, you should consider using a record instead of a tuple. Records have the advantage of placing each name on a separate line and still looking good. Constructing a tuple over multiple lines makes for ugly code. Other expressions that take up multiple lines should be done with a lot of thought. The best way to transform code that constructs expressions over multiple lines to something that has good style is to factor the code using a let expression. Consider the following:

Best
let rec euclid (m : int) (n : int) : (int * int * int) =
   if n = 0 then
      (1, 0, m)
   else
      let q = m / n in
      let r = m mod n in
      let u, v, g = euclid n r in
         (v, u - q * v, g)

Do not factor unnecessarily.

Bad

let x = input_line stdin in
   match x with
      ...

Good

match input_line stdin with
    ...

Bad (provided y is not a large expression):

let x = y * y in x + z
Good
y * y + z

Verbosity

Don't Rewrite Library Functions. The OCaml library has a great number of functions and data structures -- use them! Often students will recode List.filter, List.map, and similar functions. A more subtle situation for recoding is all the fold functions. Writing a function that recursively walks down the list should make vigorous use of List.fold_left or List.fold_right. Other data structures often have a folding function; use them whenever they are available.

Misusing if Expressions. Remember that the type of the condition in an if expression is bool. In general, the type of an if expression is 'a, but in the case that the type is bool, you should not be using if at all. Consider the following:


Bad Good
if e then true else false e
if e then false else true not e
if beta then beta else false beta
if not e then x else y if e then y else x
if x then true else y x || y
if x then y else false x && y
if x then false else y not x && y
if x then y else true not x || y

Misusing match Expressions. The match expression is misused in two common situations. First, match should never be used in place of an if expression (that's why if exists). Note the following:

match e with
   true ->  x
 | false -> y

if e then x else y

The latter is much better. Another situation where if expressions are preferred over case expressions is as follows:

match e with
   c -> x   (* c is a constant value *)
 | _ -> y

if e = c then x else y

The latter is definitely better. The other misuse is using match when pattern matching with a let declaration is enough. Consider the following:

let x = match expr with (y,z) -> y

let (x,_) = expr

The latter is better.

Other Common Misuses. Here are some other common mistakes to watch out for:

Bad Good
l::nil [l]
l::[] [l]
length + 0 length
length * 1 length
big exp * same big exp
let x = big exp in x*x
if x then
   f a b c1
else
   f a b c2
f a b (if x then c1 else c2)

Don't Rewrap Functions. When passing a function as an argument to another function, don't rewrap the function unnecessarily. Here's an example:

List.map (fun x -> sqrt x) [1.0; 4.0; 9.0; 16.0]

List.map sqrt [1.0; 4.0; 9.0; 16.0]

The latter is better. Another case for rewrapping a function is often associated with infix binary operators. To prevent rewrapping the binary operator, use the parenthesized form as in the following example:

List.map2 (fun x y -> x + y)

List.map2 (+)

The latter is better.

Avoid Computing Values Twice. If you compute a value twice, you're wasting CPU time and making your program ugly. The best way to avoid computing values twice is to create a let expression and bind the computed value to a variable name. This has the added benefit of letting you document the purpose of the value with a name.


Webmaster | Contact Us | Generated on Saturday, Dec 14, 2002

Copyright (c) 2002 Caltech CS20 Course Administration.
Computer Science Dept., California Institute of Technology
HTML4.01 | CSS2 | Bobby