The Go Type System
Recently I've become very interested in
the Golang
programming language. Golang, or Google Go as it's often called, is a new
programming language designed by some fairly well-known systems researchers,
including Rob Pike of Plan 9 fame, and Ken Thompson, one of the creators of C.
Go has a lot "go"ing for it. Google has begun to use it for big internal
projects. There is a lot of interest in Go in the web development and
scripting communities. It also has a fairly mature implementation, something
that's easy to overlook when considering new programming languages, but vitally
important in practice.
So what's the deal with Go? It's a strongly, statically typed imperative
language with good concurrency support. But don't we already have a few of
those? C++, Java, and .NET come to mind. Well, Go is fundamentally different
than those languages. There are a lot of philosophical differences in the way
Go handles errors, deals with platform-specific code, makes efficiency
tradeoffs, and deals with concurrency. I could easily write an essay on each
one of those philosophical excursions, and why Go's choice is the right one.
However, in this post, I just want to talk about the Go type system and why
it's better than the Java or C++ type system.
More Java, sir?
In Java, you have classes which contain data and instance methods.
Code reuse is mostly done through inheritance.
Let's consider
InputStream.
This is one of the most fundamental Java classes. It's been around since Java
1.0, and is the main method Java developers use to read files.
InputStream
is a class which contains some code, but it also has some "abstract" methods
which are meant to be overridden.
Let's say we want to some integers from a file. Well, we can't use InputStream
directly, but we can use FileInputStream, a class which inherits directly from InputStream.
Unfortunately, FileInputStream doesn't do any buffering. So if we want to do a
lot of small reads, it may not be very efficient, because each of those reads
will turn into a syscall.
Well, that's no problem; we can use BufferedInputStream. BufferedInputStream
extends FilterInputStream, which wraps one input stream in another input stream.
In the case of BufferedInputStream, the outer stream does buffering.
However, we still do have a problem. BufferedInputStream doesn't let us read
integers; it only lets us read bytes and arrays of bytes. In order to get
something that reads integers, we need another wrapper: DataInputStream.
So far our program looks like this:
FileInputStream fos = new FileInputStream("myfile");
BufferedInputStream bos = new BufferedInputStream(fos, 16384);
DataInputStream dos = new DataInputStream(bos);
int foo = dos.readInt();
Well, it might be a little verbose, but at least it's safe, right? We know
that whatever operation we call on our InputStream will be well-supported.
Well... not quite. One operation defined by InputStream, mark, is only
supported by some InputStreams, and not others. It so happens that
FileInputStream doesn't support it (and will misbehave at runtime if you try to
use it). BufferedInputStream does. Our DataInputStream does, but only because
it wraps our BufferedInputStream-- DataInputStream itself does not support
mark.
Incidentally, this also highlights another problem with this code-- the extreme
lack of locality. In order to know what will happen when we make any given
call, we have to start with the most derived class, and follow the chain of
virtual method invocations, being careful to note where the subclass overrides
the superclass and where it doesn't. For example, InputStream#read(byte b[],
int off, int len) contains a very naive implementation of bulk reads that just
reads a byte at a time in a loop. However, you don't have to worry about that
here because it's overridden by FileInputStream. This is just a very simple
example-- things can get hairy in the real world where you have 4 or 5 wrapper
classes, and each stream is fairly far from the original base class on the
inheritance tree.
So anyway, this mark business is quite an omission. Wasn't Java supposed to
protect us from runtime errors due to type signatures? How can we say that
FileInputStream "is-an" InputStream when FileInputStream doesn't support all
the methods of InputStream? If you're a Java true believer, it's a little bit
like someone just farted in church. However, you'll find things like this all
throughout the Java standard library-- little cases where the hierarchical
inheritance-driven model of the world just did not conform to reality, and
something had to give.
In this particular case, in order to put the information about markability into
the type system, we would have had to double the number of classes.
Basically, we would have had to create MarkableInputStream,
UnmarkableInputStream, followed by MarkableBufferedInputStream,
UnmarkableBufferedInputStream, MarkableFilterInputStream, and
UnmarkableFilterInputStream, and so forth.
The basic problem is that there are a lot of traits an input stream could have:
seekable versus non-seekable, buffered versus non-buffered, writable versus
non-writable, position tracking versus position-oblivious, and so forth. In
general, these traits are independent of one another, and we need 2^n classes
to represent all the possible combinations of input stream. This kind of
combinatorial explosion of pointless wrapper classes is the fundamental,
algorithmic reason why Java is verbose. Even if its keywords were all
abbreviated, even if Java supported a C++0x-style auto keyword, Java would
still feel like Java.
This may seem bad, but it gets worse. What if you want to both read
and
write from a file? Unfortunately, InputStream only supports reading. If you
want to write to the same file, you'll have to either deal with OutputStream,
RandomAccessFile, or FileChannel. None of these classes share a common base
class with InputStream, so you essentially have to rewrite your code in order
to use them. It gets very awkward. It's not unknown to see people passing
around both an InputStream and a FileChannel that both represent the same
file!
A generic solution
Java 5.0 got generics. The easiest way to think of generics is as a way of
automatically generating the 2^n different implementations we talked about
earlier. In C++, generics are actually implemented this way-- by generating
extra copies of the templated code. Java only creates one copy of the code,
though. The Java way is much more cache-friendly-- one reason why C++'s
performance is actually worse than Java's sometimes.
With generics, InputStream could be rewritten as InputStream<IOManipulator>.
Then we could insert different IOManipulator classes which had different
capabilities-- one might enable seeking and buffering, while another might not.
We might have a Java interface named IOManipulator, implemented by
BufferedIOManipulator, RandomAccessBufferedIOManipulator, and
RandomAccessIOManipulator. However, we still find ourselves obsessing over the
inheritance relationships between the IOManipulators-- should
RandomAccessBufferedIOManipulator inherit from RandomAccessIOManipulator?
Probably.
Overall, though, generics make Java much more bearable by providing another
"degree of freedom" in the type system. They allow you to avoid deep
inheritance trees for just a little bit longer. I think this is the reason why
most Java programmers can't imagine ever using a language without generics.
However, generics have a steep learning curve for novice programmers, and the
error messages you get with generics are extremely poor. Generics also don't
fully solve the problems that we have with verbosity, overgrown hierarchies,
and in many cases general awkwardness.
The C++ standard library makes extensive use of templating.
For example, std::string is really:
namespace std {
template<class charT, class traits = char_traits<charT>,
class Allocator = allocator<charT> >
class basic_string;
}
That's right-- triply templated on the character type, the character "traits"
class, and the memory allocator. This often leads to "word salad" compiler
error messages.
Go's solution
After many years of real-world software engineering, the truth has become
clear:
inheritance
is an antipattern. However, we still need some way of creating
relationships between types. What are we going to do?
Go solves this problem by deriving which types correspond to an interface
automatically. So, for example, any type which has a function
Read(p
[]byte) (n int, err error) defined for it conforms to the
io.Reader interface Similarly,
there is an
io.ReaderAt
interface for seekable streams.
bufio is Go's version of
BufferedInputStream.
Notice that the problems we had before melt away. It's easy to have a type
which conforms to both io.Reader and io.Writer. No big deal-- just implement
the required functions. There's no need to think about inheritance
hierarchies, type traits, or any of that. The code is also much less verbose
because we avoid the "implements," "extends," and so forth. Some people call
this feature "static duck typing."
So far, so good. But what about if we want to extend the functionality
provided for another type? For example, perhaps we want to create a buffered
Reader that can also return a string describing which file is being read. In
C++ or Java, if we wanted to extend the functionality through composition, we'd
have to write
"forwarding
methods" for all the methods of the enclosed type. In Golang, we just use
an
anonymous
field to export all of the functions of the anonymous type to the enclosing
type. Later, we can override any of these auto-exported fields in the future
if we want to, without breaking the API.
Go's type system has one more subtle features that might surprise you if you
grew up with C++ or Java, like I did. You can define methods on types other
than structures. For example, I could define a method on the type []int (array
of integers).
Conclusion
So there you have it. By making static typing feel more like dynamic typing,
Golang eliminates the old tradeoff between the higher productivity of languages
like Python, and the increased safety of languages like Java. We can have
both! We also avoid the quagmire of deep inheritance hierarchies,
incomprehensible error messages, and unsafe compromises like
InputStream#mark.
Go also provides tools to implement code reuse by composition, allowing us to
favor
composition over inheritance, as Effective C++ recommends.
Will Go get generics in the future? I don't really know. The designers have
left the question open. As we've seen, Go doesn't need generics as much as C++
and Java needed them. However, generics would still be useful for avoiding
typecasts when implementing generic data structures. Basically when you're
pulling an instance of a type out of a generic data structure using a get()
method or similar, it's nice not to have to cast it back to the type that you
know that it is. I think people make a much bigger deal out of this than it
really is.
One thing to note is that if generics were ever added to Go, they would
not have to be added using type erasure. This was only done in Java to
retain backwards compatibility between new Java code and old Java Virtual
Machines (JVMs). Since Golang has no JVM, but compiles down to machine code,
this would not be necessary.
The design of Go has been influenced by decades of experience in building
real-world systems. I think its type system is the best thing going out there
for imperative languages, and a good candidate for replacing Python and Java
with something much safer and more productive.
Other stuff
There are some other problems with Java's type system that I didn't really go
into here. If you haven't read
The
Kingdom of Nouns, then you should check that out. Likewise, Yossi's
C++ FQA ("frequently questioned answers")
is a hilarious takedown of the over-inflated claims of C++ partisans. You will
also learn quite a bit about the language by reading it-- Yossi knows his
stuff.