How To Design A Good API and Why it Matters

How To Design A Good API and Why it Matters

>>First, I would like to thank you and welcome
you all to the latest in our series of talks on advance topics and programming languages.
The purpose of these series of talks is to expose all of the great domain knowledge in
programming languages that we have at Google. So that–what that means is that everybody
in this audience, all of you geniuses who work for Google and myself discounted, should
give a talk. So please, come to me, my name Jeremy Manson email me, IM me whatever and
tell me what talk you are going to give and I will set it up and give you a talk. You
don’t have to be an author like Josh or some of the people we have coming up in order to
do so. Josh–and we’re running late. Josh didn’t want me to build him up too much and
give him a swell head. So, I want to introduce him. I’ll just say that in a company full
of geniuses, it’s his star that shines probably the most brightly. And I just want to…
>>BLOCH: I’ll get you for that.>>I just want to introduce this man whose
boot heel I am not worthy to lick, Joshua Bloch ladies and gentlemen.
>>BLOCH: Normally at this point, I thank my introducer for the introduction but today
I think I’ll dispense with that formality. So, I should–I should also say by the way
that this is unfortunately rather a long talk. I try to have short talks with one or two
key ideas but this subject matter just doesn’t lend itself to that. So, please hold your
questions till the end and in fact, we’ll probably use up the hour without questions
but then I’ll hang around for as long as you want and answer all the questions that you
have. So, why is API design important? Well, APIs can be among a company’s greatest assets.
A good API is something that people invest heavily in. They do this in obvious ways and
in less obvious ways. Obvious ways are may be a by product built around the APIs. They
write to the API. But the less obvious ways is they learn it. They spend hours and hours–actually,
they spend months learning the APIs. And once they’ve have done that, you know, they don’t
want to learn a new one because they have to unlearn everything they know and replace
it with something else. And furthermore, the API is just wired throughout the infrastructure
at a company. So, a successful API can make a company, can give a franchise that last.
So–I guess it’s about twenty-five years now and similarly a bad API can be among a company’s
greatest liabilities. And there are a couple of reasons for this. First of all, a bad API
can cause an unending stream of, you know, support phone calls because people cannot
make the thing do what it ought to do and it can inhibit a company’s ability to move
forward because once you have a bad API you cannot change it at will. You’re pretty much
stuck with it forever. You have one chance to get it right. So, that’s pretty scary and
with that in mind, you want to learn how to make APIs that will stand the test of time.
So, now you know why API design is important but why is it important to you? Not all of
you may think of yourselves as API designers. Well, it turns out that all of you are API
designers. Anyone who programs a computer is an API designer. And the reason is that
good programming is inherently modular and these inter-modular boundaries are themselves
APIs. Furthermore, good APIs tend to get reused. If you’ve written a module and it’s good at
doing something, you now, one of these days one of your co-worker is going to need to
do the same thing. And gosh, you know you’ve already got this module that does it but once
he’s using that API, you are no longer free to change it at will because if you change
it you’ll break his program. And if he has ten friends and they all start using it then
you’re really hosed. Finally, thinking in terms of API design tends to improve the quality
of the programs that you write. It tends to sort of keep you from just hacking things
together. It tends to make you want to write nice units, you now, that are–that are composable,
that are reusable and that are sensible. Now, one other question we’ve got to get out of
the way at the beginning is, why am I talking about API at what is billed as a language
design series? And the glib answer of course is that Jeremy asked me to do it. And Jeremy
used to be friend, so, of course, I said, “Yeah, of course I’ll do it.” But in fact
the real answer is that API design and language design are very, very similar. The only real
difference is that API design is constrained by the syntactic–sorry, the syntax of the
language for which you’re writing the API. Whereas, when you’re designing a language
you have the flexibility to do anything you like with the syntax. But in fact, whether
you’re designing a language or an API, you are creating a tool for programmers to express
their intent to the machine and to other programmers, who read the program, maintain it, modify
it and so forth. Finally, these days you don’t really think in terms of a language or a library
alone. A language and a library together comprise a platform and when you’re designing a language,
you design the core libraries hand in hand with the language. So, really the skill set
for designing good APIs and for designing languages is pretty much the same. So, what
are the characteristics of a good API? First of all, it’s easy to learn and it’s easy to
use, even without documentation. So, a good API should be easy to memorize. It should
just plain make sense. And the flip side of that is, not only should it be easy to use
a good API but it should hard to misuse a good API. It should be hard or impossible
to use–misuse a good API. That is basically, a good API should simply force you to do the
right thing. It should be easy to read and to maintain code written to that API. The
API should be sufficiently powerful to do what it has to do. Note that I didn’t say
the API should be powerful. It is not the case that the more powerful an API is, the
better it is. It should basically be just powerful enough to do what it needs to do.
But it should be easy to evolve the API over time because there will be new needs later
on. So, what you want to do is want to write an API that meets its requirements and that
can evolve to meet future requirements. And finally, the API has to be appropriate to
the audience. What is a good API for let’s say a Wall Street analyst is probably not
a good API for a physicist because they have different terminology. They think differently.
So their API has to be aimed at its audience. So now, we know what the characteristics are.
How do we achieve them? And that’s what’s the rest of this talk is about. The talk is
divided into five sections. The first one is on the process of the API design. I’m not
a big process weenie but I found over the years that there are certain things that all
good API designs have in common in terms of the process used to create them. So, I’ll
try to get over that. Then, the general principles of API design, then those principles as they
apply to classes, as they apply to methods, and as they apply to exceptions. And finally,
if I have time I’ll show a couple of refactorings where we improve API designs. So, what is
the process of API design like? Well, the first thing you got to do is you got to gather
the requirements but do it with a healthy dose of speck–of skepticism. Because often
when you ask people, you know, “Well, what does this API have to do for you?” What you’ll
get won’t be a real set of requirements, it’ll be a set of proposed solutions and a better
solution may exist. So, you know, if someone tells you, let’s say, “We need to precisely
control the garbage collection intervals and the maximum time that each garbage collection
can take.” You know, that’s not really a requirement. I mean, the requirement is, you know, we need
to be able to run a server smoothly while any garbage collection takes place. How you
choose to achieve that is up to you. Your job is to extract the requirements from the
stakeholders in your API and often it’s real give and take process. Once you’ve got the
requirements, they should take the form of used cases, and by used cases, I simply mean,
the problems that your API should be able to solve. And these are extremely important
because they provide the benchmark against which you can measure any proposed solution.
One thing you should keep in mind is that it can be easier and more rewarding to build
a more general solution than what you’ve been asked to do. This doesn’t mean you should
just say, “Oh, I’m going to build a framework,” every time someone asks you to do something.
But sometimes the more specific thing is more difficult to build as well as being less powerful.
So, always keep an open when you’re–when you’re looking at those initial requirements.
Oh, let’s see, I guess I have another example here of what they say and what they mean.
When they say, “We need new data structures and RPCs with the Version 2 attributes,” this
actually happened to me at a company called Transarc, you know, when we were kind of upgrading.
They said, you know, “Make a whole new set of data structures and a whole new set of
APIs.” What they really meant was, “We need a new data format that will accommodate all
further evolution in the internal data structures.” You know, because you don’t have to want to
have to make a whole new set of data structures and a whole new set of on the wire and on
disk interfaces every time you decide to add a few attributes to your server. So in fact,
I made the system much more dynamic and we never had to do that again. So, you should
start with a really short spec, one page is ideal. At this stage in an API design, agility
definitely trumps completeness. The worst thing you can do is to send six smart guys
off into a room and have them sit there with the door closed for six months and come out
with a 240 specification document. And believe me this has been done many, many times. It’s
an awful idea because at that point, first of all, their ego is invested in what they’ve
just done. They’re going to build it even if it’s a piece of crap. And second of all,
you know, how do you know if it’s any good? It’s like this big, long, hairy spec. It’s
no longer agile. If they made a fundamental mistake, then you’ve got to change all 240
pages of it. But it may fail to satisfy some sort of key requirement that they didn’t really
understand before they started. So, what you want to do in the beginning is have the entire
API spec on one page. In this way, you can bounce that spec off as many stakeholders
as possible. Listen to what they have to say and take it seriously. If they say, “No, I’m
sorry, this won’t do for me because I cannot write such and such a program,” think about
it. You know, you may say, “Well, you shouldn’t be writing that kind of program. It’s a really
bad idea.” But more likely, you may say, “Oh gee, I didn’t think of that. This really is
important. What if we structure it this way?” The whole thing is only a page long. You can
do major refactorings in ten minutes. If you keep the spec short, it’s easy to modify.
And you flesh that spec out only as you gain confidence that you’re on the right track.
And this necessarily involves coding. In particular, it involves coding to the API that you are
defining. It doesn’t involve implementing the API. It involves pretending it’s already
been implemented. So, what does it look like when you do it right? Well, here’s an example
that I was writing at about the time that I was putting this talk together for OOPSLA
some number of months ago. Someone wanted the ability to retry a computation in the
face of failure. And I said, “Oh well, you know, we have this executor framework, otherwise
known as the executor framework.” And really, all we want here is a retry policy that tells
you how you might choose to retry the thing in the face of failure. So, you know, here’s
a little interface and it’s got a couple of methods. One tells you, if a given failure
is recoverable. You pass in the exception and it just gives you true or false. We should
try to recover from this one or we shouldn’t. And the second one computes the next delay
in terms of, the initial start time and the number of previous retries and by passing
in all these data, the actual retry policy can be stateless. So, you can have singletons.
You can have a retry policy called exponential backoff. But you’re going to have to store
any data in each exponential back-off instance. It really is just a retry policy and it’s
called exponential backoff. And that’s kind of all there is to that one. And this isn’t
really very complete. This is not, you know, a spec of a quality that I could use for Java
doc. It’s not a spec that someone could use to implement to but it’s a spec that’s good
enough for someone to look at it and say, “Yes. It does do what I need or it doesn’t.”
The rest of it is on another slide here. This is a set of static utilities that lets you
actually use retry policies. So, what can you do? Well, the first thing you can is you
can pass in an executor service and retry policy and get a retrying executor service.
It implements the same interface, which is executor service. So, if you already know
how to use an executor service, you know how to use the retrying executor service and that’s
great by the way. That’s a good way to keep the sort of the conceptual weight of an API
small. Use interfaces that have already been designed–defined, in this case, executor
service. And what else do we have? We have another kind of a retrying executor service.
What are–what are the difference between the first two? I haven’t looked at this for
a while so, I apologize for that. Anyway, it doesn’t–it doesn’t matter. And then, we
have a couple of wrappers, one of which takes a callable and returns a retrying callable
and takes a runnable, returns to retrying runnable. And then, we have a couple examples
of the retry policies themselves. These are static factories, we have, if you want an
exponential backoff, you get one. And these are the parameters that describe your exponential
backoff. What’s the initial delay? In what unit, you know, 10 seconds, a hundred milliseconds,
whatever and then, the timeout, which I’m not sure what that means. So, by the way,
this is actually interesting. This tells me this wasn’t quite good enough, you know. What
I wanted to show you is that something that’s really simple. It fits on the slide or two,
is enough to communicate your intent and enough to sort of, figure out whether it’s good enough
and you know, try it out. And I think the answer here is it’s almost good enough but
not quite good enough. I guess, the timeout probably, is the overall timeout. Like after
how long of trying and retrying do you finally give up. But I think it should have said that
somewhere on this. But anyway, you get the idea. The idea is that this is a very small
description of an API. But it’s big enough to find out if it’s good enough to do what
needs to be done. And if it’s not, it’s easy to modify it. You should write to the API
early and often. Of course, you should start before you’ve implemented the API because
this saves you from having to throw away an implementation of a bad API. You know if you–if
you first specify it, then implement it, then try the implementation and decide that the
API was garbage, well, you’ve wasted lots of time implementing it. And as I said, you
should start before you’ve even specified it properly because that saves you from having
to throw away detailed specifications for broken APIs. You should continue writing to
the API as you flesh it out and this is important. Some people, sort of stop writing to the API
about halfway through the process and just go on this death march to implementation.
The problem with that is you get nasty surprises about a week before you ship when you try
writing to it again and you find that, you know, “Oh, my gosh. It actually doesn’t solve
this important case that we thought it solved.” And some people worry. They worry that, you
know, that all these coding to the API is a waste of time when they should be implementing
it. But that nothing could be further from the truth. Those initial pieces of code that
you write to any API are among the most important pieces of code that you’ll ever write to it.
The code lives on in the examples that you publish for how to use that API. And those
examples tend to get emulated heavily. If you get them right, you seeded the market
with good uses of your API. If you get them wrong, conversely, you know, you’ve ensured
that there will be broken programs floating around the web for the next ten years. And
I used to, you know believe this with all my heart and soul. But now, I actually can
point to a proof of it. It turns out in the last OOPSLA, there was a paper published called
Design Fragments by Fairbanks, Garlan and Scherlis from CMU. And they actually traced
mistakes in the original applets that were shipped out with, you know, the first release
of Java into broken concurrent programs, thousands of them that still exist on the web today.
So, you know, the way I put this is, example program should be exemplary. There’s a reason
they call them example programs. And the programs that–the first programs that you write to
an API, as you are fleshing it out, invariably become the example programs. You know, so,
you know, my rule of thumb is you should spend ten times as much time on every line of example
code as you do production code. That may sound backwards to you but I really believe it.
Writing to an SPI, that is a Service Provider Interface is even more important than writing
to any other kind of API. You all probably what SPIs are or maybe you don’t. If you know
an SPI is, raise your hand. Okay. So, an SPI is a special kind of an API which you use
to provide a new means of doing something. It’s not the API which the programmers write
to rather, it’s the plug-in API that, let’s say, lets–RSADSI Incorporated provides their
encryption methods and Sun Microsystems provide their encryption methods. So, the user of
this encryption API can use a higher level API, which then dispatches to these encryption
methods. And the thing about SPIs is that you’re supposed to be able to hide very, very
different implementations underneath them. And if you, you know, write an SPI and you
only have one provider it turns out that is a practical matter, you will probably never
be able to support another. Once you try to do the second, you’ll find that there’s something
about that SPI that ties it forever to the only implementation you actually thought about.
If on the other hand, you do two implementations rather than one. Then, you’ll probably be
good enough. You’ll probably be able to support subsequent ones with some difficulty. But
if you do three, it will probably work fine for any number. If it works for three, the
fourth won’t all be that different from each of the first three. I found this out myself
and then, I saw it in a book, Will Tracz discovered this and called it The Rule of Threes in a
book, which was subtitled Confessions of a Used Program Salesman because it was a book
about a software reuse. And by the way, here we explain the subtle coding that is used
throughout this talk. Whenever you see something green, it means this is good, do it this way.
And when you see something in red, it means this is bad, don’t do it that way. You should
maintain realistic expectations throughout the process of API design. It turns out that
most API designs are over constrained. People want them to do more than they can possibly
do. So, you are going to have to make compromises. You cannot please everyone. If you try to
please everyone, you come up with a pig. You come with big, nasty APIs that no one will
ever be able to use properly. So, what you should do? And this may sound strange, is
you should aim to displease everyone equally. The idea is that, you know if one of your
important stakeholders is very displeased and the others are really pleased, that’s
probably a problem because your API isn’t doing something it has to do. If everyone
is like less than a 100% happy but they’re all happy enough then, you’ve probably done
the right thing. But do not misinterpret this as saying I favor design by committee and
you should take everyone’s ideas and mush them altogether. You do need one sort of strong
design lead that can ensure that the API that you’re design is cohesive and pretty and clearly
the work of, you know, one single mind or at least a single minded body. And that’s
always a little bit of a trade off, being able to satisfy the needs of many costumers
and yet produce something that is, you know, beautiful and cohesive. Expect to make mistakes.
API design is hard. Now luckily, a few years of real world use will always flush out the
mistakes. Unfortunately, by that time it’s like too late to do anything about them. Although,
you can write nice talks and tell people about the mistakes so they don’t make them again
and it’s kind of what I’m doing here today. So, given that you’re going to make mistakes
and you’re going to be stuck with the original API, write the API so that at least you can
sort of add to it and produce something that will help you get around the shortcomings
of your original designs. The recent example of that in my life is in the collections API
which I did around 1997, 1998. There were some real flaws in the sorted sets and sort
of map implementations. In particular, you know, they’re a little bit asymmetric. It’s
much easier to a forward than backward and there are couple other things. I knew about
these flaws at that time but I didn’t know how to fix them. However, we were able to
extend that API. So, if you look at the most recent release of Java 6. You’ll see something
called the Navigable Set and Navigable Map, which extends sorted set and sorted map and
provide, you know, additional methods that basically fix those difficulties. You know
and it’s not a perfect solution because there are some things that implement the old–the
older interfaces and they’re not fixed yet. But at least all the standard collection implementations
from Sun now are fixed. So, what are the general principles of good API design? First all,
an API should do one thing and do it well. And I should say that almost all of what I’m
going to say for the next five minutes may just sound like motherhood and apple pie.
But, there’s more to it than that. I’m going to try to give you actionable advice. I’m
going to try to take the sort of the standard old souls like in API should do one thing
and do it well and see what it really means and tell you how to achieve it. So, you know,
in this case, the functionality should be easy to explain. If it’s not easy to explain,
then it’s not doing one thing and doing it well. It’s a mess. If you can’t come up with
a good name for it, then it’s a mess. The names are the API talking back to you, so
listen to them. When you try to name those methods and those classes, you know, if you
come up with the really complicated name like DynAnyFactoryOperations or underscore BindingIteratorImplBase
switch, you know, actually violates the naming conventions of platen–platform, sorry. ENCODING_CDR_ENCAPS,
you know, clearly you’ve got problems. Any API that contains this sort of stuff is a
mess. Oh, what about this, OMGVMCID? I know OMG is, “Oh, my god,” but I can’t–I can’t
figure out the rest of it. And by the way least you think I’m just making this stuff
up, all of these comes from an actual Java platform API. And I won’t tell what it is
except to say that it’s club Good API names are like a font. Yeah, I know what a font
is. Sure, you know, it’s like, it’s italic or bold or whatever. You know, a set, I know
what a set is. A private key, a lock, a ThreadFactory, these things are, you know, they instantly
communicate what they are. And the methods, you know the classes all of them should be
like that. Looking at them, you know, it should be clear what they are. And good names drive
good designs. You know, once you have something that’s called a set. You know what the operations
are. You insert things from sets, you remove, you test for containments. So, good names
drive good designs and bad names are an indication of bad designs. So, listen to those names
speaking to you. And if you just you can’t get it to work out right, then you probably
not trying to build something reasonable. So, always remain amenable to splitting a
module up if you’re trying to cramp too much into a module, or to putting multiple modules
together, if you’re trying to expose sort of internal details that ought to be hidden.
Maybe you should just make a bigger module that hides some of those details. An API should
be as small as possible but no smaller. This principle is usually attributed to Einstein.
Although I looked really, really hard and I don’t believe he ever said it. I believe
that it’s, you know–he probably believed it but he didn’t say it. At any rate the API
should satisfy its requirements and if you only remember one thing from the talk today,
please remember this. When it doubt, leave it out. That applies to everything. It applies
to functionality, to classes, to methods, to parameters within a method, anything. If
you have any doubts about whether to include something, leave it out. You’ll probably be
able to add it later, but you most certainly will not be able to take it out once you put
it into an API. Once you put it in people will be using it, if you take it out they
will scream bloody murder. So, if you ever have any doubts about whether to include something,
leave it out. And if that’s all you take away with you from today’s talk, then, I think
it’s been worth your hour. Of course, that’s just my judgment. Anyway, the–when you’re
thinking about the size of an API, the conceptual weight is more important than the bulk. By
bulk, I mean the number of methods, the number of class, the number parameters. What’s really
important is the number of concepts. When I’m learning this API, how many different
things do I have to learn about? And there are a number of ways to decrease the conceptual
weight of an API. The most important one is reusing interfaces. So, for example, if you
look at the collection’s framework, there are many implementations of the set interface,
you know, whether it’s TreeSet. The original ones were, you know, a HashSet, TreeSet and
then we add a LinkedHashSet and more recently a whole slew of concurrent set implementations.
You don’t have to learn any new APIs. You learn the Set API and we can add new functionality,
you know, we can richness without making you learn any new API. So, that’s one of the great
ways to increase the power to weight ratio. That’s the important thing. You want to be
able to do a lot without learning a lot. The implementation shouldn’t impact the API. Once
again this is motherhood but what does it really mean? It means don’t put any implementation
details into the API. They confuse the users and they inhibit their freedom to decide.
They inhibit the implementer’s freedom to change the API later. So, you know, for example,
let us say, if you have some API that’s about phone numbers but it throw a sequel exceptions
and now you want to re-implement it on top of some proprietary data store rather than
a sequel data store. But your clients are already trying to catch the sequel exceptions.
You know, what do you do? Well, you can emulate those olds sequel exceptions but that’s crazy.
So that means that you should make sure that the exceptions that you throw are kind of
at the same layer of abstraction as the rest of the API is. That’s just one example where
implementation details kind of leak into APIs. The important thing is you have to be aware
of what actually is an implementation detail. You don’t want to over specify your methods.
You don’t want the specification of a method to involve something that is in the implementation
detail and that you would like to be able to change later on. So, here’s an example
where we did that. Don’t specify your hash functions. You might think that, you know,
exactly what value is returned by the hash code method is a proper part of a spec but
it isn’t. It’s an implementation detail. The spec should simply say it returns an integer
and, you know, with high probability the integer will differ for two different objects and
furthermore it should be cheap to calculate the thing. But exactly what number is returned
you should have the flexibility to change that from release to release as you learn
about flaws in your old hash functions and as the technology improves and hash functions
improve. Of course, you cannot do that if you’re writing a persistent data store. If
those hash functions are going to be use to store data on disk then they can’t change.
But that’s a very special kind of hash function. Those ones must be specified but the great
majority of hash functions out there shouldn’t be. And we got this very wrong in initial
releases of Java and unfortunately that tradition has stuck to the point where almost all of
these hash functions are specified. But they really shouldn’t be. Finally, you shouldn’t
let the implementation details kind of just leak into the API. The example I gave you
before with an exception is an example where, you know, you didn’t really think hard about
them and say, “Oh yeah, you know, we should maybe throw a sequel exception here.” You
probably just wrote it and you realized, “Gee, I’m calling something that throws a sequel
exception, so I have to propagate it out.” That’s a case where an implementation detail
is just sort of leaking. Another example, a really notorious example in Java is, if
you simply say, “Implements serializable.” Once you’ve done that, your entire implementation
has just sort of leaked out as part of your API because the serial form consists of the
entire field that comprise your object even your private fields, so all of the sudden
the private fields are part of the public API and that’s really, really bad. And the
way around that by the way is to design your serial forms carefully. Don’t just say implement
serializable. You should minimize the accessibility of everything. That means you should make
your classes, your members, your fields, all, as private as possible. One specific case
is that public classes should have no public fields with the exception of their constants,
which aren’t really fields. This maximizes what they call information hiding. Parnas
is the guy who came up with that term. And it minimizes–minimizes the coupling between
APIs. You know, if things are kind of hidden behind inter-modular boundaries, they can
be change freely. And this allows modules to be understood, to be used, to be built,
to be optimized, debugged, tested, and what have you individually and in parallel. So
you can have multiple teams, you know, dealing with multiple APIs concurrently. If on the
other hand the APIs sort of expose everything and each, you know, module is sort of messing
around with other modules, then there is very little that you can do to any module without
affecting a whole slew of modules around it. Names matter a lot. There are some people
who think that names don’t matter and, you know, when sit down and you say, “Well, this
isn’t named right.” They say, “Don’t waste your time. Let’s just move on. It’s good enough.”
Names in an API that are going to be used by anyone else and that includes yourself
in a few months, matter an awful lot. The idea is that every API is kind of a little
language and people who are going to use your API have to learn that language and then speak
in that language and that means names should be largely self explanatory. You should avoid
cryptic abbreviations. So the original Unix names I think, failed this one miserably.
You should be consistent. It’s very important that same word means the same things when
you used repeatedly in your API. And that you don’t have multiple words meaning that
same thing. So let us say you have a remove and a delete in the same API. That’s almost
always wrong. You know, what’s the difference between remove and delete? Well, I don’t know.
When I listen to those two things, they seem to mean the same thing. If they do mean the
same thing, then call them both the same thing. If they don’t, then make the names different
enough to tell you how they differ. If they were called, let’s say delete and expunge,
I would know that expunge was a more–a permanent kind of removal or something like that. Not
only should you strive for consistency but you should strive for symmetry. So, if your
API has, let’s say two verbs, add and remove, and two nouns, entry and key, I’d like to
see, you know, add entry, add key, remove entry and remove key. If one of them is missing,
there should be a very good reason for it. I’m not saying that all API should be symmetric
but the great bulk of them should. And if you get it right, the code will read like
prose. That’s the prize. So, you know, in this case the code reads, “If the car’s speed
is more than twice a speed limit and the speaker should generate in alert that says watch out
for cops.” That’s pretty much English. It reads like prose and that’s an indication
that API is reasonably decent. Documentation matters as well. And Parnas, the aforementioned
Parnas said it much better than I could, so I’m simply going to read what he had to say.
He said, “Reuse is something that is far easier to say than to do. Doing it requires both
good design and very good documentation. Even when we see good design, which is still infrequently,
we won’t see the components reused without good documentation.” He said that in 1994.
And I don’t know about you but, you know, when I–when I read that I get religion. And
the only thing I can do then is to document religiously. Document every single class,
every interface, every method, every constructor, every parameter and every exception in my
public API. Of course all of you do that, right? And when you go out on the web, whenever
you look Javadoc, it’s always the case that every public or protected method has a Javadoc
comment, right? No. You know, it’s really terrible because if you don’t have a comment
telling you what the specification is, what is the specification? Who knows? You have
two choices. Either you guess, in much case your program probably doesn’t work, or you
read the code in which case, the implementation becomes the specification and it’s over specified
and you no longer have the freedom to change that implementation at all. So document everything
and there are–you know, what does this mean for classes? Just tell me what an instance
of that class represents. From methods, tell me the contract between the method and its
client. That is what must be true before I call it. What will be true after it returns
and any side effects. Those are particularly important. If you had side effect and you
don’t document them, people will shoot themselves in the foot. I’ll give you an example of this
later. For parameters, people often forget. Don’t just say, “Hmm, the size of the block.”
It’s the size of the block in bytes or in megabytes or whatever. You’ve got to tell
me what the units are, the form, if it’s a string, especially. I’ve got to know. Is it
XML? You know what form is this string in and finally, the ownership. If I’m passing
an object into an API, do I still only object? Am I free to modify it after I passed it in
or have I transferred ownership from my self to the object to which I passed that other
object? If the thing that you’re defining is mutable, that is, can be modified, then
you must document the state space very carefully. If you have a badly documented space–state
space, then you have no hope of being able to use that API properly. Because people won’t
know when it’s legal to call what and what will happen after the call is made. I may
discuss this elsewhere but an example of how to do this very badly, our date and calendar,
in particular the calendar API in Java. The state space is almost undocumented and it
caused numerous bugs over time. I’m happy to say by the way, that just days ago, Sun
decided that we would be pursuing any JSR to improve the date and calendar APIs based
in part on [INDISTINCT] time and who knows what or else. But to me–we may finally be,
you know, free of that mess. You should consider performance consequences of API design decisions.
And this is funny because this tends to, you know, contradict the advice. You’ve all heard
that, you know, a premature optimization is evil. In fact, I have an old essay about that
in effective Java. However, that doesn’t mean that you can just ignore performance. Jon
Bentley was big on this fact. And in particular, it turns out that bad API decisions can limit
performance. Examples of things that can limit it are making the type mutable when it should
be immutable or vice versa, providing a constructor instead of a static factory, using implementation
type in an API instead of an interface, which means that people will always have to use
that particular implementation even if a better one comes along later. But the converse of
this is, never work an API to gain performance. Every once in a while you have something that
sort of temporarily broken. You know, this thing is slow and in order to avoid this slow
thing, you break your API. The thing that used to get slow becomes fast but your API
is still broken, you know. So, design your APIs for the long haul. Luckily, good API
design usually coincides with good performance. Here’s an example of an API design decision
that led to bad performance. In the original AWT, there was something called a dimension.
If you had a component and you ask for its size, you got back a dimension object, which
contained two coordinates. It’s simply couple of longs that were wrapped. The problem is
that the dimension object was mutable. Those longs weren’t really wrapped. You know, they
were–they were publicly visible, mutable fields. And what that meant was every time
you called getSize, you had to allocate a new dimension object because otherwise I might,
you know, get the dimension object, give it to you, you might ask for it and then you
might modify it modifying his copy as well. And that would be really bad. Now these two,
you know, independent threads or computations will be tied together in nasty ways. Is that
bad? You know, is it really expensive to allocate a little object containing two longs? No,
it’s dirt cheap but unfortunately this thing gets called millions, literally millions of
times in a goyap So, you know, all of a sudden, you’re basically allocating megabytes of garbage
and that really does cost you. It’s garbage collector pressure that you just don’t need.
And it was fixed in 1.2 by adding methods that return each dimension individually as
a primitive type which is in fact immutable. But, you know unfortunately, old codes that
used the 1.1 APIs is still slow and will always be slow. The APIs that you write have to coexist
peacefully with the platform. So, that means that you have to do what’s customary for that
platform. You have to obey its naming conventions. You have to avoid anything that’s just verboten
in that platform. You know, whatever it is, if you’re in java, you know, if you are in
C++, everybody knows that there are certain things that you just shouldn’t do or shouldn’t
use. So, learn what those things are and then avoid them. And there are–there are generally
books that tell you the traps and pitfalls for every platform. I could recommend one
for Java but, you know. Anyway, the thing you’ve got to do is mimic the patterns and
the core APIs in the platform, because everyone who uses a platform knows its core APIs. So,
if your API feel just like one of the core APIs, then everyone, because they already
know how to use core API will already know how to use your API. It’s as simple as that.
And the real trap here is you should never simply transliterate APIs. That is the worst
way to design an API. And what do I mean by that? Suppose you have a C++ API, and you
want a Java version of the same facility. What you should not do is take every C++ class
and make a complementary Java class that contains all the same methods as the C++ class because
what was reasonable in C++ will almost certainly be unreasonable in Java and vice versa. So,
you have to basically take a step back. You have to say, “What is this class doing and
how would I do this in Java?” That’s the right way to do it. Transliterated APIs are almost
always broken. Okay. On to class design, I have 15 minutes to finish the whole–rest
of the talk. So that gives me five minutes each for the next three sections and zero
minutes for the last section. So, first of all minimize mutability. Classes should be
immutable unless there’s a very good reason to make them mutable. And the advantages are
that the class you get are simple, they are thread safe, they are–instances of reusable.
You never have to generate a new one. The only disadvantage is that you need a separate
object for each value. So, if you a huge object, let’s say a big integer that’s a million digits
long and you want to throw it away but you just want to change a last bit, if it were
mutable, you could do that in place, you know, at virtually no cost. But because they’re
immutable, you have to basically copy a megabit of data and then throw away the old megabit,
which is a little bit unpleasant. But if you do have to make things mutable, and you often
do, you should still make them as immutable as possible. You should give them a nice small,
well defined state space. You should make it clear–you should make it clear when it’s
legal to call which method. So, bad examples, as I mentioned before, are date and calendar.
Calendar has this, you know, like the roll method and when you, you know, you put date
into it, who knows what state is behind it? When it’s legal to call what? What state the
calendar is in after if you’ve used it? And, you know, did you know that when you’re using
a date formatter, if one thread uses a date formatter and other thread tries to use at
the same time, both threads are hosed. You know, it kind of feels like an immutable object,
except that it has state inside it. You can’t read that state but it’s there. You know,
so, these are–these are things that are more mutable than they should be. A good example
is TimerTask. It’s not immutable, but it minimizes mutability. In particular, a TimerTask is
inherently mutable because it represents an actual or current computation and that mutates.
But, you know, a TimerTask, what you do? You create it, you schedule it, it runs as many
times as it has to run, and then it’s dead. It’s gone. End of story. Now, there were people,
who asked us, when we were designing the timer API, “We want to reuse them, you know. It’s
like expensive to make a TimerTask.” And the answer is, “No, let it die in peace.” If you
need another one, make another one but by eliminating that sort of loop from the state
space, you make the API much simpler and much less bug-prone. You should subclass only when
it makes to do so. This is the Liskov Substitution Principle. And it’s actually very simple.
You just have to ask yourself a question. When you have two classes, in a public API,
and you’re thinking of making one of subclass to another, like through a subclass of bar,
ask yourself, “Is every foo a bar?” If you can answer that with a straight face, then
make it a subclass. If not, don’t. So, a bad example is, in Java properties extends Hash
table. Is every property’s object a Hash table? Heavens, no. A property’s object is a special
thing that map certain strings to certain other string. So, every property’s object
has a Hash table, perhaps. Typically, it’s implemented a top Hash table but you can’t
answer that answer–that is a question with a straight face or what about Stack extends
Vector? Is every stack a vector? No. You push and pump on stacks and that’s pretty much
all you do them. You might also have a peek method and a size method. But, you know a
vector allows random access, accessing by index, a stack doesn’t. So, it was really
wrong to have Stack extends Vector. And the really bad thing about it was they took away
great piece of real estate. We can’t use the name stack for a class that actually does
implement stack anymore because the name has been taken. Good is Set extends Collections.
Is every set a collection? Yes, it really is. Set is just a special kind of a collection.
It’s a collection that does not allow duplicate elements. This is a fairly controversial one.
But I believe that you should design and document a class for inheritance or else prohibit it
outright, to make the class final or have no publicly accessible constructor. The reason
for that is, is that in heritance violates in encapsulation in a–in a subtle way. And
that is to say that subclassing violates encapsulation in a way that mere method invocation does
not. This is sometimes called the fragile base class problem, where basically if you
have one class, it’s implemented atop second class. And you override a method in the first
class. It may modify the behavior of other methods in the subclass because the original
implementations of those methods dispatched to the method that you just overrode. And
then if in the future implementation of the super class, they re-implement all of these
methods, so that this method no longer is implemented in terms of this one but both
of them are implemented in terms of some third method. Then, you know, this new version of
the super class will break the subclass. And the way to avoid that is either come clean
on exactly how every method uses every other method. That is document the self-use patterns
of a class. If you’ve done that, then you documented it and designed it for inheritance.
So, it’s okay to make it non-final. Otherwise, it could just be final. Now, if you look at
the Java API, we got this wrong in many places. So, most of the concrete class in the J2SE
libraries, in particular, collection classes like, HashMap and HashSet. They’re all non-final,
but they don’t exactly define their self-use patterns. So, they’re a bit fragile. Abstract
set and abstract map and the other abstract xxx classes are good. They really are designed
and documented for inheritance. Okay, onto methods. So, this is–if you remember only
two things from the talk today, this is the second thing. By the way what was the first
thing I told you to remember?>>[INDISTINCT]
>>BLOCH: Excellent. When in doubt, leave it out. The second one is, don’t make the
client do anything the module could do. Those two things are like the fundamental rules
of API design. So, the worst thing you can do is write an API that just requires the
client to call in and out, and in and out just doing repeated calls, passing junk from
the first call to the second call. These causes boilerplate code, which as you can see it’s
red. It’s really, really bad. Why is boilerplate code bad? Because it’s an opportunity for
bugs, you make boilerplate code by doing cut-and-paste and then modify. But if you don’t do all the
modification that you should, it may still compile but it won’t do the right thing. It
is ugly, it is annoying, and it’s error prone. And here is a real live example from the W3Cs
DOM API. Suppose you have in hand an XML document and you want to print it to an output stream.
That’s a very reasonable thing to do. It should take one call, you know, print, it takes an
output stream and you’re done. But it doesn’t. Here is what you actually have to do. First
you have to import, in addition to w3c.dom, that’s fine and, you also have to
import XML Transform, xmltransform.dom and Why? I don’t know but
I do know you have to do it. And then here’s how the call should’ve looked. It should’ve
been called Right Doc and, you know, the document perhaps should have been the receiver and
you pass in an output stream and it would throw an IO exception. But here’s what you
actually have to do. First, you get a transformer and what is a transformer? I don’t know, but
I know you need one. How do you get one? Well, you take the new instance method on the transformer
factory and then that gives you a transformer factory and once you get the factory you ask
for a new transformer. So, as you can see, they read design patterns. It’s great. It’s
like filled with patterns. So now, you’ve got your transformer out of your transformer
factory and then you have to set an output properly on that transformer. You have to
set its doc type system to be the doc type from the doc. I don’t know what any of these
means but I know you have to do all of it. And then, you have to get the system I.D.
and you have to–that’s part of this output property that you’re setting. And then when
you’re all done, you can actually do the output. The way you do that is you call transform
on the transformer. And you just don’t pass in the doc source because that doesn’t implement
the right thing. You use the adaptor pattern here to take your doc and you turn it into
a DOM source and then also you can’t quite just pass in your output stream. You wrap
it in a stream result and now you get your output, right? Except for one tiny problem,
which is it can throw a transformer exception? When can it throw it? Well, never actually
but the API says it can and it’s a checked exception, so we have to catch it if it gets
thrown and then we throw an insertion there because it can never happen. So, you know,
you’ve got like, whatever it is, six lines of just unreadable garbage code give you something
very simple. If they had started with a used case, people might want to print their xml
documents. You get the idea. Another general rule when you’re designing class–a method
is, don’t violate the Principle of Least Astonishments. The user of an API should never be surprised
by its behavior. It’s worth extra implementation effort. It’s even sometimes worth a reduced
performance not to surprise the users of your API because if you surprise them, what will
happen? They’ll simply do the wrong thing. They’ll think it does something, it’ll actually
do something else and their program will be broken. Here’s a real example from the thread
API in Java. So, we have this method called interrupted. You got a thread and you want
to check if it’s interrupted. You call thread.interrupted. And what does it do? Well, it tests whether
the current thread has been interrupted. Oh, and by the way, it clears the interrupted
status of the current thread. That’s just like a little side effect. It clears the interrupted
status of the current thread. Looking at the name, you know, thread.interrupted, there’ll
be no way to guess that it does this. But it does this and many people have, you now,
spent hours chasing bugs because of it. You know what is the primary thing that this call
does? It clears the interrupted status. It’s not an unreasonable call but it should’ve
been named clear interrupt status and by the way, it could’ve returned the old interrupt
status as a favor to you. But they named it based on–so, the second most important thing
it did, instead of the first most important thing it did. And in doing so, they violated
this Principle of Least Astonishment. You should fail fast. Whenever there’s an error,
you should tell the user of your API as soon as possible after the error has happened.
Ideally, you should tell him at compile time, you know, because this way it happens in the
lab or in the, you know, here where the program is being written, instead of out in the field
where the program is being run. And that means that I believe that static typing is a very
good thing. It moves errors from runtime to compile time. I understand that this is another
highly controversial topic. But I do believe it. You know, I’ve seen it happened. You know,
for example, when increased the static typing in Java by adding generics. I found bugs in
preexisting code, you know, because it forced to me to be more specific about the types
that were expected and they told me where things were wrong. If you are only going to
be able to find out an error at runtime, you want to find out the first time you do something
wrong. Like, if you pass some garbage into something else, you should find out as soon
as you pass the garbage in, not ten minutes later. Here’s the example of how not to do
it. In the aforementioned properties class that extends hash table. If you look at the
spec, it has properties instance maps strings to strings. But if you look at this put call,
it takes an object key and an object value. Any object, maybe a string, maybe something
else, but it tells you right away if you pass in something that’s not a string, right? If
only it were so. It doesn’t, in fact, tell you right away. It lets you put in any garbage
you want and then, ten minutes later when you try–when you call the save call to an
output stream, which basically takes his property’s object and translate into some garbage that
isn’t quite XML. Then and only then does it blow up with a class cast exception because
you put something wrong into it ten minutes before. But by that time, debugging becomes
almost impossible. You don’t know where the call was that put the garbage into your properties
object. You should provide programmatic access to all data that is also available in string
form. This is really important. Whenever you have a method that return something as a string
you should also have a method that returns the same stuff in programmatic form. If you
don’t do that, then clients will have to parse the string. Not only is that a pain in the
butt but it turns the string into a de facto part of the API. You can never add information
to that string because there’s code out there that pausing that string. And if you change
the format of the string, you break that code. So, what you should do is along with the API
that gives you the printable string, you should have other APIs that give you access to–excuse
me, to the actual information and this way you can add more information to the string
later. And in fact, the spec should say that you are not specifying the format of the string
and that anyone who writes code to parse the string is taking their lives into their hands.
So, you know, a bad example is, initially the only way to get the stack trace in Java
was to call this gets stack–sorry, prints stack trace API and people actually did go
parse those things. In 1.4, we finally added get stack trace API that gives you all the
same information, a stack trace element consisting of the filename, a line number, class name
and so forth. But this was a case of sort of the horse had already left the barn. You
should overload with care. Method overloading can be a good thing but it tends overused.
You should avoid ambiguous overloading. That is multiple overloading that can do different
things when passed the same values. And a bad example, which I am guilty of here myself,
is TreeSet has two constructors; one that takes collection, one that takes a sorted
set. The first one ignores the order of the thing that was passed in. The second one says,
“Gee, I’m making a tree set out of another sorted set, I might as well order it in the
same way.” Well, here’s the problem. If you have a sorted set that’s cast to a collection,
then you’re calling this constructor and you get one result. Whereas, if you don’t cast
it, if you just pass it in, you get another result, so I really should not have done that
way. I should’ve done a dynamic test. If the thing was the instance of sorted of set, then
I should’ve preserved the order. So, you know, the basic rule here is just because you can
doesn’t mean you should. Often, it’s better to simply give something another name rather
than overloading. Overloading can be a real trap. If you use appropriate parameter and
return types that means you should favor interfaces that were specific classes. You should use
the most specific possible input parameter type. If you accept, let’s say, a collection
but you’ll blow up unless somebody passes you a set that’s broken, you’ve just taken
something that could’ve been caught at compile time and instead you’re catching it at runtime.
Here’s a really, sort of, another trap, don’t use a string if a better type exist. In these
days of, you know, XML and web services, people always start off with strings. Strings come
in over the web. Just because it started as a string doesn’t mean it should stay as string.
You should turn it into something more reasonable and leave it there. A really bad example of
this that I saw in a program years ago, was a program that passed around a string for
its whole duration of its execution, that was either yes or no. We have a good data
type for that it’s called boolean. You should never use floating point types at Google.
You guys already know all these but never use the floating point types, float or double
for monetary values. They’re not good enough to represent money. You cannot do exact computations
base 10 using floating point numbers–binary numbers because of the fact that 1/10th is
not representable as a binary fraction. So, don’t do it. If you have you know, an amount
of money, use big decimal. Perhaps, use long big integer or what have you but do not use
float or double. And when you are faced with the choice of using float or double, you should
almost use double rather than float because, you know, typically double will run just as
fast and, you know, you lose real and important precision by going down to float. Let see,
I’m going to have to just run through the rest of this. This one is really important.
Use consistent parameter ordering across methods. So, you know, here is an example of what not
to do. A real example from Unix. We have two methods to copy data. One is called store
and copy. One is called b-copy. The first one takes a destination, a source and a size.
The second takes a source, a destination and a size. So, what happens, you know, if somebody
assumes one ordering when they call the other method? They clobber their source data with
whatever garbage was in their destination array. And how long does that take to find
that bug? Probably a really, really long time. This is particularly important when the types
of the two parameters are identical because if you switch them around, you will not know
at compile time. A good example here is in Java.util.Collections, the first value is
always the collection being mutated or manipulated. Similarly, in util concurrent, when you have
an amount of time, you always specify it as a delay followed by time unit never other
a way around. And even if it were the other a way around, the compiler would tell you
because these are strong types that are incompatible. A long and the time unit are two different
things. You should avoid long parameter list. Ideally, you should limit them to three. It’s
really easy to remember three things especially, you should avoid long list of identically
typed parameters because of the problems that I’ve told you before. If you get the order
wrong, you’re hosed. So here’s an example of what not to do. This is from the Win3 to
API, to create a window. You know, if you look in the middle of these, whatever 15 parameters,
you see Int x, Int y, Int end with, Int height. So, here’s, you know, a whole string of Ints
and by the way some of these other things are also Ints, just Int by another name. So,
you know, without support from an IDE it’s pretty much impossible to use this API. Luckily,
there are number of great techniques for shorting parameter list. One thing you can do is break
up a method into a multiple methods, or you can create a helper classes to hold the parameters.
A specific example of the helper class is the builder pattern, where if you got a constructor
or a static factory that naturally would take 10 parameters, most of which you don’t have
to specify, most of which have good defaults. Instead, make a builder and then just plug-in
the ones you actually care about and then call a build method. And that code will be
much easier to write and to read. You should avoid return values that demand exceptional
processing. In particular, you should never return a, “No,” instead of a zero-length array
or collections. Here is an example of something we got wrong. The–in the buffered image op-class,
we have a method called get rendering hints. And either, it returns a rendering hints collection
or it returns, “No.” And what’s the consequence of this? Almost all code that calls this thing
is wrong because it rarely returns, “No.” So, people forget to code for that special
case. If they do get a no, what happens? No point or exception and it’s completely unnecessary.
If they just returned a zero-length collection, then they wouldn’t got into any trouble. So,
I think I’m actually going to skip the rest of the talk because I’m out of time. In fact,
I’ve already used up five minutes more time than I have. But if you have any other collection–try
again. If you have any other questions, I’ll be around for a while afterwards and you can
ask me anything you’d like. So, thanks for coming. Oh, one other thing, which is I have
a–I have a give-away for you, which is–when I gave this at OOPSLA in the proceedings,
they gave me two pages to put in a–what do you call that? Like an extended abstract.
And instead of doing that, I tried to do API design by bumper sticker. I try to basically
take this entire talk and break it down into 50 little maxims, like, when in doubt, leave
it out or, you know, all programmers are API designers, each with a sort of a sentence
describing, you know, in a bit more detail. So here, pass this out amongst yourselves,
as best you can. And I’ll put it up on, you know–and tell the JJB in case there are more
of you than there are piece of paper. So thanks again for coming.

71 thoughts on “How To Design A Good API and Why it Matters

  1. Great talk. I'm so glad you mentioned the importance of naming. In fact it shows up twice.

    Just the other day I was asking programming friends for help with naming. I wanted to check for a win on the diagonals of a tic-tac-toe board. And I wanted a name better than "IsTopLeftBottomRightDiagonalAWin." 😀

    Often when I ask for naming help I'm met with "Just use that name and move on." After hearing that enough I began questioning if I was wasting my time. It's good to hear that naming IS coding.

  2. Hmmmm.
    About your problem, making a name for a function that checks for win on a tic-tac-toe board.

    "IsTopLeftBottomRightDiagonalA Win."

    Huh? I don't think so.

    Why would you make a separate function checking each winning combination?

    Wouldn't you just make one function that checks all combinations?

    Anyway, the problem is in your design.
    chris69666 made a good point.

  3. I agree that naming is ultra important! but sometimes naming things stalls writing code.
    I find naming some functions super hard, so i rank accuracy with more words over being concise.
    If you can refactor, then you can come back and rename things to make sense later with more time.

  4. I just love these enthusiastic speakers, with loads of experience, and a good sense of humour … ! Excellent.

  5. I'm going to say this, from 22:00 – 26:00 is without a doubt, the best advice I've ever received regarding API design. It goes something like this: "When in doubt, leave it out"… and yes that applies to you 🙂

  6. Awesome speaker, so dedicated. I'd love to have the opportunity to chat with someone like this guy some day. Kudos to him.

  7. @jjmontesl Josh is even better in person. His sense of programming aesthetics is strong and they inform his entire being. Even his daily speech is type-safe and semantically precise; he often corrects himself to achieve these goals. He hired me into my first programming job at Transarc in 1990. Working with him was like going to graduate school for CS. To top it all off, Josh is friendly, interesting, funny and of excellent character. It is good to hear that he still quotes Jon Bentley.

  8. @hecatombe youtube wont let me post even a mildly obfuscated URL. The slides are up on examville under the title of the talk.

  9. I searched for the talk title, as a whole string, and got a link at which seems to have all the slides. An hour is a bit long to sit through at work, even when the speaker knows his onions like this guy.

  10. Pretty good presentation.

    Too bad someone this good actually believes that the compiler should be used to debug code, including type issues. It really brings his credibility into question.

    It is impossible to use a compiler, especially the Java compiler to root out all type-errors. He completely ignores the role of unit-tests to cover everything that the compiler misses.

    If the compiler is part of your testing tools, or worse, your only testing tool, it is time to find a new profession.

  11. An amusing part is when he rightly talks about avoiding boilerplate and abusing patterns yet at the end he does exactly that by talking about adding Builders and Factories which are nothing more then unnecessary boilerplate. Java is one of the few languages that needs this kind of boilerplate to be usable

    Mr. Bloch is intelligent and highly capable, it isnt my intention to say otherwise. It is proof that Java is harmful to ones thinking. He has been twisted and lives in such a rigid box, sad

  12. @hecatombe Do a google search for "How To Design A Good API and Why it Matters "

    The first hit for me is a link to the .pdf

    I also see your comment is 3 years old …

  13. Transcription errors:
    00:23:17 club –> CORBA
    00:28:40 sequel exceptions –> SQL exceptions
    00:35:25 [INDISTINCT] –> Joda Time
    00:58:45 no pointer exception –> NullPointerException

  14. You are wrong. The compiler CAN and should help find type errors. In fact, it's one of its most important roles and languages like ML or Haskell enforce strong typing. Things like casting are not allowed. Now Java is not a strongly typed statically typed language, so the ability of the compiler to catch errors is limited. Still, generics have improved things in that regard.

  15. Sure ML, Haskell(and others like Scala) are much better at handling types but this is Java we are talking about…

    Anyone who says Java's type system is good(for any rational definition of good) is deluding themselves.

    I have seen it way too often, "programmers" using the compiler to test their code.

    BTW, dynamically typed languages often disallow for casting as well.

  16. It would be a shame for you not to build muscle when these other people do it so easily using Lean Body Blaster (Look it up on google).

  17. DO NOT make all your Java members private! That is idiotic. In fact private members is the biggest mistake of all time!!! It totally destorys code reusability. Instead of private make them protected. This way subclasses can still access them, but they aren't public API either.

  18. I'm not so sure about that. I agree that protected is often underused, but I wouldn't say that private members are inherently bad. They're good to use, for example, when the public methods of a superclass are engineered to safely manipulate the object's internal state to maintain some invariant. You don't want that internal state to just be protected, because then subclasses can circumvent the carefully-designed public methods and mess it up by accessing it directly without any checks.

  19. It is not bad at all! That's a widespread and terrible misconception. Being able to modify code is the very crux of code reusablitly. Being able to sublcass and modify makes it possible to create functional variations. Private members destroy that possibility. The Question of broken dependency is a separate issue, and should be addressed via versioning, not private members.

  20. Excellent concepts. Was looking for a way to improve my coding skills and this nailed it. 

    These are not simple techniques that fade away in a couple of months or years. These are concepts that prevail through time (Watched it @ 2014). 


    I've always named variables, functions, classes, methods, and everything else with such squeaky words. 

  21. Given how much money Google has, why didn't they produce a better quality video?  A good talk is hidden behind all the blurriness.

  22. Is there a higher quality version of this out there? This is only 240p and I suspect even in 2007 the source footage was higher than that. I would like to show this at our weekly developers lunch at my company.

  23. @GoogleTechTalks audio appears desynced, can we get this corrected? otherwise this is one of the BEST talks ever given, and it transcends languages, platforms, tools, etc. THANK YOU FOR THE UPLOAD

  24. At 43:40, An example would have been so much bettter . Speaker using hands to explain such an important point didnt really communicate it very well!

  25. Here is a brief summary:

    Getting the requirements for your API:
    -extract "true requirements" from your customers (often, customers give you solutions as requirements, when they should be giving you a problem)
    -extract requirements in the form of use cases (i.e. "as a shopper, i want to be able to add items to my cart")
    -create a small specification (about one page) and create it fast. Then present it to your customers for feedback, modify, and repeat. In other words, create the specification in an "iterative" manner (with constant feedback from your customer)

    Writing the API:
    -make your classes/functions small and cohesive. If its hard to name, its doing too much.
    -name, name, name. Naming is everything. Better names, makes your API easier to learn/use.
    -use the conventions/idioms/patterns that are common in whatever language/platform your API is meant to be used on. This reduces the amount your clients have to learn in order to use your API.
    -document every single class/function. For a class, what does it represent? For a function, what does it do, what are its pre/post conditions and does it have any side effects?
    -keep everything as private as possible (private classes/member functions/fields)
    -design your API with change in mind. You will not get it perfect the first time. Try to keep that in mind when designing your API and you're more likely to design it in such a way that it can evolve/adapt later when needed.
    -"when in doubt, leave it out", if you're not sure about adding something to your API, don't. You can always add later when you are SURE. If you add it now, when your not sure, you may get it wrong, now your clients are using that wrong API and you can't change it. Remember, you can always add to your API, but you can't remove without breaking client code.
    -keep performance in mind, but don't sacrifice readability for it.
    -keep classes as immutable as possible. Even if you can't make a class completely immutable, make it as immutable as you can.
    -only inherit when is-a relationship exists.
    -if a class is meant to be inherited from, document it
    -if a class is not meant to be inherited from, prohibit it (some languages let you enforce this, in others, you can just write something like "Do not inherit" in the documentation) (the reason for this is that inheritance violates encapsulation/information hiding)
    -don't make your client do anything you can do for them (this reduces the amount of boilerplate code your client will have)
    -don't violate principle of least astonishment (clients on your platform are used to certain behavior in certain situations, go along with this!)
    -fail fast (i.e. if an incorrect input is sent to your function, stop and display error immediately (best is at compile time, but sometimes you have to do it at runtime))

    Here is a summary of the above summary 🙂

    -extract true requirements into a small (~1 page) specification iteratively
    -name/document each and every single one of your classes/functions properly (which naturally tends to make them small/cohesive)
    -don't violate principle of least astonishment (use style/convention/idioms/patterns/behavior present in your clients platform)
    -don't violate encapsulation/information hiding (don't do inheritance unless its truly an is-a relationship, keep classes/functions as private as possible, etc…)
    -keep performance in mind, but don't sacrifice readability for it
    -keep classes as immutable as possible
    -fail fast
    -you can always add, but you can't remove (w/o breaking client code)

    Hope that was useful to someone 🙂

    Excellent presenter/presentation. I learned/reinforced a ton. Was also very fun to watch. Thank you very much for sharing this talk.

  26. Slides can be found at although you might need to add the PDF extension to the downloaded file

  27. Big pig apis… oh my god, Win32… every damn method has 10 to 20 parameters taking security descriptors and handles to fuck knows what else… sometimes null is fine othertimes null is BAD. ugggh I hate it so much

  28. This is a mixed bag of good and bad ideas, couldn't keep going after "When in doubt leave it out". It is so not as simple as what he is pitching it to be.

  29. It's really amazing how 12 years later, this talk is still relevant as if it came out yesterday!

Leave a Reply

Your email address will not be published. Required fields are marked *