On ES 6 Modules

Note: This was written quite some time ago. The current ES6 Module specification has changed a lot since then, and many of my complaints have been addressed. Treat this essay as a piece of history. click here to acknowledge this caveat and read the essay.

A few things have rubbed me the wrong way about the current Modules and Module Loader specification. I regret that I have not been very clear about what exactly my objections are, and worse still, I have not been very clear about what I think a better direction would be.

Yes, this will be sent to ES-Discuss as a proper discussion proposal. This blog post is phase three of getting my ideas in order. (The first being “get annoyed that current proposals don’t fix the problem”, and the second being “rant about it with friends and colleagues on twitter and over tacos.”)

First of all, I want to put to rest any ideas that I’m a die-hard JavaScript language conservative ideologue who will oppose any change whatsoever. I am well aware that what we have is lacking. I would love to see changes that make it easier to write JavaScript module systems, and debug programs that use them.

I believe that there is a place for new syntax, especially in cases where it allows for optimizations in the implementation, ease of reading, or run-time behavior that must occur before the program is executed (that is, static analysis stuff.) Parsing a plain-old JavaScript AST is certainly possible, but it’s a bit unfortunate.

We’ve spent a few years now doing modules in JavaScript. The claim that “JavaScript needs modules” is thus somewhat misguided: JavaScript needs better modules. Leaving aside Node.js for the moment, modular systems in JavaScript generally:

Impose boilerplate restrictions on the programmer. This is ugly and error-prone, and there is no easy way to catch many of these errors early.
Are not inter-operable with code that uses a different module system (or none at all).
Either require that all modules be present in the page at the start, or delay the execution of the program unacceptably. (No one does sync XHR. I’m talking about r.js/AMD and the YUI3 seed file here.)
Leaks internal implementation details in unfortunate ways, so that users are sometimes surprised when behavior violates intuitions.
Do not isolate global leakage, making a missing var a felony, when it shouldn’t even be a misdemeanor. (At best, they wrap in a function.)
To varying degrees, line and column numbers are obscured. (Sometimes just the first line’s column; sometimes the stack traces are completely meaningless.)

All of these problems are issues with Node.js as well! We paper over #3 by using a package manager and requiring that the modules in your program are available on the disk at the start, but it’s still in my opinion unacceptable. We have the advantage of “startup time” and “run time” separation, but really, <script> tags are a web browser’s “startup time”, and the rest of the time is its “run time”. Build processes allow one to trade run-time delay (and complexity) for up-front download size (and a simpler synchronous require()), but generally only by making the other problems worse.

TC-39 has one chance to specify a Module system that can address each of these issues, or allow host platforms to address them effectively. Problems introduced here will be with us forever. A half-way fix will be prohibitively expensive to fix once it’s in use, so we’ll be stuck with mistakes for some time (as we in Node.js are stuck with the mistakes in our system.)

The Good in the Current Spec

Though I think it has deep problems, there are very good parts in the current spec:

It clearly is attempting to reach a module system that addresses the needs of Node.js (and whatever on-device JavaScript platforms succeed it), as well as those of browser-JavaScript authors and platforms like RequireJS and Browserify.
The issues around globals and scope are pretty solid. I don’t have much to complain about there. Any changes to global behavior come along with a pile of edge cases, but they’re pretty thorougly evaluated and addressed.
The goals of both the Module and Loader proposals had me pretty much cheering. It seems like TC-39 is actually interested in solving a problem, and that gives me hope.

More than anything, reading the spec makes it clear that the problem is fairly well understood. However, the presented solution seems to be headed in the wrong direction.

Problems with the Current Spec

I’m not going to go through the issues that I have with the current spec one by one. It’s tedious and not the conversation we should be having. I’ll detail my alternative proposal below.

However, there are a few points I’d like to highlight, because they are issues that probably ought to be informed by the experience that I and other module system authors and users can provide.

It seems to be based on the assumption that nesting module systems is a thing that people want. Historically, in Node, we’ve made several API decisions based on the explicitly stated requests to make the module system more extensible and flexible. In practice, none of the supposed innovation panned out, and every one of those decisions was a huge mistake that increased flexibility with no tangible benefit.

People don’t want to write module systems. People want to stop writing module systems. Once there’s a module system in place, it should be The module system, period.

It bears repeating: no one wants to write a module system. A few of us take it on out of regrettable necessity. Anyone who actually enjoys writing module systems is too insane to be trusted. The only rational position is to do the simplest necessary thing, and as quickly as possible get to the business of building real programs with it. Optimize for that.
It puts too many things in JavaScript (as either API or syntax) which belong in the host (browser/node.js). As I said, people don’t actually want to write module systems in their JavaScript programs. They want to stop having to think about it. Node’s module system has been successful (as has require.js and browserify) precisely because it requires a minimum amount of thought on the part of the user about the module system. (It’s still way too much.)

Adding features that add complexity with the goal of making it easier to have lots of module systems in JavaScript is a mistake. Typically we can enable extension more effectively by reducing the scope of the specification, rather than by increasing it.
It borrows syntax from Python that many Python users do not even recommend using. The import * from mod syntax is dated and highly contentious in the Python community (as is import com.foo.* in Java), because it is a recipe for name collisions. Learning from real implementations is winful; but we should be avoiding their mistakes, not copying them.

Furthermore, let already gives us destructuring assignment. If a module exports a bunch of items, and we want several of them, then do var {x,y,z} = import 'foo' or some such. This import <x> from <module> as <blerg> is 100% unnecessary, adds nothing, and solves no problems. It does not pay its utility bill.
It favors the “object bag of exported members” approach, rather than the “single user-defined export” approach. Node.js uses an exports object because the CommonJS approach seemed like a good idea at the time, and it works around the fact that we have no good way to handle transitive dependencies except via unfinished objects.

However, it is widely acknowledged in the node community that using the module.exports = xyz style generally results in better programs. Changes at the language level can likely address the transitive loading issue more powerfully, and so should encourage the known best practices.

A Simpler Proposal

Clearly, the problems with the current state of JavaScript modules cannot be solved with zero changes to the language. Some cannot be changed without adding syntax. However, every change carries with it a cost. Therefor, it seems like the ideal approach is to try to find the minimum possible change that will address the issues – and, we ought to be ruthless on which bits of functionality don’t make the cut to be worth the risk. If we can get away with a much smaller fix by refusing to address part of the problem which is inessential, then that is the right course of action.

I don’t know if this is minimal enough, but I’d like to propose the following, which I think picks some of the most essential aspects of the Loader and Module proposals. It’s very rough, and there are a lot of unanswered questions. But in general, this is what I would like to see from a Loader specification.

This is very rough, and needs a lot of polish and edge-case exploration. I’m not pitching it to get it accepted, I’m sharing it to hopefully help pull the conversation in another direction, and help make it clear what a better proposal might look like.

(I’ve numbered them simply so that I can refer to bits later, not so much because they’re a list of like things in order. I’m a spec n00b.)

A Loader built-in object, with a few methods that must be specified before modules can be used. (And will typically be specified by the host object.)
Within a module, the import <pathString> syntax that can be easily detected statically in program text before evaluation, and returns a module’s exported object. var foo = import 'path/to/foo.js'. Import returns a single value, always. The path must be a string literal. The import keyword is an operator, not a function, and thus cannot be assigned etc.
Loader.define(<path>, <program text>) defines a module at the specified <path>, with the <program text> contents. That <program text> is statically analyzed for any import statements.
Whenever an import <path> is encountered in <program text> then the Loader.resolve(requestPath, callerPath, callback) is called. This method should return a fully qualified path. If this method returns boolean true, then it will not be considered resolved until the callback is called. (The argument to the callback is the string path.) If it does not return true, and does not return a string path, then this is an error, and throws.
Once a module is resolved to a full path string, then Loader.load(fullPath, callback) is called. callback should not be called until Loader.define(fullPath, contents) is called. This should be called at most once for any given fullPath. (Is the callback even necessary? Why not just wait for Loader.define and throw any errors encountered?)
The Loader.main(fullPath) method executes the module referenced by fullPath (which must have already been defined), as well as evaluating each of the modules that it imports.
Within a module, the export <expression> statement marks the result of <expression> as the exported value from the module. There can be at most one export statement in a module, and the exported expression is the module’s export. To export more than one thing, export an object with more than one thing on it.

Modules export a single value. Exporting a second time throws.

Maybe this is not a valid cause for syntax addition. I’m not sure. There are hairy problems around cyclic dependencies, so it’s worth at least having the option to address with static magic that has not yet been fully imagined.
The global object within a module context is equivalent to Object.create(<global>) from the main global context. (The important thing is that leaks aren’t leaky outside the module, but for example, x typeof Error still works, because it uses the same Error function.)
If a module does not contain an export statement, then its global object is its export. This is to provide support for legacy modules that create a global object (such as jQuery) rather than using an export statement. (Too magical? Probably. Also, what about having exports inheriting from global is weird. Is there a simpler way to make existing libs place nicely with this approach?)

The default values of Loader.load, Loader.define, and Loader.resolve would typically be set by the host environment. However, for reasons of simplicity, they must be set in normal program text (ie, not in a module), and modules should not have the ability to alter them.

In web browsers, modules could be defined straight away by using a new attribute on the script tag: <script module src='http://src.com/foo.js'></script> would be equivalent to doing Loader.define('http://src.com/foo.js', '<contents of foo.js>).

Bundler programs could trivially translate files into modules using Loader.define (rather than wrapping in a IIFE), or JavaScript files could be loaded as-is, without requiring that existing libraries begin using any module { ... } syntax.

In Web Browsers

Web Browsers could implement the Loader object thusly:

Loader.resolve(request, from) Uses standard URL-resolution rules.
Loader.define could be sweetened by a <script module> tag.
Loader.main could be set via a <script module main> tag.
Loader.load could fetch the URL, and evaluate the contents, as if it had been added to the document with a <script module src=...> tag.

For additional extensibility, these methods could be overridden by, for example, browserify or RequireJS.

For security, the Loader object could be frozen with Object.freeze to prevent additional changes.

I’m in no way attached to the specifics of the tag spelling. My point is that we in the JS community should specify the loader semantics, and then let host objects take advantage of them in application-specific ways.

In Require.JS

RequireJS and other AMD platforms would be pretty much made mostly obsolete by this specification, since the principle of AMD would just be “how it works” in web browsers by default. But, without the unfortunate boilerplate, and the resource loading mechanism could kick off much sooner, since import statements can be detected long before the script is actually run.

In Browserify

Most browserify modules would Just Work if they replaced require(..) with import ... However, it would probably be necessary to extend the Loader methods to provide shims for Node.js built-ins (ie, path, fs, url, assert, net, http, etc.) as well as pre-define node_modules dependencies into the browserify bundle.

However, Browserify’s static analysis build step could be made much more effective by using a designated import operator rather than relying on knowledge of a require function.

In Node.js

Loader.resolve(request, from) would do the current node_modules and NODE_PATH dance.
Loader.define would replace the existing module wrapper stuff.
Loader.main would be called on the module specified as a command line argument.
Loader.load would be very straightforward FS operations.

This would also set the stage for making node-core itself more modular, and we could even explore new approaches like detecting module dependencies from code, rather than requiring the use of a package.json file, which is very exciting.

What’s Missing from this Proposal

There is no module syntax in this “module” proposal. That is because it is unnecessary, and its omission is intentional.

A lot of work has also been done on the Harmony Module Loader proposal to flesh out some details of the Loader object. Most of this is good stuff. However, by removing the Module syntax portion of the proposal, a lot of those things can be streamlined.

It’s also worth mentioning that this approach make sourcemaps unnecessary for useful stack traces, even in bundled or concatenated code, as the Loader.define() syntax would function as a sourcemap.

While the experience in the wild has shown us that the “export one thing” approach is definition sound, I’m not sure exactly how to handle the transitive dependency issue in a way that doesn’t involve unfinished objects, or cause breakage in cases like this:

// x.js
var y = import './y.js'
// y.js
setTimeout(function() {
export { fooled: 'you' }
}, 100)

Even more insidious is something like this:

// x.js
var y = import './y.js'
export { real: 'x' }

// y.js
var x = import './x.js'
assert.deepEqual(x, {real: 'x'}) // nope!!

Because x sets its export after being loaded by y, the assignment

does nothing. Currently, in Node.js (and most other systems as well) this is not handled, or not handled very well at least.

Is there some way that it could somehow pass an object to the x module that would get swapped out behind the scenes when y.js changes its export? Is that too magical? I’m not sure.

Next…

My hope is that this post will help spark a more interesting conversation than the current tendency towards “YAY/BOO” that is so common in the internet. This isn’t politics. We’re not voting for parties. The goal is to figure out the best API, which is a complex thing. The solution space is wide, and it is naive to reduce it to a boolean prematurely.

I would like to try out some implementations of this in Node.js as soon as possible. Also, I’d love to hear feedback about which parts of this you think are unnecessary or impossible.

Let’s not forget that we all want these problems solved. No reasonable person thinks that JavaScript programs are optimally modular today. Most people who enjoy Node’s module system only like it because they’ve never taken a close look at it. As one of its maintainers and chief architects, I feel qualified to say that it’s pretty terrible. (Though, in my opinion, it is the best I’ve used, and the only that is optimized for maximum utility and an absolute intolerance for boilerplate. It’s just that the language is lacking, but that’s what this is all about.)

Not every change is an improvement, but every improvement is a change. My friends in the “no new syntax” crowd would do well to remember that.

That being said, since JavaScript cannot be easily changed, and can only be changed in one direction, we must be very careful to make sure that every change is an improvement. It’s more important to proceed carefully than to proceed quickly.

Future generations will thank us for our care, and curse us for our haste.

2012-06-26

On ES 6 Modules #