# Module loading

#### Modules

> If V8 is the engine of Node.js, npm is its soul!

npm is the world's largest module repository. Let's take a look at some data:

* Approximately 210,000 modules
* Billions of module downloads per day
* Billions of module downloads per week

This led to the creation of a company that manages npm packages called `npmjs.com`.

#### Module Loading Preparation Operations

Strictly speaking, there are several types of modules in Node:

* Builtin modules: Modules provided in C++ format within Node, such as tcp\_wrap and contextify.
* Constants modules: Modules that define constants within Node used to export definitions for things like signal, openssl libraries, file access permissions etc. Examples include O\_RDONLY and O\_CREAT for file access permissions or SIGHUP and SIGINT for signals.
* Native modules: Modules provided in JavaScript format within Node such as http, https and fs. Some native modules require builtin modules to implement their functionality behind-the-scenes. For example, the buffer native module still requires node\_buffer.cc from builtin to achieve large memory allocation and management outside V8 memory size usage restrictions.
* Third-party modules: All other non-built-in third-party modules such as express.

#### Builtin Module and Native Module Generation Process

![](/files/rfj3Y6r9EBKYMi6mYiEm)

The generation process for native JS module is relatively complex. Downloading node source code then compiling it will generate a file named `node_natives.h` located under out/Release/obj/gen directory.

This file was generated by js2c.py which converts all JavaScript files under lib directory along with every character from node.js under src directory into corresponding ASCII codes before storing them into respective arrays.

```c++

namespace node {

const char node_native[] = {47, 47, 32, 67, 112 …}

const char console_native[] = {47, 47, 32, 67, 112 …}

const char buffer_native[] = {47, 47, 32, 67, 112 …}

…

}

struct _native {const char name; const char* source; size_t source_len;};

static const struct _native natives[] = {{ “node”, node_native,sizeof(node_native)-1 },

{“dgram”, dgram_native,sizeof(dgram_native)-1 },

{“console”, console_native,sizeof(console_native)-1 },

{“buffer”, buffer_native,sizeof(buffer_native)-1 },

…

}
```

The generation process for builtin C++ module is relatively simple. Each entry point of a builtin C++ module will be expanded into a function through the macro NODE\_MODULE\_CONTEXT\_AWARE\_BUILTIN. For example: tcp\_wrap module will be expanded to static void \_register\_tcp\_wrap (void) attribute((constructor)). Those familiar with GCC know that functions decorated with attribute((constructor)) will execute before node's main() function which means our builtin C++ modules are loaded into modlist\_builtin linked list before main() function executes. modlist\_builtin is a pointer of type struct node\_module and get\_builtin\_module() traverses it to find the required modules.

For Node-provided modules whether they are native JS or builtin C++, both are ultimately embedded in ELF format binary file named `node` during compilation to generate executable files.

However their extraction methods differ. For JS modules we use process.binding("natives") while for C++ modules we directly use get\_builtin\_module(). This part will be discussed in section 1.2.

#### module binding

In node.cc, there is a Binding() function provided. When our application or Node's built-in modules call require() to reference another module, the supporter behind the scenes is the Binding() function mentioned here. Later, we will discuss how this function supports require(). Here, we mainly analyze this function.

```c++
static void Binding(const FunctionCallbackInfo<Value>& args) {
  Environment* env = Environment::GetCurrent(args);

  Local<String> module = args[0]->ToString(env->isolate());
  node::Utf8Value module_v(env->isolate(), module);

  Local<Object> cache = env->binding_cache_object();
  Local<Object> exports;

  if (cache->Has(module)) {
    exports = cache->Get(module)->ToObject(env->isolate());
    args.GetReturnValue().Set(exports);
    return;
  }

  // Append a string to process.moduleLoadList
  char buf[1024];
  snprintf(buf, sizeof(buf), "Binding %s", *module_v);

  Local<Array> modules = env->module_load_list_array();
  uint32_t l = modules->Length();
  modules->Set(l, OneByteString(env->isolate(), buf));

  node_module* mod = get_builtin_module(*module_v);
  if (mod != nullptr) {
    exports = Object::New(env->isolate());
    // Internal bindings don't have a"module" object, only exports.
    CHECK_EQ(mod->nm_register_func, nullptr);
    CHECK_NE(mod->nm_context_register_func, nullptr);
    Local<Value> unused = Undefined(env->isolate());
    // **for builtin module**
    mod->nm_context_register_func(exports, unused,
      env->context(), mod->nm_priv);
    cache->Set(module, exports);
  } else if (!strcmp(*module_v,"constants")) {
    exports = Object::New(env->isolate());
    // for constants
    DefineConstants(exports);
    cache->Set(module, exports);
  } else if (!strcmp(*module_v,"natives")) {
    exports = Object::New(env->isolate());
    // for native module
    DefineJavaScript(env, exports);
    cache->Set(module, exports);
  } else {
    char errmsg[1024];
    snprintf(errmsg,
             sizeof(errmsg),
             "No such module: %s",
             *module_v);
    return env->ThrowError(errmsg);
  }

  args.GetReturnValue().Set(exports);
}
```

Module loading

1. Builtin modules have the highest priority. For any module that needs to be bound, it will first look for it in the modlist\_builtin list. The search process is very simple, just traverse this list and find the module with the same name. After finding this module, the registration function of the module will be executed first, and an important data exports will be returned. For builtin modules, the exports object contains the interface names exposed by the builtin C++ module and their corresponding code. For example, for the tcp\_wrap module, the contents of exports can be represented in the following format: {"TCP": "/function code of TCPWrap entrance/", "TCPConnectWrap": "/function code of TCPConnectWrap entrance/" }.
2. The constants module has the second highest priority. The constants in node are exported through constants. The exported exports format is as follows: {"SIGHUP":1, "SIGKILL":9, "SSL\_OP\_ALL": 0x80000BFFL}
3. For native modules, except for the node\_native array in Figure 3, all other modules will be exported to exports. The format is as follows: {"\_debugger": \_debugger\_native, "module": module\_native, "config": config\_native} Among them, \_debugger\_native, module\_native, etc. are array names, or memory addresses.

Comparing the exports structure exported by the above three types of modules, it can be found that for each attribute, their values represent completely different meanings. For builtin modules, the TCP attribute value of exports represents the function code entry, for the constants module, the attribute value of SIGHUP represents a number, and for native modules, the attribute value of \_debugger represents the memory address (more accurately, it should be the .rodata segment address).

#### Module Loading

Let's start with `var http = require('http');`.

How does `require` work, why can we use it out of nowhere, and what does it actually do?

The following code is from [lib/module.js](https://github.com/nodejs/node/blob/v4.4.0/lib/module.js):

```js
// Loads a module at the given file path. Returns that module's
// `exports` property.
Module.prototype.require = function(path) {
  assert(path,'missing path');
  assert(typeof path ==='string','path must be a string');
  return Module._load(path, this);
};
```

First, the assert module is used to check that the `path` variable is present and is a string.

```js
// Check the cache for the requested file.
// 1. If a module already exists in the cache: return its exports object.
// 2. If the module is native: call `NativeModule.require()` with the
//    filename and return the result.
// 3. Otherwise, create a new module for the file and save it to the cache.
//    Then have it load  the file contents before returning its exports
//    object.
Module._load = function(request, parent, isMain) {
  if (parent) {
    debug('Module._load REQUEST %s parent: %s', request, parent.id);
  }

  var filename = Module._resolveFilename(request, parent);

  var cachedModule = Module._cache[filename];
  if (cachedModule) {
    return cachedModule.exports;
  }

  if (NativeModule.nonInternalExists(filename)) {
    debug('load native module %s', request);
    return NativeModule.require(filename);
  }

  var module = new Module(filename, parent);

  if (isMain) {
    process.mainModule = module;
    module.id = '.';
  }

  Module._cache[filename] = module;

  var hadException = true;

  try {
    module.load(filename);
    hadException = false;
  } finally {
      if (hadException) {
        delete Module._cache[filename];
      }
  }

  return module.exports;
};
```

Check the cache for the requested file.

1. If a module already exists in the cache: return its exports object.
2. If the module is native: call `NativeModule.require()` with the filename and return the result.
3. Otherwise, create a new module for the file and save it to the cache. Then have it load the file contents before returning its exports object.

Let's take a deep dive into the code and look at `NativeModule.require` in a recursive manner.

```js
  NativeModule.require = function(id) {
    if (id =='native_module') {
      return NativeModule;
    }

    var cached = NativeModule.getCached(id);
    if (cached) {
      return cached.exports;
    }

    if (!NativeModule.exists(id)) {
      throw new Error('No such native module '+ id);
    }

    process.moduleLoadList.push('NativeModule' + id);

    var nativeModule = new NativeModule(id);

    nativeModule.cache();
    nativeModule.compile();

    return nativeModule.exports;
  };
```

As we can see, caching is a strategy that runs throughout the implementation of Node.

* If the module is already in the cache, its exports object is returned directly.
* If not, it is added to the `moduleLoadList` array, and a new NativeModule object is created.

The following line is the most crucial:

```js
nativeModule.compile();
```

The implementation details are in `node.js`:

```js
NativeModule.getSource = function(id) {
  return NativeModule._source[id];
};

NativeModule.wrap = function(script) {
  return NativeModule.wrapper[0] + script + NativeModule.wrapper[1];
};

NativeModule.wrapper = ['(function (exports, require, module, __filename, __dirname) {','\n});' ];

NativeModule.prototype.compile = function() {
  var source = NativeModule.getSource(this.id);
  source = NativeModule.wrap(source);

  var fn = runInThisContext(source, {
    filename: this.filename,
    lineOffset: 0
  });
  fn(this.exports, NativeModule.require, this, this.filename);

  this.loaded = true;
};
```

The `wrap` function wraps http.js and compiles the source code using `runInThisContext`, returning the `fn` function which then receives the arguments in sequence.

#### process

Let's take a look at the `process` variable passed from the underlying C++ to JavaScript in Node.js. When Node.js is first run, the program sets up the `process` object: `Handleprocess = SetupProcessObject(argc, argv);` Then, it passes `process` as an argument to the function returned by the main JavaScript program in `src/node.js`, allowing `process` to be passed into JavaScript.

```js
//node.cc

// Get the converted src/node.js source code through MainSource() and execute it

Local f_value = ExecuteString(MainSource(), IMMUTABLE_STRING(“node.js”));
// The result of executing src/node.js is a function, as can be seen from the node.js source code:

//node.js

//(function(process) {

//    global = this;

//    …

//})

Local f = Local::Cast(f_value);
// Create a function execution environment, call the function, and pass in process

Localglobal = v8::Context::GetCurrent()->Global();

Local args[1] = {
  Local::New(process) 
};

f->Call(global, 1, args);

```

#### vm

What is `runInThisContext`?

`runInThisContext` is a function provided by the `contextify` module in Node.js. It compiles a string of JavaScript code into a function that can be executed in the current context. This is similar to the `eval` function, but with additional security features to prevent malicious code execution.

```js
  var ContextifyScript = process.binding('contextify').ContextifyScript;
  function runInThisContext(code, options) {
    var script = new ContextifyScript(code, options);
    return script.runInThisContext();
  }
```

* In the Binding function of node.cc, the module is registered using the following call: `mod->nm_context_register_func(exports, unused, env->context(), mod->nm_priv);`

Let's take a look at the definition of the `mod` data structure in `node.h`:

```c++
struct node_module {
  int nm_version;
  unsigned int nm_flags;
  void* nm_dso_handle;
  const char* nm_filename;
  node::addon_register_func nm_register_func;
  node::addon_context_register_func nm_context_register_func;
  const char* nm_modname;
  void* nm_priv;
  struct node_module* nm_link;
};
```

There are also the following macro definitions in node.h, let's keep reading!

```c++
#define NODE_MODULE_CONTEXT_AWARE_X(modname, regfunc, priv, flags)    \
  extern "C" {                                                        \
    static node::node_module _module =                                \
    {                                                                 \
      NODE_MODULE_VERSION,                                            \
      flags,                                                          \
      NULL,                                                           \
      __FILE__,                                                       \
      NULL,                                                           \
      (node::addon_context_register_func) (regfunc),                  \
      NODE_STRINGIFY(modname),                                        \
      priv,                                                           \
      NULL                                                            \
    };                                                                \
    NODE_C_CTOR(_register_ ## modname) {                              \
      node_module_register(&_module);                                 \
    }                                                                 \
  }
  
#define NODE_MODULE_CONTEXT_AWARE_BUILTIN(modname, regfunc)           \
  NODE_MODULE_CONTEXT_AWARE_X(modname, regfunc, NULL, NM_F_BUILTIN)   \
```

* There is a macro call in node\_contextify.cc, which finally makes it clear! Combining the previous points, it actually binds nm\_context\_register\_func of node\_module with node::InitContextify.

```js
NODE_MODULE_CONTEXT_AWARE_BUILTIN(contextify, node::InitContextify);
```

We trace back up the code, from `node_module_register(&_module);`, to `process.binding('contextify')` --> `mod->nm_context_register_func(exports, unused, env->context(), mod->nm_priv);` --> `node::InitContextify()`.

By using `env->SetProtoMethod(script_tmpl,"runInThisContext", RunInThisContext);`, the `runInThisContext` function is bound to `RunInThisContext`.

runInThisContext is a function provided by the `contextify` module in Node.js. It compiles a string of JavaScript code into a function that can be executed in the current context. This is similar to the `eval` function, but with additional security features to prevent malicious code execution.

This successfully loads the `native` module and marks `this.loaded = true`.

**Summary**

Node.js solves the problem of infinite circular references through caching, which is an important means of system optimization. By trading space for time, loading modules becomes very efficient.

In actual business development, we observe that Node caches a large number of modules from the perspective of heap after starting the module, including third-party modules, some of which may only be loaded and used once. I think it is necessary to have a module unloading mechanism \[1] to reduce the occupation of V8 heap memory and improve the efficiency of subsequent garbage collection.

**Reference**

\[1].<https://github.com/nodejs/node/issues/5895>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://jianghua-yjhs-organization.gitbook.io/in-depth-understanding-of-node.js-core-ideas-and/chapter2-2.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
