Production best practices: performance and reliability生产最佳实践:性能和可靠性

Overview概述

This article discusses performance and reliability best practices for Express applications deployed to production.本文讨论部署到生产环境中的Express应用程序的性能和可靠性最佳实践。

This topic clearly falls into the “devops” world, spanning both traditional development and operations. 这个主题显然属于“devops”领域,涵盖了传统的开发和运营。Accordingly, the information is divided into two parts:因此,信息分为两部分:

Things to do in your code在代码中要做的事情

Here are some things you can do in your code to improve your application’s performance:以下是您可以在代码中执行的一些操作,以提高应用程序的性能:

Use gzip compression使用gzip压缩

Gzip compressing can greatly decrease the size of the response body and hence increase the speed of a web app. Gzip压缩可以大大减小响应体的大小,从而提高web应用程序的速度。Use the compressionmiddleware for gzip compression in your Express app. 在Express应用程序中使用压缩中间件进行gzip压缩。For example:例如:

var compression = require('compression')
var express = require('express')
var app = express()
app.use(compression())

For a high-traffic website in production, the best way to put compression in place is to implement it at a reverse proxy level (see Use a reverse proxy). 对于生产中的高流量网站,实施压缩的最佳方法是在反向代理级别实现压缩(请参阅使用反向代理)。In that case, you do not need to use compression middleware. 在这种情况下,您不需要使用压缩中间件。For details on enabling gzip compression in Nginx, see Module ngx_http_gzip_modulein the Nginx documentation.有关在Nginx中启用gzip压缩的详细信息,请参阅Nginx文档中的Module ngx_http_gzip_module

Don’t use synchronous functions不要使用同步函数

Synchronous functions and methods tie up the executing process until they return. 同步函数和方法会将执行过程捆绑起来,直到它们返回。A single call to a synchronous function might return in a few microseconds or milliseconds, however in high-traffic websites, these calls add up and reduce the performance of the app. 对同步函数的单个调用可能会在几微秒或几毫秒内返回,但是在高流量网站中,这些调用会增加并降低应用程序的性能。Avoid their use in production.避免在生产中使用它们。

Although Node and many modules provide synchronous and asynchronous versions of their functions, always use the asynchronous version in production. 尽管节点和许多模块提供了其功能的同步和异步版本,但在生产中始终使用异步版本。The only time when a synchronous function can be justified is upon initial startup.同步功能唯一可以调整的时间是在初始启动时。

If you are using Node.js 4.0+ or io.js 2.1.0+, you can use the --trace-sync-io command-line flag to print a warning and a stack trace whenever your application uses a synchronous API. 如果您使用的是Node.js 4.0+或io.js 2.1.0+,则可以在应用程序使用同步API时使用--trace-sync-io命令行标志打印警告和堆栈跟踪。Of course, you wouldn’t want to use this in production, but rather to ensure that your code is ready for production. 当然,您不希望在生产中使用它,而是希望确保您的代码已准备好用于生产。See the node command-line options documentationfor more information.有关更多信息,请参阅节点命令行选项文档。

Do logging correctly正确记录日志吗

In general, there are two reasons for logging from your app: For debugging and for logging app activity (essentially, everything else). 一般来说,从应用程序进行日志记录有两个原因:调试和记录应用程序活动(本质上是所有其他内容)。Using console.log() or console.error() to print log messages to the terminal is common practice in development. 使用console.log()console.error()将日志消息打印到终端是开发中的常见做法。But these functions are synchronouswhen the destination is a terminal or a file, so they are not suitable for production, unless you pipe the output to another program.但是,当目标是终端或文件时,这些函数是同步的,因此它们不适合生产,除非将输出通过管道传输到另一个程序。

For debugging

If you’re logging for purposes of debugging, then instead of using console.log(), use a special debugging module like debug. 如果您是为了调试而记录日志,那么不要使用console.log(),而是使用特殊的调试模块,如debugThis module enables you to use the DEBUG environment variable to control what debug messages are sent to console.error(), if any. 此模块使您能够使用调试环境变量来控制发送到console.error()的调试消息(如果有)。To keep your app purely asynchronous, you’d still want to pipe console.error() to another program. 要使应用程序完全异步,您仍然需要将console.error()管道传输到另一个程序。But then, you’re not really going to debug in production, are you?但是,你不会真的在生产中调试,是吗?

For app activity用于应用程序活动

If you’re logging app activity (for example, tracking traffic or API calls), instead of using console.log(), use a logging library like Winstonor Bunyan. 如果要记录应用程序活动(例如,跟踪流量或API调用),请使用Winstonor Bunyan之类的日志库,而不是使用console.log()For a detailed comparison of these two libraries, see the StrongLoop blog post Comparing Winston and Bunyan Node.js Logging.有关这两个库的详细比较,请参阅StrongLoop博客中比较Winston和Bunyan Node.js日志的文章。

Handle exceptions properly正确处理异常

Node apps crash when they encounter an uncaught exception. 节点应用程序遇到未捕获的异常时崩溃。Not handling exceptions and taking appropriate actions will make your Express app crash and go offline. 不处理异常并采取适当的操作将使您的Express应用程序崩溃并脱机。If you follow the advice in Ensure your app automatically restartsbelow, then your app will recover from a crash. 如果您遵循下面“确保应用程序自动重新启动”中的建议,则您的应用程序将从崩溃中恢复。Fortunately, Express apps typically have a short startup time. Nevertheless, you want to avoid crashing in the first place, and to do that, you need to handle exceptions properly.幸运的是,Express应用的启动时间通常很短。然而,您首先要避免崩溃,要做到这一点,您需要正确处理异常。

To ensure you handle all exceptions, use the following techniques:要确保处理所有异常,请使用以下技术:

Before diving into these topics, you should have a basic understanding of Node/Express error handling: using error-first callbacks, and propagating errors in middleware. 在深入研究这些主题之前,您应该对Node/Express错误处理有一个基本的了解:使用错误优先回调,以及在中间件中传播错误。Node uses an “error-first callback” convention for returning errors from asynchronous functions, where the first parameter to the callback function is the error object, followed by result data in succeeding parameters. 节点使用“错误优先回调”约定从异步函数返回错误,其中回调函数的第一个参数是错误对象,后面是后续参数中的结果数据。To indicate no error, pass null as the first parameter. 若要指示无错误,请将null作为第一个参数传递。The callback function must correspondingly follow the error-first callback convention to meaningfully handle the error. 回调函数必须相应地遵循错误优先回调约定,才能有意义地处理错误。And in Express, the best practice is to use the next() function to propagate errors through the middleware chain.在Express中,最佳实践是使用next()函数通过中间件连缀传播错误。

For more on the fundamentals of error handling, see:有关错误处理基本原理的更多信息,请参阅:

What not to do有什么不可以做的

One thing you should not do is to listen for the uncaughtException event, emitted when an exception bubbles all the way back to the event loop. 您不应该做的一件事是侦听uncaughtException事件,该事件是在异常冒泡一直返回到事件循环时发出的。Adding an event listener for uncaughtException will change the default behavior of the process that is encountering an exception; the process will continue to run despite the exception. uncaughtException添加事件侦听器将更改遇到异常的进程的默认行为;尽管出现异常,该进程仍将继续运行。This might sound like a good way of preventing your app from crashing, but continuing to run the app after an uncaught exception is a dangerous practice and is not recommended, because the state of the process becomes unreliable and unpredictable.这听起来可能是防止应用程序崩溃的好方法,但在发生未捕获的异常后继续运行应用程序是一种危险的做法,不建议这样做,因为进程的状态变得不可靠和不可预测。

Additionally, using uncaughtException is officially recognized as crude. 此外,使用uncaughtException被正式认定为crudeSo listening for uncaughtException is just a bad idea. 因此,监听uncaughtException只是一个坏主意。This is why we recommend things like multiple processes and supervisors: crashing and restarting is often the most reliable way to recover from an error.这就是为什么我们推荐多进程和管理器:崩溃和重新启动通常是从错误中恢复的最可靠的方法。

We also don’t recommend using domains. 我们也不建议使用It generally doesn’t solve the problem and is a deprecated module.它通常不能解决问题,是一个不推荐使用的模块。

Use try-catch

Try-catch is a JavaScript language construct that you can use to catch exceptions in synchronous code. Try-catch是一种JavaScript语言构造,可用于捕获同步代码中的异常。Use try-catch, for example, to handle JSON parsing errors as shown below.例如,使用try-catch处理JSON解析错误,如下所示。

Use a tool such as JSHintor JSLintto help you find implicit exceptions like reference errors on undefined variables.使用诸如JSHintjslint之类的工具来帮助您查找隐式异常,如未定义变量上的引用错误。

Here is an example of using try-catch to handle a potential process-crashing exception. 下面是一个使用try-catch处理潜在进程崩溃异常的示例。This middleware function accepts a query field parameter named “params” that is a JSON object.此中间件函数接受名为“params”的查询字段参数,该参数是JSON对象。

app.get('/search', (req, res) => {
  // Simulating async operation
  setImmediate(() => {
    var jsonStr = req.query.params
    try {
      var jsonObj = JSON.parse(jsonStr)
      res.send('Success')
    } catch (e) {
      res.status(400).send('Invalid JSON string')
    }
  })
})

However, try-catch works only for synchronous code. 但是,try-catch仅适用于同步代码。Because the Node platform is primarily asynchronous (particularly in a production environment), try-catch won’t catch a lot of exceptions.由于节点平台主要是异步的(特别是在生产环境中),try-catch不会捕获很多异常。

Use promises使用承诺

Promises will handle any exceptions (both explicit and implicit) in asynchronous code blocks that use then(). Promissions将处理使用then()的异步代码块中的任何异常(显式和隐式)。Just add .catch(next) to the end of promise chains. For example:只需将.catch(next)添加到承诺连缀的末尾。例如:

app.get('/', (req, res, next) => {
  // do some sync stuff
  queryDb()
    .then((data) => makeCsv(data)) // handle data
    .then((csv) => { /* handle csv */ })
    .catch(next)
})

app.use((err, req, res, next) => {
  // handle error
})

Now all errors asynchronous and synchronous get propagated to the error middleware.现在,所有异步和同步错误都会传播到错误中间件。

However, there are two caveats:但是,有两个警告:

  1. All your asynchronous code must return promises (except emitters). 所有异步代码都必须返回承诺(发射器除外)。If a particular library does not return promises, convert the base object by using a helper function like Bluebird.promisifyAll().如果某个特定库没有返回承诺,请使用诸如Bluebird.promisifyAll()之类的辅助函数转换基本对象。
  2. Event emitters (like streams) can still cause uncaught exceptions. 事件发射器(如流)仍可能导致未捕获的异常。So make sure you are handling the error event properly; for example:因此,请确保正确处理错误事件;例如:
const wrap = fn => (...args) => fn(...args).catch(args[2])

app.get('/', wrap(async (req, res, next) => {
  const company = await getCompanyById(req.query.id)
  const stream = getLogoStreamById(company.id)
  stream.on('error', next).pipe(res)
}))

The wrap() function is a wrapper that catches rejected promises and calls next() with the error as the first argument. wrap()函数是一个包装器,它捕获被拒绝的承诺并调用next(),并将错误作为第一个参数。For details, see Asynchronous Error Handling in Express with Promises, Generators and ES7.有关详细信息,请参阅Express with Promises、Generator和ES7中的异步错误处理

For more information about error-handling by using promises, see Promises in Node.js with Q – An Alternative to Callbacks.有关使用承诺进行错误处理的更多信息,请参阅Node.js中的承诺以及回调的替代方案

Things to do in your environment / setup

Here are some things you can do in your system environment to improve your app’s performance:以下是您可以在系统环境中执行的一些操作,以提高应用程序的性能:

Set NODE_ENV to “production”

The NODE_ENV environment variable specifies the environment in which an application is running (usually, development or production). One of the simplest things you can do to improve performance is to set NODE_ENV to “production.”NODE_ENV环境变量指定应用程序运行的环境(通常是开发或生产环境)。提高性能最简单的方法之一是将NODE_ENV设置为“production”

Setting NODE_ENV to “production” makes Express:将NODE_ENV设置为“production”可使Express:

Tests indicate that just doing this can improve app performance by a factor of three!测试表明,这样做可以将应用程序性能提高三倍!

If you need to write environment-specific code, you can check the value of NODE_ENV with process.env.NODE_ENV. 如果需要编写特定于环境的代码,可以使用process.ENV.NODE_ENV检查NODE_ENV的值。Be aware that checking the value of any environment variable incurs a performance penalty, and so should be done sparingly.请注意,检查任何环境变量的值都会导致性能损失,因此应谨慎进行。

In development, you typically set environment variables in your interactive shell, for example by using export or your .bash_profile file. 在开发中,您通常在交互式shell中设置环境变量,例如使用export.bash_profile文件。But in general you shouldn’t do that on a production server; instead, use your OS’s init system (systemd or Upstart). 但一般来说,您不应该在生产服务器上这样做;相反,使用操作系统的init系统(systemd或Upstart)。The next section provides more details about using your init system in general, but setting NODE_ENV is so important for performance (and easy to do), that it’s highlighted here.下一节将提供有关一般使用init系统的更多详细信息,但是设置NODE_ENV对于性能非常重要(而且很容易做到),因此在这里重点介绍。

With Upstart, use the env keyword in your job file. 使用Upstart,在作业文件中使用env关键字。For example:例如:

# /etc/init/env.conf
 env NODE_ENV=production

For more information, see the Upstart Intro, Cookbook and Best Practices.有关更多信息,请参阅Upstart简介、食谱和最佳实践

With systemd, use the Environment directive in your unit file. For example:使用systemd,在单位文件中使用Environment指令。例如:

# /etc/systemd/system/myservice.service
Environment=NODE_ENV=production

For more information, see Using Environment Variables In systemd Units.有关详细信息,请参阅在systemd单位中使用环境变量

Ensure your app automatically restarts确保应用程序自动重新启动

In production, you don’t want your application to be offline, ever. This means you need to make sure it restarts both if the app crashes and if the server itself crashes. 在生产环境中,您永远不希望应用程序离线。这意味着您需要确保在应用程序崩溃和服务器本身崩溃时都重新启动。Although you hope that neither of those events occurs, realistically you must account for both eventualities by:虽然您希望这两种情况都不会发生,但实际上,您必须通过以下方式来说明这两种情况:

Node applications crash if they encounter an uncaught exception. 如果节点应用程序遇到未捕获的异常,则会崩溃。The foremost thing you need to do is to ensure your app is well-tested and handles all exceptions (see handle exceptions properlyfor details). 您需要做的最重要的事情是确保您的应用程序经过良好测试并处理所有异常(有关详细信息,请参阅正确处理异常l)。But as a fail-safe, put a mechanism in place to ensure that if and when your app crashes, it will automatically restart.但作为一种故障保护机制,请设置一种机制,确保在应用程序崩溃时自动重启。

Use a process manager使用流程管理器

In development, you started your app simply from the command line with node server.js or something similar. But doing this in production is a recipe for disaster. If the app crashes, it will be offline until you restart it. To ensure your app restarts if it crashes, use a process manager. A process manager is a “container” for applications that facilitates deployment, provides high availability, and enables you to manage the application at runtime.process manager是应用程序的“容器”,有助于部署、提供高可用性,并使您能够在运行时管理应用程序。

In addition to restarting your app when it crashes, a process manager can enable you to:除了在应用程序崩溃时重新启动应用程序外,process manager还可以让您:

The most popular process managers for Node are as follows:

For a feature-by-feature comparison of the three process managers, see http://strong-pm.io/compare/. For a more detailed introduction to all three, see Process managers for Express apps.

Using any of these process managers will suffice to keep your application up, even if it does crash from time to time.

However, StrongLoop PM has lots of features that specifically target production deployment. You can use it and the related StrongLoop tools to:

As explained below, when you install StrongLoop PM as an operating system service using your init system, it will automatically restart when the system restarts. Thus, it will keep your application processes and clusters alive forever.

Use an init system

The next layer of reliability is to ensure that your app restarts when the server restarts. Systems can still go down for a variety of reasons. To ensure that your app restarts if the server crashes, use the init system built into your OS. The two main init systems in use today are systemdand Upstart.

There are two ways to use init systems with your Express app:

Systemd

Systemd is a Linux system and service manager. Most major Linux distributions have adopted systemd as their default init system.

A systemd service configuration file is called a unit file, with a filename ending in .service. Here’s an example unit file to manage a Node app directly. Replace the values enclosed in <angle brackets> for your system and app:

[Unit]
Description=<Awesome Express App>

[Service]
Type=simple
ExecStart=/usr/local/bin/node </projects/myapp/index.js>
WorkingDirectory=</projects/myapp>

User=nobody
Group=nogroup

# Environment variables:
Environment=NODE_ENV=production

# Allow many incoming connections
LimitNOFILE=infinity

# Allow core dumps for debugging
LimitCORE=infinity

StandardInput=null
StandardOutput=syslog
StandardError=syslog
Restart=always

[Install]
WantedBy=multi-user.target

For more information on systemd, see the systemd reference (man page).

StrongLoop PM as a systemd service

You can easily install StrongLoop Process Manager as a systemd service. After you do, when the server restarts, it will automatically restart StrongLoop PM, which will then restart all the apps it is managing.

To install StrongLoop PM as a systemd service:

$ sudo sl-pm-install --systemd

Then start the service with:

$ sudo /usr/bin/systemctl start strong-pm

For more information, see Setting up a production host (StrongLoop documentation).

Upstart

Upstart is a system tool available on many Linux distributions for starting tasks and services during system startup, stopping them during shutdown, and supervising them. You can configure your Express app or process manager as a service and then Upstart will automatically restart it when it crashes.

An Upstart service is defined in a job configuration file (also called a “job”) with filename ending in .conf. The following example shows how to create a job called “myapp” for an app named “myapp” with the main file located at /projects/myapp/index.js.

Create a file named myapp.conf at /etc/init/ with the following content (replace the bold text with values for your system and app):

# When to start the process
start on runlevel [2345]

# When to stop the process
stop on runlevel [016]

# Increase file descriptor limit to be able to handle more requests
limit nofile 50000 50000

# Use production mode
env NODE_ENV=production

# Run as www-data
setuid www-data
setgid www-data

# Run from inside the app dir
chdir /projects/myapp

# The process to start
exec /usr/local/bin/node /projects/myapp/index.js

# Restart the process if it is down
respawn

# Limit restart attempt to 10 times within 10 seconds
respawn limit 10 10

NOTE: This script requires Upstart 1.4 or newer, supported on Ubuntu 12.04-14.10.

Since the job is configured to run when the system starts, your app will be started along with the operating system, and automatically restarted if the app crashes or the system goes down.

Apart from automatically restarting the app, Upstart enables you to use these commands:

For more information on Upstart, see Upstart Intro, Cookbook and Best Practises.

StrongLoop PM as an Upstart service

You can easily install StrongLoop Process Manager as an Upstart service. After you do, when the server restarts, it will automatically restart StrongLoop PM, which will then restart all the apps it is managing.

To install StrongLoop PM as an Upstart 1.4 service:

$ sudo sl-pm-install

Then run the service with:

$ sudo /sbin/initctl start strong-pm

NOTE: On systems that don’t support Upstart 1.4, the commands are slightly different. See Setting up a production host (StrongLoop documentation)for more information.

Run your app in a cluster

In a multi-core system, you can increase the performance of a Node app by many times by launching a cluster of processes. A cluster runs multiple instances of the app, ideally one instance on each CPU core, thereby distributing the load and tasks among the instances.

Balancing between application instances using the cluster API

IMPORTANT: Since the app instances run as separate processes, they do not share the same memory space. That is, objects are local to each instance of the app. Therefore, you cannot maintain state in the application code. However, you can use an in-memory datastore like Redisto store session-related data and state. This caveat applies to essentially all forms of horizontal scaling, whether clustering with multiple processes or multiple physical servers.

In clustered apps, worker processes can crash individually without affecting the rest of the processes. Apart from performance advantages, failure isolation is another reason to run a cluster of app processes. Whenever a worker process crashes, always make sure to log the event and spawn a new process using cluster.fork().

Using Node’s cluster module

Clustering is made possible with Node’s cluster module. This enables a master process to spawn worker processes and distribute incoming connections among the workers. However, rather than using this module directly, it’s far better to use one of the many tools out there that does it for you automatically; for example node-pmor cluster-service.

Using StrongLoop PM

If you deploy your application to StrongLoop Process Manager (PM), then you can take advantage of clustering without modifying your application code.

When StrongLoop Process Manager (PM) runs an application, it automatically runs it in a cluster with a number of workers equal to the number of CPU cores on the system. You can manually change the number of worker processes in the cluster using the slc command line tool without stopping the app.

For example, assuming you’ve deployed your app to prod.foo.com and StrongLoop PM is listening on port 8701 (the default), then to set the cluster size to eight using slc:

$ slc ctl -C http://prod.foo.com:8701 set-size my-app 8

For more information on clustering with StrongLoop PM, see Clusteringin StrongLoop documentation.

Using PM2

If you deploy your application with PM2, then you can take advantage of clustering without modifying your application code. You should ensure your application is statelessfirst, meaning no local data is stored in the process (such as sessions, websocket connections and the like).

When running an application with PM2, you can enable cluster mode to run it in a cluster with a number of instances of your choosing, such as the matching the number of available CPUs on the machine. You can manually change the number of processes in the cluster using the pm2 command line tool without stopping the app.

To enable cluster mode, start your application like so:

# Start 4 worker processes
$ pm2 start npm --name my-app -i 4 -- start
# Auto-detect number of available CPUs and start that many worker processes
$ pm2 start npm --name my-app -i max -- start

This can also be configured within a PM2 process file (ecosystem.config.js or similar) by setting exec_mode to cluster and instances to the number of workers to start.

Once running, the application can be scaled like so:

# Add 3 more workers
$ pm2 scale my-app +3
# Scale to a specific number of workers
$ pm2 scale my-app 2

For more information on clustering with PM2, see Cluster Modein the PM2 documentation.

Cache request results

Another strategy to improve the performance in production is to cache the result of requests, so that your app does not repeat the operation to serve the same request repeatedly.

Use a caching server like Varnishor Nginx(see also Nginx Caching) to greatly improve the speed and performance of your app.

Use a load balancer

No matter how optimized an app is, a single instance can handle only a limited amount of load and traffic. One way to scale an app is to run multiple instances of it and distribute the traffic via a load balancer. Setting up a load balancer can improve your app’s performance and speed, and enable it to scale more than is possible with a single instance.

A load balancer is usually a reverse proxy that orchestrates traffic to and from multiple application instances and servers. You can easily set up a load balancer for your app by using Nginxor HAProxy.

With load balancing, you might have to ensure that requests that are associated with a particular session ID connect to the process that originated them. This is known as session affinity, or sticky sessions, and may be addressed by the suggestion above to use a data store such as Redis for session data (depending on your application). For a discussion, see Using multiple nodes.

Use a reverse proxy

A reverse proxy sits in front of a web app and performs supporting operations on the requests, apart from directing requests to the app. It can handle error pages, compression, caching, serving files, and load balancing among other things.

Handing over tasks that do not require knowledge of application state to a reverse proxy frees up Express to perform specialized application tasks. For this reason, it is recommended to run Express behind a reverse proxy like Nginxor HAProxyin production.