Version: 6.1.0

Serverless

NodeSource offers Serverless support to collect data during a request in serverless settings. This data aids in debugging latency and other issues, making it ideal for those keen on addressing such problems in a serverless context.

AWS Setup

For using AWS resources, you need to configure the AWS SDK credentials in your environment. Check the AWS documentation to do this. After following the guide, your AWS CLI should be set up correctly for smooth AWS interactions.

Make sure you have all AWS credentials envs configured in your machine.

Env
AWS_SECRET_ACCESS_KEY
AWS_DEFAULT_REGION
AWS_ACCESS_KEY_ID

Configuration

Using nsolid-serverless, you can create the necessary infra resources and configure your serverless application to send telemetry data to the N|Solid Console.

Install

> npm i -g @nodesource/nsolid-serverless

Creating basic infra to collect data

The first step is to create a SQS Queue where we will send the telemetry data.

> nsolid-serverless infra --install

Collecting telemetry data

We need to set up some environment variables and layers in your lambdas to send data to our SQS queue created in the previous step.

> nsolid-serverless functions --install

Help

> nsolid-serverless --help

Adding Integration

After all the functions are set up, you need to configure the N|Solid Console. Navigate to Settings > Integration as below in the green box.

On the left side, there is a section called Integrations:

Then scroll down and click on the New AWS SQS Integration button to add the integration. The following form will be displayed:

Fill in the form with the following information:

Queue Name: The name of the integration.
Queue URL: https://sqs.\{region}.amazonaws.com/\{account-id}/nsolid-serverless-metrics

Click on the Save Integration button to save the integration.

Monitoring Serverless via N|S Console UI

In the N|Solid Console, go to the applications dashboard and click on the Functions tab on the left side.

The dashboard for the functions connected will be displayed as below:

or as below if you have multiple functions:

FUNCTION DETAIL VIEW

Select a function from the list by clicking in the function's name to see the details of the function. The detail view will be displayed as below:

The detail view will display the following information:

ACCOUNT ID: The account ID of the function.
RUNTIME VERSION: The version of the Node.js runtime used by the function.
FUNCTION VERSION: The version of the function.
NSOLID layer version: the version of the N|Solid layer inside your AWS Lambda
REGIONS: The aws zone of the function.
ARCHITECTURE: The architecture of the function.
ESTIMATED COST: The estimated cost of the function.

Metrics

The metrics tab will display the metrics of the function. There you have 9 boxes that can show many metrics by clicking into metric's name.

The metrics will display the following information:

Telemetry(realtime) API Metrics

This is the list of metrics we will be generating that come from the Telemetry API. These metrics are real-time and associated to specific functions. Their corresponding aggregations are to be performed in the Telemetry Aggregator.

Duration: The total time taken for the function to execute from start to finish, measured in milliseconds.
initDuration: The time taken for the function to initialize, including any set up or warm-up processes, measured in milliseconds.
billedDuration: The time for which AWS bills the function's execution, measured in milliseconds. This may differ from actual execution time due to rounding or billing increments.
Invocations: The total number of times the function is called or triggered.
Max Memory Used: The peak memory usage by the function during its execution.
Memory Size: The allocated memory size for the function.
Errors: The total number of errors encountered during the function's execution.
Error Rate: The ratio of errors to the total number of invocations within the measured time interval.
Response Duration: The time taken for the function to produce a response, measured in milliseconds.
Response Latency: The time taken from the invocation of the function until the response is received, measured in milliseconds.
Produced Bytes: The total amount of data produced by the function, measured in bytes.
Timeouts: The number of times the function execution exceeded the allowed time limit and was terminated.
Estimated Cost: The estimated cost of executing the function based on its usage and AWS pricing.
OOM (Out of Memory): The number of times the function execution failed due to insufficient memory.
Counters Uptime: The total time the function has been running since it was last started.
Counters User: The CPU time spent in user mode during the function execution.
Counters System: The CPU time spent in system mode during the function execution.
Counters Duration: The total duration of the function execution in milliseconds.
Counters Idle: The percentage of time the function was idle, waiting for tasks or resources.
Counters Minor Page Faults: The number of minor page faults (page reclaims) triggered by the function.
Counters Major Page Faults: The number of major page faults (requiring disk access) triggered by the function.
Counters Swapped Out: The number of times the function's data was swapped out of memory to disk.
Counters FS Read: The amount of data read from the file system by the function.
Counters FS Write: The amount of data written to the file system by the function.
Counters IPC Sent: The number of inter-process communication messages sent by the function.
Counters IPC Received: The number of inter-process communication messages received by the function.
Counters Signals Count: The number of signals received by the function.
Counters Voluntary Context Switches: The number of times the function voluntarily yielded the CPU.
Counters Involuntary Context Switches: The number of times the function was forcibly switched out by the CPU scheduler.
Billed Duration: The total billed duration of the function execution, measured in milliseconds.
Post Runtime Duration: The duration of the post-execution phase of the function, measured in milliseconds.
Init Counters Bootstrap Complete: The time at which the function's bootstrap process completed.
Init Counters Environment: The environment in which the function is running (e.g., dev, staging, prod).
Init Counters Loop Start: The time at which the function's main execution loop started.
Init Counters Node Start: The time at which the function's node process started.
Init Counters Start Time: The exact time the function started executing.
Init Counters V8 Start: The start time of the V8 engine for the function.
Post Runtime Execution Duration: The duration of the post-execution phase, specifically the execution time, measured in milliseconds.

Cloudwatch Metrics

These metrics are extracted directly from the Cloudwatch API by the Lambda Metrics Forwarder and are not real-time as the ones from the Telemetry API. The list is:

Invocations: The total number of times the function is invoked.
Errors: The total number of errors encountered during the function's execution.
Dead Letter Errors: The number of errors when the function attempts to send a message to a dead letter queue.
Destination Delivery Failures: The number of times the function failed to deliver messages to a destination.
Throttles: The number of times the function execution was throttled due to exceeding concurrency limits.
Provisioned Concurrency Invocations: The number of times the function was invoked with provisioned concurrency.
Provisioned Concurrency Spillover Invocations: The number of times invocations spilled over beyond the provisioned concurrency limit.
Maximum Concurrent Executions: The peak number of concurrent executions of the function.
Maximum Provisioned Concurrent Executions: The peak number of concurrent executions using provisioned concurrency.
Maximum Unreserved Concurrent Executions: The peak number of concurrent executions without reserved concurrency.
Maximum Duration: The longest execution time of the function, measured in milliseconds.
Minimum Duration: The shortest execution time of the function, measured in milliseconds.
Duration P50: The median (50th percentile) execution time of the function, measured in milliseconds.
Duration P90: The 90th percentile execution time of the function, measured in milliseconds.
Maximum Post Runtime Extensions Duration: The longest duration of the post-execution phase, measured in milliseconds.
Minimum Post Runtime Extensions Duration: The shortest duration of the post-execution phase, measured in milliseconds.
Post Runtime Extensions Duration P50: The median (50th percentile) duration of the post-execution phase, measured in milliseconds.
Post Runtime Extensions Duration P90: The 90th percentile duration of the post-execution phase, measured in milliseconds.
Average Iterator Age: The average age of the iterator, indicating the time taken to process data streams.
Maximum Iterator Age: The longest time taken to process a data stream item.
Provisioned Concurrency Utilization: The percentage of provisioned concurrency utilized by the function.
Duration: The total duration of the function execution, measured in milliseconds.
Post Runtime Extensions Duration: The duration of the post-execution phase, measured in milliseconds.
Iterator Age: The age of the iterator, indicating the time taken to process data streams.

Loading OpenTelemetry Instrumentation Modules

Inside your lambda environment variables config, the NSOLID_INSTRUMENTATION environment variable is used to specify and load the opentelemetry instrumentation modules that you want to utilize within your application. To enable instrumentation for specific modules, follow these steps:

For HTTP requests using the http module, set the NSOLID_INSTRUMENTATION environment variable to http.
If you're also performing PostgreSQL queries using the pg module, include it in the NSOLID_INSTRUMENTATION environment variable like this: http,pg.
Make sure to list all the relevant instrumentation modules required for your application. This will enable tracing and monitoring for the specified modules, providing valuable insights into their performance and behavior.

Tracing

To enable the Tracing using N|Solid, set the lambda's env var NSOLID_TRACING_ENABLED=1.

By enabling this feature, you can troubleshoot HTTP, DNS and other network request problems that you might encounter using the N|Solid Console.

The view will be shown as below:

Tracing is consists of three key components below:

images/tracing.png

Timeline Graph: a timeline graph of Tracing data showing the density of the number of tracing spans.

Filter: a filter input area to filter the results by attributes of the span.

Spans(Results): a span is the building block of a trace and is a named, timed operation that represents a piece of the workflow in the distributed system. multiple spans are pieced together to create a trace.

Note: The default behavior only generates traces related to the lambda invocation. If you require tracing for additional operations, set the NSOLID_INSTRUMENTATION environment variable.

Available modules for tracing are as follows:

aws: AwsInstrumentation
dns: DnsInstrumentation
graphql: GraphQLInstrumentation
grpc: GrpcInstrumentation
http: HttpInstrumentation
ioredis: IORedisInstrumentation
mongodb: MongoDBInstrumentation
mysql: MySQLInstrumentation
net: NetInstrumentation
pg: PgInstrumentation
redis: RedisInstrumentation

Please ensure you have the necessary modules enabled to trace all the operations you require.

Timeline Graph

A timeline graph displays the density of the number of tracing spans. Below is the description of the color of a slot on the timeline graph:

Color	Description
green	everything is ok
yellow	maybe you should look at this
red	definitely you should look at this

Assume that a simple request was made to the “console” service to monitor traces:

As a result, the Console displays the whole “span” information.

Span

A span is the building block of a trace and is a named, timed operation that represents a piece of the workflow in the distributed system. Multiple spans are pieced together to create a trace.

Traces are often viewed as a tree of spans that reflects the time that each span started and completed. It also shows you the relationship between spans. A trace starts with a root span where the request starts. This root span can have one or more child spans, and each one of those child spans can have child spans.

Inspecting Span Details

To inspect the span details of a span, click on the title Service & Operation:

Below are the attributes of the span:

Attribute	Description
id	the id of the application
app	the name of application
hostname	the name of the host machine
tags	the tags of the application
span_attributes_http_method	the http method of span attributes
duration	the duration of the span
span_attributes_http_status_code	the http status code of the span attributes
span_attributes_http_status_text	the http status text of the span attributes
span_attributes_http_url	the http url of the span attributes
span_end	the end time of the span
span_id	the id of the span
span_name	the name of the span
span_parentId	the parent ID of the span
span_start	the start time of the span
span_status_code	the status code of the span
span_threadId	the thread ID of the span
span_traceId	the trace ID of the span
span_type	the type of the span
resourceSpans	an array of resource spans
attributes	an array of attributes
key	the key of the attribute
value	the value of the attribute
stringValue	the string value of the attribute
telemetry.sdk.language	the language of the telemetry SDK
telemetry.sdk.name	the name of the telemetry SDK
telemetry.sdk.version	the version of the telemetry SDK
cloud.provider	the cloud provider
cloud.platform	the cloud platform
cloud.region	the cloud region
faas.name	the name of the function as a service
faas.version	the version of the function as a service
process.pid	the process ID
process.executable.name	the name of the executable
process.command	the command
process.command_line	the command line
process.runtime.version	the version of the runtime
process.runtime.name	the name of the runtime
process.runtime.description	the description of the runtime
droppedAttributesCount	the number of dropped attributes
MessageAttributes	the message attributes
eventType	the event type
StringValue	the string value
DataType	the data type
instanceId	the instance ID
functionArn	the Amazon Resource Name (ARN) of the function

SBOM

The SBOM button will display the SBOM(Software Bill of Materials) of the function in JSON or PDF. The SBOM will display the following information:

Name: The name of the package.
Version: The version of the package.
Risk: The level of risk associated with using this package.
License: The license under which the package is distributed.
Author: The author of the package.
Number of CVEs / CWEs: The number of known Common Vulnerabilities and Exposures (CVEs) or Common Weakness Enumerations (CWEs) associated with the package.
Path: The path to the package within the application.

Changing Time Range

To change the time range, click the calendar icon above the graphs:

This will show a calendar from which users can select the time range. The timeline graph range is updated every 1 minute, with an option to change the date range every 1 minute. In summary, with NodeSource's Serverless monitoring, users can gain more insight into the performance of their serverless functions and quickly identify and debug any issues.

Uninstall

To unistall the resources created by the nsolid-serverless like sqs, lambda layer and lambda envs,use the command below:

> nsolid-serverless infra --uninstall

Serverless

AWS Setup​

Configuration​

Install​

Creating basic infra to collect data​

Collecting telemetry data​

Help​

Adding Integration​

Monitoring Serverless via N|S Console UI​

FUNCTION DETAIL VIEW​

Metrics​

Telemetry(realtime) API Metrics​

Cloudwatch Metrics​

Loading OpenTelemetry Instrumentation Modules​

Tracing​

Timeline Graph​

Span​

Inspecting Span Details​

SBOM​

Changing Time Range​

Uninstall​