Cloud Error Reporting

Error Reporting Overview

Cloud Error Reporting (documentation) automatically groups errors depending on stack trace message patterns and shows the frequency of each error group. The error groups are generated automatically, based on stack traces.

On opening an error group report, operators can access to the exact line in the application code where the error occurred and reason about the cause by navigating to the line of the source code on Google Cloud Source Repository.

Using Error Reporting

You can access Error Reporting by selecting Error Reporting from the GCP navigation menu:

image

Note: Error Reporting can also let you know when new errors are received; see “Notifications for Error Reporting” for details.

To get started, select any open error by clicking on the error in the Error field:

image

The Error Details screen shows you when the error has been occurring in the timeline and provides the stack trace that was captured with the error. Scroll down to see samples of the error:

image

Click View Logs for one of the samples to see the log messages that match this particular error:

image

You can expand any of the messages that matches the filter to see the full stack trace:

image

Errors Manufacturing

There are several ways in which you can experiment with Error Reporting tool and manufacture errors that will be reported and displayed in the UI. For the purpose of this demonstration, we will use Cloud Operations Sandbox’s Load Generator and SRE Recipes features to simulate errors in the system.

To simulate requests using the load generator we can use the UI or the sandboxctl command line tool.

$sandboxctl loadgen step
Redeploying Loadgenerator...
Loadgenerator deployed using step pattern
Loadgenerator web UI: http://<ExampleIP>

Then to break the service we will use sre-recipes(recipe2)

$sandboxctl sre-recipes break recipe2
Breaking service operations...
...done

In this case you will see in Error Reporting UI you will see a new reported error Unhealthy pod, failed probe

image

You can open it to see additional information, in the below example you can see that this error repeats itself several times in the last hour.

image

You can also press View logs to view detailed log information in Cloud Operations Logging.

image

Note: at the end, don’t forget to recover the service using sandboxctl sre-recipes restore.

Another way to break the service is to use the load generator to overload the service with too many requests. In the Load Generator UI( addressed provided about or using sandboxctl describe), we will start run a test with 500 users.

Note: Currently only load test <100 users would be successful.

image

In the UI you will see that the previous error Unhealthy pod, failed probe, in addition you can see an additional error Container Downtime:

image

image

Last modified May 27, 2021: Updated review syntax changes. (6cae6d5d)