Cloud Error Reporting
Error Reporting Overview
Cloud Error Reporting (documentation) automatically groups errors depending on stack trace message patterns and shows the frequency of each error group. The error groups are generated automatically, based on stack traces.
On opening an error group report, operators can access to the exact line in the application code where the error occurred and reason about the cause by navigating to the line of the source code on Google Cloud Source Repository.
Using Error Reporting
You can access Error Reporting by selecting Error Reporting from the GCP navigation menu:
Note: Error Reporting can also let you know when new errors are received; see “Notifications for Error Reporting” for details.
To get started, select any open error by clicking on the error in the Error field:
The Error Details screen shows you when the error has been occurring in the timeline and provides the stack trace that was captured with the error. Scroll down to see samples of the error:
Click View Logs for one of the samples to see the log messages that match this particular error:
You can expand any of the messages that matches the filter to see the full stack trace:
Errors Manufacturing
There are several ways in which you can experiment with Error Reporting tool and manufacture errors that will be reported and displayed in the UI. For the purpose of this demonstration, we will use Cloud Operations Sandbox’s Load Generator and SRE Recipes features to simulate errors in the system.
To simulate requests using the load generator we can use the UI or the sandboxctl
command line tool.
$sandboxctl loadgen step
Redeploying Loadgenerator...
Loadgenerator deployed using step pattern
Loadgenerator web UI: http://<ExampleIP>
Then to break
the service we will use sre-recipes(recipe2)
$sandboxctl sre-recipes break recipe2
Breaking service operations...
...done
In this case you will see in Error Reporting UI you will see a new reported error Unhealthy pod, failed probe
You can open it to see additional information, in the below example you can see that this error repeats itself several times in the last hour.
You can also press View logs
to view detailed log information in Cloud Operations Logging.
Note: at the end, don’t forget to recover the service using
sandboxctl sre-recipes restore
.
Another way to break the service
is to use the load generator to overload the service with too many requests.
In the Load Generator UI( addressed provided about or using sandboxctl describe
), we will start run a test with 500
users.
Note: Currently only load test <100 users would be successful.
In the UI you will see that the previous error Unhealthy pod, failed probe
, in addition you can see an additional error Container Downtime
: