Problem Reporting Architecture Proposal
Introduction
Here is a strawman proposal for how to design the problem reporting system.
Conceptual Components
- System Logger
- Problem Detector
- Report Generator
- User Problem Notifier
- Reporting Mechanism
- Problem Reporting and Review
- Report Collection Server
System Logger
For the purposes of this design, the system logger collects and stores structured data for anomalous behavior of the system. It provides stable references to items within these data. Crash dumps, selinux/audit logs are included.
Problem Detector
The problem detector watches for specific types of logged information collected by the System Logger. If the problem is one of the types of trouble defined in Problem Reporting (namely Crash, Misbehavior, Misconfiguration, or Failure) then the Problem Detector will ask the Report Generator to create a new Pending Problem Report. The Detector should make a distinction between System Trouble and User Application trouble.
Report Generator
The Report Generator may gather supplementary details for a specific trouble condition and create a Pending Problem Report. This report should be stored in a non-volatile location as quickly as possible in order to capture the conditions close to the event.
User Problem Notifier
In Normal, Developer, or Managed modes the User Problem Notifier notifies the active user to:
- Apologize for disruption
- Attempt to restore the previous working state
- (except for Managed mode) Request that the user consent to reporting the problem
In Unattended mode the user is not notified.
The user should be notified as close to the time of the Problem as possible. In the case of trouble during boot, login, or catastrophic failures during the user's session or anywhere a Pending Problem can not be displayed, the User Problem Notifier should present the notification on the next login.
Reporting Mechanism
Reporting Mechanism is primarily responsible for delivering the Pending Report to the Collection Server. It must be able to operate without additional information in Managed or Unattended modes.
It may, however, be configured to deliver reports to alternate locations including:
- scp
- ftp
- NFS-mounted shared) storage
Problem Reporting and Review
In Normal or Developer modes the user may be asked to kindly submit the problem report in order to improve their system. In the user's mind the tool for this is the same as the application that can be used to review past Reports whether they were submitted or not. This tool is primarily an application with a Submit Report workflow.
The user experience goals include:
- Helping the user to understand what happened
- Apologizing to the user
- Helping the user get back to what she was doing
- Trying to make the user feel like she is in good hands
- Appeal to the user's selfish desire to make their system better
- Do not waste any more of the user's time than is necessary (they are here because we interrupted them)
- Act in a way that allows them to trust us with their information
- Explain the privacy implications of the report submission
Problem Report Collection Server
The Collection Server is where the Reporting Mechanism sends the report data. This server should: reporting server should:
- Allow anonymous crash report submissions
- Scrub sensitive user data from reports. Removing:
- User's real name
- usernames
- email addresses
- social security numbers
- phone numbers
- IP addresses
- Document titles
- user filenames (especially in $HOME)
- URLs
- Support filing reports in Bugzilla for developer review
- Avoid duplicate report filing
- Support linking crash reports to Bugzilla to allow Developer Mode direct access to bug reports
- Perform coredump analysis and backtrace generation
Implementation Details
- System Logger: systemd (git.freedesktop.org)
- Problem Detector: problemd (git.freedesktop.org)
- Report Generator: systemd/problemd?(git.freedesktop.org)
- User Problem Notifier: gnome-settings-daemon (git.gnome.org)
- Reporting Mechanism: ?
- Problem Reporting and Review: Oops! application (git.gnome.org)
- Report Collection Server: ?
System Logger
Need details of logging and cursors here.
Problem Detector
Need details of how we'll identify problem conditions.
Report Generator
What data do we need to include? Binary core? Logs? Metadata? Do we identify application vs system here or at detection time? How do we indicate whether the user has been notified about something yet?
User Problem Notifier
Need details of how gnome-settings-daemon will watch for events to notify about.
Reporting Mechanism
How do we send to the server? Do we need to use certificates to avoid man in the middle attacks?
Problem Reporting and Review
How do we handle third party applications? Firefox? Do they just opt out of the process and we just offer to restart the app?
Report Collection Server
We have to choose one, adapt one, or write one.
Discussion