An Overview of Incident Investigation and Reporting
This three hour online course provides a basic understanding of Incident Investigation
This course includes a multiple choice quiz at the end.
At the conclusion of this course, the student will:
About the Author
Robert B.Coulter, PE, is a provider of safety, process engineering and environmental consulting services including inhouse training on safety and environmental topics. For more information, visit his website at www.rbcoulter.com.
This course provides the user with an overview of incident investigation. The motivation for this course is to outline a procedure of incident investigation that is objective and satisifies most regulatory requirements. Many incident investigations and reports are done with prejudice. Conclusions and recommendations are often predetermined to suit the goals of the investigators. This could occur intentionally or unintentionally. In this course an effort is made to present a means to minimize the bias that can enter an investigation. Also, an incident investigation procedure/format that is in general compliance with OSHA/EPA regulations is difficult to find. One can usually obtain from OSHA or EPA an outline for an investigation procedure. These usually do not have much practical value in that they are usually for a specific regulation or simply outline the procedure that OSHA or EPA agents should use for their own investigations. Most facilities must comply with multiple investigation/reporting requirements as listed in various OSHA and EPA regulations. Here is sample listing of regulations requiring incident reports and/or investigations - Sample Listing of Regulations Requiring Incident Reports. This course outlines a general procedure designed to comply with most OSHA and EPA regulations. It is not possible to ensure compliance with ALL regulations; however, by following the guidelines in this course one should be able to minimize repeat investigations and reports.
Incidents and/or accidents can range from very minor (like a "near-miss") to catastrophic. Although the elements in this course can be applied to almost all types of incidents, we will focus on those types of incidents that cause or have the potential to cause major injury, damage, or impact on the public. We will also be more focused on those types of incidents associated with processes at fixed facilities.
Often a particular regulation will require that a report be made or submitted as the result of an incident without mentioning an investigation; however, in order to make a report one must carryout some form of an investigation to gather facts. Because of this we will assume that any request or need to make an incident report implies the need for an investigation.
The following are definitions to important words/phrases used in this course:
It is always best to be prepared before an incident happens by having a written procedure that outlines the method of incident investigation and reporting (IIR). This plan should address how a facility will investigate and report all incidents ranging from near-misses to catastrophic events. One way of doing this is to list and define the types of incidents that could occur and then address the IIR procedure for each. A typical listing of the types of incidents that could occur at a facility are as follows:
Note above how incidents are categorized. Divisions are usually chosen based on regulatory thresholds. For example, a small release of a chemical below its RQ (reportable quantity) will not require the same reporting burden as a release above the RQ. The IIR plan may specify a relative simply procedure for investigating and reporting a small release that only involves the immediate onsite personnel. A large release above the RQ will most likely need more detailed investigation, involving additional personnel, to support the reporting requirement. Similar reasoning can be applied to the delineation of non-recordable versus recordable accidents.
The IIR plan should address the manner that each type of incident is investigated and reported. Many of these points will be discussed in more detail below. The essential elements are WHO and HOW the incidents are investigated and reported. It is recommended that sample investigation forms be attached to the IIR plan so that the task of doing an investigation is clearly outlined. A sample form that one could use for a chemical release is shown here. Other forms, for other types of incidents, would be similar in structure but may differ based on the varying reporting requirements.
For each type of incident listed in an IIR plan there should be a full description of all verbal and written reporting requirements. This should include internal (within the company) and external (outside the company) reports. For example, the phone numbers of relevant government agencies, that require reports (verbal and written), should be listed. Typically this would include the national, state, and local emergency response agencies (NRC, SERC, & LEPC). A facility's environmental permits (e.g. air and water) may impose additional notification requirements which should also be listed in the IIR plan. The titles (or names) of those employees that should make the reports should be listed.
An IIR plan should include a description of how recommendations from incident reports will be resolved. It is highly recommended that a "recommendation resolution system" (or RRS) be implemented to track the completion of all recommendations. For a description of this system , see below.
Initial Investigation (Phase I):
Investigation should begin almost immediately with the occurrence of an incident. Phase I is that portion of the investigation that begins within a few minutes after the incident begins to approximately one hour later or until the end of the incident. The main goal of initial investigation (Phase I) is to assist emergency response efforts by quickly determining key facts about the incident and reporting to the appropriate agencies. In most cases (e.g. release above an RQ), several initial verbal reports are required to be made to certain emergency government agencies (within 15 to 60 minutes depending on the jurisdiction). (Serious accidents may require a verbal report to OSHA within a few hours.)
The initial information collected in Phase I should focus on gathering the following information (listed in order of priority):
Generally, the first two items, are needed to make reports to emergency management agencies. In most jurisdictions, these are the LERC (Local Emergency Response Center), SERC (State Emergency Response Center) and the NRC (National Response Center). The second two items are needed for reports to OSHA or state safety agencies. The phone numbers of these agencies should be listed in your written IIR plan (see above).
At times it may not be possible to obtain the above information. If there is a possibility that the public could be harmed by an incident, one should notify the emergency management agencies (or other agencies applicable to the jurisdiction) and report any information that is available. It is important to stress that these initial verbal reports should be made as soon as possible. It is preferred to make these reports soon with minimal information, and then follow up later when more information is available.
Remember to retain all information gathered during the initial phase of an incident. This should also include the times that the agencies were notified and the report numbers (provided by the agencies).
Because phase I must be executed promptly, it is recommended that the IIR plan appoint someone like the shift leader, team leader, etc. to carry it out.
Initial Investigation (Phase II):
As soon as the scene of an incident is safe to approach and all emergency operations completed, phase II of the initial investigation should be done. The purpose of this phase of the investigation is to obtain "evaporative" information that may not be available later. The clues to the causes of many incidents can "disappear". For example, residual frosting on process equipment can indicate the location of a present or recent gaseous leak. A prompt investigation at this point may resolve the cause(s) of an incident with a significant reduction of resources later in the investigation and a much less chance of having the incident reoccur.
The phase II initial investigation team should consist of multiple onsite personnel and be pre-specified in the IIR plan. Those involved in the incident (including contract employees) should be members of this team.
The scene should be isolated to prevent unnecessary persons from entering the area. This could be done with yellow ribbon. Also, precautions should be taken consistent with the facility's Hot Work and other safety procedures, before entering the scene. The scene of an accident can have unexpected hazards even after it is considered safe. In many cases it will be a good idea to have a flammable gas meter to ensure that the atmosphere at the scene is not an explosion/fire hazard. Other types of detectors (e.g. toxic gas detectors) may be needed as the circumstances dictate. Personal protective equipment (PPE) should be used in accordance with the facility's safety rules.
If it can be safely done, photographs of the incident scene should be made. Focus on those items that are likely to be gone or changed when the process is restarted. Make notes of any other pertinent observations. In some cases, it may be a good idea to collect samples for immediate or later testing. An infra-red temperature detector may also be a useful tool for determining residual temperatures (if applicable to the incident).
Often a central process control computer has collected data during an incident. Sometimes this data could be lost within a short period depending on how the cache/archiving program is setup. If this is the case, print this data or save it to a file before the cache expires.
During this phase on the investigation, the first interviews of witnesses should be done. Memories can fade quickly so it is important not to delay this too long. Focus on collecting facts and observations concerning the incident. Avoid questions that tend to point blame. Some witnesses may be more comfortable writing their observations themselves.
Once all the information has been gathered from the initial investigation it should be forwarded to a central location or person in accordance with the written IIR plan. If a minor incident, the IIR plan may specify to conclude the investigation at this point. Usually this is done by the completion and distribution of a specified form (per the IIR).
Main Investigation and Cause Analysis:
Starting typically the next working day after an incident, the main investigation should begin. The goal of the main investigation is to establish the information needed to make the first written report within the next 5 to 10 days (typical deadline set by most regulations). This information should be produced by a documented (per IIR plan), predetermined (prior to incident) objective method. The information generated is generally a description of the incident, the cause(s), and recommendations to prevent reoccurrence. Other information may be required as specified by the regulations applicable to the facility and/or process (See Table I for a sample listing of various requirements that may be applicable.)
Composition of Main Investigation Committee
The main investigation committee is generally composed of local personnel. Whereas the post-incident investigation was composed of the personnel onsite when the incident occurred, the main investigation committee should be composed of different members (if possible) to avoid bias. For example, it would be preferred to select an operator(s), familiar with the process involved in the incident, who were not on the shift that the incident occurred. Avoid selecting those persons that may not be able to investigate the incident objectively. Exclude the designers of the process, those involved in marketing or selling the products from the process, and all supervision, including management, of the operators of the process. Where it is not possible to find people who will evaluate the incident objectively, serious consideration must be given to utilizing outside personnel. Again, the method used to create this committee, and the protocols and techniques this committee employs, should be specified in the IIR plan. Avoid creating an "ad hoc" investigation committee that is outside the rules of the IIR plan. Any committee created in such fashion is likely to be biased in favor of the person(s) creating the committee.
The committee should appoint a leader and a scribe (or secretary). The leader will be responsible for assigning tasks to complete the goals of the investigation. This person should be familiar with the written IIR plan. The scribe will be responsible for documenting the committee's work which will be used later for the basis of the written investigation report.
Although the predetermined investigation method could state differently, it is usually best to establish a timeline of the incident first. The timeline is needed to establish cause and effect which is critical to many analysis techniques (see below). The timeline is basically a list of the events making up the incident listed in chronological order. The precise times and dates of these events should be indicated if known. Information gathered from the post-incident investigation will be used to establish the timeline. The committee will probably need to interview witnesses of the event to fill in any gaps in the timeline. It is important not to speculate about the events in the timeline. Do not list items that are in dispute or disagreement. The timeline will serve as a "gauge" for the rest of the investigation and needs to be accepted by consensus.
Cause Analysis Techniques
At this point the committee will most likely, depending on the IIR plan, start the cause analysis procedure. The purpose of this phase is to identify the causes of the incident. Note that the word "causes" is plural. This is because it usually takes multiple events or conditions to create an incident. Some of the more popular cause analysis methods are:
This method is based on the theory that causes can be directly correlated to "changes" or "deviations". Noted more for quality control investigations, it is probably the best and quickest approach when the cause(s) are unknown. A search of all possible changes that may be related to the incident is done. Afterwards, these changes are further analyzed to determine which ones are relevant to the incident. For example, a suggested "change" may be determined to be irrelevant because it had occurred many times in the past without incident. Others may be discarded because of occurring at a different place than the incident, etc. At some point a "manageable few" changes are left which become the prime suspects for the cause(s) of the incident.
The disadvantage is that some normal operating history or "comparison data" from the process or one similar to it is needed as a benchmark. This may not be possible in a newly designed unique process.
More detail on change analysis can be found here: - Change Analysis Link
Job Hazard Analysis (JHA)
This technique is more associated with analyzing worker's tasks in the preparation of operating procedures. In this sense it is more of a prevention method. However, this method can be used for incident investigations by simply doing a JHA on the process involved in the incident. The basic technique is to break the job down into its separate task and analyze each task for hazards. Because of its emphasis on the operator, this technique works best when the operator is directly involved in the incident. It may not be suitable for automated complex processes or where the operator is remotely involved.
More information on Job Hazard Analysis can be found here: - Job Hazard Analysis Link
Process Hazards Analysis (PHA)
This technique is similar to the JHA but is more appropriate for process incidents. Also, like the JHA, it is prevention technique that can be applied to incident investigation. Many facilities that must comply with OSHA's Process Safety Management (PSM) Standard have been doing PHAs regularly since 1992 (when OSHA issued the standard). EPA's Risk Management Plan regulation also requires PHAs for applicable facilities. Because of this, the PHA methods are better known than the other techniques.
Using a PHA method as a tool for incident investigation will work best if a PHA has already been done for the process involved in the incident. The committee/team can simply review this PHA for possible deficiencies. There is also the benefit that any needed revisions to the facility's PSM and/or RMP compliance plans will be more readily identified.
Note that "PHA" does not refer to a specific technique but is actually an umbrella term for various methods. In the PSM and RMP regulations, several PHA methods are listed. The methods most suitable for incident invesigation are listed here:
This method involves asking a series of "What-if" questions concerning the process. These questions can be generated by the committee/team or come from a standard list or "Checklist". The questions are meant to focus the committee/team on analyzing a particular aspect of the process. If the question leads to a possible hazard then a search is done for existing "controls" that eliminate or minimize the hazard. After listing the controls, a decision is made to determine if the existing controls are adequate. If the existing controls are not adequate, then a recommendation is generated.
More information can be found on What-if/Checklist by clicking this link (also describes other PHA methods). - What-if/Checklist Analysis Link
HAZOPS (Hazard and Operability Study)
This method is similar to "What-if/Checklist" but instead of using what-if questions, certain "key words" are used to focus the team's attention. Although the number of key words is relatively small, they can be "matrixed" into many ways to cover a wide variety of possible hazards. For example, the key word "low" and can be combined with the keywords "flow", "level", "pressure", "temperature", etc. The listing of hazards, controls, and recommendations is then similar to "What-if/Checklist".
More information can be found on HAZOPS by clicking this link. - HAZOPS Link
FMEA (Failure Mode and Effects Analysis)
This method analyzes the failure states of individual components in the process from bottom-up. If the failure of a component causes a hazard in the system or process then it is noted. The analysis of controls and recommendations is then similar to the above PHA methods.
More information can be found on FMEA by clicking this link. - FMEA Link
Fault Tree Analysis
Fault Tree Analysis (from top-down) is basically the opposite of FMEA (from bottom-up). A hazard event is chosen and then a logic diagram (or tree) is constructed indicating how the event can occur. The "branches" on the tree represent previous events or conditions required to produce the hazard event. The branches can be combined by an OR operator (meaning that either event needs to occur) or an AND operator (meaning that both events need to occur). Controls on the hazard are indicated by branches that represent their failure. The analysis then proceeds as in the other methods above.
More information can be found on Fault Tree Analysis by clicking this link.- Fault Tree Analysis Link
This is a broad category of cause mapping/diagraming techniques similar to Fault Tree Analysis. The difference is that these techniques are usually associated with quality control and solving other business problems. All these methods attempt to diagram the cause and effect relationship. They can be applied to incident investigation and are good methods when the cause(s) and are unknown. They have names like "cause mapping", "fishbone analysis", or "root cause analysis".
Information can be found on cause diagrams by clicking the following link. - Cause Mapping by ThinkReliability.com
The analysis technique chosen should be predetermined by the IIR plan or by the investigation committee prior to starting the analysis. Since the first written investigation needs to be written in five to ten days, there is not a lot of time to decide on the technique. Also, there will probably not be enough time to do multiple techniques. It is much preferred to do one technique well than to do multiple techniques poorly. Also, do not introduce new techniques AFTER the incident because this can bias the outcome of the investigation. Ensure that all analysis techniques that can be used are pre-listed in the IIR plan.
Regardless of the analysis technique used, the committee should focus on finding and listing ALL possible causes of the incident. Intentionally ignoring a possible cause is a serious error, which can bias the report greatly, and could prevent a key recommendation from being implemented. Later, the committee can decide which causes are relevant by Pareto Analysis or other methods.
When analyzing cause and effects, it is important to place priority on those effects that can cause injury to people and the environment. Often, analysis committees will tend to focus on the cause and effects leading up to the incident and tend to ignore the cause and effects leading to injury and environmental damage. Any lingering and/or residual effects of the incident must also be determined. Often incident reports will be required to be sent to environmental agencies. These reports will need to contain an assessment of environmental impact and any recommended corrective actions.
It is possible that the committee could disagree on the listing of causes and even disagree on an entire cause diagram. If the should happen it is recommended to list the alternative cause diagrams. Often, when the recommendations are decided, it is possible to implement corrective actions to prevent these alternative cause scenarios with minimal additional resources.
Investigation Conclusions, Recommendations, and Final Report:
At this point the committee should have agreed upon the cause(s) of the incident or at least a set of possible cause scenarios. These cause scenario(s) should lead the committee to the conclusions. The conclusions are concise statements of the cause(s) or possible cause(s) of the incident and their effects. The committee may decide to favor one or more conclusions in the final report. The reasons for favoring a set of conclusions should be given. The alternative conclusions should not be deleted from the final report and should be addressed by the recommendations.
In rare cases the committee may not have enough information to reach a meaningful conclusion about the incident within ten days. For example, a key forensic test to establish cause may take longer than ten days to complete. If this should happen, an interim report should be issued, before the deadline, stating that the cause(s) have not been determined yet. The committee should continue its work until the cause(s) are known and issue an updated report at that time.
Recommendations are suggested corrective actions to prevent a future occurrence of a similar incident. They should directly correspond to the cause(s) of the indicent. Specifically, the recommendations should attempt to remedy and/or mitigate the causes or effects of the incident as listed in the Conclusions (above). Recommendations that are vague, or that have only a peripheral relationship to the conclusions, should be avoided. For example, listing "increased training" or "review safety procedures" as sole recommendations are generally ineffective.
Once recommendations are generated they should be tracked and monitored to ensure completion. This monitoring of recommendations can be difficult because many recommendations are not implemented because of the failure of management systems. The reason for these failures may vary but are often caused by management structure changes and/or work assignment changes. For example, a recommendation may have assigned an engineer to redesign an overfill alarm on a storage tank. The engineer may be transferred to a different department before the task is complete. At this point the task may "fall through the cracks" and not get completed.
To avoid the failure of recommendation resolution, a facility of moderate to large size (more than 30 employees) should have a recommendation resolution system in place (RRS). The basic concept of an RRS is to centralize recommendations from all sources in a single document or file. A person with significant authority to allocate resources at a facility should be in charge of the RRS. A good choice is the facility manager. Usually each recommendation will be given a chronological tracking number as specified by the RRS specifications. This system can include recommendations generated from other sources such as audits, process hazards analyses, and employee suggestions. With an RRS in place, the incident investigation committee is relieved of the burden of ensuring the resolution of recommendations. The finished written incident investigation report should list the RRS tracking number of each recommendation. >
If an RRS is not available at the facility, then each recommendation should be assigned a target completion date and an individual should be appointed to ensure it is resolved. The committee should review outstanding recommendations, periodically, to ensure their resolution.
Incident Investigation Written Report and Distribution
Someone on the committee should be appointed to write the draft investigation report. For small reports (about 1-2 pages), the outline of the report can be as follows:
For longer and/or more complex reports, an outline as follows may be more suitable:
The "Summary" is brief description of the incident including the key causes and recommendations. "Analysis" is a description of the method used to determine the causes and effects pertaining to the particular incident. The other sections are self -explanatory.
Once a draft report has been completed, it is recommended that ALL committee members approve it. Once the final report has been written, all committee members should sign it.
Final reports should be distributed according to the facility IIR plan. Each facility should review it's environmental permits and other applicable regulations to determine the agencies that also require a copy of the final report.
Examples of Practical Application:
The following are practice examples:
There is a failure to note the times that emergency response agencies were called. This may indicate that the reports were not made. A failure to report to emergency response agencies can limit their ability to respond to the incident. It may also result in large fine(s) to the facility.
There is another problem with the report (look toward the bottom). What is it?
There is only one signature at the bottom. This may suggest that an investigation committee was not formed for the initial incident investigation. This would not comply with the IIR plan and may be out of compliance with the law if the incident occurred in a PSM process area.
After the initial incident investigation was complete in Example #1, it was decided that the shift leader at the time of the incident should lead the main investigation. Is this a good idea? Why or why not?
This is not a good idea and should be avoided, if possible. Remember that this individual already led the first investigation. This individual may have already decided on the causes(s) and recommendation(s) of the incident and may prejudice the main investigation.
Incident Investigation training need only be given to facility safety and environmental coordinators. Is this true?
Usually not. Investigations must start as soon as possible once the incident begins. Agencies will often require information about an incident within a few minutes. The facility's safety and/or environmental coordinators may not be available around-the-clock. At a minimum, it is recommended to train shift leaders (or equivalent positions that are always present at a facility) on incident investigation.
During the main investigation cause analysis, a member suggests using a new analysis method proposed by the corporate office. This analysis method is not listed in the IIR plan for the facility. Is this a good idea?
No. This is like changing the rules in the middle of a ball game. Changes in investigation procedures during an investigation can be biased. In the least, it gives the appearance of trying to change the outcome of an investigation. Any changes to investigation procedures should be done after an investigation has been concluded and documented in the written IIR plan.
During the main investigation committee session, the facility leader announces that she must approve the draft of the final investigation report before it is issued. The IIR plan does not require her signature and she cannot participate in the investigation (per the IIR plan). What should the committee do?
This is clearly a violation of the investigation protocol. The committee members should not sign any report altered by an outside party. This is particularly true of any licensed members of the committee who may be held to a code of ethics imposed by the state.
A committee decides to issue two reports, a short version to be sent to the environmental agencies and a more detailed version for internal distribution. Is this a good idea?
No. Not only does it give the appearance of hiding information, it is usually difficult to ensure that two reports are consistent with each other.
During the main investigation, the committee reached the conclusion that a gasket failed in the unloading hose. The committee's sole recommendation was to improve maintenance training. Is this an effective recommendation?
With such little detail concerning the training and lack of other related recommendations (e.g. consider a new gasket material) this appears to be a weak recommendation.
A committee issued a recommendation to test a new type of gasket in order to determine if the existing gasket type should be replaced. The committee had determined that the existing type of gasket was a likely cause of a chemical release incident. The task was assigned to an engineer and targeted for completion within a month. After three months, the safety committee leader asked the engineer if the evaluation was complete. The engineer responded that his priorities were changed and that he had not worked on it yet. What is the probable problem here and what can be done about it?
This is a typical symptom of recommendation resolution failure. A recommendation resolution system (RRS) should be considered. Although an RRS is not perfect, at least someone who is in control of needed resources (such as the engineer's time) will have responsibility of resolving recommendations.
Incident investigation and reporting is important to prevent future incidents and to comply with multiple EPA and OSHA regulations. A well documented incident investigation and reporting program is essential to be effective. All participants in the program should be trained. A system should be in place to resolve recommendations from incident investigations.
The following are sample forms and procedures that the student should review:
Sample Incident Investigation and Reporting Procedure
Sample Accident Investigation Form (blank)
Sample Incident Investigation Form (blank)
Sample Incident Investigation Form (completed)
Sample Final Incident Investigation Report