Look around you. There are so many opportunities to apply common sense reliability principles and gain many untapped benefits. Watch this 8 minute video showing a simple Maintenance Task Analysis (MTA) example then think about the possibilities in your plant. Interestingly, this example was ‘stumbled’ across during an RCM2 / MTA facilitator training session. You will see the value of getting Maintenance and Operations together to apply common sense strategies, in this case saving the company $11,000 annually. It’s a great learning tool, too. Just imagine what you could achieve coupling RCM on high risk assets along with MTA, starting with the ‘lowest hanging fruit’ in your plant.
Archive for the ‘RCM’ Category
Denis Marshment of Asset Dynamics Asia, discusses how instrumentation should be treated in RCM Analyses…
Process industries are increasingly reliant on Instrumentation. Instrumentation used in process control maintains product quality, reduces operating costs and helps to maximize production output by delivering instantaneous and highly accurate readings for adjustment of process parameters. Instrumentation also plays a vital role in ensuring that plants remain safe and meet environment regulations.
When instrumentation fails, the results can be disastrous. The BP Texas City Refinery explosion that led to the death of 15 workers, injuring more than 170 others and costing BP billions in damages was in a large part caused by a failure of the instrument control and protection systems.
So is RCM the best place to analyze instrumentation assets to improve safety and reliability?
The uncertainty over this question has led some to remove the instrumentation assets from RCM analysis entirely and develop new methods such as IPF (Instrument Protected Functions). This may seem like a neat & tidy solution to the problem but it can also result in an RCM analysis detached from the process.
Our experience has been that instrumentation is as much a part of RCM analysis as rotating equipment and should most definitely be included in the analysis. The question remains then, how should instrumentation be treated in RCM2?
Let’s review the elements of the RCM2 process when applied to instruments… http://www.thealadonnetwork.com/PDFs/Instrumentation_RCM2_Denis_Marshment_ADA.pdf
Denis is a Director of Asset Dynamics Asia with over 14 years experience in mechanical engineering and management consulting and is a licensed practitioner of the Aladon RCM2 network. Prior to joining Asset Dynamics Asia, Denis was a principal consultant with Price Waterhouse Coopers where he led the Physical Asset Management Group in Asia. Denis has extensive experience in implementing reliability improvement initiatives in the oil & gas industry and has help large multinationals in Asia and the Middle East make the transition from reactive to proactive maintenance through the application of leading technologies and methods.
To identify when maintenance is required, we need to define failure. The traditional view was that as equipment gets older, it is more likely to fail. The old definition of failure is when the equipment breaks down and is no longer operational.
However, studies have shown that the majority of failures are not age related.
In the new definition of failure, all equipment entering service immediately starts to wear, whether installed as new or brought back to new through repair. Equipment will eventually reach a point where it fails to meet the operating requirement. This failure point is not necessarily predictable – it could happen early on or after years of use.
If the equipment has no capability at all, it is in a totally failed state or breakdown state. If there is some capability, but the equipment is not meeting the desired level of performance, it is said to be in a functionally failed or partially failed state.
By conducting inspections of equipment condition on a regular basis, you can track early signs or indicators of a partial or functional failure long before it breaks down. By finding indicators of failure, maintenance can be targeted more accurately. When you look for indicators of failure, this is called conducting a condition inspection.
Let’s use an example. We have a pump that is required to supply between 130 and 100 gallons of water to the process. If it supplies any less than 100 gallons, the process will not operate properly. In the past, we defined failure as the point when the pump broke and does not pump any water at all. But most failures do not occur instantly. To track potential failures, we use indicators (such as tolerances, or gauge readings or other visual physical signs that indicate equipment condition is deteriorating). Since the failure point is not necessarily related to age, indicators must be monitored on a regular basis. Let’s use a gauge reading as our indicator. The indicator reads that the pump is only pumping 105 gallons. Since this is the low end of what it is required to do, it is considered a potential failure or point P on the curve. If the deterioration is not corrected, it will continue until it is pumping less than 100 gallons of water. The pump is still working, but not at the desired performance level – it has a functional failure. This is today’s definition of failure, the point where the asset fails to perform its intended function.
The amount of time that elapses between the detection of a potential failure and its deterioration to functional failure is known as the PF interval. If you properly define inspection tasks, you are able to detect failure long before it occurs and perform the corrective maintenance work when it will least impact operations.
Remember that if the potential failure (P) is not detected, the equipment will continue to deteriorate until the point where it reaches functional failure (F). Once enough condition inspection data has been defined, you can calculate the PF Interval and plan maintenance activities.
The reason to change any process including manufacturing or maintenance is desire for improvement; to be better than your competition.
In today’s world, the main driving force for change is quality (of product, service, and work) and cost reduction. To improve quality or save money you need to introduce some sort of change; you need to modify, transform or completely move away from your current practices. We all know that in order to succeed in any change, the entire organization (from top to bottom) needs to understand the necessity for change and be dedicated and supportive in adopting it. The same applies within the Maintenance organization.
RCM is a positive change for the maintenance process –and inspection routes are one of the most important aspects to implement from RCM. Why?
If you are not observing equipment condition continuously –either with some sort of continuous monitoring system (Condition Based Maintenance) or with inspection routes, then your maintenance department will never move away from being reactive. It is virtually impossible to continuously monitor an entire system, or where possible, it can be very expensive. Many companies are now using a combination of both or using only inspection routes with Maintenance Management responsible for Inspector training and motivation, as well as route optimization. The reason for this is that the majority of RCM action plans call for human sense inspections that can only be achieved by inspections and routes.
Recording data during an inspection can be a painful process especially if you’re printing out routes and entering data manually into CMMS. But with use of handheld computers (which Ivara offers), the entire process is automated. An Inspector enters his inspection data into the handheld. He/she is immediately warned of any out of normal readings based on pre-set thresholds. When the route is done, all data is transferred wirelessly into Ivara EXP, providing Maintenance with a centralized view of all condition data from all sources. With this centralized view of essentially real-time data, Maintenance is empowered to make informed decisions, ensuring the right work send to the CMMS for completion.
With diligence to this proactive process, inspection routes catch equipment failures in early stages which results in saving money, time and effort. The whole game is about equipment uptime; elimination of unwanted long downtimes and costly repairs.
One of the common challenges faced when preparing to conduct RCM2 analyses is obtaining the commitment to dedicate resources for the duration of the analysis. Managers are always looking for ways to reduce the involvement their most valued resources will play in the RCM2 analysis. In today’s typically short staffed environment, the required resources are legitimately not available.
So how can we possibly conduct analyses on our most critical equipment when we can’t get the Subject Matter Experts to participate for the planned duration of the analysis? Is there any technique we can use to reduce the SME’s involvement? One possible answer to those questions is re-using RCM2 analysis content.
Admittedly re-using, or templating, RCM2 content has been a contentious issue in the past due to concerns that the content will be applied without the requisite consideration given to operating context. However, with the appropriate guidelines in place, valuable time can be saved in the analysis phase through the re-use of existing RCM2 analysis content. The reusable content includes: Operating Context, Functions, Functional Failures, Failure Modes, Failure Effects, and Recommended Actions. Re-using content does in no way absolve the RCM2 Facilitator from still following the RCM2 process as diligently as if the new analysis was being conducted from scratch. However, the steps are slightly different. The following steps assume the analyses have been created in EXP Enterprise or EXP Professional and that RCM2 analysis content will be copied into a new RCM2 analysis. At a high level the process is as follows:
Develop the Operating Context – The Operating Context from the source analysis will form the basis of the Operating Context for the new analysis, with the differences of the new system, if any, added to the new Operating Context.
Copy the re-useable content – RCM2 analyses will for the most part be copied to RCM2 analyses. This allows the leveraging of all the information including Functions, Functional Failures, Failure Modes, Failure Effects and Recommended Actions.
Review the copied analysis with the area SMEs – The analysis meetings will consist of a review of the Operating Context with the SMEs (same team members as a regular analysis), a review of the Functions and Functional Failures (adding or changing any due to system differences), and then a review of the FMs and FEs as though a technical audit were being conducted (and again modifying any information as required due to system differences). If the team deems that the new asset is too different from the old one, then the review must be moved to a full fledge RCM2 analysis.
Present the findings to Management – The management report and presentation will be based on that from the source analysis with the differences of the new system being accounted for. The management presentation will be a review of the differences, unless the copied analysis originated from an area that had a different management team, in which case the entire analysis will be reviewed with the management team.
By following these steps, experience has shown that SME involvement can be significantly reduced (up to 70%), while maintaining a high quality analysis.
What is the cost of Failure in Maintenance? Let’s take an example of a loss time injury. The number 1 cause of loss time injury is ‘slips, trips and falls’.
- Slips – were there oil/steam/water leaks?
- Trips – were there improperly stored hoses?
- Falls – was the person rushing through the job?
What is the cost of an environmental discharge?
- Cost of clean-up?
What is the cost of poor quality?
- Loss of customer,
What is the cost of poor productivity?
- Loss of work?
Now lets think about Program Costs
No matter which program we discuss there is a significant investment made by the organization, through, research, training, registration, external audits, materials, etc.
The problem isn’t the cost of the implementation, since it is either required or desired; it is the wasted effort, since the program becomes an administrative nightmare of paper audits, Corrective Action Reports, Meeting minutes and action plans, projects and meetings. It is estimated that as much as 60% of these programs are not sustained. As an example, lets take a simple 5S program.
As a Maintenance Supervisor or Manager, I have had to participate in a number of 5S initiatives, mostly, from my role, installing hangers, brackets and painting shadows on walls. This never bothered me, until; I would see only the shadows of shovels, never any shovels, just painted shadows. This program was a significant resource drain on my department, and the effort seems to have been wasted, so I started to audit the system, enter corrective actions to the department leaders, and suddenly, I started seeing shovels, not just shadows. The audit and follow-up created sustainability of the effort, but this was only 1 program (of many), and the audit effort was administrative heavy. Plus it was in addition to; H&S audits, Insurance audits, Quality Assurance audits, Fire System Audits, Maintenance Inspections, etc. Additionally, all of these programs had there own meetings, identification processes, action plans, minutes, corrective action reports, etc. most of which impacted the maintenance department, copying corrective action plans into work orders, reviewing audit reports (internal & external) as well as creating work orders to address the identified issues.
Just a note, I haven’t mentioned maintaining the assets yet.
All of the above, whether H&S, environmental, quality, productivity or maintenance all have some things in common, the Operating Context, Function, Performance Criteria, non-noncompliance to the performance criteria and corrective actions. ie. Why is the asset/system there and what do we need it to do?
Incorporating all aspects of Industrial performance into the Operating Context and Function Statement will assure us that we are covering all these separate performance criteria in one place at one time. A consolidated Function Statement may look like this;
- To pump a minimum of 80 gpm of 60% Phosphoric Acid at 100 psi, 3.86 mPa, and 60 degree C with 100% containment, in compliance with Safety and Environmental legislation.
A number of important points are ingrained in this statement;
Productivity – pump must produce a minimum flow of 80 GPM
- What will Operations and Maintenance need to do to achieve this?
- What audit processes are in place?
Quality – Product must be maintained at 3.86 mPa and 60 Degree C
- What audit processes are in place?
Safety – 100% containment & in compliance with regulatory standards, in this case, O.H.S.A
- What audit processes are in place?
Environmental – 100% Containment & in compliance with legislation
- What audit processes are in place?
Additionally, this statement provides us with other pearls of wisdom, the materials we will need to use and the level of diligence we will require in maintaining the system. If we are pumping Phosphoric acid, we will need Stainless Steel materials and special seals. Since we need to maintain zero leakage, we would likely decide on welded stainless steel tubing over a threaded pipe system. Additionally, it provides us some insight into what type of maintenance we will be performing, so if we do not allow leakage, we will be inspecting for cracks in the tube connections, wear on seals and assure that a full containment system is in place. Obviously, if the fluid was water, we would adjust our Function Statement, which in turn would adjust our level of compliance and diligence to H&S and the environmental concerns, however the production and quality requirements may remain the same.
We need to track all these potential failure points to provide a safe, environmentally-compliant, quality-driven, and cost-effective production process.
A valuable part of RCM is determining if a task is Worth Doing and Technically Feasible (see RCM2 Decision Diagram)
Consider Health and Safety regulator O.H.S.A with these points to ponder;
- “Obligation of the employer and supervisor…”
- “Appoint competent persons…”
- “Due diligence……”
- “Take every precaution reasonable…for the protection of the worker……”
- “Know of any actual or potential hazards……”
RCM2 answers with:
- Identify Operating Context – obligation
- Understand the Function – due diligence
- Identify all reasonably likely failure modes – due diligence
- Determine FM management strategy based on consequences – every precaution reasonable
- Identify FM management strategies which would include both skill and procedure
requirements – competent persons
- Identify Hidden failures – actual or potential hazards
Implement RCM2 on your assets with the greatest business risk to your organization from a safety, environmental, quality, productivity perspective, to prevent the consequences of noncompliance (to all organizations, including your own).
Implement FMEA on the balance, with a focus on the Operating Context and Function of the asset, and centralize the process of this work id into all of the audit processes.
As we all know the P-F interval is the only viable method to accurately determine an inspection interval for random failures. A good starting point is to use the half P-F as your inspection frequency as this will buy you at least half the P-F interval from which to take action. Unfortunately many of us stop there.
In order to identify the minimum work necessary to ensure that the equipment does what the user wants it to do in its present operating context we need to look to the Nett P-F. The Nett P-F is the minimum interval likely to elapse between the discovery of a potential failure and the occurrence of the functional failure. Let’s look at a couple of examples:
- Let say that the group determines that the P-F interval is 4 months. Actually they probably said 4 to 5 months and you conservatively used 4 months. The half P-F approach would dictate that you inspect every 2 months, or 6 times per year. If you need 1 month to plan, schedule and execute the work, including ordering in parts (the Nett P-F), practical wisdom says that you only need to inspect every 3 months as this will buy you the 1 month you need to take proactive action ((P-F) 4 months – (inspection) 3 months). In both cases the consequences of failure were mitigated but with the half P-F approach we have 6 inspections per year while with the Nett P-F approach we have 4 inspections per year; a 33% reduction in inspection costs.
- In another situation the group determines that the P-F interval is 8 weeks. The half P-F approach would dictate an inspection every 4 weeks. This means that we have at least 4 weeks between the discovery of the potential failure and the occurrence of the functional failure. This is a paper machine and it is subjected to a schedule shutdown every 5 weeks. Therefore, if the inspection is performed shortly after the last shutdown failure may occur before the next shutdown. Therefore, in answering the question: “Is the P-F interval long enough to be of any use?” The answer is no. What if we were to conduct an inspection every 3 weeks? The minimum interval likely to elapse between the discovery of a potential failure and the occurrence of the functional failure would therefore be 5 weeks; our shutdown frequency. The answer to the previous question would then become yes.
A final point: our objective is always to select a task interval that allows the plant to plan, schedule and execute the work following best in class practices. Unfortunately this cannot always be achieved. An acceptable alternative is to select an interval that allows the plant to take action in a control manner such as break-in work. We may not be able to fully mitigate the consequences of failure but it may be sufficient to justify the task.
Efficient process = safe, environmentally friendly, quality driven, cost effective.
If any one of these aspects of our industrial environment is performing poorly, we impact most if not all other areas. So we have the programs, OHSA, ISO 9000, 14001, 18000, RCM, Lean, etc. For example:
- Oil leaks are not only wasteful, but could impact environmental and safety.
- Air leaks are not only wasteful, but could impact quality and productivity.
- Steam leaks are not only wasteful, but could impact, safety, quality and productivity.
The trick is to prioritize, optimize and sustain the effort expended to correct current issues. This requires, on-going, on-condition inspections, monitoring the states of the components, and receiving an indicator or alarm that the function of the component is starting to fail or has just failed.
To know when a component has failed, we need to identify the Performance Criteria, or what we need out of the component/system. For example, if we decide that our performance criterion allows air leaks, we will inspect air hoses for leaks and that is what we will find. The cost of the air leak? We incur the lost $ due to the leak and the cost of the replacement hose and labor to replace. However, if we decide that our performance criterion is zero leaks, we will look for cracks/brittleness, bulging/blistering, outer cover wear, etc. Once these indicators are severe enough we will replace the hose, incurring only the cost of the hose and the labor, but avoiding the cost of the lost air.
This is the RCM2 approach to Reliability, but it is also the ISO14001 approach to environmental compliance as well as OHSAS18001 approach to Health and Safety. Each identifies the context in which the program exists, the purpose or Function, including performance requirements, the ways in which the system could fail, as well as what is to be done to prevent the consequences of that failure, what is to be done to assure that the consequences are managed appropriately and a method to assure that the system is sustained.
So call it what you will, Lean, ISO, QS, ect. just make sure that you don’t miss the opportunity to address all of these aspects at once. RCM2 (SAE JA1011/12), will assure that all aspects of performance are addressed in the same methodology and with the same resources.
So, how many Teams do you have?
… Safety Team, Quality Team, Continuous Improvement Team, Operational Improvement Team, Lean Team, Kaizen Team, TPM Team, 5S Team, 6 sigma Team, RCA Team, Maintenance Improvement Team…
And… how many meetings a week? What do the Teams work on? Process/content development or Process/content optimization/improvement? It has been said by many folk, that “RCM takes too many resources and too much time”. RCM to me is compliance to the SAE JA1011 standard.
Personally, I believe it would save time and resources, if it is well understood, which it is not. In our industrial maintenance lives, we end up attending an abundance of various team meetings, and Maintenance is quite likely to have a number of tasks to be executed as a result of the outputs of these team meetings. Likely, we will need to tie up a couple of resources for each of the teams weekly, whether to attend the meetings or deal with the outcome of the meetings.
For instance, let’s say that there was a quality issue, where foreign material entered the product causing a quality defect. The quality team would identify the issue, the RCA team would analyze the issue and the maintenance team would end up with a work order(s) to execute. Simple or complex there is likely 3 man-days of effort to identify, analyze and execute the corrective action for this 1 failure mode.
In an RCM2 analysis, a rookie practitioner should be capable of 5 failure modes per hour. So not only would the RCM2 analysis identify and recommend a preventative action, but would also identify and recommend actions for 4 additional failure modes, with the same
RCM Analysis Team. While the RCM2 team was identifying these 4 additional failure modes, they would be identifying; 1) is it technically feasible and 2) worth doing. Only tasks which are worth doing are acted upon in a RCM analysis, so the tasks are already lean when they come out of the analysis. Certainly there would be opportunity for improvement and optimization, which should be the focus of improvement teams, not identification/development of the issue.
Quality, in a RCM analysis, is part of the Operating context and Asset Function. The Operating context of a RCM2 analysis is significantly important to provide all information in regards to the scope and level the RCM analysis will deal with. An appropriate Function Statement would identify the Quality control parameters or quality performance criteria. “To produce 1000 widgets a hour, with a reject rate of < 1%.” One of the Functional Failures would then be “Unable to produce widgets with a reject rate <1%.” This would then lead the RCM Team to analyze how this could happen, and identify all the FM’s, FE’s and recommended actions to prevent or minimize the risk and consequences of these failure modes occurring in the first place. With one additional step, a corrective action or management strategy could also be developed so there are less surprises if the FM were to occur. And one additional bonus with a RCM analysis, it is done before the issue has occurred. We don’t need to write up the issue on a Quality issue form, and we don’t have to write a letter to the client who identified the quality reject, and we don’t have to attend a Quality Improvement Team meeting to communicate what will be done to prevent this one FM from occurring again, and finally, we don’t need to tie up resources reactively to address the one corrective action. It is estimated that it takes 6-10 times the resources to react than it does to prevent or predict the occurrence.
A diligent and concise RCM2 analysis will identify all FM’s which are reasonably likely to occur, as long as the performance criteria (want/need) are identified in the Operating Context and Function of the asset.
Most of the Safety Teams I have participated on, dealt with correcting issues which caused an injury or a near miss incident. A RCM2 analysis, will not only identify, what FM’s could cause a safety incident, but also identify any hidden FM’s which are not evident to the Operating Crew on it’s own. This is significant, hidden failures abound in industry and no other tool that I have seen leads the group to identify these hidden FM’s and provide a tool to assure, based on probability and historical failure rate, that the component/system will be inspected on a frequency which will minimize the risk
potential for the safety incident.
How much time did the last safety incident take? 8-10 hrs? 3-4 days?? Again the RCM analysis would identify ~ 5-8 FM’s per hour.
If an organization has a TPM environment, this would be identified in the Operating Context, and to that point, the RCM practitioner and the RCM Team would understand that tasks identified in the analysis, may be assigned to the operations group. No separate Team is required to identify the tasks which would be operational, but would be needed to identify training programs, and improvement of the activities and training plan for the individuals.
All of this and a maintenance program too, focused on preventing/ managing the consequences of a functional failure.
Too much time and too many resources? If you are looking to streamline your operation, look to identifying a means which will identify all potential FM’s, address all the performance criteria, and through the development of a proper function statement address all aspects of an assets performance and associated tasks to maintain a safe, environmentally, friendly, quality driven, leanly produced product, look to the diligence of a RCM2 analysis.