Machine failure in manufacturing

Machine malfunctions that cause a machine shutdown and thus an interruption in the production process are well-known to anyone working in a manufacturing environment. The negative consequences of such disruptions are also well-known. It’s not just about the directly visible problems like reduced output, increased cycle time, scrap, material waste, and ultimately poorer machine utilization. Particularly the indirect effects, namely reduced delivery reliability, higher costs, and dissatisfied customers, are significant reasons to thoroughly address the issue of fault management.

Eine Andon Lampe zeigt eine Maschinenstörung an.

Fault management creates transparency

Without disruption management, valuable information about when a technical disruption occurred (early/late/day shift), on which part, item, or tool it happened, or during which manufacturing step it occurred, is lost. This information is usually only known to the employees working on the machine and is forgotten if not recorded. However, this information can help identify the underlying systematic cause of the disruption and derive measures to rectify it.

The first step: paper-based recording and subsequent analysis in Excel.

Many companies initially attempt to systematically capture disruptions, downtimes, and their respective reasons on paper. In subsequent steps, a systematic transfer of the recorded machine failures and downtimes, including potential error codes and reasons for the disruptions, is carried out in Excel. Ideally, this is done daily, by having an administrative staff transfer the values from paper to Excel for further analysis. The disruptions can be analyzed, for example, by absolute or relative frequency per cause, or by average duration. This analysis then allows for prioritizing measures to address the most severe and urgent problems.

Central advantages of this approach include the possibility of quick implementation, the ease of adjusting the type of recording via work instructions, and the ability to quickly obtain initial results. Excel also offers a certain level of flexibility, allowing a wider group of people to perform their own analyses with the data.

However, the disadvantages are numerous. Starting with data capture, it is common practice for the recording of technical disruptions to be overlooked in the heat of the moment. The machine and thus the process stops. In this situation, the primary imperative is to get the machine and process running again. Accordingly, the supervisor or maintenance department is informed, and an attempt is made to quickly isolate and solve the problem. There is often no time for detailed recording. If documentation of the malfunction occurs at all, the associated times, i.e., the start and end of the disruption, are unreliable, as they are roughly estimated afterward and not documented at the time of the disruption.

The temporal inaccuracy is tolerable to a certain extent, but it becomes a problem especially with systematically occurring micro disruptions, as they are often not recorded at all. Micro disruptions refer to brief process interruptions, e.g., of a few seconds, which may seem inconsequential on their own but, due to their frequency, can have significant impacts on the overall availability of the plant or machine.


Excursus: Impact of micro disruptions

To highlight the significance, let’s consider the following calculation example. Suppose there is an interruption of 15 seconds that occurs 10 times per hour, leading to a total disruption time of 2.5 minutes per operational hour. In a three-shift operation, these interruptions add up to 60 minutes per day, and in a single-shift operation, to 20 minutes. Over 365 assumed working days in three-shift operation, this amounts to an astonishing 25,900 minutes or 45.63 days; in single-shift operation, it amounts to 7,300 minutes or 15.21 days.

Thus, due to the seemingly “non-critical” 15-second interruptions, we lose 4.2% of our theoretical machine capacity. There is an urgent need for action here!

Reason code classes facilitate analysis

The classification of disruptions, i.e., the division into predefined “reasons”, significantly contributes to the success of data capture. When employees are left to describe disruptions in their own words in a paper-based recording system, the later analysis may face unstructured and, in the worst case, unanalyzable data.

For example, Employee 1 might record the reason for a disruption as “Machine disruption, tool contaminated,” and Employee 2 might note “Tool does not work.” Without consulting with the colleagues, it would be impossible for the person analyzing the data to determine whether the nature of the disruption was the same or different for both employees. Even if it could be clarified that the disruptions were identical, meaningful grouping, such as in Excel, would not be possible without additional effort due to the differing texts. Both effects lead to a decrease in the data’s reliability, making analysis almost impossible, and the project is often quickly abandoned, or the benefits are limited only to the most current documentation per shift. Meaningful long-term analysis is generally not possible due to poor data quality.

One measure to improve this situation could be to create forms that include predefined categories or reasons for disruptions that supplement the free text description. Colleagues would be instructed to select at least one category and to add further information about the disruption and its resolution. This approach has the significant advantage that later analysis can be clustered by the selected categories. Often, this is sufficient to identify the biggest and most pressing problems (e.g., inadequate tool cleaning) and initiate measures to address them.

It can be done differently: digital capture and analysis

Anyone looking to avoid the aforementioned problems and systematically manage disruptions without additional effort should opt for a digital solution. Below, we will explain several “levels of expansion” available for such implementations.

Capturing and analyzing disruptions using digital forms or "No-Code"/"Low-Code" Apps

Starting with digital disruption management can initially seem straightforward using form solutions such as Microsoft Forms/SharePoint, Google Forms, or no-code app builders. Simple proprietary database solutions like Microsoft Access or FileMaker Server are also popular. Data from these can then be analyzed via Excel or BI solutions such as PowerBI, Tableau, or similar tools.

The aforementioned capture solutions serve to create a standardized input interface through form fields, which can then be saved into a spreadsheet like Excel or directly into a relational database (e.g., MySQL) for subsequent analysis.

The advantage of such an approach is that it can be implemented in-house by technically skilled and qualified staff and works up to a certain point.

However, the limits of this approach are often quickly reached. These include issues with permission management (who has access to which data?) and linking with other data sources (for example, assigning to ERP order data). If there is no professionally trained personnel with a solid background in application development and databases, poorly integrated standalone solutions can arise that create a parallel master data structure. The problematic aspect is that the affected individuals often do not realize this due to a lack of expertise, and it only becomes apparent when certain requirements can no longer be met, and the system “stagnates”. It can continue to be used in its existing state, but “that’s it.” At the same time, the self-built “tools” have become so ingrained in the operational processes that simply “turning them off” is also not feasible. This is a very unfortunate, but unfortunately common situation.

Capturing and analyzing disruptions using dedicated MES software with process data collection

Currently, the most sustainable and future-proof alternative is a dedicated professional solution for Operational Data Collection and Machine Data Acquisition (MDA), ideally part of an integrated Manufacturing Execution System (MES). Such a solution typically extracts data directly from the machines through interfaces like OPC UA, and it enables manual input of additional data via terminals, which only the operator can provide to the system, such as actions taken to resolve issues.

Advantages of process data collection and MES

The key advantages include significantly better data quality compared to other approaches, as the data is extracted from the controls with second-by-second accuracy. Additionally, such software allows for the simple and straightforward setting of disruption cause categories, which can be continuously analyzed over any time horizon for frequency and duration to derive measures. The classic problem of not knowing whether measures implemented are effective is circumvented by continuously collecting the relevant data and metrics for a DMAIC process. This means it is possible to immediately understand changes in metrics and determine whether a measure has achieved the desired effect, such as a 25% reduction in incidents due to tool contamination.

Data capture is automated; disruptions are displayed on the shop floor layout; disruptions can be easily evaluated by frequency and duration.

Handling old machines

Even if direct data capture is not possible on individual machines, for example, because the machine dates back to the 1980s, data can still be collected. One option is retrofitting, which involves equipping the machine with sensors to capture operating conditions. In this case, just like with an existing interface, automatic data capture, or MDA (Machine Data Acquisition), is possible. If this option is technically unfeasible or not economically viable in extremely rare cases, then data capture can be limited to operational data capture. In this scenario, feedback is manually provided by the operator, but within a tightly controlled input process, with clearly defined categories to minimize input errors.

Unlocking additional potential

The primary advantage of being able to continuously and instantly analyze captured data is always present when using an MES with operational data collection and/or MDA. This capability provides powerful opportunities for internal continuous improvement processes beyond mere disruption management, such as displaying information on digital Andon boards and using the data in daily shopfloor meetings.

Have we piqued your interest?

Contact us today. We are happy to discuss possible options for disruption management in your manufacturing process.


No. The Selfbits Manufacturing Suite is best compared to an integrated PDA/MDA and PPS/MES solution. Its goal is to depict the process from “dock-to-dock” as a digital twin along the value stream in as much detail as possible, capturing all relevant data centrally in the process. Based on this, core value creation can be optimized, and future production can be better planned. Master data such as articles, materials, and machines are synchronized from the ERP through interfaces or transferred once. All accounting processes continue to take place in the ERP.

Yes. All common ERP systems (SAP, ProAlpha, ABAS, Sage, Navision, Infor) offer interfaces to extract or input data. We would be happy to discuss with you the possibilities for integration with your existing ERP.

Yes. We are happy to help with the integration. Just contact us.

The Selfbits Manufacturing Suite offers interfaces via REST and GraphQL to access data. This allows for in-house development of solutions and integration with other software. Data exports to Excel are also possible.

Yes and no. Our standard operating model involves hosting on the secure AWS cloud in Frankfurt am Main. If you strictly require an on-premise license, please contact us so we can find a solution together.

The data is centrally stored and processed in the AWS Cloud in Frankfurt am Main. A separate VPC (Virtual Private Cloud) is set up for each customer to ensure data security. Learn more about this in our Privacy and Data Security Whitepaper.

It’s complicated. To our own surprise, we have found in collaboration with our clients that barcodes are often much cheaper and easier to handle than solutions using NFC or RFID. The choice of “carrier technology” ultimately always depends on the conditions in production. If you use large containers and automated handling technology, the tendency leans towards RFID. If you use small containers and there are process steps with extreme external factors such as temperature or moisture, barcodes are often the better option. We would be happy to discuss the details with you personally.

Yes. As long as something is mechanically moving or electricity is flowing, we are able to capture at least rudimentary process signals.


Fault Management

Faults are either recorded automatically by the machine or manually by tablet and stored, including the cause of the fault. In the event of a breakdown, employees can be notified automatically. In the web interface, faults over time can be visualized in terms of duration and reason.


Machine data collection

With the help of flexible hardware solutions, we help you to continuously capture, store and provide relevant machine and sensor data for specific articles and orders.


Real-time capture of production data

Use tablets on machines or the flexibility of smartphones to record set-up times, machine downtime and reasons as well as good and bad parts flexibly in real time.