Manufacturing Downtime Cost Reduction with Predictive Maintenance

Manufacturers often have to deal with up to 800 hours of downtime annually. 30% facilities experienced unplanned incidents just within the first four months of 2013. On average an automotive manufacturer’s TDC is $22,000 per minute; that is $1.3M per month! With the advance of predictive analytics, TDC can easily be reduced however only 14% of the manufacturing industry is taking advantage of its big data, according to a recent survey from MESA.

Introduction

Predictive maintenance is realized through the application of sophisticated machine learning techniques to equipment condition data collected in real-time or near real-time. It is now the new standard for reducing cost, risk and lost production in manufacturing facilities. With accurate predictive maintenance tools, manufacturers can choose to do maintenance only when needed, avoid costly or dangerous unplanned downtime, or schedule repair and maintenance personnel & resources more efficiently [1] [2].

Interactive and easy-to-use, Arimo’s Predictive Apps allow manufacturers to quickly collect equipment condition data, perform sophisticated machine learning to predict needed maintenance events and operationalize the models.

Sign Up for the Arimo Newsletter

In this post, we demonstrate the application of Arimo’s Predictive Apps machine learning to a hard disk drive condition monitoring data set. Using best practices for feature engineering, up-sampling to deal with rare failure events, and Bayesian optimization to tune model hyperparameters we are able to predict failures with 100% accuracy with a low number of false failure predictions. We are also able to predict failures the day before the failure is likely to occur, providing enough time for critical decision making.

“Hard Drive Failure” Dataset

The hard drive failure data set used is a time series data set and obtained from [3]. This data set was collected in 2013 and 2014 and includes 17,673,915 events from 49,107 drives.

The columns in the data set are as follows:

  • Date: The date of the file in yyyy-mm-dd format.
  • Serial Number: The manufacturer-assigned serial number of the drive.
  • Model: The manufacturer-assigned model number of the drive.
  • Capacity: The drive capacity in bytes.
  • Failure: Contains a “0” if the drive is OK. Contains a “1” if this is the last day the drive was operational before failing.
  • SMART Stats: 80 columns of data, that are the Raw and Normalized values for 40 different SMART stats as reported by the given drive. Each value is the number reported by the drive.
Data Preprocessing

The hard drive failure data set has 85 columns which could be used in a model. However, most columns contain a large amount of “NA” or have only a single value for the whole column. For the first cut of the model we removed the columns that contain more than 30% NA and those that only contain single values. The remaining subset of columns includes:

  • Serial
  • Failure
  • Temperature
  • Datetime
  • Reallocated_sectors_count
  • Read_error_rate
  • Current_pending_sector_count
  • power_on_hours

Next, we compute an age column counting the number of days that a device is living.

To simplify modeling, we also decided to select the most common drive model, namely the Hitachi HDS5C3030ALA630, for analysis. This subset of hard drive failure data contains 2,625,514 events with only 52 failure events. This is a highly unbalanced dataset which may create challenges for us to develop an efficient predictive model for hard drive failure maintenance.

Other studies [4],[5] in the domain of hard drive failure prediction focus on predicting failures of the current day. We target our prediction to failures of the next day, in order to give the operator more time to organize response to a potential failure event.

To support predicting failure tomorrow, we have generated a new dependent column called tomorrow_failure for targeting our prediction. Then, for each column in the set of {temperature, age, reallocated_sectors_count, read_error_rate, current_pending_sector_count, power_on_hours}, we have generated a corresponding column with the values of 1 day before, denoted by yesterday_X, for example, yesterday_temperature, yesterday_power_on_hours, etc.

Finally, to address the issue of unbalanced failure data as presented above, we have used the technique of up-sampling to balance those classes in the training data.

dependent variable

tomorrow_failure

independent variables

[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
temperature
reallocated_sectors_count
read_error_rate
current_pending_sector_count
power_on_hours
age
yesterday_temperature
yesterday_reallocated_sectors_count
yesterday_read_error_rate
yesterday_current_pending_sector_count
yesterday_power_on_hours
yesterday_age

Table 1: Features used in predictive modeling.

Predictive modeling

In our use case, we’ve divided our preprocessed data set into 2 parts: an up-sampled training data set (Training DS) with events prior to Sep 1st 2014 and a test data set (Test DS) without up-sampling including events since Sep 1st 2014. The Training DS includes 42 failure events which are up-sampled and the Test DS includes 10 failure events.

Feature Selection

In order to select a suitable set of features (or equivalently variables) for learning, we compute the correlation matrix among the twelve features as shown in Table 2, and measure the importance of variables. The importance of variables is computed by using the arimo.variableImportance function of Arimo’s Predictive Apps, which measures the impact of variables on a predictive model.

Table 2 shows significant correlations among the original features. For visualization, we have drawn a heatmap to represent those numbers in Table 2. The heatmap in Figure 2 indicates that there are groups of correlated features such as (5, 6, 11, 12), (4, 10), (1, 7), etc. Among a group of strongly correlated features, only the most impact are selected for model learning using the arimo.variableImportance function. This function runs a 2-step process of (a) selecting one best feature for each group and (b) refining once more for the most important features.

Using this technique, we finally select temperature, reallocated_sectors_count, and current_pending_sector_count (corresponding to feature numbers 1, 2 and 4) for learning tomorrow_failure.

Feature

1

2

3

4

5

6

7

8

9

10

11

12

11.00-0.01-0.010.01-0.23-0.020.95-0.01-0.010.01-0.23-0.02
2-0.011.000.040.010.020.01-0.011.000.040.010.020.01
3-0.010.041.000.130.020.01-0.010.040.400.130.020.01
40.010.010.131.000.010.010.000.010.131.000.010.00
5-0.230.020.020.011.000.60-0.230.020.020.011.000.60
6-0.020.010.010.000.601.00-0.020.010.010.000.601.00
70.95-0.01-0.010.01-0.23-0.021.00-0.01-0.010.01-0.23-0.02
8-0.011.000.040.010.020.01-0.011.000.040.010.020.01
9-0.010.040.400.130.020.01-0.010.041.000.130.020.01
100.010.010.131.000.010.000.010.010.131.000.010.00
11-0.230.020.020.011.000.60-0.230.020.020.011.000.60
12-0.020.010.010.000.601.00-0.020.010.010.000.601.00

Table 2: Correlations between features. Features are denoted by numbers (see Table 1).

Heatmap that shows correlations between features

Figure 2: Correlations between features, shown by a heatmap
(The more yellow tends to have stronger correlation).

Model Training

As described in our previous post, Bayesian Optimization [6] is used to find the best hyperparameter values. The objective function to be optimized in the hyperparameter tuning is the following score measured on the validation set:

S = alpha * fnr + (1 – alpha) * fpr

where fpr and fnr are the False Positive and False Negative rates obtained on the validation set. Our goal is to keep False Positive rate low, therefore we use alpha = 0.2. Since the validation set is highly unbalanced, we found out that standard scores like Precision, F1-score, etc… do not work well. In fact, using this custom score is crucial for the model to obtain a good performance generally.

Note that we only use the above score when running Bayesian Optimization. To train logistic regression models, we use Gradient Descent with the usual ridge loss function.

Result

The best configuration of hyper-parameters found with Bayesian Optimization is:

  • ridge_lambda = 0.02789616
  • learning_rate = 0.001
  • max_iterations = 85
  • scoring_threshold = 0.9

This configuration of hyperparameters produces a false alarm rate of 1.3% and captures 100% failure events of the test set.

In the case where one wants to reduce the false alarm rate, Arimo Predictive Apps also allows the user to manually tune the scoring_threshold parameter while keeping the other parameters unchanged. Take a look at Tuning results given in Table 3.

scoring_threshold% false alarm rate% captured failure events
0.91.3100
0.991.190
0.999999990.880
0.9999999999999990.670

Table 3: Manually tuning results for the scoring_threshold parameter.

Conclusion

Predictive maintenance is a powerful technique for reducing cost and improving productivity in manufacturing operations. Arimo’s Predictive Apps provide a comprehensive platform for ingesting machine condition data, developing a predictive model and delivering the predictive model to end users so that action can be taken when a problem is imminent.

References

  1. [1] Preventive and Predictive Maintenance, [online Mar, 2016].
  2. [2] Preventive vs Predictive Maintenance: What You Should Know, [online Mar, 2016].
  3. [3] The Raw Hard Drive Data Set, [online Mar, 2016].
  4. [4] J.F. Murray, G.F. Hughes, K. Kreutz-Delgado, “Machine Learning Methods for Predicting Failures in Hard Drives: A Multiple-Instance Application”, The Journal of Machine Learning Research, vol. 6, pp. 783–816, 2005.
  5. [5] E. Pinheiro, W.D. Weber, L.A. Barroso, “Failure Trends in a Large Disk Drive Population”, in Proceedings of 5th USENIX Conference on File and Storage Technologies, pp. 17-29, 2007.
  6. [6] V. Pham, “Bayesian Optimization for Hyperparameter Tuning”, [online Mar, 2016].