TSML (Time Series Machine Learning)

Over the past years, the industrial sector has seen many innovations brought about by automation. Inherent in this automation is the installation of sensor networks for status monitoring and data collection. One of the major challenges in these data-rich environments is how to extract and exploit information from these large volume of data to detect anomalies, discover patterns to reduce downtimes and manufacturing errors, reduce energy usage, predict faults/failures, effective maintenance schedules, etc. To address these issues, we developed TSML . Its technology is based on using the pipeline of lightweight ﬁlters as building blocks to process huge amount of industrial time series data in parallel.


Introduction
TSML [5] is a Julia [1] package for time series data processing, classification, and prediction. It provides common API for ML (Machine Learning) libraries from Python's Scikit-Learn, R's Caret, and native Julia MLs for seamless integration of heterogeneous libraries to create complex ensembles for robust time series prediction, clustering, and classification. TSML has the following features: (1) data type clustering/classification for automatic data discovery (2) aggregation based on date/time interval (3) imputation based on symmetric Nearest Neighbors (4) statistical metrics for data quality assessment and classification input features (5) ML wrapper with more than 100+ libraries from caret, scikitlearn, and julia (6) date/value matrix conversion of 1-D time series using sliding windows to generate features for ML prediction (7) pipeline API for high-level description of the processing workflow (8) specific cleaning/normalization workflow based on data type (9) automatic selection of optimised ML model (10) automatic segmentation of time-series data into matrix form for ML training and prediction (11) extensible architecture using just two main interfaces: fit and transform (12) meta-ensembles for automatic feature and model selection (13) support for distributed/threaded computation for scalability and speed The TSML package assumes a two-column input for any time series data composed of dates and values. The first part of the workflow aggregates values based on the specified date/time interval which minimizes occurrence of missing values and noise. The aggregated data is then left-joined to the complete sequence of dates in a specified date/time interval. Remaining missing values are replaced by the median/mean or user-defined aggregation function of the k-nearest neighbors (k-NN) where k is the symmetric distance from the location of missing value. This approach can be called several times until there are no more missing values.
For prediction tasks, TSML extracts the date features and convert the value column into matrix form parameterized by the size and stride of the sliding window. The final part joins the date features and the value matrix to serve as input to the ML with the output representing the values of the time periods to be predicted ahead of time.
TSML uses a pipeline which iteratively calls the fit! and transform! families of functions relying on multiple dispatch to dynamically select the correct algorithm from the steps outlined above. Machine learning functions in TSML are wrappers to the corresponding Scikit-Learn, Caret, and native Julia ML libraries.
There are more than hundreds of classifiers and regression functions available using TSML's common API.

TSML Workflow
All major data processing types in TSML are subtypes of the Transformer. There are two major types of transformers, namely: filters for data processing and learners for machine learning. Both transformers implement the fit! and transform! multi-dispatch functions. All filters are direct subtypes of the Transformer while all learners are subtypes of the TSLearner. The TSLearner is a direct subtype of the Transformer.
Filters are normally used for pre-processing tasks such as imputation, normalization, feature extraction, feature transformation, scaling, etc. Consequently, filters' fit! and transform! functions expect one argument which represents an input data for feature extraction or transformation. Each data type must implement fit! and transform! although in some cases, only transform! operation is needed. For instance, square root or log filters do not require any initial computation of parameters to transform their inputs. On the other hand, feature transformations such as scaling, normalization, PCA, ICA, etc. require initial computation of certain parameters in their input before applying the transformation to new datasets. In these cases, initial computations of these parameters are performed by the fit! function while their applications to new datasets are done by the transform! function.
Learners, on the other hand, expect two arguments (input vs output) and require training cycle to optimize their parameters for optimal input-output mapping. The training part is handled by the fit! function while the prediction part is handled by the transform! function.
The TSML workflow borrows the idea of the Unix pipeline [3,4]. The Pipeline data type is also a subtype of the Transformer and expects two arguments: input and output. The main elements in a TSML pipeline are series of transformers with each performing one specific task and does it well. The series of filters are used to perform pre-processing of the input while a machine learner at the end of the pipeline is used to learn the input-output mapping. From the perspective of using the Pipeline where the last component is a machine learner, the fit! function is the training phase while the transform! function is the prediction or classification phase.
The fit! function in the Pipeline iteratively calls the fit! and transform! functions in a series of transformers. If the last transformer in the pipeline is a learner, the last transformed output from a series of filters will be used as input features for the fit! or training phase of the said learner.
During the prediction task, the transform! function in the Pipeline iteratively calls the transform! operations in each filter and learner. The transform operation is direct application of the parameters computed during normalization, scaling, training, etc. to the new data. If the last element in the pipeline during transform is a learner, it performs prediction or classification. Otherwise, the transform operation acts as a feature extractor if they are composed of filters only.
To illustrate, below describes the main steps in using the TSML.

¦ ¥
We can then setup a pipeline containing these filters to process the csv data by aggregating the time series hourly and check the data quality using the Statifier filter ( Fig. 1). § ¤ a p i p e l i n e = P i p e l i n e ( D i c t ( : t r a n s f o r m e r s = > [ c s v f i l t e r , v a l g a t o r , s t f i e r ] ) ) f i t ! ( a p i p e l i n e ) m y s t a t s = t r a n s f o r m ! ( a p i p e l i n e ) @ s h o w m y s t a t s

¦ ¥
A mentioned previously, the fit! and transform! in the pipeline iteratively calls the corresponding fit! and transform! within each filter. This common API relying on Julia's multi-dispatch mechanism greatly simplifies the implementations, operations, and understanding of the entire workflow. In addition, extending TSML functionality is just a matter of creating a new data type filter and define its own fit! and transform! functions.
In the Statifier filter result, blocks of missing data is indicated by column names starting with b. Running the code indicates that The result in Fig. 2 indicates NaN for all missing data statistics column because the set of missing blocks count is now empty.
We can also visualise our time series data using the Plotter filter instead of the Statifier as shown in

Processing Monotonic Time Series
This subsection dicusses additional filters to handle monotonic data which are commonly employed in energy/water meter and footfall sensors. In the former case, the time series type is strictly monotonically increasing while in the latter case, the monotonicity happens The presence of outlier due to some random errors during meter reading becomes obvious after the normalisation. To remedy this issue, we add the Outliernicer filter which detects outliers and replace them using the k-NN imputation technique used by the DateValNNer filter (

Processing Daily Monotonic Time Series
We follow similar workflow in the previous subsection to normalize daily monotonic time series. First, let us visualize the original data after aggregation and imputation (

Time Series Classification
We can use the knowledge we learned in setting up the TSML pipeline containing filters and machine learners to build higher level operations to solve a specific industrial problem. One major problem which we consider relevant because it is a common issue in IOT (Internet of Things) is the time series classification. This problem is prevalent nowadays due to the increasing need to use many sensors to monitor status in different aspects of industrial operations and maintenance of cars, buildings, hospitals, supermarkets, homes, and cities.
Rapid deployment of these sensors result to many of them not properly labeled or classified. Time series classification is a significant first step for optimal prediction and anomaly detection. Identifying the correct sensor data types can help in the choice of what the most optimal prediction model to use for actuation or pre-emption to minimise wastage in resource utilisation. To successfully perform the latter operations, it is necessary to identify first the time series type so that appropriate model and cleaning routines can be selected for optimal model performance. The TSClassifier filter aims to address this problem and its usage is described below.
First, we setup the locations of files for training, testing, and saving the model. Next, we start the training phase by calling fit! which loads file in the training directory and learn the mapping between their statistic features extracted by Statifier with their types indicated by a substring in their filenames. Once the training is done, the final model is saved in the model directory which will be used for testing accuracy and classifying new time series datasets.
The code below initialises the TSClassifier with the locations of the training, testing, and model repository. Training is carried out by the fit! function which extracts the stat features of the training data and save them as a dataframe to be processed by the

Extending TSML with Scikit-Learn and Caret
In the latest TSML version (2.3.4 and above), we refactored the base TSML to only include pure Julia code implementations and moved the external libs and binary dependencies into the TSMLextra package. One major reason is to have a smaller code base so that it can be easily maintained and rapidly deployed in a dockerized solution for Kubernetes or IBM's OpenShift cluster. Moreover, smaller codes make static compilation fast for smaller docker image in cloud deployment.
There are cases where the main task of time series classification requires more complex ensemble model using hierarchy or tree structure where members are composed of heterogeneous ML learners derived from binaries in different languages. For illustration purposes, we will show how to ensemble ML libraries from Scikit-Learn and Caret using TSML meta-ensembles that support the fit! and transform! APIs.

Parallel TSML Using Distributed Workflow
We will use Julia's built-in support for parallelism by using the Distributed standard library. We also let Julia detect the number of processors available and activate them using the following statements: § ¤

¦ ¥
Finally, we setup the parallelmodel function to run different learners distributed to different workers running in parallel relying on Julia's native support of parallelism. Take note that there are two parallelisms in the code. The first one is the distribution of task in different trials and the second one is the distribution of tasks among different models for each trial. It is interesting to note that with this relative compact function definition, the Julia language makes it easy to define a parallel task within another parallel task in a straightforward manner without any problem. § The data used in the experiment are sample snapshots of the data in our building operations. For reproducibility, the data can be found in the juliacon2019-paper branch of TSML in Github: /data/benchmark/tsclassifier. There are four time series types, namely: AirOffTemp, Energy, Pressure, and RetTemp. We took a minimal number of samples and classes for the sake of discussion and demonstration purposes in this paper. Figures 12 and 13 show a snapshot of running workers exploiting the distributed library of Julia and the classification performance of each model, respectively. There are 8 workers running in parallel over 12 different machine learning classifiers.
From the results, ExtraTree from Scikit-Learn has the best performance with 91.67% accuracy followed by TreeBag and SVMLinear from Caret library with 83.33 % accuracy for both. With this workflow, it becomes trivial to search for optimal model by running them in parallel relying on Julia to do the low-level tasks of scheduling and queueing as well as making sure that the dynamically available compute resources such as cpu cores and memory resources are fairly optimised.

Parallel TSML Using Threads Workflow
With Julia 1.3, lightweight multi-threading support in Julia becomes possible. We will be using the pure Julia-written ML models because installing external dependencies such as Caret MLs through RCall package has some issues with the alpha version of Julia 1.3 at this point in time. We will update this documentation and add more MLs once the issues are resolved.
The main difference in the workflow between Julia's distributed computation model compared to the threaded model is the presence of @everywhere macro in the former for each function defined to indicate that these function definitions shall be exported to all running workers. Since threaded processes share the same memory model with the Julia main process, there is no need for this macro. Instead, threading workflow requires the use of Reen-trantLock in the update of the global dataframe that accumulates the prediction performance of models running in their respective threads. In similar observation with the distributed framework, the threadedmodel function contains two parallelism: threads in different trials and threads among models in each trial. The function is surprisingly compact to implement threads within threads without issues and the main bottleneck happens only during the update operation of the global ctable dataframe. §   FordB dataset is a classification problem to diagnose whether a certain symptom exists in an automative system. There are 500 measurements of engine noise. The training data were collected in typical operating conditions while the test data were collected under noisy conditions. The dataset is slightly imbalanced consisting of 1860 class1 vs 1776 class2 in training while 401 class1 and 409 class2 in testing. There are 500 input features and 2 output classes in a total of 4446 samples. One thing to note is that the Refrigeration dataset has almost the same number of features with its total number of samples. We expect that this will result into a much harder classification problem eventhough it does not suffer from imbalanced class distribution. Ideally, there must be more samples than features in order for the classifier to properly extract the correct subset of features for robust classification.

Results
Due to the data imbalance, the typical accuracy measurement will not be able to capture the performance of the algorithms because its value may be overshadowed by the dominant class. In this regard, we use F-score to measure the performance of a given classifier to each of its classes and get the mean of these F-scores. Using similar naming convention, Table 3 indicates that Gradient-Boost from ScikitLearn and Random Forest from Caret are the best classifiers for the Refrigeration Devices problem. The two worst algorithms are the same as in Table 2.
For Earthquakes classification problem (

Discussion
Comparing the performances of different classifiers among the four problems indicate that the most difficult problem to classify is the Refrigeration Devices while the easiest one is the Earthquakes problem. As we expected, there is no enough sample for the classifiers to learn the mapping in Refrigeration dataset relative to its feature dimension. One way to alleviate this problem is to remove features that are highly correlated or perform PCA to reduce the feature dimension. This is beyond the scope of the current paper which focuses mainly on the applicability of TSML MLs to different set of problems.
Among the classifiers, the Random Forest from Caret is the top performer in all problems. On the other hand, TreeBag from Caret and Gradient Boosting from ScikitLearn are the top performers in Refrigeration Devices and Earthquakes, respectively. It is interesting to note that among the Random Forest implementations, the top performer is the Caret's version written by Breiman [2] who was the original author of the said algorithm in Fortran. The superior performance of Breiman's Random Forest can be highlighted in the FordB classification performance. kNN, the next best classifier is 4% lower than Breiman's Random Forest. In all other cases, the next best performer has almost same performance with Breiman's Random Forest.
As we expected based on past studies, the different ensemble models dominated the top 5 performers. While not shown in the table, the most consistent worst performer is the Caret's RPart. This is consistent to Breiman's observation that Random Forest performs well by using unstable or inferior ML models in its leaves. In Caret's Random Forest, the leaves are composed of RPart models which has the poorest performance in all problems. The combination of boosting and bagging weak learners such as RPart make the Random Forest robust from datasets imbalances.

Summary and Conclusion
Packages for time series analysis are becoming important tools with the rapid proliferation of sensor data brought about by IoT. We created TSML as a time series machine learning framework which can easily be extended to handle large volume of time series data. TSML exploits the following Julia features: multiple dispatch, type inference, custom data types and abstraction, and parallel computations.
TSML main strength is the adoption of UNIX pipeline architecture containing filters and machine learners to perform both preprocessing and modelling tasks. In addition, TSML uses a common machine learning API for both internal and external ML libraries, distributed and threaded support for modeling, and a growing collection of filters for preprocessing, classification, clustering, and prediction.
Extending TSML can easily be done by creating a custom data type filter and defining its corresponding fit! and transform! operations which the TSML pipeline iteratively calls for each transformer in the workflow.