Skip to main content

k-Shape clustering for extracting macro-patterns in intracranial pressure signals

Abstract

Background

Intracranial pressure (ICP) monitoring is a core component of neurosurgical diagnostics. With the introduction of telemetric monitoring devices in the last years, ICP monitoring has become feasible in a broader clinical setting including monitoring during full mobilization and at home, where a greater diversity of ICP waveforms are present. The need for identification of these variations, the so-called macro-patterns lasting seconds to minutes—emerges as a potential tool for better understanding the physiological underpinnings of patient symptoms.

Methods

We introduce a new methodology that serves as a foundation for future automatic macro-pattern identification in the ICP signal to comprehensively understand the appearance and distribution of these macro-patterns in the ICP signal and their clinical significance. Specifically, we describe an algorithm based on k-Shape clustering to build a standard library of such macro-patterns.

Results

In total, seven macro-patterns were extracted from the ICP signals. This macro-pattern library may be used as a basis for the classification of new ICP variation distributions based on clinical disease entities.

Conclusions

We provide the starting point for future researchers to use a computational approach to characterize ICP recordings from a wide cohort of disorders.

Introduction

Intracranial pressure (ICP) monitoring is a mainstay of neurosurgical diagnostics both for intensive care management in acute neurosurgical conditions [1] and for aiding diagnosis in conditions outside the intensive care unit (ICU) for milder degrees of disease such as hydrocephalus, normal pressure hydrocephalus (NPH), or idiopathic intracranial hypertension (IIH).

In the clinical setting, ICP is often interpreted purely as a number within a certain range. Yet, ICP signals are complex time series with wave patterns that go beyond just a simple number. Analysis of ICP waveforms on either a subsecond beat-to-beat basis or in patterns over longer durations, the so-called macro-patterns, gives further insight into brain function [2]. Machine learning tools have the potential to identify these patterns faster and—more importantly—objectively, helping to characterize their appearance and distribution in a standardized fashion compared to the current primary visual inspection by clinicians. Until now, most studies have employed these techniques to analyze the ICP in acute conditions. Mariak et al. used artificial neural networks (ANN) to extract global properties of the entire ICP time series to assess the severity of the clinical state in intensive care patients [3]. Hornero et al. analyzed the complexity of the ICP signal estimated by approximate entropy (ApEn) to determine the presence of patterns in periods of acute elevations in ICP of pediatric patients in intensive care [4].

In the last decade, new telemetric ICP monitoring devices have become available, allowing easier access to perform ICP recordings that are representative of daily life conditions, compared to previous cable-based solutions [1, 5]. Thus, ICP can now be monitored in patients with milder degrees of disease in disease categories such as hydrocephalus, normal pressure hydrocephalus, or idiopathic intracranial hypertension. The ICP signals recorded with these systems ensure sufficient clinical and technical quality to be analyzed as part of the ICP interpretation procedure carried out by neurosurgeons and other clinicians [6,7,8], but the increased monitoring period and signal diversity also means that the analysis of ICP data becomes more demanding.

In this study, we explore the use of machine learning tools to extract macro-patterns from the ICP signal in a diverse cohort of patients with different disease entities. We introduce a new methodology based on k-Shape clustering as a basic building block for future day-to-day ICP evaluation and update of models on stored patient data. Given that telemetric ICP monitoring has allowed us to evaluate the patient’s ICP out of hospital borders, our main context for considering new macro-patterns moves away from ICP monitoring exclusively in the neurointensive care setting, where ICP variations are more accentuated. Specifically, our approach aims to permit a more adequate description of the longer timescale ICP variations seen in the broader clinical setting nowadays including disease types like NPH or IIH. Our approach created a universal library of representative macro-patterns that can later be used to automatically segment each individual ICP signal into shorter sequences based on clinical input. Also, we developed a template matching framework to classify these shorter sequences—which we will refer to as ICP subsequences—into what we estimate to be clinically significant macro-patterns. Finally, we propose a possible visualization strategy to display the pattern-annotated ICP signal in a fashion that is clinically useful.

Methods

Our goal was to create a scalable library of a few macro-pattern templates to use for ICP subsequence classification. We used k-Shape clustering as a method to efficiently group together subsequences characterized by their shape similarity despite differences in amplitude, duration and alignment. We first describe our data selection and processing approach for artifact removal. Next, we discuss our k-Shape based clustering approach to construct the templates. Finally, we show how the stored library can be used to characterize new incoming ICP signals by reproducible macro-patterns. The components of the entire approach are illustrated in Fig. 1.

Fig. 1
figure 1

Workflow of methodology developed in this paper for 88 h of ICP data, and an additional data set of 55 h purely for investigating how new incoming ICP data in the future can be labelled

Data selection

We used a collection of eight randomly selected anonymized overnight monitoring sessions that belong to different subjects from our database in the Department of Neurosurgery, Rigshospitalet, Denmark. A commercially available cable ICP probe (Neurovent-P; Raumedic AG, Germany) was used for these measurements. The length of the sessions spanned from nine to 22 h, summing up to a total of 88 h. The sampling frequency of the recordings was 100 Hz. The dataset was made up of five monitoring sessions (five patients) for a total of 88 h, and an additional set of three monitoring sessions (three patients) for a total of 55 h. By adding the latter dataset, template matching results can provide an indication of whether the algorithm is general enough to cover subjects with different disease entities.

Data preprocessing

The ICP signal recorded is often contaminated by very high and sharp spikes, with unphysiologically high values. These artifacts mask the characteristic appearance of the signal, rendering accurate pattern recognition impossible. We used an Empirical Mode Decomposition (EMD) based method for spike removal [9].

EMD decomposes the signal into a set of intrinsic mode functions (IMFs, i.e., IMF\(_1\), IMF\(_n\), ..., IMF\(_N\)). The first function of this set corresponds to fast oscillations, while the last one corresponds to the slowest ones. Therefore, the higher the IMF order, the lower will be its frequency content.The first IMFs, containing high-frequency oscillations, indicate the presence of artifacts. Because the spikes have band-limited waveforms, their dominant oscillations are found in a subset of consecutive IMFs. In our case, the location of unphysiologically high and rapid spikes aligned with the location of spike events in IMF\(_{1}\) to IMF\(_{4}\), so summing these four IMFs enhances spike episodes. The summation result reveals the peaks with dominant amplitude at the temporal location of the spike, and attenuates the effect of non-spike events. The term \(g_r\) will be used to refer to the partially reconstructed signal calculated as the sum of the first to fourth IMFs.

To identify the peak events in \(g_r\), an adaptive thresholding approach was implemented. ICP values outside the bounded region between [\(-\eta _s\), \(\eta _s\)] were identified as spikes. The threshold was calculated as \(\eta _s= \sigma \sqrt{2\cdot \log (L)}\), where \(\sigma \) and L are the standard deviation (noise level) and number of samples of \(g_r\), respectively. It is a universal threshold first proposed by Donoho and Johnstone [10] for determining a value above background noise. Identified spikes were then imputed with a moving average calculated over a sliding window of 10s.

Template library creation

We implemented the algorithm in MATLAB (R2020b; The MathWorks, Inc., Natick, MA.) using the platform: Intel®with core i7 processor and clock speed 2.6 GHz and 16 GB RAM.

Segmentation

Time series segmentation plays an important role in data mining and refers to the tool for decomposing the signal into a discrete number of contiguous subsequences. The proposed algorithm for segmentation of the ICP signal can be broken down into four sequential steps, as seen in Fig. 2. The following section will cover the details regarding each of the steps.

Fig. 2
figure 2

Workflow of the steps involved in the generation of the ICP subsequences, describing the segmentation step in Fig. 1

ICP segmentation was applied to divide the signal into subsequences of duration varying from seconds to minutes. This poses the challenge of deciding the time location at which to anchor both the start and end points of each subsequence. To address this problem, we first smoothed the signal via a linear phase finite impulse response (FIR) lowpass filter. The filtered signal will only be used in the segmentation step. The cut-off frequency (\(F_{pass}\)) was set to 0.05–0.1 Hz, depending on the degree of smoothing desired for the removal of cardiac and respiratory contributions in each subject. Other filter parameters were \(F_{stop}\) = 0.02–0.05 Hz, \(A_{pass}\) = 0.001 dB, \(A_{stop}\) = 60 dB, and minimum order.

From the smoothed ICP signal, the major extrema were extracted (maxima and minima). Only the minima were used as the start and endpoints for each of the ICP subsequences. Because some minima were located very close—both time and amplitude wise—to a neighboring maximum, we implemented the following rule to identify suitable minima for the segmentation. If we suppose that the discrete ICP signal at this stage can be written as \(g_n\), \(n=1,2,...,N\) and indexed in time order \(t_n\), \(t=1,2,...,N\), we removed a minima \(g_i\) from being a candidate as a boundary point if:

  1. 1)

    the time difference between the minimum \(g_i\) and its neighboring maximum \(g_j\) was smaller than a predefined value \(\eta _{dur}\), between 0.5 and 2 min, i.e. \(|t_j-t_i|<\eta _{dur}\), or

  2. 2)

    the magnitude difference between a minimum \(g_i\) and its neighboring maximum \(g_j\) was smaller than a predefined value \(\eta _{mag}\), between 0.5 and 1.5, i.e. \(|g_j-g_i|<\eta _{mag}\).

We can then define the segmented window (i.e., ICP subsequence) as g[ij] with i and j corresponding to the discrete indices of the selected boundary points. An example of these steps is shown in Fig. 5A–C.

Z-normalization

Z-normalization of the derived subsequences was required before clustering. As many recent studies [11, 12] suggest, this procedure is necessary for data mining algorithms to deal with scale and translation invariance to prioritize shape features over amplitude ones. By z-normalizing each subsequence we ensured that they were linearly transformed to have zero mean and standard deviation close to one:

$$\begin{aligned} z(g[i,j])= \frac{g[i,j]-\mu _{g[i,j]}}{\sigma _{g[i,j]}} \end{aligned}$$
(1)

where \(\mu _{g[i,j]}\) and \(\sigma _{g[i,j]}\) refer to the mean and standard deviation of the ICP subsequence g[ij], respectively. For the sake of simplicity, we will refer to each z-normalized ICP subsequence z(g[ij]) as \(z_{icp}\) in the rest of the paper.

k-Shape clustering

k-Shape was used to divide our extracted ICP subsequences into a number of characteristic-preserving groups, the so-called clusters, such that sequences in the same group were similar in shape. Each cluster is represented by a central vector, the centroid, which is not necessarily part of the original dataset [13]. Each centroid in k-Shape is determined as a sequence that minimizes the sum of squared distances to the rest of the z-normalized ICP subsequences. This novel centroid-based clustering algorithm is fundamentally a variant of k-means with a distance measure derived from the cross-correlation coefficient [14]. As a result, one template is built for each centroid and subsequently stored together with a class label.

Through an iterative procedure, k-Shape:

  1. 1)

    assigned each z-normalized ICP subsequence to the centroid with the maximum shape similarity in the assignment step, and

  2. 2)

    updated the centroids based on the new members of each cluster, in the refinement step.

The previous two steps of the algorithm were repeated either until there was no change in cluster configuration or until the maximum number of 100 iterations was reached [14].

Shape similarity was defined by the so-called Shape-Based Distance (SBD):

$$\begin{aligned} SBD(\overrightarrow{x},\overrightarrow{\text{c}_\text{k}}) = 1-max_w \left( \frac{CC_w(\overrightarrow{x},\overrightarrow{\text{c}_\text{k}})}{\sqrt{R_0(\overrightarrow{x},\overrightarrow{\text{x}})\cdot R_0(\overrightarrow{c_k},\overrightarrow{\text{c}_\text{k}})}} \right) \end{aligned}$$
(2)

where w is the position at which the cross-correlation \(CC_w(\overrightarrow{x},\overrightarrow{\text{c}_\text{k}})\) between the z-normalized ICP subsequence (\(\overrightarrow{x}=z_{icp}\)) and the centroid vector of each cluster (\(\overrightarrow{\text{c}_\text{k}}\)) was maximized; and \(R_0\) the geometric mean of autocorrelation of each individual sequence \(\overrightarrow{x}\) or \(\overrightarrow{c_k}\) [14]. Cross-correlation measures the degree of similarity between two time series, which in our case are \(\overrightarrow{x}\) and \(\overrightarrow{\text{c}_\text{k}}\), calculated as a function of the displacement of \(\overrightarrow{x}\) over \(\overrightarrow{\text{c}_\text{k}}\). Cross-correlation adds shift-invariance to the SBD measure and can be computed on sequences of different lengths.

Determining the optimal number of clusters, K, is a fundamental challenge within partitional clustering and unfortunately, there is not an ideal approach to identify K. Given that we had a large amount of data to be clustered into a number of clusters, and this number was dependent on medical practical experience, the need for an initial estimate of clusters is clear. We relied on a direct method, the so-called silhouette index, as the metric to evaluate the quality of the clustering structure. This metric evaluates the clustering quality based on the similarity between subsequences within the same cluster and across different clusters [15]:

$$\begin{aligned} S(i) = \frac{b(l)-a(i)}{max\{b(l),a(l)\}} \end{aligned}$$
(3)

In Eq. 3, a(l) is the average distance between subsequence l and every subsequence within the same cluster and b(l) is the minimum average distance between subsequence l and every subsequence in different clusters [16]. The optimal estimate of K was the value that maximized the silhouette metric over a range of possible values for K. The window of solutions for which the silhouette index was calculated ranged from 5 to 20.

Cluster validation

Visual inspection of the clustering results is crucial for verifying the accuracy of the partitioning. However, a visual approach is subject to the level of expertise and subjectivity of the investigator. Thus, visualization needs to be combined with standardized cluster validation indices (CVI) tailored to quantitatively evaluate clustering results. Quantitative evaluation of extracted clusters is not straightforward if there is a lack of annotated data. Thus, we need to rely on internal indices. Conclusions from previous studies have shown that there is no best single CVI in each context [17, 18]. Therefore, multiple validation indices will be used in the validation process: Silhouette Index, Davies–Bouldin index (DBI), and Calinski–Harabasz index (CHI).

Silhouette index, introduced in the previous section, is a common metric to measure how well an object lies within a cluster and our selected internal clustering validation index. DBI is the ratio between the average distance of all subsequences of each cluster to their respective centroids and the distance of the centroids of the two clusters, i.e., the ratio between within-cluster compactness and between-cluster separation [19, 20]:

$$\begin{aligned} DBI = \frac{1}{K} \sum _{a=1}^K max \Big \{ \frac{d_a+d_b}{d(c_a,c_b)} \Big \} \, {a\ne b} \end{aligned}$$
(4)

where K is the number of clusters, ab are cluster labels, \(d_a,d_b\) the average distance of all subsequences in clusters a and b to their respective centroids, and \(d(c_a,c_b)\) the distance between centroids. Smaller values indicate better clustering results, as clusters are more separated from each other and less disperse within each cluster. To be in line with the rest of CVIs, we use \(1-DBI\) for comparison of clustering results and thus higher values indicate better clustering solutions.

CHI relates the sum between the cluster dispersion calculated as the distance, \(S_B\), between each within-cluster subsequence and its centroid, to the inter-cluster dispersion calculated as the distance (\(S_W\)) between each centroid to the global centroid (\({\overline{c}}\)) [21]:

$$\begin{aligned} CHI = \frac{tr(S_B)}{tr(S_W)}\cdot \frac{n_p-1}{n_p-K} \end{aligned}$$
(5)

where \(S_B\) and \(S_W\) are the between and within cluster scatter matrices, respectively, tr the trace defined by the sum of the elements of the main diagonal of the scatter matrices, K the number of clusters and \(n_p\) the number of clustered subsequences. The higher the index value, the better the performance of the clustering.

Characterization of ICP signals

Shape-based template matching

The primary goal was to learn what the distinctive shapes for differentiating pattern clusters from each other were. Therefore, when an uncharacterized ICP subsequence entered into our system, we were able to automatically determine if it belonged to a template from the library of patterns or not. For labeling ICP subsequences based on the generated templates, new ICP subsequences from the additional dataset were retrieved and z-normalized to address scaling invariance. To deal with the horizontal shifts and stretching of the subsequence on the templates, we rescaled the time dimension. Query subsequences were then compared to each template for the closest match. For this comparison, we computed the SBD so that the shape similarity could be measured.

This template matching approach is done under the assumption that all queries must be classified to a template, even if the closest match shows a high SBD. This is why apart from defining our template library, we also defined a rule to ensure that the correlation to the closest match is meaningful. Although this parameter can be specified by the user, a reasonable rule is: \(CC(z_{icp},\overrightarrow{c_k}) > 0.50\).

Classification visualization

The amount of data in each ICP recording is very large. With our current template library, we are able to classify a subset of the ICP signal. Visualizing this information must be presented to a clinical end-user in a fashion that is operationally useful. For this purpose, we represented each ICP subsequence as colored boxes with varying dimensions according to their characteristics (Fig. 3). The height of the box was defined by the difference between the absolute maximum and minimum values of the non-normalized sequence (of the raw unfiltered ICP signal), and the width by the duration of the sequence. The vertical center of the box corresponded to the median ICP value of the non-z-normalized subsequence. Each box was colored after the label their corresponding subsequence had been matched to, being black if the matching correlation coefficient was below 0.50.

Fig. 3
figure 3

Detailed description of data reduction for each labeled ICP subsequence

Results

Data demographics

Eight patients were selected for the study: two male and six female. The pooled median age was 55 years; range: 20–74 years old. Subjects were fetched randomly from a continuously updated clinical ICP database. The clinical conditions were hydrocephalus, aneurysm and craniotomy, but signal analysis was performed on the anonymized recordings without reference to clinical information.

Data pre-processing

We decomposed the ICP signal via EMD into sixteen IMFs and a residual. Figure 4 shows an example of an ICP signal of one subject after EMD-based filtering, with unphysiologically high and rapid spikes removed.

On average, 18 spikes of less than one second duration are identified in each ICP monitoring. These spikes are found within a range that spans from two to 43 spikes per recording, that account on average for less that 0.000087\(\%\) of the total monitoring time. Thus, removing the few samples corresponding to these spikes should not have any major consequences on later processing steps, especially since we will be looking at longer variations of the ICP signal.

Fig. 4
figure 4

Example of EMD-based filtering of an ICP signal for removal of high spikes

Template library creation

Segmentation and normalization

We now show how the ICP signal is segmented and illustrate the segmentation results for the five patients whose recordings made up the main dataset. Figure 5 displays the segmentation process described in Fig. 2. From the figure, we can see that some of the minima extracted, marked as black squares, are not minima that could potentially be considered boundary points. To keep only the minima of our interest, marked as squares, we specified \(\eta _{dur}\) and \(\eta _{mag}\) for each ICP signal. From the main dataset of 88 h, we were able to generate 5579 ICP subsequences. The last Fig. 5(d) presents how the segmentation results are z-normalized. Z-normalization of a subsequence was done with the mean and standard deviation of that subsequence.

Fig. 5
figure 5

Example of the Artifact removed (AR)-ICP signal segmentation of one subject. A After low-pass filtering the AR-ICP signal, with respiratory and pulse contributions to the signal removed to generate the AR-low-pass (ARLP)-ICP signal; B After extracted extrema from ARLP-ICP signal; C After segmentation using desired minima; and D After z-normalization of the segmented AR (ZNAR)-ICP signal

k-Shape clustering

Fig. 6
figure 6

Number of optimal K using Silhouette score on the main ICP subsequences

5579 ICP subsequences of varying length generated from the 88 h of the ICP main dataset were clustered with the k-Shape algorithm into seven clusters. The number of optimal clusters to generate the most distinct patterns was calculated using the Silhouette index. We set \(K=7\) because it gave us the maximum silhouette value after performing k-Shape clustering for \(k=5-20\), as seen in Fig. 6. Going beyond twenty will not contribute to generating distinctive clusters with sufficient information about the ICP data and it will only make our clusters more complicated.

Fig. 7
figure 7

Main extracted reproducible subsequences from the 88 h of ICP recordings (main dataset). These patterns are the foundation for identifying clinically relevant macro-patterns across a wide cohort of patients, moving away from Lundberg’s A and B waves. In contrast to the classical approach, our subsequences could be combined to generate a new macro-pattern. For instance, the ascending L6 subsequence could be followed by the descending L3 subsequence, generating a new macro-pattern

To better understand what shapes of centroids were generated, Fig. 7 visualizes the cluster centroids with their corresponding ICP subsequences. This means that the five ICP recordings can be represented by a combination of these seven patterns which can vary in duration and amplitude.

Cluster validation

Silhouette index was used to compare the clustering results of k-Shape applied to the main data with and without the addition of the correlation rule. Figure 8 shows that for Silhouette index, k-Shape together with the correlation rule shows better results over just k-Shape.

Fig. 8
figure 8

Graphical silhouette values for each clusters when \(K=7\) of the main dataset (left), main dataset and correlation rule (right). The dashed vertical line indicates the average silhouette score across all clusters

The value of the Silhouette index lies within the range − 1 to 1. The closer the index is to 1, the more dense and well-separated from other clusters it is. The addition of the correlation rule increases the silhouette average from 0.024 to 0.056. Another aspect to look for is the thickness (in the vertical axis) of the silhouette plot representing each cluster, being more uniform between clusters when the correlation rule is considered. This idea is reinforced by the results seen in Table 1, with all CVIs increasing when the correlation rule is added.

Table 1 Three CVIs for k-Shape clustering without and with correlation rule for the main dataset

Cluster validation indices highly depend on the complexity of the cluster analysis and on the vague definition of what the nature of the cluster is. Such validation requires a visual approach [22]. We used the additional set to visually confirm that the patterns found in the main set can also be observed in this new data.

ICP signal characterization

To allow searching for the minimum distance between each new z-normalized ICP subsequence and each pattern template, we used SBD. Results from the previous section showed the importance of the introduction of a correlation-based rule to ensure that the closest match was significant. We need to bear in mind that if clinicians are our end-users, the ICP template matching output should be clinically intelligible. Figure 9 shows an example of our ICP signal characterization output described in the Methods section.

Fig. 9
figure 9

Example of the ICP signal segmentation and classification into labels for one subject visualized in the raw signal (top) and in the data reduced signal representation (bottom)

Fig. 10
figure 10

Example of the ICP signal segmentation and classification into labels, including unclassified subsequences, for one subject visualized in the raw signal (top) and in the data reduced signal representation (bottom)

Using the pipeline solution we propose in this paper, we are able to characterize an average of 54\(\%\) of the ICP signal. Figure 10 shows an exampled including unclassified ICP subsequences. Further classification details are presented in Table 2.

Table 2 Percentage duration of the classified subsequences in the ICP monitoring of each patient. The first five patients correspond to the main set, while the remaining three are part of the additional set
Fig. 11
figure 11

Variation between patients and labels for how often the seven types of patterns occur (left), pattern mean amplitude (middle) and pattern mean duration (right)

Figure 11 provides further visualization of the macro-pattern amplitude and duration in each subject. The presence of specific patterns in unique subjects, in this case L5 in Subject 1, could suggest that the occurrence frequency of specific macro-patterns could potentially be used to describe the pathological state of each subject.

Discussion

Typical A and B waves are described and classified differently by various authors [23] and do no longer adequately address the waveforms encountered in clinical practice today, where patients are investigated also in non-acute scenarios. Therefore, building on top of these classical macro-patterns, a new workflow was developed for the characterization and visualization of long-term ICP variations. Our adaptable pipeline steps includes Empirical Mode Decomposition (EMD) for artifact removal, segmentation into variable-duration subsequences, z-normalization, k-Shape clustering to divide the extracted ICP subsequences into a number of characteristic preserving labels, template-matching to locate the labels in the segmented ICP signal, and finally produces a box-based subsequence labeling display.

A previous study by Paparrizos and Gravano [14] carried out an extensive analysis on the performance of k-Shape against partitional, hierarchical and spectral clustering methods combined with different state-of-the-art distance measures. k-Shape outperformed traditional scalable and non-scalable clustering, such as k-means with Dynamic Time Warping (DTW) as distance metrics or k-medoids, in terms of both accuracy and/or efficiency [14, 24, 25]. Given that we are working with large ICP monitoring sessions, the possibility of scaling while still ensuring high accuracy and efficiency made k-Shape clustering our chosen clustering technique. Unfortunately, k-Shape also presents a limitation: the number of clusters needs to be pre-specified by the user. If the assumed K value is above the optimal, the algorithm will generate unnecessary additional groups; if below, we will be under-representing associations between subsequences. There is no perfect method to determine the optimal number, as there is no clear definition of a cluster. We tackled this issue by combining visual inspection with the Silhouette index. Initial visual exploration by clinicians suggested that the search for the optimal K should not go beyond \(k=20\), as they do not believe in the existence of a number of clinically relevant macro-patterns beyond that value. Thus, the choice of search range was \(k = 5-20\), with \(K=7\) as the optimal value for our data. We are aware that our methods for selecting K are heuristics, and subject to interpretation. A different choice of K could yield different results if our study is to be replicated with the same methodology by different research groups.

Besides using CVIs for estimating the quality of the clustering, it is important to visually inspect the results. Clusters L1, L2, and L5 could fall into the same category of clinically well-known waveforms since they highly relate to A and B waves. Previous studies classify B waves according to their shape into symmetrical (sinousoidal) and asymmetrical (ramp-like) waveforms [26,27,28,29]. Cluster L1 in our template library resembles the so-called asymmetrical waves, since the duration of the ascending phase is longer than that of the descending. Clusters L2 and L5 present more symmetry, with ascending and descending phases of closer duration. They differ in the presence (L2) or not (L5) of a plateau. For L2 waveforms, the pressure magnitude will determine its degree of similarity to either A or B waves. Remaining extracted clusters have not been described as such in the literature. Clusters L6 and L3 represent ascending and descending segments leading or ending a plateau segment, respectively. This plateau can vary in duration and in some cases contain other clinically relevant macro-patterns. Cluster L7 is likely to represent subsequences containing artifacts, given the shape of the peak. Finally, cluster L4 appears as a new non-classified macro-pattern, whose clinical relevance needs to be further analyzed. It must be noted that the templates in our library are normalized in time, meaning that they can be stretched and compressed when matched to incoming ICP subsequences, but constrained by the correlation rule.

With these templates, approximately half of the ICP recording ends up being labeled. This means that for the data considered in this paper, half of it can be represented by just seven shapes (properly scaled horizontally and vertically). This suggests that many ICP signals are often made up of the same patterns repeated again and again. As we have selected to look for one particular macro-pattern type (B wave) to investigate the feasibility of our approach, and the occurrence of macro-pattern types is related to the clinical diagnosis, it is to be expected that the current macro-pattern library does not cover the entire curve length and that the percentage covered can vary between datasets, as we have included these randomly. With this in mind, we have developed the building blocks of a methodology that—with additional retrospective data—could allow identification of previously unencountered macro-patterns in addition to the immediately useful potential of systematic quantitative multidimensional analysis of ICP data. It would be interesting to investigate whether an increased number of templates, K, would increase the fraction of ICP recording being labeled, and especially whether such an increase in the fraction comes at the price of an exponential increase in K. Finally, in this context, one should bear in mind that if K increases to e.g., 100, then the clinical clarity with respect to visual classification might suffer seriously.

The universal scalable library produced so far is the result of combining clinical knowledge of how ICP changes in different clinical conditions, with an engineering approach that moves ICP signal analysis in a more robust quantitative direction with fewer subjective judgments. The results of the present work may be considered benchmarks for the shape clustering method that will be used in our ongoing research. The evident next step is to relate the generated macro-pattern templates with clinical data to ensure that macro-patterns are reproducible and identifiable across a wide cohort of patients with different disease entities. Subsequently, we aim to investigate if it is possible to match disease categories with the occurrence frequency and distribution of the specific macro-patterns. Knowledge of some macro-patterns possibly being more indicative of particular pathological conditions opens opportunities to individualize management and treatment of each patient and obtain a better prediction and understanding of the possible outcome. Furthermore, looking at each label together with additional monitoring of other physiological signals could help to elucidate the origin of each waveform.

The output of our labeling method must be displayed in a way that ensures readability and clarity for clinicians to easily interpret and to integrate it as a new tool in their daily clinical practice. Our visualization strategy is one of the many alternative ways of looking at the raw ICP signal that could be used to accentuate specific features that might not be easily spotted during the visual interpretation of the ICP monitoring. It can be seen as a prototype, among all possibilities of graphical representations, for how the ICP data analysis workflow can be structured. The box approach highlights the presence of the seven identified labels, with many other visualization alternatives yet to be considered, some of them maybe aiming for a report of a certain clinical state. Internal distribution and clinical weighting of the boxes could reflect the pathological state of the patient.

Limitations

The selection of subjects for the creation of the template library is likely to affect the result, since some patients appear to have more distinguishable macro-patterns than others. Applying this approach to a larger group of subjects is one of the future objectives.

Conclusions

In this paper, a flexible time series pattern recognition scheme customized to handle ICP time series patterns was introduced. In particular, a clustering algorithm k-Shape clustering was first applied to cluster ICP subsequences to generate a standard scalable library of macro-patterns that can further be used for classification of new incoming ICP signals. We worked with 88 h of ICP recordings and showed the resulting seven clusters that best describe them. Our further research will investigate the clinical use of this technique and look at the practicality of its automatic use to quantitatively interpret ICP data, hoping to reveal a better understanding of the patients underlying physiological status.

Availability of data and materials

Not applicable.

Abbreviations

ApEn:

Approximate entropy

CHI:

Calinski–Harabasz index

CVI:

Cluster validation index

DBI:

Davies–Bouldin index

EMD:

Empirical mode decomposition

FIR:

Finite impulse response

ICP:

Intracranial pressure

IIH:

Idiopathic intracranial hypertension

IMF:

Intrinsic mode function

NPH:

Normal pressure hydrocephalus

SBD:

Shape-based distance

TBI:

Traumatic brain injury

References

  1. Lilja-Cyron A, Kelsen J, Andresen M, Fugleholm K, Juhler M. Feasibility of telemetric intracranial pressure monitoring in the neuro intensive care unit. J Neurotrauma. 2018;35(14):1578–86.

    Article  Google Scholar 

  2. Czosnyka M, Pickard JD. Monitoring and interpretation of intracranial pressure. J Neurol Neurosurg Psychiatry. 2004;75(6):813–21.

    Article  CAS  Google Scholar 

  3. Mariak Z, Swiercz M, Krejza J, Lewko J, Lyson T. Intracranial pressure processing with artificial neural networks: classification of signal properties. Acta Neurochirurgica. 2000;142(4):407–12.

    Article  CAS  Google Scholar 

  4. Hornero R, Aboy M, Abasolo D, McNames J, Wakeland W, Goldstein B. Complex analysis of intracranial hypertension using approximate entropy*. Crit Care Med. 2006;34(1):87–95.

    Article  Google Scholar 

  5. Andresen M, Juhler M, Munch TN. Quality and Safety of Home ICP Monitoring Compared with In-Hospital Monitoring; 2012. p. 187–191.

  6. Andresen M, Juhler M. Intracranial pressure following complete removal of a small demarcated brain tumor: a model for normal intracranial pressure in humans. J Neurosurg. 2014;121(4):797–801.

    Article  Google Scholar 

  7. Andresen M, Hadi A, Petersen LG, Juhler M. Effect of postural changes on ICP in healthy and ill subjects. Acta Neurochirurgica. 2015;157(1):109–13.

    Article  Google Scholar 

  8. Andresen M, Juhler M, Thomsen OC. Electrostatic discharges and their effect on the validity of registered values in intracranial pressure monitors. J Neurosurg. 2013;119(5):1119–24.

    Article  Google Scholar 

  9. Martinez-Tejada I, Wilhjelm JE, Juhler M, Andresen M. Empirical mode decomposition-based method for artefact removal in raw intracranial pressure signals. Acta Neurochirurgica. 2021;131:201–5.

    Article  Google Scholar 

  10. Donoho DL, Johnstone IM. Ideal spatial adaptation by wavelet shrinkage. Biometrika. 1994;81(3):425–55.

    Article  Google Scholar 

  11. Aghabozorgi S, Ying Wah T, Herawan T, Jalab HA, Shaygan MA, Jalali A. A hybrid algorithm for clustering of time series data based on affinity search technique. Sci World J. 2014;2014.

  12. Javed A, Lee BS, Rizzo DM. A benchmark study on time series clustering. Mach Learn Appl. 2020;1:100001.

    Google Scholar 

  13. Karim MR, Beyan O, Zappa A, Costa IG, Rebholz-Schuhmann D, Cochez M, et al. Deep learning-based clustering approaches for bioinformatics. Brief Bioinform. 2021;22(1):393–415.

    Article  Google Scholar 

  14. Paparrizos J, Gravano L. K-shape: efficient and accurate clustering of time series. Proceedings of the ACM SIGMOD International Conference on Management of Data. 2015;2015:1855–1870.

  15. Sardá-Espinosa A. Comparing time-series clustering algorithms in R using the dtwclust package. R Journal. 2019;11(1):1–45.

    Article  Google Scholar 

  16. Starczewski A, Krzyżak A. Performance evaluation of the Silhouette Index. Cham: Springer; 2015. p. 49–58.

    Google Scholar 

  17. Hämäläinen J, Jauhiainen S, Kärkkäinen T. Comparison of internal clustering validation indices for prototype-based clustering. Algorithms. 2017;10(3):105.

    Article  Google Scholar 

  18. Maulik U, Bandyopadhyay S. Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern Anal Mach Intell. 2002;24(12):1650–4.

    Article  Google Scholar 

  19. Li H, Zhang S, Ding X, Zhang C, Dale P. Performance evaluation of cluster validity indices (CVIs) on multi/hyperspectral remote sensing datasets. Remote Sensing. 2016;8(4):295.

    Article  Google Scholar 

  20. Kryszczuk K, Hurley P. Estimation of the number of clusters using multiple clustering validity Indices. Berlin: Springer; 2010. p. 114–23.

    Google Scholar 

  21. Calinski T, Harabasz J. A dendrite method for cluster analysis. Commun Stat Theory Methods. 1974;3(1):1–27.

    Article  Google Scholar 

  22. Ng M, Huang J. M-FastMap: a modified FastMap algorithm for visual cluster validation in data mining. Berlin: Springer; 2002. p. 224–36.

    Google Scholar 

  23. Eide PK, Fremming AD. A new method and software for quantitative analysis of continuous intracranial pressure recordings. Acta Neurochirurgica. 2001;143(12):1237–47.

    Article  CAS  Google Scholar 

  24. Paparrizos J, Gravano L. Fast and accurate time-series clustering. ACM Trans Database Syst. 2017;42(2):1–49.

    Article  Google Scholar 

  25. Yang J, Ning C, Deb C, Zhang F, Cheong D, Lee SE, et al. k-Shape clustering algorithm for building energy usage patterns analysis and forecasting model accuracy improvement. Energy Build. 2017;7(146):27–37.

    Google Scholar 

  26. Martinez-Tejada I, Arum A, Wilhjelm JE, Juhler M, Andresen M. B waves: a systematic review of terminology, characteristics, and analysis methods. Fluids Barriers CNS. 2019;16(1):33.

    Article  Google Scholar 

  27. Spiegelberg A, Preuß M, Kurtcuoglu V. B-waves revisited. Interdiscipl Neurosurg. 2016;12(6):13–7.

    Article  Google Scholar 

  28. Raftopoulos C, Chaskis C, Delecluse F, Cantrainet F, Bidauti L, Brotchi J. Morphological quantitative analysis of intracranial pressure waves in normal pressure hydrocephalus. Neurol Res. 1992;14(5):389–96.

    Article  CAS  Google Scholar 

  29. Santamarta D, González-Martínez E, Fernández J, Mostaza A. The prediction of shunt response in idiopathic normal-pressure hydrocephalus based on intracranial pressure monitoring and lumbar infusion; 2016. p. 267–274.

Download references

Acknowledgements

Not applicable.

Funding

This project has received funding from the Novo Nordisk Tandem (NNF17OC0024718).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Data analysis was performed by IM. The first draft of the manuscript was written by IM and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Isabel Martinez-Tejada.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Martinez-Tejada, I., Riedel, C.S., Juhler, M. et al. k-Shape clustering for extracting macro-patterns in intracranial pressure signals. Fluids Barriers CNS 19, 12 (2022). https://doi.org/10.1186/s12987-022-00311-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12987-022-00311-5

Keywords