Consecutive patients with acute ICH who were admitted to Mie Chuo Medial Center between December 2012 and July 2020, Matsusaka Chuo General Hospital between January 2018 and December 2019, Suzuka Kaisei Hospital between October 2017 and October 2019, and Mie University Hospital between January 2017 and July 2020 were retrospectively reviewed. Patients in Mie Chuo Medical Center, Matsusaka Chuo General Hospital, and Suzuka Kaisei Hospital were assigned to the development cohort, and those in Mie University Hospital were assigned to the validation cohort.
Inclusion criteria were defined as follows: ≥ 18 years of age; baseline CT scan within 24 h of onset; and follow-up CT scan within 30 h after baseline CT scan. Exclusion criteria were defined as follows: traumatic ICH; secondary cause of ICH (e.g., aneurysm, arteriovenous malformation, arteriovenous fistula, hemorrhagic transformation of infarction, and tumor); and surgical evacuation before follow-up CT scan.
Baseline clinical variables included age, sex, medical history (ICH, cerebral infarction, ischemic heart disease, hypertension, diabetes mellitus, and dyslipidemia), anticoagulant use, antiplatelet use, Glasgow Coma Scale, systolic and diastolic blood pressures, prothrombin time-international normalized ratio (PT-INR), white blood cell count, hemoglobin, platelet count, serum creatinine, serum total bilirubin, and time from onset to baseline CT scan.
This study was approved by the following institutional review boards: Mie Chuo Medical Center institutional review board [permit number: MCERB-201926], Matsusaka Chuo General Hospital institutional review board [permit number: 232], Suzuka Kaisei Hospital institutional review board [permit number: 2020–05], and Mie University Hospital institutional review board [permit number: T2019-19]. Because this was a retrospective study, separate informed patient consent was waived by the following institutional review boards: Mie Chuo Medical Center institutional review board [permit number: MCERB-201926], Matsusaka Chuo General Hospital institutional review board [permit number: 232], Suzuka Kaisei Hospital institutional review board [permit number: 2020–05], and Mie University Hospital institutional review board [permit number: T2019-19]. All study protocols and procedures were conducted in accordance with the Declaration of Helsinki. This manuscript was prepared according to the standards for reporting of diagnostic accuracy (STARD) statement.
CT scans were performed using 120 kVp with a thickness of 0.5–10.0 mm in the supine position. CT angiography was performed by injecting 50–100 ml of an iodinated contrast material at 3.5–5.0 ml/s; but not all patients underwent CT angiography. Manufacturers and models of CT scanners in the development cohort included Aquilion ONE (Canon Medical Systems, Ohtawara, Japan), Aquilion 64 (Canon Medical Systems), LightSpeed Plus (GE Medical Systems, Milwaukee, WI, USA), LightSpeed VCT (GE Medical Systems), BrightSpeed Elite (GE Medical Systems), and SOMATOM Definition Flash (SIEMENS Healthineers, Erlangen, Germany), and those in the validation cohort included Aquilion 64 and Discovery CT750 HD (GE Medical Systems).
The hemorrhage locations were categorized as basal ganglia, thalamus, lobe, brain stem, and cerebellum. The presence of intraventricular extension of hemorrhage was noted. The hematoma volume was calculated with the ABC/2 formula27. Hematoma expansion was defined as an increase in volume between baseline and follow-up CT scans exceeding 6 cm3 or 33% of the baseline volume16,17,18,19,20,28.
Intrahematoma hypodensities, irregular hematoma shape, and blend sign were identified as noncontrast CT markers. Intrahematoma hypodensities were defined as presence of any hypodense region encapsulated within the hematoma having any morphology and size, separated from the surrounding parenchyma3,4,12,14. Irregular hematoma shape was defined as presence of 2 or more hematoma edge irregularities4,7,9,12. Blend sign was defined as blending of relatively hypoattenuating area with adjacent hypoattenuating region within a hematoma with a well-defined margin and at least 18 Hounsfield units difference from these regions4,6,8,12. When available, CT angiography spot sign was evaluated, which was defined as follows: (1) ≥ 1 focus (attenuation ≥ 120 Hounsfield units) of any size and morphology of contrast pooling within a hematoma, and (2) discontinuous from normal or abnormal vasculature adjacent to the hematoma15,29. The CT markers were independently evaluated by 2 observers. When the evaluation by observers disagreed, the CT images were re-evaluated by both observers together, with consensus being developed.
After identification of ICH on baseline CT scan, continuous blood pressure monitoring and blood pressure-lowering treatment were initiated. Calcium channel blockers, mainly intravenous nicardipine, were administered as antihypertensive agents throughout the period between baseline and follow-up CT scans. The target systolic blood pressure was less than 140 mmHg or 180 mmHg.
Continuous variables were summarized using a mean with standard deviation or a median with interquartile range and compared using Student’s t test or Mann–Whitney U test, depending on the distribution of the variable assessed by the Shapiro–Wilk test. Categorical variables were summarized using a count with percentages and compared using Fisher’s exact test.
To confirm the superiority of predictive models using ML over the previous scoring methods, the BAT, BRAIN, and 9-point scores in the validation cohort were calculated16,17,18,19. The receiver operating characteristic (ROC) curve was drawn, where the best cutoff value by the Youden’s index was determined. In each scoring method, accuracy, sensitivity, specificity, and the area under the ROC curve (AUC) for the prediction of hematoma expansion were computed. The AUC of the three scores and that of ML models were compared using DeLong test.
All statistical analyses were performed using EZR (Saitama Medical Center, Jichi Medical University, Saitama, Japan)30, which is a graphical user interface for R (The R Foundation for Statistical Computing, Vienna, Austria).
Machine learning environment and algorithms
The programming language Python (version 3.7.8) and its libraries, NumPy (version 1.19.1), scikit-learn (version 0.23.2), XGBoost (version 1.2.0), imbalanced-learn (version 0.7.0), and matplotlib (version 3.3.1), were used for all data processing. The programming code was executed in Jupyter Notebook (version 6.0.3).
To develop predictive models, supervised ML algorithms were adopted, in which pairs of the input data and the output class were given to the algorithm, which found a way to generate the output class from the input data31. The k-nearest neighbors (k-NN) algorithm, logistic regression, support vector machines (SVMs), random forests, and XGBoost were selected as the supervised algorithms. The k-NN algorithm is the simplest ML algorithm, which finds k neighbors closest to a new observation in the stored training data and makes a prediction by assigning the majority class among these neighbors31. Logistic regression is a binary classifier, in which a linear model is included in a logistic function and the probability that a new observation is a member of each class is computed31. SVMs find the hyperplane that maximizes the margin between classes in the training data, making a prediction based on the distances to the support vectors and the importance of support vectors31. Random forests train many decision trees, where each tree only receives a bootstrapped observation of training data and each node only considers a subset of features when determining the best split, making a prediction in accordance with the averaged probabilities predicted by all the trees31. XGBoost is a gradient boosting algorithm, which works by building decision trees in a serial manner, where each tree tries to correct the mistakes of the previous one; and the probability is computed by summing the weight of the leaves to which a new observation belongs in each decision tree31. With each supervised algorithm, predictive model development using the patent data of the development cohort (training data set) and external validation using that of the validation cohort (test data set) were planned.
Feature selection and scaling, and oversampling
Baseline clinical variables, CT findings including hemorrhage locations, intraventricular hematoma extension, baseline hematoma volume, and noncontrast CT markers, and target systolic blood pressure were applied as the input data, while hematoma expansion was applied as the output class.
Since there were 31 individual properties of the input data, which were called features, feature selection was performed to lead to simpler models that generalize better31. Firstly, univariate analyses with Student’s t test, Mann–Whitney U test, and Fisher’s exact test were performed between expansion and no expansion groups in the training data set. Secondly, the features were ranked in accordance with their P values. Finally, 5 to 10 features with the smallest P values were selected. Feature scaling was performed using standardization in SVMs, which required all the features to vary on a similar scale to perform well.
Given the imbalance of the output class distribution, random oversampling was employed. Random oversampling involved randomly selecting observations from the minority group with replacement and adding them to the training data set.
Predictive model development and external validation
Each supervised ML algorithm was applied to the training data set with 5 to 10 selected features and all 31 features. In the predictive model development process, stratified 30-fold cross-validation was used to assess generalization performance, in which the training data set was split such that the proportions between output classes were the same in each fold as they were in the whole training data set31. The hyperparameters were tuned manually in each algorithm as shown in Table 1 to improve generalization performance, while the other hyperparameters not listed in Table 1 were used as default.
After the model development, each model was evaluated for its performance on the test data set as external validation, where accuracy, sensitivity, specificity, and the AUC for the prediction of hematoma expansion were computed.