Can we tell what type of earthquakes could trigger Tsunami?

Yu Nakamura
5 min readFeb 19, 2021

Feb 19th 2021 | Yu Nakamura

Motivation

Tsunami — a series of waves caused by earthquakes or undersea volcanic eruptions (NOAA) brought us about fear of death. In past, between 1998–2017, tsunamis killed over 250 000 deaths globally (WHO).

Since I investigated the damage due to earthquake and Tsunami incidents in the world in my previous story, I come up with one question: What type of earthquake can trigger Tsunami? In my previous article, I tried clustering and PCA analysis to find two distinguished groups which is potentially useful for the identification of the earthquake type associated with Tsunami occurrence. In this study, I would like to build the prediction model of Tsunami occurrence based on the type of its features with the dataset of earthquake from USGS catalog by a usage of python Scikit-lean package.

Exploratory Data Analysis and Methodology

Since some erroneous measurements are mentioned in the USGS catalog before the invention of seismographs in the late 1880s, I exclude the events prior to 1990 in order to eliminate data correction bias for the data exploration in this study.

Here are 5 explanatory variables for the prediction of Tsunami occurrence called “FLAG_TSUNAMI”. Damage description is an ordinal variable that corresponds to the degree of damage from 1 to 5 (less =1, severe = 5).

Exploratory variables and Response variable in Dataset

With using these 5 attributes, I applied two different classification methods: k-Nearest Neighbors (kNN)and Random Forest with 1164 earthquake events recorded in the world. For the purpose of testing each model, I split the entire dataset into training (80%) and test dataset (20%) to see the model accuracy for the comparison.

Here is the visualization of EDA. It’s hard to see any clear indication form the plot, but the orange dots (Tsunami occurrence Yes) tend to exist in the cloud of earthquake events which have higher magnitude > 5.0 with shallower depth. Some regions look not to have any cause of Tsunami.

EDA Visualization

Prediction Models

  1. k-Nearest Neighbors (kNN)

First, I checked the sensitivity of the number of k. From below plot, the model accuracy of Test dataset doesn’t change when k is greater or equal to 6.

Sensitivity check of the number of k

Then, I implemented the kNN with k = 6 which gives 0.85 accuracy, which isn’t a bad result. However, as you can see from the below confusion matrix, a lot of events are predicted as “No” Tsunami occurrence and only 2 events are classified as “Yes” . Since the number of incidents which Tsunami has been recorded were way smaller than the one which didn’t cause Tsunami, it might have a data bias to establish the prediction model.

Confusion Matrix for the kNN classifier

To check the classification by kNN model with simple two dimensional plots: Focal depth vs Primary magnitude (EQ_PRIMARY), here is the comparison of scatter plots between the true and the one from kNN model. The number of earthquakes which were predicted that trigger Tsunami by kNN are dramatically decreasing compared to the true values. This explains the data bias described in above.

Visual check of classification by kNN model with a simple two dimensional plot

Next, I normalized the scalar of explanatory variables and implemented the kNN classifier to see any improvement.

Standardize the explanatory variables with StandardScalor function

As a result, I got highest test accuracy which is 0.91 with k = 3, showing the 6% of improvement from the kNN classifier based on non-scalar dataset. I would suggest that the usage of standardized variables is beneficial to predict the Tsunami occurrence.

Test accuracy of kNN classifier with standardized data

2. Random Forest

Next, I built the Random Forest model for the prediction of Tsunami occurrence with standardized explanatory variables for the comparison with the result of kNN classifier.

Here is the result of Random Model. The model accuracy is 0.92. From the model result, I would like to note that the precision to predict the “No” Tsunami occurrence is higher than the one to predict “Yes” Tsunami. That is, we can predict the earthquake which don’t cause Tsunami with higher accuracy but the ability to predict the earthquake which does cause Tsunami is less accurate about 0.73.

I would say that the 0.92 of the model accuracy has proven that the certain type of earthquake is useful to predict Tsunami occurrence but it remains the difficulty of providing a precise outcome.

Result of Random Forest Model (Tsunami occurrence: 0: No, 1: Yes)

From a below bar chart which shows the importance of each explanatory variables for the model, you can see that the primary magnitude of earthquake is the most effective to the model prediction followed by focal depth. This makes sense considering the earthquakes with higher Mw>5.0 tend to cause the most frequency of Tsunami and less chance to trigger Tsunami if earthquakes occur at the deeper depth.

Variable Importance

Lastly, I created the scatter plots with focal depth against Primary magnitude to provide how precise Random Forest Model classification is in a visual sense. As you can see, the right plot colored by the Random Forest result has a relatively closer distribution to the one from the true category in the left plot.

Visual check of Random Forest classification model with a simple two dimensional plot

Discussion

As a conclusion, Random Forest model is the 92% of accuracy to predict Tsunami occurrence which is higher than the one from kNN classifier model. Since the model accuracy is higher than 90%, I would conclude that the Random Forest method is valid for the purpose of identification of type of earthquakes which potentially trigger Tsunami. However, there is still uncertainty and data-bias of dataset as described in above session. To apply this model for practical world, an adoption of more accurate database and model application must be considered.

Thank you for your time! I would like to continue learning and developing my skillsets of data-driven analysis and explorations.

--

--

Yu Nakamura

Geologist. Currently enrolled in Data Science & Analytics Graduate Program at University of Calgary, Canada.