By implementing data-driven models for the 2011 Great East Japan earthquake and tsunami, the present study aims at investigating the effect of the level of spatial aggregation of the data on model’s predictive ability and at identifying the possible existence of regional-dependent patterns affecting model's accuracy and feature importance. An extended version of the dataset compiled by the Japanese Ministry of Land, Infrastructure and Transportation (MLIT) after the 2011 event in the Tōhoku region was used to generate sub datasets at different spatial scales, ranging from individual cities of different sizes to clusters at regional and multiregional levels. The results indicate a high variance in the accuracy for the models trained on the different subsets, with relative hit rates ranging from 0.68 to 0.89 and exhibiting a positive correlation with the cardinality of the sets, as well as some regional patterns in the prediction errors. The cluster-averaged feature importance is observed to be stable for all selections and reflects the results obtained from the models trained on the whole dataset, thus allowing a more informed identification of the most significant influencing factors for tsunami damage modelling.
Empirical multi-variable tsunami damage models based on the 2011 Great East Japan dataset: analysis of the performances at different spatial scales
Mario Di Bacco;Anna Rita Scorzini
;
2023-01-01
Abstract
By implementing data-driven models for the 2011 Great East Japan earthquake and tsunami, the present study aims at investigating the effect of the level of spatial aggregation of the data on model’s predictive ability and at identifying the possible existence of regional-dependent patterns affecting model's accuracy and feature importance. An extended version of the dataset compiled by the Japanese Ministry of Land, Infrastructure and Transportation (MLIT) after the 2011 event in the Tōhoku region was used to generate sub datasets at different spatial scales, ranging from individual cities of different sizes to clusters at regional and multiregional levels. The results indicate a high variance in the accuracy for the models trained on the different subsets, with relative hit rates ranging from 0.68 to 0.89 and exhibiting a positive correlation with the cardinality of the sets, as well as some regional patterns in the prediction errors. The cluster-averaged feature importance is observed to be stable for all selections and reflects the results obtained from the models trained on the whole dataset, thus allowing a more informed identification of the most significant influencing factors for tsunami damage modelling.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.