Abstract:With the development of technology, people have an increasing need to get useful information from large amounts of data. Automatic recognition of spatial relations is an important subject arising under this background and is also an important task of natural language processing. Relation extraction is used in automatic recognition of spatial relations to classify unseen instances by the marked instances. Different from traditional classification, Geographic relations may belong to multiple categories at the same time. Automatic identification of spatial relations is a multi-lable classification problem.
After studying the multi-lable algorithms, we choose the multi-lable K nearest neighbors (ML-KNN) algorithm for spatial relation extraction.ML-KNN is a multi-lable classification algorithm which is derived from the traditional K nearest neighbors (KNN) algorithm. Firstly, for each unseen instance, its K nearest neighbors in the training set are identified. After that, based on the number of neighboring instances belong to each possible class, maximum a posteriori principle is used to forecast the lable set of the unseen instance. To calculate the instances’ similarity, we use the subsequence kernel method.
We do the experiment with 188 Chinese documents which are collected from Encyclopedia as our experiment data. By dividing these 188 documents, we randomly pick 3/4 of them as training documents and pick the remaining 1/4 as test documents. We complete the extraction of spatial relations by using ML-KNN method and analyse the experimental results.
keywords: Machine Learning; Spatial Relations; Relation Extraction; Muti-Lable Classification; K Nearest Neighbors; Subsequence Kernel