Classification of Linked Data Sources Using Semantic Scoring

dc.contributor.authorYumusak, Semih
dc.contributor.authorDogdu, Erdogan
dc.contributor.authorKodaz, Halife
dc.date.accessioned2020-03-26T19:53:06Z
dc.date.available2020-03-26T19:53:06Z
dc.date.issued2018
dc.departmentSelçuk Üniversitesien_US
dc.description15th International Semantic Web Conference (ISWC) -- OCT 17-21, 2016 -- Kobe, JAPANen_US
dc.description.abstractLinked data sets are created using semantic Web technologies and they are usually big and the number of such datasets is growing. The query execution is therefore costly, and knowing the content of data in such datasets should help in targeted querying. Our aim in this paper is to classify linked data sets by their knowledge content. Earlier projects such as LOD Cloud, LODStats, and SPARQLES analyze linked data sources in terms of content, availability and infrastructure. In these projects, linked data sets are classified and tagged principally using VoID vocabulary and analyzed according to their content, availability and infrastructure. Although all linked data sources listed in these projects appear to be classified or tagged, there are a limited number of studies on automated tagging and classification of newly arriving linked data sets. Here, we focus on automated classification of linked data sets using semantic scoring methods. We have collected the SPARQL endpoints of 1,328 unique linked datasets from Datahub, LOD Cloud, LODStats, SPARQLES, and SpEnD projects. We have then queried textual descriptions of resources in these data sets using their rdfs: comment and rdfs: label property values. We analyzed these texts in a similar manner with document analysis techniques by assuming every SPARQL endpoint as a separate document. In this regard, we have used WordNet semantic relations library combined with an adapted term frequency-inverted document frequency (tfidf) analysis on the words and their semantic neighbours. In WordNet database, we have extracted information about comment/label objects in linked data sources by using hypernym, hyponym, homonym, meronym, region, topic and usage semantic relations. We obtained some significant results on hypernym and topic semantic relations; we can find words that identify data sets and this can be used in automatic classification and tagging of linked data sources. By using these words, we experimented different classifiers with different scoring methods, which results in better classification accuracy results.en_US
dc.description.sponsorshipSemant Web Sci Assoc, IBM, Semant Software, Oracle, IOS Press, Recruit Technologies Co Ltd, Fujitsu, NTT Resonant, SYSTAP Metaphacts, Hitachi, Rakuten Inst Technol, Yahoo Japan, Googleen_US
dc.description.sponsorshipScientific and Technological research council of TurkeyTurkiye Bilimsel ve Teknolojik Arastirma Kurumu (TUBITAK) [1059B141500052, B.14.2. TBT.0.06.01-21514107-020-155998]en_US
dc.description.sponsorshipThis research is supported by The Scientific and Technological research council of Turkey with grant number 1059B141500052 (Ref. No: B.14.2. TBT.0.06.01-21514107-020-155998).en_US
dc.identifier.doi10.1587/transinf.2017SWP0011en_US
dc.identifier.endpage107en_US
dc.identifier.issn1745-1361en_US
dc.identifier.issue1en_US
dc.identifier.scopusqualityQ3en_US
dc.identifier.startpage99en_US
dc.identifier.urihttps://dx.doi.org/10.1587/transinf.2017SWP0011
dc.identifier.urihttps://hdl.handle.net/20.500.12395/36403
dc.identifier.volumeE101Den_US
dc.identifier.wosWOS:000431760600015en_US
dc.identifier.wosqualityQ4en_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.language.isoenen_US
dc.publisherIEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENGen_US
dc.relation.ispartofIEICE TRANSACTIONS ON INFORMATION AND SYSTEMSen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.selcuk20240510_oaigen_US
dc.subjectlinked dataen_US
dc.subjectsemantic classificationen_US
dc.subjectwordneten_US
dc.titleClassification of Linked Data Sources Using Semantic Scoringen_US
dc.typeArticleen_US

Dosyalar