1st Workshop on Enabling Open Semantic Science (SemSci 2017) @ISWC2017

SemSci 2017: Enabling Open Semantic Science

1st International Workshop co-located with ISWC 2017, October 2017, Vienna, Austria

In the past few years, a push for open reproducible research has led to a proliferation of community efforts for publishing raw research objects like datasets, software, methodologies, etc. These efforts underpin research outcomes much more explicitly accessible. However, the actual time and effort required to achieve this new form of scientific communication remains a key barrier to reproducibility. Furthermore, scientific experiments are becoming increasingly complex, and ensuring that research outcomes become understandable, interpretable, reusable and reproducible is still a challenge.

The goal of this workshop is to incentivise practical solutions and fundamental thinking to bridge the gap between existing scientific communication methods and the vision of an reproducible and accountable open science. Semantic Web technologies provide a promising means for achieving this goal, enabling more transparent and well-defined descriptions for all scientific objects required for this new form of science and communication. We are particularly interested in four kinds of contributions:

Novel approaches that analyze how publications link to their described methods and research outputs
Novel approaches that use the research outputs of a scientific publication to facilitate its understanding and reuse (e.g., generating explanations of results, interactive visualizations, linking to datasets and methods, etc.)
Novel approaches that help comparing and relating software, datasets and methods used in different publications
Novel approaches to apply Semantic Web and Linked Data techniques to scientific workflows.

Topics of Interest

Topics for submissions include, but are not limited to:

Tools, methods and use cases for helping linking existing papers to their research products: data, software, methods and execution traces.
New methods for helping linking scientific papers to other papers (e.g., papers that use similar approaches, similar methods, common software, common data, etc.)
New methods for helping visualizing and presenting scientific information to scientists (e.g., provenance-based visualizations, summaries, presenting results at different levels of granularity, etc.)
New approaches for retrieving the method steps expressed in a paper.
New methods for generating automated explanations of scientific results.
New approaches for comparing methods, protocols and methodologies expressed in papers.
New methods to highlight the differences between execution runs of a scientific experiment (based on their configuration, performance, results, etc.)
Tools and methods for discovering data and software used in similar publications or to address similar problems.
Vocabularies and ontologies that help relate and describe software, data, methods and provenance used in a scientific publication.
Vocabularies and ontologies that help capturing and presenting experiment information to scientists.
Automatic annotation of scientific research
Provenance, quality, privacy and trust of scientific information
Novel visualizations of scientific data
Novel approaches to apply Linked Data and Semantic Web techniques to scientific workflows

Workshop Gallery

Workshop schedule - Saturday, October 21st

09:00-09:10	Introduction
Session 1: Extracting and Representing Semantics
09:10-09:50	Keynote speaker: Carole Goble The Rhetoric of Research Objects (slides)
09:50-10:10	Gully Burns, Pradeep Dasigi and Ed Hovy. Extracting Evidence Fragments for Distant Supervision of Molecular Interactions (Slides)
10:10-10:30	Carlos Badenes-Olmedo, Jose Luis Redondo-Garcia and Oscar Corcho. An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts. (Slides)
10:30-11:00	Coffee break
Session 2: Semantic metadata
11:00-11:40	Keynote speaker: Frank van Harmelen The end of the scientifc paper as we know it (in 4 easy steps) (slides)
11:40-12:00	Mihyun Jang, Tejal Patted, Yolanda Gil, Daniel Garijo, Varun Ratnakar, Jie Ji, Prince Wang, Aggie McMahon, Paul Thompson and Neda Jahanshad. Automatic Generation of Portions of Scientific Papers for Large, Multi-Institutional Collaborations Based on Semantic Metadata (slides).
12:00-12:20	Rafael S. Gonçalves, Martin J. O'Connor, Marcos Martínez-Romero, John Graybeal and Mark A. Musen. Metadata in the BioSample Online Repository are Impaired by Numerous Anomalies (slides).
12:20-14:00	Lunch
Session 3: Semantic Technologies for Science
14:00-14:20	Sabbir Rashid, Katherine Chastain, Jeanette Stingone, Deborah McGuinness and Jim McCusker. The Semantic Data Dictionary Approach to Data Annotation and Integration (slides).
14:20-14:40	Vera G. Meister. Towards a Knowledge Graph for a Research Group with Focus on Qualitative Analysis of Scholarly Papers (slides)
14:40-15:00	Ali Khalili, Peter van Den Besselaar, Al Koudous Idrissou, Klaas Andries de Graaf and Frank van Harmelen. Semantically Mapping Science (SMS) Platform (slides).
15:00-15:20	Ensar Hadziselimovic, Kaniz Fatema, Harshvardhan J. Pandit and Dave Lewis. Linked Data Contracts to Support Data Protection and Data Ethics in the Sharing of Scientific Data (slides).
15:20-16:00	Coffee break
Session 4: Provenance and Scientific Experiments
16:00-16:20	Joachim Van Herwegen, Ruben Taelman, Sarven Capadisli and Ruben Verborgh. Describing configurations of software experiments as Linked Data (slides).
16:20-16:40	Ben De Meester, Anastasia Dimou, Ruben Verborgh and Erik Mannens. Detailed Provenance Capture of Data Processing (slides) .
16:40-17:20	Keynote speaker: Silvio Peroni The open citations revolution (slides)

Submission Guidelines

Paper submission and reviewing for this workshop will be electronic via EasyChair. The papers should be written in English, following the Springer LNCS format, and be submitted in PDF on or before August 6, 2017. However, SemSci2017 explicitly welcomes alternative and enhanced submission formats, such as communicative online materials. Authors who are preparing such a submission should contact the workshop organizers in advance to make sure we can accommodate for them in the submission and review process. All deadlines are midnight Hawaii time.

The following types of contributions are welcome.

Full research papers (8 pages)
Position papers (4-6 pages)
Short research papers (4-6 pages)
System/tool papers (4-6 pages)
Posters (2 pages)

Papers presenting a tool will be reviewed based on potential impact of the tool for Open Semantic Science, usability, and documentation. Accepted papers will be published at the CEUR workshop series.

Important Dates

Workshop papers due: ~~July 21st~~ ~~July 28th~~ August 6th, 2017
Notification of accepted workshop papers: August 24th, 2017
Publication of workshop proceedings: September 21st, 2017
Workshops held: October 21st, 2017

Invited Speakers

Carole Goble, University of Manchester. Title: The Rhetoric of Research Objects (slides). We have all grown up with the research article and article collections (let’s call them libraries) as the prime means of scientific discourse. But research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge. We can think of “Research Objects” as different types and as packages all the components of an investigation. If we stop thinking of publishing papers and start thinking of releasing Research Objects (software), then scholar exchange is a new game: ROs and their content evolve; they are multi-authored and their authorship evolves; they are a mix of virtual and embedded, and so on. But first, some baby steps before we get carried away with a new vision of scholarly communication. Many journals (e.g. eLife, F1000, Elsevier) are just figuring out how to package together the supplementary materials of a paper. Data catalogues are figuring out how to virtually package multiple datasets scattered across many repositories to keep the integrated experimental context. Research Objects (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described. The brave new world of containerisation provides the containers and Linked Data provides the metadata framework for the container manifest construction and profiles. It’s not just theory, but also in practice with examples in Systems Biology modelling, Bioinformatics computational workflows, and Health Informatics data exchange. I’ll talk about why and how we got here, the framework and examples, and what we need to do.
Frank van Harmelen , VU University Amsterdam. Title: The end of the scientifc paper as we know it (in 4 easy steps) (slides).
Silvio Peroni, University of Bologna. Title: The open citations revolution (slides). Citations are the primary tool to acknowledge others’ prior work on a particular topic. They enable one to find key publications within a particular field, and are used also for research purposes – e.g. people working in Bibliometrics, Informetrics, and Scientometrics use them for analysing the complex relationships that exist within huge networks of citations of scholarly works. In addition, citation data are important for the assessment of the quality of research by means of metrics and indicators calculated from citation databases. However, the cruel reality is that citations have been locked up in close silos for years, and often they can only be accessed by paying significant subscription fees. But the scenario is quickly changing. In the past years, several initiatives (I4OC, OpenCitations, WikiCite, Springer Nature SciGraph, etc.) have started to promote the availability of open citation data. In this talk I will introduce some of the main significative efforts in the area, focusing on the way Semantic Publishing technologies have been used and adopted for enabling a FAIR publication of open citation data.

Program Chairs

Daniel Garijo, Information Sciences Institute, University of Southern California
Tobias Kuhn, VU University Amsterdam
Jun Zhao, Oxford University
Willem Robert van Hage, Netherlands eScience center
Tomi Kauppinen, Aalto University School of Science in Finland

Program Committee

Marieke van Erp, VU University Amsterdam
Yolanda Gil, University of Southern California, USA
Oscar Corcho, Universidad Politécnica de Madrid, Spain
Idafen Santana Pérez, Universidad Politécnica de Madrid, Spain
Mark Wilkinson, Universidad Politécnica de Madrid, Spain
Craig A. Knoblock, University of Southern California, USA
Gully Burns, University of Southern California, USA
Khalid Belhajjame, University Paris-Dauphine
Amrapali Zaveri, Maastricht University, Netherland
Alasdair Gray, Heriot-Watt University, UK
Paul Groth, Elsevier Labs, the Netherlands
Anita de Waard, Elsevier Labs
Jeff Pan, University of Aberdeen, UK
Alexander Garcia Castro, Universidad Politécnica de Madrid, Spain

Open Review

The following list of submissions participate in an open review process, i.e., both the submitted paper and reviews (if reviewers agree) will be made available online:

Sheeba Samuel and Birgitta König-Ries. Provenance Oriented Reproducibility of Scripts using the REPRODUCE-ME Ontology (link, reviews)
Vera G. Meister. Towards a Knowledge Graph for a Research Group with Focus on Qualitative Analysis of Scholarly Papers (link, reviews)
Joachim Van Herwegen, Ruben Taelman, Sarven Capadisli and Ruben Verborgh. Describing configurations of software experiments as Linked Data (link, reviews)
Ben De Meester, Anastasia Dimou, Ruben Verborgh and Erik Mannens. Detailed Provenance Capture of Data Processing (link, reviews)
Sabbir Rashid, Katherine Chastain, Jeanette Stingone, Deborah McGuinness and Jim McCusker. The Semantic Data Dictionary Approach to Data Annotation and Integration (link, reviews)
Rafael S Gonçalves, Martin J. O'Connor, Marcos Martínez-Romero, John Graybeal and Mark A. Musen. Metadata in the BioSample Online Repository are Impaired by Numerous Anomalies (link, reviews)
Ensar Hadziselimovic, Kaniz Fatema, Harshvardhan J. Pandit and Dave Lewis. Linked Data Contracts to Support Data Protection and Data Ethics in the Sharing of Scientific Data (link, reviews)
Ali Khalili, Peter van Den Besselaar, Al Koudous Idrissou, Klaas Andries de Graaf and Frank van Harmelen. Semantically Mapping Science (SMS) Platform (link, reviews)
Carlos Badenes-Olmedo, Jose Luis Redondo-Garcia and Oscar Corcho. An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts (link, reviews)

Accepted Papers

The proceedings for the workshop are avilable at http://ceur-ws.org/Vol-1931/

Vera G. Meister. Towards a Knowledge Graph for a Research Group with Focus on Qualitative Analysis of Scholarly Papers. (Slides)
Joachim Van Herwegen, Ruben Taelman, Sarven Capadisli and Ruben Verborgh. Describing configurations of software experiments as Linked Data. (Slides)
Ben De Meester, Anastasia Dimou, Ruben Verborgh and Erik Mannens. Detailed Provenance Capture of Data Processing. (Slides)
Sabbir Rashid, Katherine Chastain, Jeanette Stingone, Deborah McGuinness and Jim McCusker. The Semantic Data Dictionary Approach to Data Annotation and Integration. (Slides)
Rafael S Gonçalves, Martin J. O'Connor, Marcos Martínez-Romero, John Graybeal and Mark A. Musen. Metadata in the BioSample Online Repository are Impaired by Numerous Anomalies. (Slides)
Ensar Hadziselimovic, Kaniz Fatema, Harshvardhan J. Pandit and Dave Lewis. Linked Data Contracts to Support Data Protection and Data Ethics in the Sharing of Scientific Data. (Slides)
Ali Khalili, Peter van Den Besselaar, Al Koudous Idrissou, Klaas Andries de Graaf and Frank van Harmelen. Semantically Mapping Science (SMS) Platform. (Slides)
Carlos Badenes-Olmedo, Jose Luis Redondo-Garcia and Oscar Corcho. An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts. (Slides)
Mihyun Jang, Tejal Patted, Yolanda Gil, Daniel Garijo, Varun Ratnakar, Jie Ji, Prince Wang, Aggie McMahon, Paul Thompson and Neda Jahanshad. Automatic Generation of Portions of Scientific Papers for Large Multi-Institutional Collaborations Based on Semantic Metadata. (Slides)
Gully Burns, Pradeep Dasigi and Ed Hovy. Extracting Evidence Fragments for Distant Supervision of Molecular Interactions. (Slides)