Section outline

  • In this module, you will be introduced to key software tools used to create and explore semantic data in the humanities and social sciences. Through guided demonstrations, you will see how Protégé and Tedi are used to build ontologies and ontoterminologies, how SPARQL enables querying of RDF data, and how LEAF-Writer supports semantic text encoding using TEI standards. You will also get an overview of how NLP notebooks can assist in basic language processing tasks. This module gives you a practical understanding of the tools behind semantic technologies in SSH.
    • 3a. Protégé

      This MOOC introduces Protégé, a free and open-source ontology editor developed at Stanford University, widely used for building OWL ontologies. It explains the foundational elements of ontologies—individuals, properties, and classes—emphasizing that meaning arises from the relationships between objects. Ontologies are structured as class hierarchies supported by logical constraints and property restrictions. An example is given through the Krater Ontology, which models various types of Ancient Greek vases and their characteristics. The process of ontology building in Protégé involves defining class hierarchies, annotating terms, and populating the ontology with individuals. The presentation raises two key open questions about defining essential characteristics and integrating the linguistic dimension of ontology. Protégé supports reasoning tools to ensure logical consistency and is compliant with W3C standards, making it a powerful tool for knowledge representation, particularly in the humanities and social sciences.

    • 3b. TEDI

      This MOOC session introduces TEDI, a freely distributed software by the University of Crete for academic and research use. TEDI is designed for building multilingual ontoterminologies—terminologies whose conceptual system is a formal ontology. The session begins with a recap of ontoterminology, highlighting its role in representing and standardizing domain knowledge by combining ontology and terminology. TEDI supports both a conceptual dimension, where users define concepts, essential characteristics, relations, and instances based on Aristotelian principles, and a linguistic dimension, where terms and proper names are assigned independently across languages but linked through a shared ontology. The session demonstrates TEDI’s editors and shows how to structure and populate an ontoterminology. Export options include HTML (for web-based dictionaries), RDF (for tools like Protégé), TBX (ISO-standard for terminological data), and CSV (e.g., for CmapTools), making TEDI a versatile and standards-compliant tool for humanities research and education.


    • 3c. SPARQL

      This MOOC session introduces SPARQL, the standard query language for RDF data. It begins with a recap of RDF and its structure as subject–predicate–object triples in directed, labeled graphs. The session then defines SPARQL and its importance in querying RDF datasets, much like SQL for relational databases. The session outlines the structure of SPARQL queries—PREFIX declarations, query forms like SELECT and ASK, graph patterns, and modifiers. Examples show how to query linked data using endpoints like DBpedia, including retrieving labels, depictions, and verifying patterns. It concludes by emphasizing SPARQL’s role in semantic search, large-scale data exploration, and linking diverse data sources.

    • 3d. LEAF-Writer

      LEAF-Writer is a free, web-based text encoding tool that requires no installation or configuration and supports collaborative editing. The session explains text encoding as the process of making human-readable text machine-readable through markup, covering structural, presentational, and semantic types. XML (eXtensible Markup Language) is introduced as a flexible language for structuring and labeling data without predefined tags, balancing user freedom with interoperability challenges. The Text Encoding Initiative (TEI) is presented as a humanistic XML standard designed for consistent text encoding in literary and linguistic contexts, such as manuscripts, historical archives, and critical editions. LEAF-Writer supports TEI schema, offers on-the-fly validation, and entity tagging, allowing users to encode texts easily via a web platform. Export options include XML, HTML, and Markdown formats, facilitating integration and reuse. The session closes with links for further learning and access to the LEAF-Writer platform.


    • 3e. A notebook for NLP

      Designed for newcomers, this hands-on course offers a gentle introduction to Natural Language Processing (NLP) using Python in Google Colab. Participants learn to read and analyze text files, calculate basic text metrics like word and character counts, and perform simple preprocessing tasks such as converting text to lowercase. The course emphasizes practical coding skills and foundational concepts, setting the stage for deeper exploration into NLP techniques like lemmatization, part-of-speech tagging, and sentiment analysis using popular libraries such as NLTK. It’s an ideal starting point for anyone looking to unlock the potential of text data.

    • 3f. Quiz on Module 3