User Tools

Site Tools


damsym:semantic_search_engine

This is an old revision of the document!


Semantic Search Engine

General Description

The semantic search module represents the core functionality of the DaMSym tool. It is designed to allow users to query multilingual textual corpora — Arabic, Slavic, Greek, Latin, Sanskrit, and Greek + Latin — through an intelligent system based on conceptual similarity between the query terms and the contents present in the texts. Unlike traditional search engines, which operate on literal matches, DaMSym processes the entered terms by evaluating semantic proximity between concepts. The result is a list of texts related not only by keyword, but also by meaning and linguistic context, supporting comparative analysis and philological study. Semantic search is available to all users, including unauthenticated users (Guest), while some advanced functions — such as resource addition, review, or approval — require authenticated access.


Interface and Structure

The search engine interface is organized in a simple but functional way. All input tools are located on the left side of the screen, arranged vertically and including:

  • the main search bar,
  • the “Advanced Search” section,
  • the dynamic filters panel.

The right side of the page is dedicated to displaying results, which are updated in real time according to the selected settings. The visual separation between the input area and the output area ensures clarity and immediacy, allowing users to adjust parameters without reloading the page. All search parameters are interdependent: any modification to filters, query terms, or concept weights dynamically influences the semantic context and the list of displayed results.


Supported Languages and Search Domains

The system supports six main linguistic domains, each with its own rules and metadata. Despite internal differences, Arabic, Greek, Latin, and Greek + Latin share the same search structure, while Slavic and Sanskrit include specific functionalities.

Language / Domain Main Characteristics
Arabic, Greek, Latin, Greek + Latin Common filter structure: author, work, and chronological range. In the case of the Greek + Latin combination, the search is performed simultaneously on both corpora: selecting one of the two languages returns results from both.
Slavic Includes a dedicated font selector for correct character rendering. Provides filters for multiple languages or historical/regional events. Does not include author or work management.
Sanskrit Includes only the “Works” filter. No additional parameters are currently available. Automatic transliteration is provided to improve readability of texts written in Devanagari characters.

The Advanced Search section, located directly below the main search bar, allows users to extend the query by entering up to three words or phrases, including the main one. For each term, it is possible to assign a numerical weight between 0 and 1, indicating its relative importance within the search. Weights can be assigned only to the additional words or phrases entered in the Advanced Search section (up to a maximum of two), while the main word typed in the search bar does not have an associated weight. The sum of the assigned weights must always be equal to 1, and the system automatically recalculates the values to maintain semantic balance.

Practical example: A user can search for “sacrifice” with weight 0.5, add “temple” with weight 0.3, and “ritual” with weight 0.2. The semantic engine balances the search according to these proportions, returning results consistent with the specified conceptual combination. This weighting logic makes the search more flexible and suitable for comparative or multidisciplinary studies. (Figure 3)

Figure 3, Advanced Search


Filters and Search Parameters

In addition to semantic terms, users can narrow the search scope through a series of contextual filters, which vary depending on the selected language.

Common filters (Arabic, Greek, Latin, Greek + Latin)

  • Chronological range → selectable via time slider or by manually filling the “From year” and “To year” fields;
  • Authors → multiple selection to include one or more authors;
  • Works → multiple selection of specific works or collections.

Slavic filters

  • Font → selection of the character type used for text rendering;
  • Language → filter allowing selection of one or more available languages, also individually;
  • Historical or regional events → geographical or cultural reference context.

Sanskrit filters

  • Works → the only filter currently available.

All filters are dynamic: the list of available values changes according to the combination of other selected parameters. The Reset button is also available, allowing the complete reset of the search, including selected filters and text entered in the main search bar.


Search Results

Search results are displayed as a list on the right side of the screen. (Figure 4) Each row includes:

  • the title of the work or fragment;
  • a text excerpt;
  • the Similarity Score, a numerical value expressing the degree of semantic similarity between the result and the user’s query;
  • the “More Details” button, which opens the detailed metadata view.

Results can be sorted according to three criteria:

  • Similarity (default),
  • Date,
  • Author.

Text highlighting — shown in yellow — is visible only for Arabic, Slavic, and Sanskrit, where the most relevant terms with respect to the query are highlighted. In Greek and Latin corpora, results are displayed without highlighting.

Figure 4, Search Results


Detailed View (“More Details”)

By clicking “More Details” for a result, the user accesses a detailed panel containing all information associated with the selected text (Figure 5):

  • title, author, and language;
  • full or extended text of the fragment;
  • list of metadata (work, period, place, source, etc.), which vary depending on the selected language;
  • semantic highlights (Arabic, Slavic, and Sanskrit only);
  • Feedback or Rating section (visible only to Reviewer users).

From this same view, authenticated users (Researcher and Reviewer) can propose corrections directly on the text, modify metadata using the Edit button, and add new metadata using the Add Metadata button.

Figure 5, More Details


Text Editing and Corrections

Within the More Details view, users with the Researcher or Reviewer role can directly interact with the displayed text:

  • select words or fragments to propose a semantic or philological correction using the Edit Text (Corrections) function;
  • modify existing metadata using the Edit buttons next to each metadata field;
  • use the Add Metadata button to add new fields or information (e.g. sources, original titles, notes, bibliographic references).

All modifications are saved as proposals and made visible to the WP Lead in the Dashboard, where they can be approved or rejected.


Interdependence Between Search and Filters

Search and filters in DaMSym do not operate independently, but in a relationship of mutual dependence: any change in parameters influences the semantic processing of the query. For example, selecting a specific author automatically restricts the semantic context to the subset of texts associated with that author. This dynamic architecture enables a fluid, coherent, and scientifically accurate search experience, suitable for comparative studies and advanced linguistic analysis.

damsym/semantic_search_engine.1768490705.txt.gz · Last modified: by fincons