AutoMLM

The world is increasingly permeated by software and data. The use and maintenance of these digital artifacts requires their linguistic representation. Otherwise, they remain ‘black boxes’ which not only threaten the achievements of enlightenment, but also jeopardize the protection of investments in software and data. A linguistic representation, for example in the form of well-documented code or, better still, conceptual models, is, however, hardly sufficient on its own. In order to support efficient and secure adaptation to new requirements and thus the long-term competitiveness of companies, abstractions to invariant and from time-varying requirements are necessary. However, common programming and modeling languages only offer limited possibilities for expressing abstractions. The multi-level language architecture FMMLx offers clear advantages here (see also Prospects of Multi-Level Modeling). Multi-level models not only support an unbounded number of classification levels, but also a common representation of models and code.

The representation of existing digital artefacts using multi-level models is suitable for significantly increasing their reusability and adaptability. However, the subsequent manual creation of multi-level models is associated with an effort that cannot be justified in many cases. This is where the AutoMLM project comes in. It is aimed at investigating and designing procedures for the (semi-) automatic (re-) construction of static multi-level models from existing representations (code, schemas, flat models) in order to provide a basis for deciding which procedures can be used in individual cases. The project is being carried out in collaboration with the American company Oracle.

The main objectives of the project are (1) the identification of promising automation techniques as well as (2) the analysis of prospects and challenges of the identified automation techniques to guide the (semi-)automatic construction of multi-level models. Flat representations, i.e., models or programming code void of any multi-level language features, serve as the starting point for these analyses. Example flat representations include UML class and object diagrams as well as database schemata or two-level programming code. The transformation from flat, two-level representations to multi-level models is called model deepening.

Within the research project, the following techniques to support (semi-)automatic model deepening are investigated:

  • Formal Concept Analysis
  • Lexical-Semantic Analysis using Structured Vocabularies (such as WordNet)
  • Large Language Models

We conducted studies on using ChatGPT to support the automatic construction of pure generalizations and intrinsic classification in flat models. Here, you can access all example models, interactions, and results of these studies by downloading the respective documents.

Pure Generalization Studies with ChatGPT

To conduct studies on pure generalization, we created 16 example models with accompanying reference solutions. Five of these example models serve as negative examples where the construction of a generalization would minder the overall model quality. For each of these example models, we tried generating pure generalizations with three different prompting strategies, resulting in a total of 48 experimental runs. Of the 16 example models, four were aimed at promoting associations. The documents below report in detail on all used prompts and the obtained results.

You can download an overview of all obtained results Here (opens PDF file in new tab). Note that in our internal documents, we referred to studies on pure generalization with attributes as GA and pure generalizations with attributes and associations as GAA. Identifiers of example models follow a rigid naming convention that was oriented on particular example categories which are not explained in these documents. The overview PDF provides a mapping between these internal example IDs and a simplified example ID.

Pure Generalization of Attributes:
Pure Generalization of Attributes and Associations:

Intrinsic Classification Studies with ChatGPT

We conducted studies on intrinsic classification not by prompting ChatGPT to perform model deepening transformations, but by utilizing it as a source of semantic information that may be well suited to detect model-deepening patterns. We conducted three rounds of experiments. Here, we only provide results for round 1. Rounds 2 and 3 are subject to current research. In Round 1, we experimented with detecting and validating type-object patterns in flat model. We performed six different types of interactions labeled 1A-1F. IN R1A, we provided a complete flat model to ChatGPT and asked it to detect type-object patterns in the model. In R1B, we provided ChatGPT pattern candidates and prompted it to determine whether a candidate is well-suited for a model-deepening transformation. We did the same in 1RC, only that now we first prompted ChatGPT three questions that should guide its final verdict. In Round 1E, we prompted ChatGPT to generate secondary extensions for deepened models by providing types on various classification levels. In Round 1D and 1F, we used ChatGPT to validate a multi-level model after a model-deepening transformation has been transformed.

Studies on the Type-Object Pattern:

An overview of the results for R1 (studies on the type-object pattern) can be obtained Here (opens PDF file in new tab).

Further documents are also available for download:

When using the XModelerML  in Alpha Mode (can be activated by navigating to File > Preferences… > Activate Alpha Mode) the menu item “AutoMLM” appears in the diagram editor. Currently, however, the AutoMLM feature is not supported by the publicly available XModelerML version. Although a dialogue appears when clicking on the AutoMLM menu item, it offers no functionality to a regular user.