The FrameNet Brasil Computational Linguistics Lab at the Federal University of Juiz de Fora, Brazil, has been accepted as a mentor organization for Google Summer of Code 2021. This page is the main reference point for students submitting their projects to address the ideas listed below.
A framenet is a semantically oriented computational resource in which language material (words, multi-word expressions and grammatical constructions) are linked to a network of frames that help define their meaning. In the context of Frame Semantics, a frame is a scene, a system of interrelated concepts in which participants on the scene, the props they use, and the way they interact are defined. The key notion in framenet is that the meaning of words – as well as the meaning of other levels of linguistic structure – depends on the frames associated with the words, that is, words may evoke frames. Take a word such as the verb tour, for example. In order to understand this word, a speaker of English recruits the Touring frame, in which there are three core participants: the Tourist, an Attraction and a Place. These three elements must be cognitively present, so that the idea of touring can be interpreted. There’s no tourism without one of those elements. Additionally, frames are interconnected to each other via a series of relations, providing a cognitive semantics structure against which meaning is defined.
In FrameNet Brasil we apply this kind of semantically oriented structure to tackle important issues in Natural Language Understanding. In the ideas list below, we explain those issues further.
To learn more about FrameNet Brasil, consider the following papers:
TORRENT, T. T.; MATOS, E.; LAGE, L.; LAVIOLA, A.; TAVARES, T.; ALMEIDA, V. G.; SIGILIANO, N. (2018). Towards continuity between the lexicon and the constructicon in FrameNet Brasil. In: LYNGFELT, B.; BORIN, L.; OHARA, K. H.; TORRENT, T. T. (Orgs.). Constructional Approaches to Language. Amsterdam: John Benjamins Publishing Company.
DINIZ DA COSTA, A.; GAMONAL, M. A.; PAIVA, V. M. R. L.; MARÇÃO, N. D.; PERON-CORRÊA, S.; ALMEIDA, V. G.; MATOS, E. E. S.; TORRENT, T. T. (2018). FrameNet-Based Modeling of the Domains of Tourism and Sports for the Development of a Personal Travel Assistant Application. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan: ELRA, p. 6-12.
There are framenets under development for several languages (English, Japanese, German, Swedish, Brazilian Portuguese, Chinese, among others) and also a global initiative to connect them all and develop shared tasks based on framenet data. To learn more about this initiative, visit the Global FrameNet website.
Successful applicants will turn in projects to address the issues listed in the ideas list, bringing together the kind of structured data FN-Br has been developing through the past decade with the computational techniques they find more suited for achieving the proposed goals. Please note that FN-Br is not only about big data, machine learning and whichever purely statistical approach to language is out there. The work in FN-Br is model-based, besides being also data-driven. The kinds of issues prompting the mentoring process that will take place if FN-Br is accepted for GSoC 2021 are not to be solved by solely training some algorithm from a ton of raw data. With that in mind, applicants should follow the steps below to submit their applications:
Two aspects are important to keep in mind while reading the ideas described below:
Mentors: Alexandre Diniz da Costa (FN-Br | UFJF) | Ely Matos (FN-Br | UFJF) | Tiago Torrent (FN-Br | UFJF)
FNBr has been implementing qualia relations in its database. These relations derive from the idea of qualia structure, proposed by Pustejovsky (1995) in the Generative Lexicon (GL) theory. Basically, GL assumes that the meaning of words is structured on four generative factors, called qualia roles. Each role captures how humans understand objects and the relationships between these objects in the world, trying to provide some explanation for the linguistic behavior of lexical items. Pustejovsky (1995, p. 85) defines four qualia roles:
At FN-Br, this idea was implemented by creating relations between Lexical Units (LUs). Each relation has specific semantics (e.g. part_of, used_for, used_by, is_a, etc.) and is associated with one of the four qualia roles. Because of its specific semantics, each relation is also associated with a background frame that improves and explains the semantics of the relation. Each LUs is associated with a Frame Element of this Background frame. Therefore, the use of qualia relations constitutes a “lexical ontology”, which is used in several processes, such as the disambiguation of LUs and parsing.
FN-Br currently has more than 25,000 qualia relations for LUs in English and Brazilian Portuguese. However, this is a relatively small number if we take into account the total number of LUs in the database (more than 27000, including pt, en, es). These relations were created manually – which is a time-consuming and costly process.
This idea proposes the development of a pipeline that takes advantage of databases available on the Internet (lexical resources, semantic networks, ontologies, etc.) to automatically create new qualia relations. Examples of resources that can be used include: VerbNet, ConceptNet, BabelNet, Framester, among others. These resources provide semantic relations between lexical items.
A successful project should implement a (human in the loop) solution for (semi-) automatically extracting qualia relations between words in existing databases and incorporating them to FN-Br.
Why this Idea is Innovative:
The innovation presented by FN-Br lies in the use of two complementary theories of lexical semantics. While Frame Semantics allows analyzing the meaning of a lexical item within the context in which it is used (that is, which frame is evoked by that lexical item), the qualia relations from GL allows a more precise specification of the meaning of the item, relating it to other items not due to the linguistic context, but based on a conceptualization of common sense knowledge, thus forming a lexical ontology. However, the effective application of these theories in computational applications (and the evaluation of this application) requires a more complete database – which is the object of this idea.
Mentors: Ely Matos (FN-Br | UFJF) | Collin Baker (FrameNet | ICSI) | Marcelo Viridiano (FN-Br | UFJF)
As the FrameNet Brasil Web Annotation Tool has been used for other projects, as well as in the Global FrameNet Shared Annotation Task, new data compatibility features have been demanded by the community.
This idea is split into two sub-ideas:
This idea revolves around the implementation of data import/export features from/to other formats used by other projects/tools, among which the Berkeley FN XML standard, the Universal Dependencies CONLLU format and the WebAnno standards should be considered.
This idea involves a partial migration from FN database to a graph database. The project can include some common graph traversals like frame groups, shortest path between two frames, analyses of frame families for polysemic LUs and others. This visualization tool can be built outside the FrameNet Brasil WebTool, as a new web visualization tool ideally with application to other complex lexical databases, as well as FrameNet.
Why this Idea is Innovative:
FrameNet data is rich and dense. All this richness, plus the network based structure of FrameNet, makes traditional list and table based data visualization inadequate. However, no suitable data visualization tool has been built which will illuminate the whole complex structure of Frame Semantic data. These tools would help to meet the urgent need to link the fine-grained semantic representations of FrameNet with other computational tools.
(This idea will be developed in a co-mentorship project with the Red Hen Lab. Applicants may choose whether they will apply to FrameNet Brasil or Red Hen. However, if the student gets accepted, mentors from both labs will be involved in the mentorship.)
Mentors: Francis Steen (Red Hen) | Fred Belcavello (FN-Br | UFJF) | Mark Turner (Red Hen) | Tiago Torrent (FN-Br | UFJF)
Both FrameNet Brasil and Red Hen have been investigating how meaning is construed in multimodal communication. While Red Hen has been focusing more on the relation between speech and co-speech gestures, FN-Br has been looking into how frames are evoked by different modalities, especially audio and video.
In both cases, however, research interest revolves around how different modalities interact for meaning production.
For this idea, we expect projects focused on identifying joint meaning construal patterns. Recent work has defined a non-exhaustive list of construal dimensions, which could be used for inspiration. Also, Red Hen has a collection of multimodal corpora already annotated for the kind of co-speech gesture that accompanies speech. Good examples of these kinds are air quotes gestures, which can accompany very different types of speech with very different functions.
Why this Idea is Innovative:
Although research in multimodal communication has advanced greatly in the past decade, practitioners in the field still fall short in ways of analyzing how meaning is construed from the interaction between modalities in large amounts of data. A successful implementation of this idea would then allow for human in the loop solutions for annotating patterns of joint meaning construal in multimodal communication.