Congratulations!

[Valid RSS] This is a valid RSS feed.

Recommendations

This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.

Source: https://direct.mit.edu/rss/site_1000003/advanceAccess_1000004.xml

  1. <?xml version="1.0"?>
  2. <rss version="2.0" xmlns:prism="http://purl.org/rss/1.0/modules/prism/">
  3.  <channel>
  4.    <title>Computational Linguistics Advance Access</title>
  5.    <link>https://direct.mit.edu/coli</link>
  6.    <description>
  7.    </description>
  8.    <language>en-us</language>
  9.    <pubDate>Fri, 02 May 2025 00:00:00 GMT</pubDate>
  10.    <lastBuildDate>Wed, 07 May 2025 22:46:29 GMT</lastBuildDate>
  11.    <generator>Silverchair</generator>
  12.    <managingEditor>editor@direct.mit.edu/coli</managingEditor>
  13.    <webMaster>webmaster@direct.mit.edu/coli</webMaster>
  14.    <item>
  15.      <title>Socially Aware Language Technologies: Perspectives and Practices</title>
  16.      <link>https://direct.mit.edu/coli/article/doi/10.1162/coli_a_00556/128186/Socially-Aware-Language-Technologies-Perspectives</link>
  17.      <pubDate>Fri, 02 May 2025 00:00:00 GMT</pubDate>
  18.      <description>&lt;span class="paragraphSection"&gt;&lt;div class="boxTitle"&gt;Abstract&lt;/div&gt;Language technologies have advanced substantially, particularly with the introduction of large language models. However, these advancements can exacerbate several issues that models have traditionally faced, including bias, evaluation, and risk. In this perspective piece, we argue that many of these issues share a common core: a lack of awareness of the social factors, interactions, and implications of the social environment in which NLP operates. We call this &lt;strong&gt;social awareness&lt;/strong&gt;. While NLP is improving at addressing linguistic issues, there has been relatively limited progress in incorporating social awareness into models to work in all situations for all users. Integrating social awareness into NLP will improve the naturalness, usefulness, and safety of applications while also opening up new applications. Today, we are only at the start of a new, important era in the field.&lt;/span&gt;</description>
  19.      <prism:startingPage xmlns:prism="prism">1</prism:startingPage>
  20.      <prism:endingPage xmlns:prism="prism">15</prism:endingPage>
  21.      <prism:doi xmlns:prism="prism">10.1162/coli_a_00556</prism:doi>
  22.      <guid>https://direct.mit.edu/coli/article/doi/10.1162/coli_a_00556/128186/Socially-Aware-Language-Technologies-Perspectives</guid>
  23.    </item>
  24.    <item>
  25.      <title>Kallini et al. (2024) Do Not Compare Impossible Languages with Constituency-based Ones</title>
  26.      <link>https://direct.mit.edu/coli/article/doi/10.1162/coli_a_00554/128121/Kallini-et-al-2024-Do-Not-Compare-Impossible</link>
  27.      <pubDate>Fri, 02 May 2025 00:00:00 GMT</pubDate>
  28.      <description>&lt;span class="paragraphSection"&gt;&lt;div class="boxTitle"&gt;Abstract&lt;/div&gt;A central goal of linguistic theory is to find a precise characterization of the notion “possible human language”, in the form of a computational device that is capable of describing all and only the languages that can be acquired by a typically developing human child. The success of recent large language models (LLMs) in NLP applications arguably raises the possibility that LLMs might be computational devices that meet this goal. This would only be the case if, in addition to succeeding in learning human languages, LLMs struggle to learn “impossible” human languages. ? conducted experiments aiming to test this by training GPT-2 on a variety of synthetic languages, and found that it learns some more successfully than others. They present these asymmetries as support for the idea that LLMs’ inductive biases align with what is regarded as “possible” for human languages, but the most significant comparison has a confound that makes this conclusion unwarranted.&lt;/span&gt;</description>
  29.      <prism:startingPage xmlns:prism="prism">1</prism:startingPage>
  30.      <prism:endingPage xmlns:prism="prism">10</prism:endingPage>
  31.      <prism:doi xmlns:prism="prism">10.1162/coli_a_00554</prism:doi>
  32.      <guid>https://direct.mit.edu/coli/article/doi/10.1162/coli_a_00554/128121/Kallini-et-al-2024-Do-Not-Compare-Impossible</guid>
  33.    </item>
  34.    <item>
  35.      <title>UniASA: A Unified Generative Framework for Argument Structure Analysis</title>
  36.      <link>https://direct.mit.edu/coli/article/doi/10.1162/coli_a_00553/127893/UniASA-A-Unified-Generative-Framework-for-Argument</link>
  37.      <pubDate>Fri, 02 May 2025 00:00:00 GMT</pubDate>
  38.      <description>&lt;span class="paragraphSection"&gt;&lt;div class="boxTitle"&gt;Abstract&lt;/div&gt;Argumentation is a fundamental human activity that involves reasoning and persuasion, which also serves as the basis for the development of AI systems capable of complex reasoning. In NLP, to better understand human argumentation, argument structure analysis aims to identify argument components, such as claims and premises, and their relations from free text. It encompasses a variety of divergent tasks, such as end-to-end argument mining, argument pair extraction, and argument quadruplet extraction. Existing methods are usually tailored to only one specific argument structure analysis task, overlooking the inherent connections among different tasks. We observe that the fundamental goal of these tasks is similar: identifying argument components and their interrelations. Motivated by this, we present a unified generative framework for argument structure analysis (UniASA). It can uniformly address multiple argument structure analysis tasks in a sequence-to-sequence manner. Further, we enhance UniASA with a multi-view learning strategy based on subtask decomposition. We conduct experiments on seven datasets across three tasks. The results indicate that UniASA can address these tasks uniformly and achieve performance that is either superior to or comparable with the previous state-of-the-art methods. Also, we show that UniASA can be effectively integrated with large language models, such as Llama, through fine-tuning or in-context learning.&lt;/span&gt;</description>
  39.      <prism:startingPage xmlns:prism="prism">1</prism:startingPage>
  40.      <prism:endingPage xmlns:prism="prism">46</prism:endingPage>
  41.      <prism:doi xmlns:prism="prism">10.1162/coli_a_00553</prism:doi>
  42.      <guid>https://direct.mit.edu/coli/article/doi/10.1162/coli_a_00553/127893/UniASA-A-Unified-Generative-Framework-for-Argument</guid>
  43.    </item>
  44.    <item>
  45.      <title>LMLPA: Language Model Linguistic Personality Assessment</title>
  46.      <link>https://direct.mit.edu/coli/article/doi/10.1162/coli_a_00550/127544/LMLPA-Language-Model-Linguistic-Personality</link>
  47.      <pubDate>Fri, 02 May 2025 00:00:00 GMT</pubDate>
  48.      <description>&lt;span class="paragraphSection"&gt;&lt;div class="boxTitle"&gt;Abstract&lt;/div&gt;Large language models (LLMs) are increasingly used in everyday life and research. One of the most common use cases is conversational interactions, enabled by the language generation capabilities of LLMs. Just as between two humans, a conversation between an LLM-powered entity and a human depends on the personality of the conversants. However, measuring the personality of a given LLM is currently a challenge. This article introduces the Language Model Linguistic Personality Assessment (LMLPA), a system designed to evaluate the linguistic personalities of LLMs. Our system helps to understand LLMs’ language generation capabilities by quantitatively assessing the distinct personality traits reflected in their linguistic outputs. Unlike traditional human-centric psychometrics, the LMLPA adapts a personality assessment questionnaire, specifically the Big Five Inventory, to align with the operational capabilities of LLMs, and also incorporates the findings from previous language-based personality measurement literature. To mitigate sensitivity to the order of options, our questionnaire is designed to be open-ended, resulting in textual answers. Thus, the Artificial Intelligence (AI) rater is needed to transform ambiguous personality information from text responses into clear numerical indicators of personality traits. Utilizing Principal Component Analysis and reliability validation methods, our findings demonstrate that LLMs possess distinct personality traits that can be effectively quantified by the LMLPA. This research contributes to Human-Centered AI and Computational Linguistics, providing a robust framework for future studies to refine AI personality assessments and expand their applications in multiple areas, including education and manufacturing.&lt;/span&gt;</description>
  49.      <prism:startingPage xmlns:prism="prism">1</prism:startingPage>
  50.      <prism:endingPage xmlns:prism="prism">42</prism:endingPage>
  51.      <prism:doi xmlns:prism="prism">10.1162/coli_a_00550</prism:doi>
  52.      <guid>https://direct.mit.edu/coli/article/doi/10.1162/coli_a_00550/127544/LMLPA-Language-Model-Linguistic-Personality</guid>
  53.    </item>
  54.    <item>
  55.      <title>Language Models and Externalism: A Reply to Mandelkern and Linzen</title>
  56.      <link>https://direct.mit.edu/coli/article/doi/10.1162/coli_a_00551/127543/Language-Models-and-Externalism-A-Reply-to</link>
  57.      <pubDate>Fri, 02 May 2025 00:00:00 GMT</pubDate>
  58.      <description>&lt;span class="paragraphSection"&gt;&lt;div class="boxTitle"&gt;Abstract&lt;/div&gt;Do texts generated by language models (LMs) refer? Mandelkern and Linzen (2024) argue that externalist principles point to an affirmative conclusion. What grounds reference, according to their externalism, is a term’s “natural history”. For example, ‘water’ refers to H&lt;sub&gt;2&lt;/sub&gt;O among English speakers, and not to the phenomenally indistinguishable chemical XYZ, because H&lt;sub&gt;2&lt;/sub&gt;O, and not XYZ, is implicated in the natural history of ‘water’. Appealing to the literature on contrastive explanation, I show that a term’s natural history does not generally ground its referential properties. Thus, Mandelkern and Linzen’s quick route to the referentiality of LM-generated texts fails.&lt;/span&gt;</description>
  59.      <prism:startingPage xmlns:prism="prism">1</prism:startingPage>
  60.      <prism:endingPage xmlns:prism="prism">9</prism:endingPage>
  61.      <prism:doi xmlns:prism="prism">10.1162/coli_a_00551</prism:doi>
  62.      <guid>https://direct.mit.edu/coli/article/doi/10.1162/coli_a_00551/127543/Language-Models-and-Externalism-A-Reply-to</guid>
  63.    </item>
  64.    <item>
  65.      <title>Tokenization Changes Meaning in Large Language Models: Evidence from Chinese</title>
  66.      <link>https://direct.mit.edu/coli/article/doi/10.1162/coli_a_00557/128327/Tokenization-Changes-Meaning-in-Large-Language</link>
  67.      <pubDate>Thu, 10 Apr 2025 00:00:00 GMT</pubDate>
  68.      <description>&lt;span class="paragraphSection"&gt;&lt;div class="boxTitle"&gt;Abstract&lt;/div&gt;Large language models segment many words into multiple tokens, and there is mixed evidence as to whether tokenization affects how state-of-the-art models represent meanings. Chinese characters present an opportunity to investigate this issue: They contain semantic radicals, which often convey useful information; characters with the same semantic radical tend to begin with the same one or two bytes (when using UTF-8 encodings); and tokens are common strings of bytes, so characters with the same radical often begin with the same token. This study asked GPT-4, GPT-4o, and Llama 3 whether characters contain the same semantic radical, elicited semantic similarity ratings, and conducted odd-one-out tasks (i.e., which character is not like the others). In all cases, misalignment between tokens and radicals systematically corrupted representations of Chinese characters. In experiments comparing characters represented by single tokens to multi-token characters, the models were less accurate for single-token characters, which suggests that segmenting words into fewer, longer tokens obscures valuable information in word form and will not resolve the problems introduced by tokenization. In experiments with 12 European languages, misalignment between tokens and suffixes systematically corrupted categorization of words by all three models, which suggests that the tendency to treat malformed tokens like linguistic units is pervasive.&lt;/span&gt;</description>
  69.      <prism:startingPage xmlns:prism="prism">1</prism:startingPage>
  70.      <prism:endingPage xmlns:prism="prism">30</prism:endingPage>
  71.      <prism:doi xmlns:prism="prism">10.1162/coli_a_00557</prism:doi>
  72.      <guid>https://direct.mit.edu/coli/article/doi/10.1162/coli_a_00557/128327/Tokenization-Changes-Meaning-in-Large-Language</guid>
  73.    </item>
  74.  </channel>
  75. </rss>

If you would like to create a banner that links to this page (i.e. this validation result), do the following:

  1. Download the "valid RSS" banner.

  2. Upload the image to your own server. (This step is important. Please do not link directly to the image on this server.)

  3. Add this HTML to your page (change the image src attribute if necessary):

If you would like to create a text link instead, here is the URL you can use:

http://www.feedvalidator.org/check.cgi?url=https%3A//direct.mit.edu/rss/site_1000003/advanceAccess_1000004.xml

Copyright © 2002-9 Sam Ruby, Mark Pilgrim, Joseph Walton, and Phil Ringnalda