Bridging Disciplinary Disconnects: The Role of Legal Experts in Legal NLP

Computational Legal Futures is a tri-monthly series exploring the promise of computational law: digital transformation and extended intelligence in the law. This contribution is authored by Robert Mahari, JD-Ph.D. Student at Harvard Law School and MIT.


The potential of Natural Language Processing (NLP) to transform the legal industry is increasingly undeniable. NLP can be used to reimagine legal search, improve legal reasoning and writing, answer legal questions, or extract information from legal documents. These capabilities can help scale the provision of legal services and broaden access to justice by making legal practitioners more effective and by giving individuals tools to independently navigate legal processes. Yet, a troubling disconnect persists between the potential of AI solutions to improve legal practice and the research focus within the legal NLP community. This short piece presents specific opportunities for legal experts to collaborate with legal NLP researchers to pursue practice-oriented research that aims to broaden access to justice.

As part of Harvard’s Leadership in Law Firms program, I had the opportunity to survey 59 senior and managing law firm partners from 17 countries on their views on and engagement with AI. Their excitement and interest in AI were striking. Of the surveyed participants, over 95% reported that their firm is engaged with AI in some capacity by hosting conferences, forming committees, or working with external vendors. Over 90% of them had a positive attitude (ranging from cautious hope to excitement) towards AI in legal practice. This survey provides a glimpse into how practitioners think about AI and suggests that many have embraced the potential of these technologies.

To explore the state of legal NLP and the degree of collaboration with legal experts, together with collaborators at ETH Zurich, we conducted a literature review of legal NLP, looking at 170 of the most influential recent legal NLP research papers. We observed a problematic disconnect: While practitioners are clearly interested in AI, it does not seem that legal NLP researchers are communicating with the legal community. Very few legal NLP papers cite law review articles or include authors with legal backgrounds, and even fewer of them are cited by legal academics. To unlock efficiencies in legal practice and to broaden access to justice, legal NLP must be aligned with the needs of the legal community, but this type of collaboration appears to be rare. We published a call to action for the legal NLP community, highlighting the importance of working closely with legal practitioners and identifying access to justice as a normative goal for legal NLP.

One of our key messages was that NLP researchers should seek out collaborations with legal experts. In this piece, we hope to elucidate the other side of the equation: how can legal experts work more closely with legal NLP communities to foster impactful research? The remainder of this piece will outline four concrete opportunities for this type of interdisciplinary collaboration.

Technology-focused cross-cutting clinics at law schools

Law school clinics allow law school students to gain hands-on experience as legal practitioners. While clinical education at law schools is ubiquitous today, it was a controversial pedagogical device when first proposed, and took many decades to reach wide adoption. As centers for legal practice at law schools, clinics are uniquely positioned to explore the use of technology to enhance access to justice. Clinics are in a prime position to identify the types of repetitive tasks in legal practice that lend themselves to automation, they work closely with low-income individuals who are in desperate need of access to justice tools, and –in contrast to most other types of legal aid organizations– they are often co-located with leading computer science research departments. As a result of these synergistic factors, clinical programs could once again be at the vanguard of legal education and could play a role in helping the legal NLP community identify impactful areas for research.

However, clinics typically focus on specific areas of practice and rarely have the resources or talent to support technical development. Moreover, the types of technical innovation that could occur in clinics (such as new approaches to legal research and writing, computational legal risk assessment tools, and studies on human-computer interaction in legal contexts) are not necessarily specific to one clinic. Rather than expecting individual clinics to spin up technical research projects, law schools could establish a cross-cutting technology-focused clinical program that caters to the technical needs of all clinics, identifies overarching research themes, assembles suitable datasets, and interfaces with external research partners. Law schools are already starting to experiment with these types of interdisciplinary models. For example, the Library Innovation Lab (LIL) at Harvard Law School has assembled a massive corpus of judicial opinions that has tremendous potential for NLP. LIL has also provided the faculty with a platform to design digital casebooks. As such, LIL provides a fantastic example of how law schools can foster interdisciplinary technical innovation.

As practice-oriented units within an academic context, technology-focused law school clinical programs could provide a fruitful interface between legal practice and NLP research. In addition, they could help expose law school students to technical thinking and design, preparing them to effectively and responsibly use the technologies that are increasingly common in legal practice. These clinics could also provide an opportunity for law schools to experiment with hiring interdisciplinary faculty as they continue to develop legal pedagogy and academia.

Direct communication with technical research communities

Our review of the legal NLP literature underscored that law review articles are rarely cited by the legal NLP community. While we encourage technical researchers to identify research problems from the legal literature, we also recognize that law review articles are not the most accessible publications. In contrast to the onerous requirements of law review articles, technical conferences and workshops tend to have much lower barriers to entry, routinely accepting short papers that are four pages in length. These venues provide an effective avenue for legal academics to communicate directly with technical research communities. Interdisciplinary workshops like the Natural Legal Language Processing workshop and the GenLaw workshop are co-located with leading computer science conferences and provide especially good opportunities to present new legal tasks and datasets, shed nuanced perspectives on legal NLP tasks or identify potential legal issues raised by technology. Meanwhile, workshops like SemEval provide explicit opportunities to describe an NLP task and solicit solutions from the community. These venues provide an ideal opportunity for legal academics to share important practice-oriented tasks, ideally together with suitable datasets, with technical researchers. As such, publishing in these venues can allow legal experts to catalyze impactful legal NLP research and recruit technical collaborators.

Contribute to datasets and benchmarks

Structured legal data is notoriously difficult to come by, which stands in the way of many types of data-driven work including legal NLP. Even when data is available, it may be difficult to interpret and analyze without domain expertise. For example, some legal NLP researchers have treated the facts described in judicial opinions as neutral accounts when they are, in fact, likely to be colored by the ruling. Legal experts can help create datasets and guide technical researchers in their use of this data.

Legal experts could contribute to dataset creation in at least three ways. First, they can present existing legal datasets to NLP communities based on their own knowledge of working with this data. It may not always be obvious that a dataset used for empirical legal work might have value from an NLP perspective, but the domain expertise captured by legal scholars’ annotations can have real value for computational applications. For example, the SEC’s EDGAR database is widely used by legal academics studying corporate or financial law, but it also holds tremendous potential as an NLP dataset. Second, legal experts are well-positioned to evaluate omissions or biases in datasets that may lead to issues in model outputs or predictions. This type of contribution is especially critical given the inherently sensitive nature of deploying NLP in legal contexts. Concepts that might appear obvious to legal experts, like the bias contained in a judicial opinion’s “fact” section, may not be apparent to NLP researchers. Finally, legal experts can contribute to the design of benchmarks that can be used to evaluate the performance of NLP models. LegalBench provides an impressive example of a benchmark for legal reasoning that was the product of a diverse collaboration between computer scientists and lawyers. These types of benchmarks are likely to gain importance not just in academic circles, but also as ways to evaluate the performance of commercial systems.

Partner with NLP researchers

It is broadly recognized that domain-specific NLP applications are bottlenecked by their ability to recruit domain experts. By virtue of the high cost of legal experts and the low degree of interaction between NLP and legal communities, this problem seems especially acute in legal NLP. In contrast to legal academia, co-authorship is the norm in computer science, so legal experts would be welcomed as co-authors. In addition to helping identify impactful applications of legal NLP, legal experts can help evaluate solutions and provide a critical perspective to contextualize these. This type of partnership would also expose legal academics to NLP research and may help surface legal research questions or applications.

Beyond Legal NLP

In our recent position paper, we argued that access to justice should be treated as an important normative goal for legal NLP research and that, for this reason, this research should strive to be practice-oriented. The involvement of legal experts in legal NLP is a key step toward facilitating impactful work related to access to justice. However, this type of interdisciplinary collaboration may also have other benefits.

Lawyers, by and large, are not technologists, and the legal industry has, at least so far, been slow to adopt technologies, despite the positive impact they could have. The involvement of legal experts in the design and evaluation of legal NLP models can help ensure the responsible development of these technologies. At the same time, the lawyers who are involved in bridging between law and NLP can help introduce the use of these technologies to legal audiences, communicating relevant technical nuances and limitations. This type of interdisciplinary dialogue can help drive the responsible adoption of AI tools by practitioners.

More generally, there are other areas of NLP and AI, like safety, fairness, and explainability, where legal contributors could help provide important practical input. On the one hand, this can aid technical authors in optimizing their contributions along the dimensions that are most important from a legal perspective and avoid situations where resources as squandered by pursuing research questions that are not properly aligned with regulatory needs. On the other hand, legal experts involved in NLP research could help communicate technical insights back to their communities. For example, while watermarking AI-generated content is often identified as a solution to AI misinformation and other harms, researchers have expressed doubts about its technical feasibility. Understanding and communicating these types of insights to legal communities can help design superior policy interventions and guide the application of existing legal frameworks to new technologies.


Legal NLP has tremendous potential to broaden access to justice. To live up to this potential, it needs to engage with the real issues experienced by the legal community. Legal experts can help with this process by presenting at relevant workshops and conferences, contributing to practice-oriented datasets and benchmarks, co-authoring with legal NLP researchers, and creating clinical programs at law schools that catalyze impactful technical work. Not only can these collaborations lead to legal NLP research that expands access to justice, but they could also help provide legal perspectives that enhance other areas of AI research, and ultimately help communicate technical insights back to legal audiences.

Robert Mahari


Citation: Robert Mahari, Bridging Disciplinary Disconnects: The Role of Legal Experts in Legal NLP, Network Law Review, Spring 2024.

Related Posts