Skip to main content

Transformer-Models for Translating Tax Code

Project Information

ai, natural-language-processing, python
Project Status: Recruiting
Project Region: CAREERS
Submitted By: Andrew Sherman
Project Email: phillip.bradford@uconn.edu
Project Institution: University of Connecticut - Stamford
Anchor Institution: CR-Yale
Project Address: Stamford, Connecticut

Preferred Start Date: 01-09-2023

Mentors: Henry Orphys
Students: Recruiting
Student Skill Level Required: Ideally, the student should know Python or similar programming languages.

They should have interest in NLP (Natural Language Processing) and its application.

They should also be able to learn to work with one of several Python NLP libraries such as NLTK ( https://realpython.com/nltk-nlp-python/ )

Project Description

Opportunity: Recent breakthroughs with NLP Transformers, BERT and GPT models [GG2020], we look to advance the translation of tax code to ErgoAI (Prolog). There has been work applying BERT-based models to legal translation to Prolog [HBD2020]. We will work to extend the depth and breadth of applying Transformers, BERT and or GPT models to translating tax code to ErgoAI (Prolog).

The U.S. tax code and the Connecticut State Tax Code are both available online in structured formats. We have already successfully captured all U.S. tax code and Connecticut State tax code. We are preparing to translate it into ErgoAI (Prolog). This is challenging due to the complexity of English and tax law.

There has been recent work on translating English text to Prolog rules. Some of this has been done using Controlled Natural Language (CNL), see [GFK2018, KD2022]. In some cases, it has also been done without any CNL [WBGFK2022]. Unfortunately, getting the tax code into CNLs seems challenging – comparable to translating the tax code directly to Prolog (ErgoAI). We want to automate this process. See [VSPUJGKP2017] and [MMP2021, MCP2021]. ErgoAI was developed and is maintained by Coherent Knowledge.

Transformer Models: In the last year, with Krutika Patel and the generous support of the Yale University, Center for Research Computing, CAREERS, we found several patterns in legal text. These patterns allowed us to selectively translate minor sections of the law into logic. However, to make this practical, we need to scale the translation a great deal. Our hope is that applying Transformer-based models will allow us to sufficiently scale the translation from U.S. and Connecticut tax code to ErgoAI (Prolog) to significantly enhance hand translation.

Project Information

ai, natural-language-processing, python
Project Status: Recruiting
Project Region: CAREERS
Submitted By: Andrew Sherman
Project Email: phillip.bradford@uconn.edu
Project Institution: University of Connecticut - Stamford
Anchor Institution: CR-Yale
Project Address: Stamford, Connecticut

Preferred Start Date: 01-09-2023

Mentors: Henry Orphys
Students: Recruiting
Student Skill Level Required: Ideally, the student should know Python or similar programming languages.

They should have interest in NLP (Natural Language Processing) and its application.

They should also be able to learn to work with one of several Python NLP libraries such as NLTK ( https://realpython.com/nltk-nlp-python/ )

Project Description

Opportunity: Recent breakthroughs with NLP Transformers, BERT and GPT models [GG2020], we look to advance the translation of tax code to ErgoAI (Prolog). There has been work applying BERT-based models to legal translation to Prolog [HBD2020]. We will work to extend the depth and breadth of applying Transformers, BERT and or GPT models to translating tax code to ErgoAI (Prolog).

The U.S. tax code and the Connecticut State Tax Code are both available online in structured formats. We have already successfully captured all U.S. tax code and Connecticut State tax code. We are preparing to translate it into ErgoAI (Prolog). This is challenging due to the complexity of English and tax law.

There has been recent work on translating English text to Prolog rules. Some of this has been done using Controlled Natural Language (CNL), see [GFK2018, KD2022]. In some cases, it has also been done without any CNL [WBGFK2022]. Unfortunately, getting the tax code into CNLs seems challenging – comparable to translating the tax code directly to Prolog (ErgoAI). We want to automate this process. See [VSPUJGKP2017] and [MMP2021, MCP2021]. ErgoAI was developed and is maintained by Coherent Knowledge.

Transformer Models: In the last year, with Krutika Patel and the generous support of the Yale University, Center for Research Computing, CAREERS, we found several patterns in legal text. These patterns allowed us to selectively translate minor sections of the law into logic. However, to make this practical, we need to scale the translation a great deal. Our hope is that applying Transformer-based models will allow us to sufficiently scale the translation from U.S. and Connecticut tax code to ErgoAI (Prolog) to significantly enhance hand translation.