We expect to deliver transformer based translation systems. These translation systems will first translate from Connecticut tax code to ErgoAI (Prolog). The Connecticut tax code has under 1,500 regulations.
Once we understand the Connecticut tax code translation, we will refine the transformer-based tools on U.S. tax code. The U.S. tax code is much larger than the Connecticut tax code. So its automatic translation is more important.
In the process of building the tax transformer-based models, we will author a research paper that surveys the opportunities and outlines our findings.
We will post all deliverables to public github repositories.
Selected References:
[KD2022] R. Kowalski, A. Datoo. Logical English meets legal English for swaps and derivatives. Artif Intell Law 30, 163–197 (2022). https://doi.org/10.1007/s10506-021-09295-3
[GFK2018] Tiantian Gao, Paul Fodor, Michael Kifer. Knowledge Authoring for Rule-Based Reasoning, ODBASE, OTM Conferences 2018: 461-480
[GG2020] Benyamin Ghojogh and Ali Ghodsi. "Attention mechanism, transformers, BERT, and GPT: Tutorial and survey." (2020).
[HBD2020] Nils Holzenberger, Andrew Blair-Stanek, and Benjamin Van Durme. "A dataset for statutory reasoning in tax law entailment and question answering." arXiv preprint arXiv:2005.05257 (2020).
[KD2022] Robert Kowalski, and Akber Datoo. "Logical English meets legal English for swaps and derivatives." Artificial Intelligence and Law 30, no. 2 (2022): 163-197.
[MMP2021] Denis Merigoux, Raphaël Monat, and Jonathan Protzenko. "A modern compiler for the French tax code." In Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction, pp. 71-82. 2021.
[MCP2021] Denis Merigoux, Nicolas Chataing, and Jonathan Protzenko. "Catala: a programming language for the law." Proceedings of the ACM on Programming Languages 5, no. ICFP (2021): 1-29.
[TWW] Lewis Tunstall, Leandro von Werra, and Thomas Wolf. Natural language processing with transformers. O'Reilly Media, Inc., 2022.
[WBGFK2022] Yuheng Wang, Giorgian Borca-Tasciuc, Nikhil Goel, and Paul Fodor and Michael Kifer “Knowledge Authoring with Factual English”, 5-August, 2022. https://arxiv.org/abs/2208.03094v1
[VSPUJGKP2017] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." Advances in neural information processing systems 30 (2017).
{Empty}
Ideally, the student should know Python or similar programming languages.
They should have interest in NLP (Natural Language Processing) and its application.
They should also be able to learn to work with one of several Python NLP libraries such as NLTK ( https://realpython.com/nltk-nlp-python/ )
{Empty}
Some hands-on experience
{Empty}
University of Connecticut - Stamford
Stamford, Connecticut
CR-Yale
01/09/2023
No
Already behind3Start date is flexible
6
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
The student will learn about transformer models such as BERT.
The student will learn about ErgoAI.
The student will learn some data architecture.
The student will learn to organize the legal text for storage to make retrieval easy and mapping easy to either ErgoAI or Catala-lang.
This transformed/organized tax code will be stored in a relational database such as MySQL.
The student will learn SQL and how to interact with a relational database through a database workbench.
The student will learn how to work with a relational database from a language like Python.
{Empty}
{Empty}
Unknown at this time
{Empty}