Contents:
1. Project introduction
2. Expected outcome
3. Training dataset description
4. Training data usage
5. Testing
1.Project introduction
Background info on Shepards treatment
https://en.wikipedia.org/wiki/Shepard%27s_Citations
https://www.lexisnexis.com/pdf/lexis-advance/Shepards-Signal-Indicators-and-analysis-phrases-Final.pdf
https://infoguides.wtamu.edu/busi3312/shepardizing
https://depts.washington.edu/uwlawlib/wordpress/wp-content/uploads/2018/01/Dabney2007.pdf
Shepards treatment decision automation (cite class analysis)
In case law editorial process, editors manually tag the cites and decides the treatment (cite class). This treatment decisioning process is called Shepard. This project aims to use AI to automate this process.
In the folder, only a few training data files are provided. Assuming we had 10,000 training documents, given the labelled data for cite classes, train a ML model to classify the cites in any new caselaw documents.
2. Expected outcome:
Approach document, including how to formulate the problem, data considerations, roadmap, methods, metrics, evaluation process, implementation, risks, reference etc.
Code with comments. Python with open source libraries.
3. Training Dataset Description
This dataset contains 5 case law documents. This training dataset includes both raw case text in folder full text and the labels in folder citation_class, related by file name. The labels are citations and treatments. So it can be used as gold standard for automatic citation analysis, such as citation recognition or treatment decision. In this project, we focus on treatment decision classification. This dataset is self-contained and is ready for modeling.
4.Usage
For treatment decision. This training dataset includes both raw case text and the labels, related by file name. The xml files can be parsed with any programming language such as Python and Scala.
For example:
Around the citation in the training data, we have markup
In the above labelled data,
5. Testing
User testing will use a holdout dataset. The inputs are new case documents with cite class being TO_BE_ADDED. The ouptput will be the full document with cite class added. In the testing data, the test document has all the marked up except for class info. And the task is to add the class info. You may strip off the class info from the training data to get test data.
Reviews
There are no reviews yet.