Kevin D. Ashley
University of Pittsburgh
Learning Research and Development Center
Graduate Program in Intelligent Systems
Contact Information
Kevin D. Ashley
3939 O'Hara Street
Pittsburgh, PA 15213
Phone: (412) 624-7496
Fax : (412) 624-9149
Email: ashley+@pitt.edu
http://www.lrdc.pitt.edu/Ashley/Default.htm
WWW PAGE
http://www.pitt.edu/~steffi/CBR/group.html
List of Supported Students and Staff
(optional)
Stefanie Brüninghaus, GRA, University of Pittsburgh Graduate Program in Intelligent Systems
Project Award Information
case-based reasoning (CBR), automated case indexing, automated text classification, knowledge-guided machine learning, text-oriented CBR, factor-based text classification, legal information retrieval
Project Summary
The work improves current methods for learning to classify texts by incorporating knowledge from an expert domain model. The goal is automatically to classify the texts of legal opinions in terms of the factors that apply to the cases described. Factors, stereotypical fact patterns tending to strengthen or weaken the underlying legal claims in a case, and their relations to legal issues, are a kind of expert domain knowledge useful in legal argumentation. The program takes as inputs the raw texts of legal opinions and assigns as outputs the applicable factors. The program's training instances are drawn from a corpus of legal opinions whose textual descriptions of cases have been represented manually in terms of factors. The problem is hard because the language of the opinions is complex; the mere fact that an opinion discusses factors does not necessarily imply that those factors actually apply to the case. Most recently, we employ ID3 to induce decision trees for classifying by factors. We plan to explore means of inputting certain linguistic information (e.g., about negation) using selective parsing and information extraction techniques.
Publications and Products
Bruninghaus, S. and Ashley, K.D. (1999a). "Toward Adding Knowledge to Learning Algorithms for Indexing Legal Cases," In Proceedings, Seventh International Conference on Artificial Intelligence and Law, Association of Computing Machinery, New York. Oslo. June. Donald H. Berman Award for Best Student Paper. http://www.pitt.edu/~steffi/papers/icail99.ps.
Bruninghaus, S. and Ashley, K.D. (1999b). "Bootstrapping Case Base Development with Annotated Case Summaries," In Proceedings of the Third International Conference On Case-Based Reasoning. Munich, Germany. July. Outstanding Research Paper Award. http://www.pitt.edu/~steffi/papers/iccbr99.ps.
Bruninghaus, St., and K.D. Ashley (1998a) Evaluation of Textual CBR Approaches. In: Proceedings of the AAAI-98 Workshop on Textual Case-Based Reasoning. Pages 30-34. AAAI Technical Report WS-98-05. AAAI Press, Menlo Park, CA.
Bruninghaus, St., and K.D. Ashley (1998b) How Machine Learning Can be Beneficial for Textual Case-Based Reasoning. In: Proceedings of the AAAI-98/ICML-98 Workshop on Learning for Text Categorization. Pages 71-74. AAAI Technical Report WS-98-05. AAAI Press, Menlo Park, CA.
Bruninghaus, St., and K.D. Ashley (1997a) Finding Factors: Learning to Classify Case Opinions Under Abstract Fact Categories. In: Proceedings of the Sixth International Conference on Artificial Intelligence and Law (ICAIL-97). Pages 123-131. ACM Press, New York, NY. http://www.pitt.edu/~steffi/papers/icail97.ps
Bruninghaus, St., and K.D. Ashley (1997b) Using Machine Learning to Assign Indices to Legal Cases. In: Case Based-Reasoning Research and Development, Proceedings of the Second International Conference on Case-Based Reasoning (ICCBR-97). Pages 303-314. Lecture Notes in Artificial Intelligence 1266. Springer Verlag. Heidelberg, Germany. http://www.pitt.edu/~steffi/papers/iccbr97.ps
Bruninghaus, St. (1998) Case-Based Reasoning From Textual Documents. Invited Talk at the Sixth German Workshop on Case-Based Reasoning. Extended Abstract published in: Proceedings of the Sixth German Workshop on Case-Based Reasoning (GWCBR-98). Pages 55-58. Berlin, Germany. http://www.pitt.edu/~steffi/papers/slides-gwcbr98.ps
The project has enabled a graduate student in the University of Pittsburgh Graduate Program in Intelligent Systems to pursue her ideal research topic. Stefanie Brüninghaus is performing her Ph.D. dissertation project with this funding and plans to enter academia in AI/Computer Science, a field that needs more female faculty. This funding has already bolstered Ms. Brüninghaus’ professional experience and exposure with an invited talk and two "Best Paper" awards. The work also plays a prominent role as an example in my seminar entitled "Artificial Intelligence and Law," which brings graduate students in the Intelligent Systems Program together with law students. Finally, the work will enable us to improve the intelligent tutoring system CATO, which teaches law students basic skills of legal argument, and to expand its database to include other legal domains.
Goals, Objectives, and Targeted Activities
Since the start of the original grant,
we have engaged in several preliminary experiments to assess the feasibility
of applying machine learning to automate index assignment. In our initial
experiments, we found that various statistical learning algorithms working
with texts represented as vectors of weighted terms would not be appropriate
for our task.
These experiments have led us to the
idea of learning rules to classify sentences from manually classified sentences.
We decided to use a symbolic learning algorithm, one that learns classification
rules, which is more appropriate for comparatively small numbers of training
examples of the factors. We chose to implement ID3 (which learns decision
trees in which rules are implicit) (Quinlan 1993). In order to reduce complexity,
we employed marked-up sentences as training instances rather than full
documents. The sentences come from case squibs, brief summaries of the
fact situations of all of CATO’s cases, which we had prepared previously
for use in CATO’s instruction. Our program SMILE (Smart Index LEarner)
employs ID3 to induce decision trees for classifying sentences as positive
or negative instances of a factor. ID3 learns a decision tree by recursively
partitioning the training set according to the feature that best discriminates
positive and negative instances of a factor. The positive training instances
are the sentences in the squibs marked-up as substantiating the factor.
All other sentences in the squibs are considered negative training instances.
Each learned decision tree represents a number of rules for assigning a
factor to sentences. We are very excited about the decision trees SMILE
has learned. Intuitively, these decision trees confirm that the idea of
learning text classification rules from sentences has merit. We have also
integrated an application-specific legal thesaurus with the algorithm in
order to improve the performance of the decision tree induction.
We compared the ID3 algorithm and
two baselines using the F-measure, where we assign somewhat more weight
to recall than precision. The two baselines classify a sentence by determining
whether all (or any) of the words in the factor name were present. Since
the factor names are very descriptive (e.g., Disclosure-in-Negotiations,
Security-Measures, Info-Reverse-Engineerable) a human expert might very
reasonably employ this approach as a first pass in identifying factors
in a text. Our learning approach outperformed the baselines for all but
one of the six factors tested. The recall and precision, calculated by
case, reached as high as 80% for finding which factor applies to a case,
an indication that the methodology holds promise (Brüninghaus &
Ashley, 1999a, 1999b). Integrating a legal thesaurus significantly improved
recall and precision for at least three of the factors. We are examining
why it did not have this effect for all factors.
Assuming renewal of funding, our general
plan is to: (1) study the performance of the ID3 and other symbolic learning
algorithms, (2) investigate whether adding linguistic knowledge using shallow
Natural Language Processing or Information Extraction techniques may help
process case texts automatically, (3) investigate how to add more domain
dependent knowledge to the learning process and evaluate the resulting
program, and (4) explore how generally the techniques apply given a different
kind of case texts.
Project References (See
Publications and Products above)
Area Background
Previously, we developed an expert model of case-based reasoning, which is the basis for an intelligent tutoring system to teach law students argumentation with previous cases available as texts. The texts are legal opinions in which judges record their decisions and rationales for litigated disputes. We have compiled a large corpus of full-text descriptions of cases and a parallel abstract representation of some important aspects of those cases which capture their content and meaning. Our model of expert legal reasoning relates a set of factors, stereotypical factual strengths and weaknesses which tend to strengthen or weaken a legal claim, with the more abstract legal issues to which the factors are relevant. The evidence that factors apply to a given case are passages in the text of the opinions. We have constructed these resources in building the CATO program, an NSF PYI-supported intelligent tutoring environment designed to teach law students to make arguments with cases. CATO's Factor Hierarchy relates factors to more aggregated concepts and ultimately to legal issues raised by the legal claim. Together factors and the Factor Hierarchy enable CATO to generate examples of legal arguments and to provide some feedback on a students' work. We think that using the representation as guidance, a machine learning program trained on the corpus could learn to classify which factors and issues apply in new cases presented as raw texts.
Area References
Rissland, E. L. and Daniels, J. (1995) "A Hybrid CBR-IR Approach to Legal Information Retrieval." In Proceedings of the Fifth International Conference on AI and Law, (ICAIL-95), pp. 52-61. ACM-Press: New York, NY.
Smith, J.C., Gelbart, D., MacKimmon, K., Atherton, B., McClean, J., Shinehoft, M. and Quintana, L. (1995). "Artificial Intelligence and Legal Discourse: The Flexlaw Legal Text Management System". In Artificial Intelligence and Law, Volume 3, Number 1, pp. 55-95. Kluwer Academic Publishers: Dordrecht, The Netherlands.
Turtle, H. (1995) "Text Retrieval in the Legal World". In Artificial Intelligence and Law. Volume 3, Number 1, pp. 5-54. Kluwer Academic Publishers: Dodrecht, The Netherlands.
Potential Related Projects
To be determined