论文标题
检测法定案例生命周期中的异常发票订单项
Detecting Anomalous Invoice Line Items in the Legal Case Lifecycle
论文作者
论文摘要
美国是世界上最大的法律服务分销商,代表一个4370亿美元的市场。其中,公司法律部门向律师事务所支付了800亿美元的服务。每个月,法律部门都会从这些律师事务所和法律服务提供者那里接收和处理发票。法律发票审查是并且已经成为公司法律部门领导者的痛苦点。复杂而错综复杂的法律发票通常包含数百个线条项目,这些行为从动手法律工作等任务到诸如复制,用餐和旅行等费用。发票审查过程中涉及的工时和审查可能是压倒性的。即使有了常见的保障措施,例如既定的计费准则,经验丰富的发票审稿人(通常是高薪的内部律师)以及基于规则的电子计费工具(“ e-billing”),许多差异都没有发现。使用机器学习,我们的目标是证明律师事务所向其公司客户提交的发票的法律发票审查过程的当前缺陷,并探索改进。在这项工作中,我们详细介绍了我们的方法,应用了几种机器学习模型体系结构,以根据法律案例的生命周期中的适用性(使用一组案例级别和发票线 - 项目级特征对其进行建模)来检测异常的发票线项目。为了克服未标记数据的挑战,我们生成一个合成数据集,该数据集利用主题专业知识(“ SME”)来操纵现有记录的属性以反映产品生命周期中的异常状态,并使用一组模型体系结构来表征我们方法的性能。我们演示了该过程如何提高解决异常检测问题的发展,特别是当异常的特征众所周知,并从将我们的方法应用于现实世界数据中汲取的经验教训。
The United States is the largest distributor of legal services in the world, representing a $437 billion market. Of this, corporate legal departments pay law firms $80 billion for their services. Every month, legal departments receive and process invoices from these law firms and legal service providers. Legal invoice review is and has been a pain point for corporate legal department leaders. Complex and intricate, legal invoices often contain several hundred line-items that account for anything from tasks such as hands-on legal work to expenses such as copying, meals, and travel. The man-hours and scrutiny involved in the invoice review process can be overwhelming. Even with common safeguards in place, such as established billing guidelines, experienced invoice reviewers (typically highly paid in-house attorneys), and rule-based electronic billing tools ("e-billing"), many discrepancies go undetected. Using machine learning, our goal is to demonstrate the current flaws of, and to explore improvements to, the legal invoice review process for invoices submitted by law firms to their corporate clients. In this work, we detail our approach, applying several machine learning model architectures, for detecting anomalous invoice line-items based on their suitability in the legal case's lifecycle (modeled using a set of case-level and invoice line-item-level features). To overcome the challenge of unlabeled data, we generate a synthetic dataset which utilizes subject matter expertise ("SME") to manipulate existing records' attributes to reflect an anomalous state in the product lifecycle, and characterize our method's performance using a set of model architectures. We demonstrate how this process advances solving anomaly detection problems, specifically when the characteristics of the anomalies are well known, and offer lessons learned from applying our approach to real-world data.