Detecting Potential Upcoding in Medicare Reimbursement via Working Hours
Using the Medicare Part B Fee-for-Service (FFS) Physician Utilization and Payment Data in 2012 and 2013 which were newly released by the Centers for Medicare and Medicaid Services (CMS), we first construct estimates for physicians' hours spent on treating Medicare Part B FFS beneficiaries. Specifically, we estimate the time needed to furnish each service from its Relative Value Unit and the subset of services whose time needed is either readily available in the definition or objectively measured in a recent CMS on-site survey. Despite the extremely conservative estimation process, we find that about 2,800 physicians, or 4% of physicians with a significant fraction of Medicare Part B FFS patients, have billed highly implausible working hours (more than 100 hours per week).
The existing studies on the detection of potential upcoding often look at total revenues, costs for a given treatment, or frequencies of expensive services. Our approach based on working hours has several advantages. First, by focusing on the implied working hours within a given time period, our approach separates confounding factors such as selection on patient conditions. Second, it is also flexible in the sense that it can be automated and can be easily extended to a more general setting with augmented data on physicians billing for beneficiaries of other insurance programs.
We then look into the characteristics and billing patterns of physicians who were flagged to have billed implausibly long hours. Even after controlling for heterogeneous exposure to different Medicare markets, we still find these flagged physicians work in smaller group practices if at all, are more likely to be a specialist rather than a primary care physician, and provide both more and higher-intensity services on patients. Interestingly, the increased revenues from these higher-intensity services are not enough to offset their implied “longer” hours based on billing, resulting in substantially lower supposed hourly revenues than other physicians. This large gap in hourly revenues is hard to reconcile using observable physician characteristics and geographical variations.
Finally, we construct a propensity score for each service to characterize its likelihood to be used in upcoding, and compare our results with the CMS Comprehensive Error Rate Testing (CERT) program, which samples and audits a large number of claims annually. We have qualitatively similar results with CERT for the majority of the 1,621 services it sampled, and can help improve CERT with the propensity scores we constructed for the other 2,559 services it did not sample.