Optimize sample size and observation time: A case for employer-sourced administrative claims data

Closed administrative claims databases capture the complete range of healthcare services rendered during the period an individual is eligible to receive medical care covered by the insurer. Because of their completeness and the ability to indicate whether an individual does or does not use health services during the period with insurance coverage, closed claims databases have been favored for a wide array of healthcare research, including:

In recent years, claims data and other real-world data (RWD) have increasingly been used for regulatory submissions and post-marketing requirements.

Follow enrollees over time with administrative claims databases

There are many criteria to consider when choosing the right administrative claims database for outcomes research or epidemiology studies, but the final decision often comes down to key metrics like sample size and the available length of observation time. Having a lot of patients and not enough time to meaningfully observe could hinder the study's success. In administrative claims databases, patient observation time is directly linked to the concept of continuous enrollment which is the time from the start of healthcare insurance coverage to the end of, or significant lapse in, continuous coverage. As such, one informed question to ask while selecting a data source is: “can, or how does, the claims database track enrollees when they change insurers?” Depending on whether the claims data is sourced from employers, health plans, or other sources, the answer can be different.

Consider the data capture experience of Ms. X who has worked for the same employer for the last 4 years. During that time Ms. X has selected insurance from three different insurers (Insurer A – year 1, Insurer B – year 2, Insurer C – year 3, Insurer A – year 4)

In a claims database sourced from health plan, Ms. X would be viewed as a database member for only the specific time affiliated with each insurer (1st and 4th year with insurer A, 2nd year with insurer B, and 3rd year with insurer C), and she would be missing from these databases during the years she is not affiliated with each insurer (Figure 1).

In an employer-sourced claims database, Ms. X would be viewed as a database member for the full period that she is affiliated with the employer (4 years).

The strengths of employer-sourced claims databases

Administrative claims databases may look similar at first glance, but how they follow the enrollees through insurer changes can differ greatly. Employer-sourced claims databases are unaffected by insurer changes and can follow patients across health plans if the patient remains at the same employer. One prior study shows that 22% of enrollees disenroll from a commercial insurer annually, and 34% of those disenrolled reenroll with the original insurer within 5 years.1 In the Merative™ MarketScan® Commercial Database, which sources its data from 350+ self-insured employers, about 10% of enrollees who stayed with the same employer during the years 2020-2022 switched health plans at least once during the 3-year period. Because these enrollees are sourced from employers, changing insurers does not disrupt their continuous presence in the MarketScan database.

A recent publication which used three claims databases in the United States to emulate 30 clinical trials showcases how data sources can impact real-world studies.2 While the objective of the publication was not to compare databases, it allowed for side-by-side comparison of sample sizes, length of follow-up, and statistical power between MarketScan and a health-plan centric database. Results were that the MarketScan databases:

MarketScan: The Employer-Sourced Claims Databases

For research-grade closed claims databases, the MarketScan® Databases stand out for their long history, trusted brand, and large size (nearly 300 million total enrollees since 1995), enabling the creation of a nationally representative sample of the US population with employe-sponsored insurance. More importantly, the MarketScan Databases source data from self-insured employers, enjoying the benefits of superior continuity of patients over multiple years, generally longer than claims databases sourced from health plans alone.

In addition to the benefits of tracking enrollees longer, MarketScan is more likely to link medical and pharmacy coverage at the patient level, whereas health plan data is more likely to have medical coverage only. Because self-insured employers receive data from all payers, MarketScan databases are also more likely to include data for carveout benefits like mental health services managed by a different payer. These unique attributes enable MarketScan to capture the fullest healthcare experience of enrollees over a longer time, which makes it better suited for life sciences research needs.

As people change health plans more often than they switch jobs, employer-sourced claims datasets like MarketScan offer superior longitudinality and thus the advantage of optimizing sample size and observation time for healthcare research.

To learn more about how the MarketScan databases can help with your next study, reach out to the MarketScan team. Every month, our team pulls together a report with our latest findings and perspectives. Next month, we will share an analysis on behavioral health in the pediatric population.

Schedule a chat

Learn about MarketScan


1.Fang H, F.M., Sylwestrzak G, Ukert B. , Trends in Disenrollment and Reenrollment Within US Commercial Health Insurance Plans, 2006-2018. JAMA Netw Open. , 2022. Feb 1;5(2).

2.Wang, S.V., et al., Emulation of Randomized Clinical Trials With Nonrandomized Database Analyses: Results of 32 Clinical Trials. JAMA, 2023. 329(16): p. 1376-1385.