May 18, 2022

What is Data Quality, and How to Enhance it in Research

By James Robert Wagner

meaning of quality of data in research

We often talk about “data quality” or “data integrity” when we are discussing the collection or analysis of one type of data or another. Yet, the definition of these terms might be unclear, or they may vary across different contexts. In any event, the terms are somewhat abstract — which can make it difficult, in practice, to improve. That is, we need to know what we are describing with those terms, before we can improve them.

Over the last two years, we have been developing a course on Total Data Quality, now available on Coursera. We start from an error classification scheme adopted by survey methodology many years ago. Known as the “Total Survey Error” perspective, it focuses on the classification of errors into measurement and representation dimensions. One goal of our course is to expand this classification scheme from survey data to other types of data.

The figure shows the classification scheme as we have modified it to include both survey data and organic forms of data, also known as big data or found data. We find that all forms of data are subject to these same sorts of errors in varying degrees.

Error classification scheme

We won’t define all the classes in this post – just two examples.

Data Origin

First, on the measurement side, we look at “ Data Origin ” as how were the individual values / data points for a given variable (or field) recorded, captured, labeled, gathered, computed, or represented? This could be the process of answering a question, filling a field in an administrative record, or labeling an image in a machine learning context. In the case of labeling images, this could be a human being incorrectly labeling an image. For example, a human being might not note the difference between a cat or a kitten. In some contexts, that difference could be important.

Missing Data

On the representation side, “ Missing Data ” is a common problem that impacts many types of data. For example, administrative records can be missing key variables or even entire records. Similar things can happen with surveys. These missing data can impact inferences or predictions if the missing values differ from the observed values in important ways.

Using this classification scheme as a way to think about errors can help guide researchers as they consider quality issues. Further, being aware of these issues may also open the door to enhancing the quality along these dimensions! If you’d like to learn more, our new open online courses series focuses on identifying, measuring, and maximizing quality along all of these dimensions.

The post was originally published on Survey Methods Musings .

  • Sign Up Now
  • -- Navigate To -- CR Dashboard Connect for Researchers Connect for Participants
  • Log In Log Out Log In
  • Recent Press
  • Papers Citing Connect
  • Connect for Participants
  • Connect for Researchers
  • Connect AI Training
  • Managed Research
  • Prime Panels
  • MTurk Toolkit
  • Health & Medicine
  • Conferences
  • Knowledge Base
  • Ultimate Guide To Survey Data Quality

What Is Data Quality and Why Is It Important?

What Is Data Quality and Why Is It Important2@2x

Quick Navigation:

What is data quality, 4 dimensions used to assess data quality, survey response consistency, accuracy and correctness of answers, completeness of responses, respondent credibility and honesty, why is data quality important to an organization or researcher.

By Aaron Moss, PhD & Leib Litman, PhD

The CloudResearch Guide to Data Quality, Part 1: How to Define and Assess Data Quality in Online Research

If you studied human behavior 30 years ago, your options for finding people to take your studies were limited. If you worked at a university, you might be able to rely on a student subject pool, or, with a lot of legwork, identify people in the community to participate in your studies. If you worked in the marketing industry, your company might conduct a focus group or hire an outside firm to conduct a phone survey, a mail survey or an in-person study with your target audience. Either way, the options for finding participants were slow, costly and restricted.

The internet changed all that.

Due to technology, research in the social and behavioral sciences has undergone a rapid revolution. Today, researchers can easily identify participants, quickly collect data and affordably recruit hard-to-reach groups. Online studies allow researchers to examine human behavior in exciting ways and at scales not possible in the past.

Even though online research has benefits for researchers, businesses, and science, it also presents some unique challenges. When conducting studies online, researchers must direct extra attention to data quality, an important and complex issue. So, let’s take a deeper look at what data quality is and why it’s important.

Data quality is a complex and multifaceted construct, making it difficult to precisely define. Nevertheless, perhaps one of the simplest definitions of data quality is that quality data 1) are fit for their intended purpose, and 2) have a close relationship with the construct they are intended to measure.

This definition may sound a bit abstract, so consider the example below.

Valid Market Research Requires High Quality Data

Imagine you are a data scientist at a music streaming service such as Spotify. Your job is to use data from the songs people have listened to in the past to predict what kind of music they might listen to in the future.

The data you have in this case — songs people have listened to in the past — are likely high quality because the music people have listened to in the past probably predicts what they want to hear in the future.

The data are also likely high quality because the music people have listened to in the past is directly related to the construct you’re interested in measuring: musical preferences. In other words, your data possess the defining characteristics of high-quality data.

What Factors Cause Poor Quality Data?

Generally speaking, your data (in this case) can suffer only from factors that cause it to lack completeness or credibility.

For example, if a user listens to music on your streaming service only once every six months, your data represent an incomplete picture of your user’s musical preferences. With such limited data, it’s difficult to ascertain what the user truly likes and dislikes.

A second way your data quality may suffer is because of a lack of credibility or consistency. Suppose a user allowed a friend or family member who likes very different music to use their account. Your data for this user would now be an inaccurate and perhaps inconsistent representation of their preferences, meaning the quality of your data would be lower as a result.

Although assessing data quality is relatively easy in the scenario above, measuring the quality of data collected in online research that requires people to answer survey questions, evaluate products, engage with psychological manipulations, walkthrough user-testing sessions, or reflect on their past experiences is often much more difficult.

How to Measure Data Quality

Researchers typically assess data quality at both the group level and the individual level. At both levels, researchers look for evidence that the data are: 1) consistent, 2) correct, 3) complete and 4) credible.

Evaluating the consistency of people’s responses at the group level often means examining measures of internal reliability, such as a  Chronbach’s alpha score .

Measures of reliability tell the researcher how well a test measures what it should measure. For validated measures that have been used before, a low-reliability score can indicate inconsistent responses from research participants.

Researchers assess the consistency of responses at the individual level by identifying either logical contradictions in people’s responses or inconsistent answers to specific questions designed to elicit the same information (e.g., “What is your age?” “What year were you born?”). People who provide many inconsistent responses are often removed from the dataset.

What does it mean for data to be “correct”? Simply put, correct data are data that accurately measure a construct of interest. A construct might be happiness, customer satisfaction, people’s intention to buy a new product or something as complex as feelings of regret.

Regardless of the specific construct, researchers typically assess the group-level correctness of their data by examining whether the data are related to similar constructs they  should  relate to (convergent validity) and dissimilar from constructs they  should not  relate to (discriminant validity).

For example, a researcher studying life satisfaction might look for evidence that people who say they are satisfied with their life also say they are happy and not depressed.

Assessing the correctness of data at the individual level involves evaluating whether people provided consistent responses to similar items. This is not possible for all measures, but when possible, researchers may administer questions that are either synonymous (“On most days I feel happy” and “On most days I am in a good mood”) or antonymous (“I often feel good” and “I seldom feel good”) and examine the distance between participant responses to each item.

At the group level, complete datasets are those where most people answer all items in the survey and those who start the survey finish it (i.e., low attrition).

At the individual level, complete responses often mean the same thing. But, a researcher may specify before collecting data that people must have seen or responded to key questions within the study, such as manipulation, manipulation check or important outcome measures.

Credible datasets are those in which respondents make a good faith effort to answer the questions in a study.

At the group level, credibility can sometimes be assessed by comparing the effect size of specific manipulations to those previously obtained with other  samples . At the individual level, researchers have several tools for detecting  participant responses that lack credibility .

These tools range from measures designed to detect overly positive or negative self-presentation to a variety of measures assessing people’s attention, effort, anomalous response patterns, speed through the survey and deliberate misrepresentation of demographic information.

Because data quality is a complex construct, researchers who collect data over the Internet strive to ensure the credibility of individual participant responses.

How to Tell If Your Participants are Providing Honest Survey Responses

  • Are they attentive? : The first step on the ladder to quality data is attention. Are people making a good faith effort to provide honest responses? Attention is the minimum criterion necessary for quality data.
  • Are responses made with appropriate effort? : The second element of data quality is effort. Measurements of participant effort typically tap into how much people are willing to engage with the measures and manipulations in a study.
  • Do people answer to the best of their ability? : The highest element of data quality is ability. Researchers assess participants’ higher-order functioning by examining: How well do participants solve problems? How creative are they when asked to complete a novel task? How accurate are their predictions? Assessments of ability often boil down to an examination of how well people perform at something.

Everyone knows you can’t make good decisions with poor-quality data. The idiom “garbage in, garbage out” has traveled far beyond the realm of computer science — where it originated — because it captures the idea that if you don’t begin with good information, you can’t make effective decisions. But how, exactly, does low-quality data impair decision-making?

Type 1 Error (False Positive): Mistaking Noise for Significance

Low-quality datasets can lead researchers to make bad decisions by inflating the relationship between variables or making it appear two variables are related when they are not.

Spurious relationships that capitalize on might lead a healthcare analyst to determine people with a specific set of symptoms prefer one treatment plan to another when people really prefer neither plan. Spurious relationships can also allow a university researcher to find results that later studies cannot replicate.

Type 2 Errors (False Negative): Missing Significant Findings

A low-quality dataset can introduce noise (i.e., error variance) that obscures or weakens the relationship between variables.

Noise within a dataset may cause a marketing team to determine there is no difference in the effectiveness of various messages intended to increase brand awareness although there actually is. Noise may also lead a university researcher to decide there is no need to follow up on an exploratory study with an experiment because the primary variables of interest are not related.

Regardless of the exact situation, noisy data produced  by inattentive participants  can cause researchers to overlook relationships that might actually exist. In this way, spurious relationships between variables can cause researchers and businesses to invest money in some future course of action that won’t pay dividends because the study’s findings are not reliable.

CloudResearch clients know they can rely on quality data. That is why more than 3,000 academic institutions, multiple Fortune 500 companies, and federal agencies alike all trust CloudResearch with data collection.  Get in touch today  to learn how we can make your next research project a success.

Continue Reading: Ultimate Guide to Survey Data Quality

meaning of quality of data in research

Part 2: How to Identify and Handle Invalid Responses to Online Surveys

meaning of quality of data in research

Part 3: Solving the Challenges of Managing Data Quality in Online Research

meaning of quality of data in research

Part 4: 9 Strategies to Enhance Quality of Data in Online Research

Related articles, how to identify and handle invalid responses to online surveys.

As a researcher, you are aware that planning studies, designing materials and collecting data each take a lot of work. So when you get your hands on a new dataset,...

Solving the Challenges of Managing Data Quality in Online Research

Technology has transformed behavioral science research. Researchers today can quickly access participants from all over the world and collect data in ways not possible in the past. Key to this...

SUBSCRIBE TO RECEIVE UPDATES

2024 grant application form, personal and institutional information.

  • Full Name * First Last
  • Position/Title *
  • Affiliated Academic Institution or Research Organization *

Detailed Research Proposal Questions

  • Project Title *
  • Research Category * - Antisemitism Islamophobia Both
  • Objectives *
  • Methodology (including who the targeted participants are) *
  • Expected Outcomes *
  • Significance of the Study *

Budget and Grant Tier Request

  • Requested Grant Tier * - $200 $500 $1000 Applicants requesting larger grants may still be eligible for smaller awards if the full amount requested is not granted.
  • Budget Justification *

Research Timeline

  • Projected Start Date * MM slash DD slash YYYY Preference will be given to projects that can commence soon, preferably before September 2024.
  • Estimated Completion Date * MM slash DD slash YYYY Preference will be given to projects that aim to complete within a year.
  • Project Timeline *
  • Name This field is for validation purposes and should be left unchanged.

  • Name * First Name Last Name
  • I would like to request a demo of the Sentry platform
  • Name * First name Last name
  • Comments This field is for validation purposes and should be left unchanged.

  • Name * First Last
  • Phone This field is for validation purposes and should be left unchanged.
  • Name * First and Last
  • Please select the best time to discuss your project goals/details to claim your free Sentry pilot for the next 60 days or to receive 10% off your first Managed Research study with Sentry.

  • Email * Enter Email Confirm Email
  • Organization
  • Job Title *

 Illustration with collage of pictograms of clouds, pie chart, graph pictograms on the following

Data quality measures how well a dataset meets criteria for accuracy, completeness, validity, consistency, uniqueness, timeliness and fitness for purpose, and it is critical to all data governance initiatives within an organization.

Data quality standards ensure that companies are making data-driven decisions to meet their business goals. If data issues, such as duplicate data, missing values, outliers, aren’t properly addressed, businesses increase their risk for negative business outcomes. According to a Gartner report, poor data quality costs organizations an average of USD 12.9 million each year  1 . As a result, data quality tools have emerged to mitigate the negative impact associated with poor data quality.

When data quality meets the standard for its intended use, data consumers can trust the data and leverage it to improve decision-making, leading to the development of new business strategies or optimization of existing ones. However, when a standard isn’t met, data quality tools provide value by helping businesses to diagnose underlying data issues. A root cause analysis enables teams to remedy data quality issues quickly and effectively.

Data quality isn’t only a priority for day-to-day business operations; as companies integrate artificial intelligence (AI) and automation technologies into their workflows, high-quality data will be crucial for the effective adoption of these tools. As the old saying goes, “garbage in, garbage out”, and this holds true for machine learning algorithms as well. If the algorithm is learning to predict or classify on bad data, we can expect that it will yield inaccurate results.

Learn the building blocks and best practices to help your teams accelerate responsible AI.

Read the guide for data leaders

Data quality, data integrity and data profiling are all interrelated with one another. Data quality is a broader category of criteria that organizations use to evaluate their data for accuracy, completeness, validity, consistency, uniqueness, timeliness, and fitness for purpose. Data integrity focuses on only a subset of these attributes, specifically accuracy, consistency, and completeness. It also focuses on this more from the lens of data security, implementing safeguards to prevent against data corruption by malicious actors.

Data profiling, on the other hand, focuses on the process of reviewing and cleansing data to maintain data quality standards within an organization. This can also encompass the technology that support these processes.

Data quality is evaluated based on a number of dimensions, which can differ based on the source of information. These dimensions are used to categorize data quality metrics:

  • Completeness: This represents the amount of data that is usable or complete. If there is a high percentage of missing values, it may lead to a biased or misleading analysis if the data is not representative of a typical data sample.
  • Uniqueness: This accounts for the amount of duplicate data in a dataset. For example, when reviewing customer data, you should expect that each customer has a unique customer ID.
  •   Validity: This dimension measures how much data matches the required format for any business rules. Formatting usually includes metadata, such as valid data types, ranges, patterns, and more.
  • Timeliness: This dimension refers to the readiness of the data within an expected time frame. For example, customers expect to receive an order number immediately after they have made a purchase, and that data needs to be generated in real-time.
  • Accuracy: This dimension refers to the correctness of the data values based on the agreed upon “source of truth.” Since there can be multiple sources which report on the same metric, it’s important to designate a primary data source; other data sources can be used to confirm the accuracy of the primary one. For example, tools can check to see that each data source is trending in the same direction to bolster confidence in data accuracy.
  • Consistency: This dimension evaluates data records from two different datasets. As mentioned earlier, multiple sources can be identified to report on a single metric. Using different sources to check for consistent data trends and behavior allows organizations to trust the any actionable insights from their analyses. This logic can also be applied around relationships between data. For example, the number of employees in a department should not exceed the total number of employees in a company.
  • Fitness for purpose: Finally, fitness of purpose helps to ensure that the data asset meets a business need. This dimension can be difficult to evaluate, particularly with new, emerging datasets.                                                                                                          

These metrics help teams conduct data quality assessments across their organizations to evaluate how informative and useful data is for a given purpose.

Over the last decade, developments within hybrid cloud , artificial intelligence , the Internet of Things (IoT), and edge computing  have led to the exponential growth of big data. As a result, the practice of master data management (MDM) has become more complex, requiring more data stewards and rigorous safeguards to ensure good data quality.

Businesses rely on data quality management to support their data analytics initiatives, such as business intelligence dashboards. Without this, there can be devastating consequences, even ethical ones, depending on the industry (e.g. healthcare). Data quality solutions exist to help companies maximize the use of their data, and they have driven key benefits, such as:

  • Better business decisions: High quality data allows organizations to identify key performance indicators (KPIs) to measure the performance of various programs, which allows teams to improve or grow them more effectively. Organizations prioritize data quality will undoubtedly have an advantage over their competitors.
  • Improved business processes: Good data also means that teams can identify where there are breakdowns in operational workflows. This is particularly true for the supply chain industry, which relies on real-time data to determine appropriate inventory and location of it after shipment.
  • Increased customer satisfaction: High data quality provides organizations, particularly marketing and sales teams, with incredible insight into their target buyers. They are able to integrate different data across the sales and marketing funnel, which enable them to sell their products more effectively. For example, the combination of demographic data and web behavior can inform how organizations create their messaging, invest their marketing budget, or staff their sales teams to service existing or potential clients.

Reimagine how you work with AI: our diverse, global team of more than 20,000 AI experts can help you quickly and confidently design and scale AI and automation across your business, working across our own IBM watsonx technology and an open ecosystem of partners to deliver any AI model, on any cloud, guided by ethics and trust.

Operationalize AI across your business to deliver benefits quickly and ethically.  Our rich portfolio of business-grade AI products and analytics solutions are designed to reduce the hurdles of AI adoption and establish the right data foundation while optimizing for outcomes and responsible use.

Now available, a fit-for-purpose data store built on an open data lakehouse architecture to scale AI workloads, for all your data, anywhere.

Read an IBM guide about the building blocks of data governance and privacy.

Learn from the experts in this step-by-step guide on how to take a values-driven approach to data quality and AI practices.

IBM named a Leader for the 18th year in a row in the 2023 Gartner® Magic Quadrant™ for Data Integration Tools

Scale AI workloads for all your data, anywhere, with IBM watsonx.data, a fit-for-purpose data store built on an open data lakehouse architecture.

1 Gartner, " How to Improve Your Data Quality " (link resides outside ibm.com), July 14, 2021

Everything you need to know about data quality

everything you need to know about data quality

VP Data Science

In the world of market research, data is king. Data allows businesses to connect with consumers, provide valuable products and services, and rise above the competition. But, alas, not all data is helpful. Inaccurate, incomplete, inconsistent, and otherwise skewed data can muddy the waters, making the path to intelligent, informed business decisions unclear and risky. That’s why market researchers must have a solid grasp of data quality and demand it from their partners.

Data quality measures how well a data (or dataset) fulfills an intended purpose; different purposes will require different levels of quality. Objectives can be as varied as gauging brand awareness, mapping seasonality to trigger sales campaigns, and understanding consumer purchasing behavior across demographics. Does this mean that data quality is subjective? No, it means quality is subjective and objective. On the objective side, there are critical measures of good and bad data.

Let us delve deeper into the concept of data quality and discuss data quality metrics and means of improving data quality.

What is data quality?

Data quality measures how well qualitative or quantitative information serves an intended purpose. Put differently; data is deemed high-quality if it accurately represents real-world constructs.

For instance, imagine a company attempting to assess brand awareness. Data quality is high if the data yielded from a survey precisely evaluates consumer sentiments, opinions, and behavior. On the contrary, data quality is compromised if the questionnaire delivers data that paints a grossly skewed picture.

Ergo, data quality is closely aligned with trustworthiness and reliability. When data quality is high, market researchers feel confident using the information to make critical business moves. Comparatively, when data quality is low, market researchers may feel trepidatious about using the information as a springboard for company decisions like boosting production or increasing sales prices.

Why is data quality important?

For decades, companies have often relied on intuition to make critical decisions. Years of experience have to build a consensus view of what matters, the ins and outs of markets and technology. But alas, gut feelings aren’t always warranted, especially when market or technology disruptions are present. So hoping to eliminate the fickleness of human emotion and biases, many contemporary corporations have adopted a data-driven decision-making model.

The value proposition is clear: Rather than making choices on a whim or being driven by the loudest person in the room, businesses extract insights from quantitative and qualitative information and ensure the best decision is taken as well as a side benefit to enhance the

Consider this example: C-suite executives of a software development company hope to unveil enhancement to their accounting tool for small businesses. Some existing clients are on version X; others are on version Y.

Before investing time and resources in product development and determining possible upgrade prices, the company’s market research team surveys to assess existing demand for the software and elasticity. Ensuring the current version is captured is more important than the tenure of the account. Since information from this questionnaire will determine the company’s next move (that is, whether or not executives give the okay to engineering to proceed), data quality is of the utmost importance.

Trusted information can help companies:

  • Extract greater value from market research efforts
  • Reduce risks and costs associated with production
  • Improve tradeoffs between options
  • Target consumers better
  • Develop more effective marketing campaigns
  • Improve customer relations

There is no doubt that proper data management gives businesses a competitive edge. Companies can make efficient and effective decisions to outperform rivals by better-understanding consumer opinions and behaviors.

Consequences of poor data quality

Accurate data allows a company to flourish. The opposite is true: Compromised data quality can quickly tank a business. Low-quality data can result in the following:

  • Reduced efficiency: When market researchers base decisions on flawed data, they risk wasting two essential resources: time and money. They may, for example, release a product for which there is no demand. Or, they may launch a marketing campaign that doesn’t resonate with the target consumer.
  • Missed opportunities: When data quality is compromised, companies miss revenue-generating opportunities. For example, executives may need to realize there is, in fact, a need for a particular product or service. Or, they may attribute brand awareness to social media outreach when, in reality, out-of-home advertising is the contributing source of conversions. In return, they may invest marketing dollars in the less effective media vehicle.
  • Strained customer relations: Market research's primary objective is to understand your target consumer better. Sadly, the understanding must be more precise when data is biased or skewed by outliers. In return, companies might be perceived as having a blind eye to the market, appear disconnected, or even be dismissed as arrogant.

Aspects of data quality

High-quality data can move businesses forward. But how, exactly, is data quality assessed? How can you determine if data should be used to make critical business decisions or abandoned altogether?

As a general rule, there are seven aspects of data quality. These dimensions can allow you to determine the trustworthiness of a particular dataset.

Fidelity or accuracy

Data fidelity refers to the degree to which data represents reality. In other words, fidelity measures whether or not the information collected is correct

As with most things about data, fidelity can be compromised by human error. For example, a survey respondent may accidentally mistype their zip code or select the wrong entry from a pull-down menu when completing a questionnaire. Though an honest mistake, this foible can compromise data quality if you assess purchasing behavior by location or brand preference. This has to be distinguished from dishonest survey takers who may purposefully lie about demographic information to qualify for monetary rewards—the latter an example of survey fraud, which must be tackled proactively.

Other factors that influence data fidelity include:

  • Data Decay: Fidelity may be high initially but degrade over time. For example, a survey respondent’s income or number of dependents living in the same home may change.
  • Manual Entry: As previously noted, a survey participant may mistype a value. Similarly, a market researcher may transpose numbers or letters during the data analysis. That is incredibly impactful.
  • Data Movement and Integration: The data Information can also be altered inadvertently when it has been moved from one system to another where the formatting might differ. Is 4/6/23 the 6th of April or the 4th of June? You better be sure.

Completeness

Completeness measures if each data entry is “full.” The data refers to NANs. In other words, this metric seeks to determine if there are missing records, fields, rows, or columns.

Generally speaking, there are two types of missing records:

  • Unit Nonresponse: This occurs when a member of the survey sample fails to complete the questionnaire.
  • Item Nonresponse: Item nonresponse occurs when a survey participant fails to answer one or more survey questions.

Both phenomena can affect the quality of your survey results, potentially leaving insufficient data to make meaningful insights (such as cross-tab analysis).

It is important to note that the completeness needed for a project is subjective. It depends on the purpose of the study. It is up to the market researchers (working with their survey partner) to determine the acceptable response level. Data science methods assess if the missing data follows specific patterns. It is also up to market researchers to distinguish between critical data—information integral to the study—and non-critical data.

Consistency

Consistency is the degree to which a survey would yield similar results if conducted again under the same conditions. Furthermore, this relates to the statistical concepts of confidence levels and results. In other words, it’s an assessment of whether the questionnaire measures what you aim to measure as a market researcher.

Consistency may also refer to whether specific data points gathered through your questionnaire are congruent with those gathered elsewhere. For example, a respondent may note earning a particular income during a pre-screening survey. However, they may designate a dramatically lower income during the actual study.

Sometimes market researchers intentionally ask the same (or slightly different) question twice or similar questions to check for these conflicting responses. This survey quality check should be used sparingly, however. Market researchers risk triggering survey dropout with redundancy.

Timeliness refers to the relevance of the data. In other words, how recently was the data collected?

As a general rule, companies should make decisions using the most up-to-date information possible. Otherwise, stale data could result in erroneous decision-making. Case in point: Suppose a company conducted a survey to assess consumer buying behavior before the pandemic. Since COVID-19 shifted purchasing habits , this data is no longer accurate. Hence, a further study should be conducted to evaluate the target audience better.

Valid data refers to data correctly formatted per predetermined standards set by market researchers.

For example, a survey may ask that respondents provide their birthdays in British English (i.e., day, month, and year). Responses provided in American English (i.e., month, day, and year) would be considered invalid. Other examples include telephone numbers. A survey may ask for the respondent’s phone number using only numbers—no symbols. Any responses submitted with symbols would not be valid.

Unique data only appears once in a dataset. In other words, there are no duplicates.

Unfortunately, data duplication is a common occurrence. In addition, dishonest and fraudulent survey takers may intentionally guise their identity to collect rewards. The risk, of course, is that these respondents need more insight into your target audience. Worse yet, disingenuous survey takers are often incredibly sophisticated and challenging to detect. That’s why anti-fraud software is a must. Integrity

Data integrity refers to the fidelity and completeness of data as it’s maintained over time and across formats. Unfortunately, there are various threats to data integrity, from human error (i.e., a market researcher accidentally deleting a row in Excel) to data decay. Hence, maintaining data integrity is a continual, ongoing process that requires a meticulous approach.

How to improve your data quality

High-quality data offers a window into the consumer psyche, allowing market researchers to understand better what motivates the target market. But alas, compromised data can sour results, wasting countless company dollars.

Luckily, there are three simple ways you can improve data quality.

1. Know your niche audience

When market researchers conduct panel surveys, they hope to gain insight into how potential customers think, feel, and act. However, market researchers must first determine their niche audience for these insights to be valuable.

A niche audience, or target audience, is a group of people who are most likely to purchase a product or service. These individuals often share demographic traits like age, gender, location, education, and socioeconomic status.

It’s essential to have a clear idea of your target audience before conducting a survey. Why? Because surveying these specific types of individuals increases the fidelity of your data. If, for example, your company mainly sells products to middle-aged women, the dataset will be more accurate if you survey middle-aged women.

To help with this, Kantar offers an extensive research panel of more than 170 million people. As one of the biggest and best sources of global survey takers, we can easily connect you with your target niche, allowing your business to collect more accurate and representative data.

2. Engage your survey respondents

As a market researcher, boredom is your arch-nemesis. It triggers panelists to speed through questions, straight-line, fill open-ended fields with gibberish, and abandon questionnaires altogether. Unfortunately, these actions can spoil the quality of your data, leaving you with a dataset that needs to be more accurate and complete.

Kantar has developed an entire library of online survey training modules to support the collection of trustworthy data. Created with award-winning online survey design knowledge and best practices to improve survey effectiveness, these online classes will teach you how to craft surveys that keep respondents happy and engaged rather than listless and weary. In return, you can expect higher-quality responses.

3. Reduce fraud

Kantar found in Q4 2022 that companies discard up to 38% of the data they collect because of quality concerns and panel fraud. Fortunately, market researchers can combat lazy and dishonest panelists through effective survey design . You can, for example, remove superfluous questions to keep the survey length under 10 minutes. Or, you can use iconography to keep survey takers engaged.

Despite these efforts, fraudulent panelists  will continue to be an issue as long as there is monetary reward. Often located overseas, these scammers are highly sophisticated and understand how to guide their IP address, device type, and other red flags that give away their identity. Their goal is to extract as much money as possible quickly as possible. Hence, they can be pretty aggressive in their methods.

To thwart spammers, Kantar developed Qubed, a proprietary anti-fraud technology based on deep neural networks, the benchmark technique for AI-based classification. Qubed employs the latest artificial intelligence technology to detect fraud where humans or other standard measures cannot.

More specifically, Qubed works using a four-pronged approach:

  • Assessing Domain Knowledge: Qubed's Core AI is trained via 5 years of concise data collection and labeling, detailed knowledge of breached ISP/IP and points of access for fraudsters and bots. Knowing all the attack vectors allow Qubed to block most fraudsters before they even attempt the studies.
  • Assessing Key Factors: Qubed analyses the full history of each user, looking at every data point of every event collected, it evaluates their reconcile/acceptance rate, activity patterns, rate of starting/finishing studies versus indicated and other like-to-like users, open-ended response quality rate, demographics sensibility/consistency, device/browser fingerprint and much more..
  • Machine Learning: Qubed is continuously improving and learning through real-time machine learning. That means it evolves and learn new patterns automatically as scammers develop new ploys, placing you under constant vigilant protection.
  • Identifying Types of Fraudulence: Not all red flags are triggered by actual scammers. Straight-lining, for instance, could result from respondent fatigue—not fraudulence. With this in mind, Kantar designed Qubed to distinguish and categorize different sources of survey fraud - which are in turn dealt by different measures.

Feel confident in your data-informed decisions with Kantar

Improving data quality should be a top priority if your company aims to boost revenue and foster brand awareness. Fortunately, Kantar's Profiles division has developed a science-backed quality data formula that affords market research partners highly accurate, valid, and trustworthy information.

Our formula encompasses three key elements:

  • An expansive research panel of 170M+ people. Quality data begins with a representative sample of survey respondents. To provide you with just that, Kantar offers the biggest and best source of human respondents.
  • Productive panellists. Even better, our survey respondents are satisfied and engaged. This results in a 23 percent higher survey completion rate than the industry average.
  • State-of-the-art fraud protection. To combat pesky bots and fraudulent survey takers, our R&D team has developed proprietary anti-fraud software that prevents four times more fraud than any other tool on the market.

If your business wants to make smart, data-informed decisions, the first step is to partner with Kantar. As an industry leader, we understand how to conduct market research that yields informative, helpful, and high-quality data.

Want more like this?

Read: How to combat survey fraud  

Read: How you can check the quality of your survey data  

Read: 11 Best Practices for More Effective Survey Designs

circular light data

Child Care and Early Education Research Connections

Assessing research quality.

This page presents information and tools to help evaluate the quality of a research study, as well as information on the ethics of research.

The quality of social science and policy research can vary considerably. It is important that consumers of research keep this in mind when reading the findings from a research study or when considering whether or not to use data from a research study for secondary analysis.

meaning of quality of data in research

Announcements

Find announcements, including conferences and meetings, Research Connections newsletters, opportunities, and more.

meaning of quality of data in research

Search Resources

Search all resources in the Research Connections Library.

meaning of quality of data in research

Explore Our Topics

Research Connections' resources are organized into topical categories and subcategories.

Key Questions to Ask

This section outlines key questions to ask in assessing the quality of research.

Research Assessment Tools

This section provides resources related to quantitative and qualitative assessment tools.

Ethics of Research

This section provides an overview of three basic ethical principles.

Your browser is ancient! Upgrade to a different browser to experience this site.

meaning of quality of data in research

What is Data Quality, and How to Enhance it in Research

James Wagner, PhD, Research Professor, Survey Research Center, Institute for Social Research, University of Michigan

Share on Facebook

Share on Twitter

Share on LinkedIn

We often talk about “data quality” or “data integrity” when we are discussing the collection or analysis of one type of data or another. Yet, the definition of these terms might be unclear, or they may vary across different contexts. In any event, the terms are somewhat abstract -- which can make it difficult, in practice, to improve. That is, we need to know what we are describing with those terms, before we can improve them.

Over the last two years, we have been developing a course on Total Data Quality , available now. We start from an error classification scheme adopted by survey methodology many years ago. Known as the “Total Survey Error” perspective, it focuses on the classification of errors into measurement and representation dimensions. One goal of our course is to expand this classification scheme from survey data to other types of data.

The figure shows the classification scheme as we have modified it to include both survey data and organic forms of data, also known as big data or found data. We find that all forms of data are subject to these same sorts of errors in varying degrees.

A complex categorization. Left column is labeled "Measurement" followed by a a series of rectangles and the right column is labeled “Representation” with a series of rectangles.

We won’t define all the classes in this post – just two examples.

Data Origin

First, on the measurement side, we look at “ Data Origin ” as how were the individual values / data points for a given variable (or field) recorded, captured, labeled, gathered, computed, or represented? This could be the process of answering a question, filling a field in an administrative record, or labeling an image in a machine learning context. In the case of labeling images, this could be a human being incorrectly labeling an image. For example, a human being might not note the difference between a cat or a kitten. In some contexts, that difference could be important.

Missing Data

On the representation side, “ Missing Data ” is a common problem that impacts many types of data. For example, administrative records can be missing key variables or even entire records. Similar things can happen with surveys. These missing data can impact inferences or predictions if the missing values differ from the observed values in important ways.

Using this classification scheme as a way to think about errors can help guide researchers as they consider quality issues. Further, being aware of these issues may also open the door to enhancing the quality along these dimensions! If you’d like to learn more, our new open online courses series focuses on identifying , measuring , and maximizing quality along all of these dimensions.

Read more from Dr. Wagner on his "Survey Methods Musings" blog.

/media/collections/dataquality_coursethumbnails_ACTUALspec_20220412_BHBWAvB.png.550x200_q85_crop-scale_replace_alpha-%23F3F5F8.jpg

Total Data Quality

This specialization aims to explore the Total Data Quality framework in depth and provide learners with more information about the detailed evaluation of total data quality that needs to happen prior to data analysis. The goal is for learners to incorporate evaluations of data quality into their process as a critical component for all projects. We sincerely hope to disseminate knowledge about total data quality to all learners, such as data scientists and quantitative analysts, who have not had…

What is quality research? A guide to identifying the key features and achieving success

meaning of quality of data in research

Every researcher worth their salt strives for quality. But in research, what does quality mean?

Simply put, quality research is thorough, accurate, original and relevant. And to achieve this, you need to follow specific standards. You need to make sure your findings are reliable and valid. And when you know they're quality assured, you can share them with absolute confidence.

You’ll be able to draw accurate conclusions from your investigations and contribute to the wider body of knowledge in your field.

Importance of quality research

Quality research helps us better understand complex problems. It enables us to make decisions based on facts and evidence. And it empowers us to solve real-world issues. Without quality research, we can't advance knowledge or identify trends and patterns. We also can’t develop new theories and approaches to solving problems.

With rigorous and transparent research methods, you’ll produce reliable findings that other researchers can replicate. This leads to the development of new theories and interventions. On the other hand, low-quality research can hinder progress by producing unreliable findings that can’t be replicated, wasting resources and impeding advancements in the field.

In all cases, quality control is critical. It ensures that decisions are based on evidence rather than gut feeling or bias.

Standards for quality research

Over the years, researchers, scientists and authors have come to a consensus about the standards used to check the quality of research. Determined through empirical observation, theoretical underpinnings and philosophy of science, these include:

1. Having a well-defined research topic and a clear hypothesis

This is essential to verify that the research is focused and the results are relevant and meaningful. The research topic should be well-scoped and the hypothesis should be clearly stated and falsifiable .

For example, in a quantitative study about the effects of social media on behavior, a well-defined research topic could be, "Does the use of TikTok reduce attention span in American adolescents?"

This is good because:

  • The research topic focuses on a particular platform of social media (TikTok). And it also focuses on a specific group of people (American adolescents).
  • The research question is clear and straightforward, making it easier to design the study and collect relevant data.
  • You can test the hypothesis and a research team can evaluate it easily. This can be done through the use of various research methods, such as survey research , experiments or observational studies.
  • The hypothesis is focused on a specific outcome (the attention span). Then, this can be measured and compared to control groups or previous research studies.

2. Ensuring transparency

Transparency is crucial when conducting research. You need to be upfront about the methods you used, such as:

  • Describing how you recruited the participants.
  • How you communicated with them.
  • How they were incentivized.

You also need to explain how you analyzed the data, so other researchers can replicate your results if necessary. re-registering your study is a great way to be as transparent in your research as possible. This  involves publicly documenting your study design, methods and analysis plan before conducting the research. This reduces the risk of selective reporting and increases the credibility of your findings.

3. Using appropriate research methods

Depending on the topic, some research methods are better suited than others for collecting data. To use our TikTok example, a quantitative research approach, such as a behavioral test that measures the participants' ability to focus on tasks, might be the most appropriate.

On the other hand, for topics that require a more in-depth understanding of individuals' experiences or perspectives, a qualitative research approach, such as interviews or focus groups, might be more suitable. These methods can provide rich and detailed information that you can’t capture through quantitative data alone.

4. Assessing limitations and the possible impact of systematic bias

When you present your research, it’s important to consider how the limitations of your study could affect the result. This could be systematic bias in the sampling procedure or data analysis, for instance. Let’s say you only study a small sample of participants from one school district. This would limit the generalizability and content validity of your findings.

5. Conducting accurate reporting

This is an essential aspect of any research project. You need to be able to clearly communicate the findings and implications of your study . Also, provide citations for any claims made in your report. When you present your work, it’s vital that you describe the variables involved in your study accurately and how you measured them.

Curious to learn more? Read our Data Quality eBook .

How to identify credible research findings

To determine whether a published study is trustworthy, consider the following:

  • Peer review: If a study has been peer-reviewed by recognized experts, rest assured that it’s a reliable source of information. Peer review means that other scholars have read and verified the study before publication.
  • Researcher's qualifications: If they're an expert in the field, that’s a good sign that you can trust their findings. However, if they aren't, it doesn’t necessarily mean that the study's information is unreliable. It simply means that you should be extra cautious about accepting its conclusions as fact.
  • Study design: The design of a study can make or break its reliability. Consider factors like sample size and methodology.
  • Funding source: Studies funded by organizations with a vested interest in a particular outcome may be less credible than those funded by independent sources.
  • Statistical significance: You've heard the phrase "numbers don't lie," right? That's what statistical significance is all about. It refers to the likelihood that the results of a study occurred by chance. Results that are statistically significant are more credible.

Achieve quality research with Prolific

Want to ensure your research is high-quality? Prolific can help.

Our platform gives you access to a carefully vetted pool of participants. We make sure they're attentive, honest, and ready to provide rich and detailed answers where needed. This helps to ensure that the data you collect through Prolific is of the highest quality.

With Prolific, you can streamline your research process and feel confident in the results you receive. Our minimum pay threshold and commitment to fair compensation motivate participants to provide valuable responses and give their best effort. This ensures the quality of your research and helps you get the results you need. Sign up as a researcher today to get started!

You might also like

Whitepaper-ux-research

High-quality human data to deliver world-leading research and AIs.

meaning of quality of data in research

Follow us on

All Rights Reserved Prolific 2024

All MRS websites use cookies to help us improve our services. Any data collected is anonymised. If you continue using this site without accepting cookies you may experience some performance issues. Read about our cookies here .

Sort by: Newest Oldest Relevance

  • You do not need to register again if you already have an MRS web account, just sign in.
  • About Research Live
  • Contributor guidelines
  • A to Z Contributors
  • Publications Index
  • Terms & Conditions
  • Research Buyers Guide
  • Geodemographics
  • MRS Website

meaning of quality of data in research

  • Forgot Password?
  • Research Live
  • The crucial role of data quality in market research Melanie Courtright 25 April 2024
  • Leisure & Arts
  • Public Sector

Yellow lightbulb made from scrunched up paper, surrounded by other coloured scrunched up paper

Bethan Blakeley: Learning on the job

Bethan Blakeley

  • 2024 elections
  • Annual Conference 2024
  • Behavioural science
  • Cost of Living
  • Data analytics
  • Innovations
  • Social media
  • Sustainability
  • Advertisement Features
  • IMPACT Reports
  • New business
  • Research Heroes 2021
  • Research Heroes 2023
  • Asia Pacific
  • Latin America
  • Middle East and Africa
  • North America
  • Beam Suntory
  • British Gas
  • British Museum
  • Chapel Down
  • Compare the Market
  • Jaguar Land Rover
  • Marks & Spencer
  • Rugby League
  • Warner Leisure Hotels

Impact magazine is a quarterly publication for MRS members. You can access Impact content on this website.

  • Terms & Conditions
  • Research Live Live
  • Toggle navigation Toggle navigation -->
  • Sign In Forgot Password? Sign In

OPINION 25 April 2024

The crucial role of data quality in market research

Melanie Courtright

Data analytics North America Opinion Trends

meaning of quality of data in research

Data quality has risen up the market research industry agenda in recent years. Melanie Courtright outlines the importance of data quality to insight. 

Magnifying glass with target underneath

In the dynamic landscape of market research, where every decision is propelled by insights, data quality emerges as the unsung hero, quietly steering the course of action with its profound impact. Data quality serves as the bedrock upon which the edifice of meaningful analysis and informed decision-making stands tall.

Regardless of method, mode, tool, technology, industry or topic, quality is the most important element of the outcomes being fit for decision-making. Data quality, as defined by the Global Data Quality Glossary, means: “The measure of the condition of data based on factors such as accuracy, completeness, consistency, reliability and how up to date it is.” Quality data for insights fosters credibility, facilitates relevance and drives innovation.

Data quality fosters credibility. In an age where trust is the currency of consumer engagement, businesses cannot afford to base their decisions on shaky foundations. High-quality data lends legitimacy to research findings, enhancing the credibility of business strategies and bolstering stakeholder confidence. Whether it is persuading investors, convincing board members or winning over sceptical consumers, the credibility conferred by accurate and reliable data is invaluable.

Data quality facilitates relevance. In the vast sea of information inundating businesses today, discerning the signal from the noise is paramount. High-quality data ensures that the insights derived from market research are pertinent, actionable and aligned with the strategic objectives of the organisation. By filtering out extraneous or irrelevant data points, businesses can focus their resources on endeavours that truly move the needle, thereby maximising efficiency and efficacy.

Data quality drives agility and innovation. In today’s fast-paced business environment, agility is not just a buzzword but a prerequisite for survival. Timely access to accurate and reliable data empowers businesses to adapt swiftly to changing market dynamics, seize emerging opportunities and mitigate potential threats. Agility enables innovation, and innovation thrives on insights.

By ensuring the integrity, completeness and consistency of data, businesses create fertile ground for innovation to flourish. Whether it’s uncovering untapped market segments, identifying unmet consumer needs or predicting future trends with precision, the transformative potential of data-driven innovation is unleashed when fuelled by high-quality data.

For these reasons, the Global Data Quality ( GDQ ) initiative was formed through a partnership with global associations to take on the yeoman’s task of developing a framework for measuring and improving data quality in the insights profession. That work started with defining the language we use, outlining the research process, developing tools along that research process and leading the profession towards a quantitative conversation about the state of data quality, and an eventual path to quality buying signals. To achieve this, we must bring the profession along with us, getting their buy-in at every step.

To date, GDQ has delivered the glossary, a recommendation on the use of secure end links, a best practice guide on survey design and mobile optimisation guidelines. This quarter, two new tools are being issued: a Technology Solutions Guide and a Buyer’s Guide. In Q3, the first of their kind industry benchmarks will be released, so we can track performance on key data quality metrics over time. As a profession of measurement, we must measure what we care about – data quality.

To engage with the tools, or to become a part of the movement, we invite you to the GDQ webpage . 

The importance of data quality in market research cannot be overstated. Global associations and leaders in the field are adamant that data quality is either a core strength or an eminent threat to the future of market research. It is the cornerstone upon which informed decision-making, strategic agility and sustainable growth rest.

When trust is high, the profession thrives, and researchers have a greater seat at the table enabling us to advocate for people’s opinions. When credibility is low, our ability to represent people and influence decisions is at risk. We implore each of you to embrace data quality as the guiding light that illuminates your research choices, and to join us in the movement that ensures the credibility and growth of this great profession.

Melanie Courtright is chief executive officer at the Insights Association.

An interview with MRS managing director Debrah Harding on the Global Data Quality initiative is available here .

Research Jobfinder Logo

The world's leading job site for research and insight

Resources Group Associate Research Director – Focus on Ad hoc studies! – Innovative Insight Consultancy £45,000–£55,000 + great benefits

Spalding Goobey Associates Associate Director/Director – Quant – Insight Agency – Life Changing Move to New Zealand NZ$ 125,000 to 180,000 + Benefits

Resources Group Research Manager – Consumer – Neuro / Behavioural Science £35,000–£40,000 + Benefits

Sign up for the latest news and opinion. You will be asked to create an account which also gives you free access to premium Impact content.

  Daily   Weekly   Both

@RESEARCH LIVE

Media evaluation firm Comscore has increased its revenue in the second quarter but has made a net loss of $44.9m, a… https://t.co/rAHZYxiapz

RT @ImpactMRS : Marginalised groups are asserting themselves in Latin America, with diverse creative energy and an embrace of indigenous cul…

There is no evidence that Facebook’s worldwide popularity is linked to widespread psychological harm, according to… https://t.co/wS1Um3JRS5

Related Articles

Meeting the data quality challenge

Katie McQuater

Data quality: 'We cannot solve this unless we come together'

Leveraging academia and working together can improve data quality

Karine Pepin

Five ways to pursue better data quality in market research

Horst Feldhaeuser

Advertisement feature

Call to action: Join the fight for higher data quality

Next Article

Senior changes at Kantar TGI and insight

The Research Buyers Guide

Find your next agency.

Powered by the Research Buyers Guide

Advanced Search

Brought to you by:

MRS_EM_RGB

©2024 The Market Research Society, 15 Northburgh Street, London EC1V 0JR Tel: +44 (0)20 7490 4911 [email protected]

  • Advertisers
  • Accessibility statement
  • Publication index
  • Research Buyer's Guide
  • MRS WEBSITE

Evidence matters MRS logo

The post-demographic consumerism trend means segments such age are often outdated, from @trendwatching #TrendSemLON

Are you human?

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

meaning of quality of data in research

Home Market Research

Data Analysis in Research: Types & Methods

data-analysis-in-research

Content Index

Why analyze data in research?

Types of data in research, finding patterns in the qualitative data, methods used for data analysis in qualitative research, preparing data for analysis, methods used for data analysis in quantitative research, considerations in research data analysis, what is data analysis in research.

Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. 

Three essential things occur during the data analysis process — the first is data organization . Summarization and categorization together contribute to becoming the second known method used for data reduction. It helps find patterns and themes in the data for easy identification and linking. The third and last way is data analysis – researchers do it in both top-down and bottom-up fashion.

LEARN ABOUT: Research Process Steps

On the other hand, Marshall and Rossman describe data analysis as a messy, ambiguous, and time-consuming but creative and fascinating process through which a mass of collected data is brought to order, structure and meaning.

We can say that “the data analysis and data interpretation is a process representing the application of deductive and inductive logic to the research and data analysis.”

Researchers rely heavily on data as they have a story to tell or research problems to solve. It starts with a question, and data is nothing but an answer to that question. But, what if there is no question to ask? Well! It is possible to explore data even without a problem – we call it ‘Data Mining’, which often reveals some interesting patterns within the data that are worth exploring.

Irrelevant to the type of data researchers explore, their mission and audiences’ vision guide them to find the patterns to shape the story they want to tell. One of the essential things expected from researchers while analyzing data is to stay open and remain unbiased toward unexpected patterns, expressions, and results. Remember, sometimes, data analysis tells the most unforeseen yet exciting stories that were not expected when initiating data analysis. Therefore, rely on the data you have at hand and enjoy the journey of exploratory research. 

Create a Free Account

Every kind of data has a rare quality of describing things after assigning a specific value to it. For analysis, you need to organize these values, processed and presented in a given context, to make it useful. Data can be in different forms; here are the primary data types.

  • Qualitative data: When the data presented has words and descriptions, then we call it qualitative data . Although you can observe this data, it is subjective and harder to analyze data in research, especially for comparison. Example: Quality data represents everything describing taste, experience, texture, or an opinion that is considered quality data. This type of data is usually collected through focus groups, personal qualitative interviews , qualitative observation or using open-ended questions in surveys.
  • Quantitative data: Any data expressed in numbers of numerical figures are called quantitative data . This type of data can be distinguished into categories, grouped, measured, calculated, or ranked. Example: questions such as age, rank, cost, length, weight, scores, etc. everything comes under this type of data. You can present such data in graphical format, charts, or apply statistical analysis methods to this data. The (Outcomes Measurement Systems) OMS questionnaires in surveys are a significant source of collecting numeric data.
  • Categorical data: It is data presented in groups. However, an item included in the categorical data cannot belong to more than one group. Example: A person responding to a survey by telling his living style, marital status, smoking habit, or drinking habit comes under the categorical data. A chi-square test is a standard method used to analyze this data.

Learn More : Examples of Qualitative Data in Education

Data analysis in qualitative research

Data analysis and qualitative data research work a little differently from the numerical data as the quality data is made up of words, descriptions, images, objects, and sometimes symbols. Getting insight from such complicated information is a complicated process. Hence it is typically used for exploratory research and data analysis .

Although there are several ways to find patterns in the textual information, a word-based method is the most relied and widely used global technique for research and data analysis. Notably, the data analysis process in qualitative research is manual. Here the researchers usually read the available data and find repetitive or commonly used words. 

For example, while studying data collected from African countries to understand the most pressing issues people face, researchers might find  “food”  and  “hunger” are the most commonly used words and will highlight them for further analysis.

LEARN ABOUT: Level of Analysis

The keyword context is another widely used word-based technique. In this method, the researcher tries to understand the concept by analyzing the context in which the participants use a particular keyword.  

For example , researchers conducting research and data analysis for studying the concept of ‘diabetes’ amongst respondents might analyze the context of when and how the respondent has used or referred to the word ‘diabetes.’

The scrutiny-based technique is also one of the highly recommended  text analysis  methods used to identify a quality data pattern. Compare and contrast is the widely used method under this technique to differentiate how a specific text is similar or different from each other. 

For example: To find out the “importance of resident doctor in a company,” the collected data is divided into people who think it is necessary to hire a resident doctor and those who think it is unnecessary. Compare and contrast is the best method that can be used to analyze the polls having single-answer questions types .

Metaphors can be used to reduce the data pile and find patterns in it so that it becomes easier to connect data with theory.

Variable Partitioning is another technique used to split variables so that researchers can find more coherent descriptions and explanations from the enormous data.

LEARN ABOUT: Qualitative Research Questions and Questionnaires

There are several techniques to analyze the data in qualitative research, but here are some commonly used methods,

  • Content Analysis:  It is widely accepted and the most frequently employed technique for data analysis in research methodology. It can be used to analyze the documented information from text, images, and sometimes from the physical items. It depends on the research questions to predict when and where to use this method.
  • Narrative Analysis: This method is used to analyze content gathered from various sources such as personal interviews, field observation, and  surveys . The majority of times, stories, or opinions shared by people are focused on finding answers to the research questions.
  • Discourse Analysis:  Similar to narrative analysis, discourse analysis is used to analyze the interactions with people. Nevertheless, this particular method considers the social context under which or within which the communication between the researcher and respondent takes place. In addition to that, discourse analysis also focuses on the lifestyle and day-to-day environment while deriving any conclusion.
  • Grounded Theory:  When you want to explain why a particular phenomenon happened, then using grounded theory for analyzing quality data is the best resort. Grounded theory is applied to study data about the host of similar cases occurring in different settings. When researchers are using this method, they might alter explanations or produce new ones until they arrive at some conclusion.

LEARN ABOUT: 12 Best Tools for Researchers

Data analysis in quantitative research

The first stage in research and data analysis is to make it for the analysis so that the nominal data can be converted into something meaningful. Data preparation consists of the below phases.

Phase I: Data Validation

Data validation is done to understand if the collected data sample is per the pre-set standards, or it is a biased data sample again divided into four different stages

  • Fraud: To ensure an actual human being records each response to the survey or the questionnaire
  • Screening: To make sure each participant or respondent is selected or chosen in compliance with the research criteria
  • Procedure: To ensure ethical standards were maintained while collecting the data sample
  • Completeness: To ensure that the respondent has answered all the questions in an online survey. Else, the interviewer had asked all the questions devised in the questionnaire.

Phase II: Data Editing

More often, an extensive research data sample comes loaded with errors. Respondents sometimes fill in some fields incorrectly or sometimes skip them accidentally. Data editing is a process wherein the researchers have to confirm that the provided data is free of such errors. They need to conduct necessary checks and outlier checks to edit the raw edit and make it ready for analysis.

Phase III: Data Coding

Out of all three, this is the most critical phase of data preparation associated with grouping and assigning values to the survey responses . If a survey is completed with a 1000 sample size, the researcher will create an age bracket to distinguish the respondents based on their age. Thus, it becomes easier to analyze small data buckets rather than deal with the massive data pile.

LEARN ABOUT: Steps in Qualitative Research

After the data is prepared for analysis, researchers are open to using different research and data analysis methods to derive meaningful insights. For sure, statistical analysis plans are the most favored to analyze numerical data. In statistical analysis, distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities. The method is again classified into two groups. First, ‘Descriptive Statistics’ used to describe data. Second, ‘Inferential statistics’ that helps in comparing the data .

Descriptive statistics

This method is used to describe the basic features of versatile types of data in research. It presents the data in such a meaningful way that pattern in the data starts making sense. Nevertheless, the descriptive analysis does not go beyond making conclusions. The conclusions are again based on the hypothesis researchers have formulated so far. Here are a few major types of descriptive analysis methods.

Measures of Frequency

  • Count, Percent, Frequency
  • It is used to denote home often a particular event occurs.
  • Researchers use it when they want to showcase how often a response is given.

Measures of Central Tendency

  • Mean, Median, Mode
  • The method is widely used to demonstrate distribution by various points.
  • Researchers use this method when they want to showcase the most commonly or averagely indicated response.

Measures of Dispersion or Variation

  • Range, Variance, Standard deviation
  • Here the field equals high/low points.
  • Variance standard deviation = difference between the observed score and mean
  • It is used to identify the spread of scores by stating intervals.
  • Researchers use this method to showcase data spread out. It helps them identify the depth until which the data is spread out that it directly affects the mean.

Measures of Position

  • Percentile ranks, Quartile ranks
  • It relies on standardized scores helping researchers to identify the relationship between different scores.
  • It is often used when researchers want to compare scores with the average count.

For quantitative research use of descriptive analysis often give absolute numbers, but the in-depth analysis is never sufficient to demonstrate the rationale behind those numbers. Nevertheless, it is necessary to think of the best method for research and data analysis suiting your survey questionnaire and what story researchers want to tell. For example, the mean is the best way to demonstrate the students’ average scores in schools. It is better to rely on the descriptive statistics when the researchers intend to keep the research or outcome limited to the provided  sample  without generalizing it. For example, when you want to compare average voting done in two different cities, differential statistics are enough.

Descriptive analysis is also called a ‘univariate analysis’ since it is commonly used to analyze a single variable.

Inferential statistics

Inferential statistics are used to make predictions about a larger population after research and data analysis of the representing population’s collected sample. For example, you can ask some odd 100 audiences at a movie theater if they like the movie they are watching. Researchers then use inferential statistics on the collected  sample  to reason that about 80-90% of people like the movie. 

Here are two significant areas of inferential statistics.

  • Estimating parameters: It takes statistics from the sample research data and demonstrates something about the population parameter.
  • Hypothesis test: I t’s about sampling research data to answer the survey research questions. For example, researchers might be interested to understand if the new shade of lipstick recently launched is good or not, or if the multivitamin capsules help children to perform better at games.

These are sophisticated analysis methods used to showcase the relationship between different variables instead of describing a single variable. It is often used when researchers want something beyond absolute numbers to understand the relationship between variables.

Here are some of the commonly used methods for data analysis in research.

  • Correlation: When researchers are not conducting experimental research or quasi-experimental research wherein the researchers are interested to understand the relationship between two or more variables, they opt for correlational research methods.
  • Cross-tabulation: Also called contingency tables,  cross-tabulation  is used to analyze the relationship between multiple variables.  Suppose provided data has age and gender categories presented in rows and columns. A two-dimensional cross-tabulation helps for seamless data analysis and research by showing the number of males and females in each age category.
  • Regression analysis: For understanding the strong relationship between two variables, researchers do not look beyond the primary and commonly used regression analysis method, which is also a type of predictive analysis used. In this method, you have an essential factor called the dependent variable. You also have multiple independent variables in regression analysis. You undertake efforts to find out the impact of independent variables on the dependent variable. The values of both independent and dependent variables are assumed as being ascertained in an error-free random manner.
  • Frequency tables: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Analysis of variance: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Researchers must have the necessary research skills to analyze and manipulation the data , Getting trained to demonstrate a high standard of research practice. Ideally, researchers must possess more than a basic understanding of the rationale of selecting one statistical method over the other to obtain better data insights.
  • Usually, research and data analytics projects differ by scientific discipline; therefore, getting statistical advice at the beginning of analysis helps design a survey questionnaire, select data collection methods , and choose samples.

LEARN ABOUT: Best Data Collection Tools

  • The primary aim of data research and analysis is to derive ultimate insights that are unbiased. Any mistake in or keeping a biased mind to collect data, selecting an analysis method, or choosing  audience  sample il to draw a biased inference.
  • Irrelevant to the sophistication used in research data and analysis is enough to rectify the poorly defined objective outcome measurements. It does not matter if the design is at fault or intentions are not clear, but lack of clarity might mislead readers, so avoid the practice.
  • The motive behind data analysis in research is to present accurate and reliable data. As far as possible, avoid statistical errors, and find a way to deal with everyday challenges like outliers, missing data, data altering, data mining , or developing graphical representation.

LEARN MORE: Descriptive Research vs Correlational Research The sheer amount of data generated daily is frightening. Especially when data analysis has taken center stage. in 2018. In last year, the total data supply amounted to 2.8 trillion gigabytes. Hence, it is clear that the enterprises willing to survive in the hypercompetitive world must possess an excellent capability to analyze complex research data, derive actionable insights, and adapt to the new market needs.

LEARN ABOUT: Average Order Value

QuestionPro is an online survey platform that empowers organizations in data analysis and research and provides them a medium to collect data by creating appealing surveys.

MORE LIKE THIS

user journey vs user flow

User Journey vs User Flow: Differences and Similarities

Apr 26, 2024

gap analysis tools

Best 7 Gap Analysis Tools to Empower Your Business

Apr 25, 2024

employee survey tools

12 Best Employee Survey Tools for Organizational Excellence

Customer Experience Management Platform

Customer Experience Management Platform: Software & Practices

Apr 24, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence
  • Systematic Review
  • Open access
  • Published: 26 April 2024

Systematic review on the frequency and quality of reporting patient and public involvement in patient safety research

  • Sahar Hammoud   ORCID: orcid.org/0000-0003-4682-9001 1 ,
  • Laith Alsabek 1 , 2 ,
  • Lisa Rogers 1 &
  • Eilish McAuliffe 1  

BMC Health Services Research volume  24 , Article number:  532 ( 2024 ) Cite this article

Metrics details

In recent years, patient and public involvement (PPI) in research has significantly increased; however, the reporting of PPI remains poor. The Guidance for Reporting Involvement of Patients and the Public (GRIPP2) was developed to enhance the quality and consistency of PPI reporting. The objective of this systematic review is to identify the frequency and quality of PPI reporting in patient safety (PS) research using the GRIPP2 checklist.

Searches were performed in Ovid MEDLINE, EMBASE, PsycINFO, and CINAHL from 2018 to December, 2023. Studies on PPI in PS research were included. We included empirical qualitative, quantitative, mixed methods, and case studies. Only articles published in peer-reviewed journals in English were included. The quality of PPI reporting was assessed using the short form of the (GRIPP2-SF) checklist.

A total of 8561 studies were retrieved from database searches, updates, and reference checks, of which 82 met the eligibility criteria and were included in this review. Major PS topics were related to medication safety, general PS, and fall prevention. Patient representatives, advocates, patient advisory groups, patients, service users, and health consumers were the most involved. The main involvement across the studies was in commenting on or developing research materials. Only 6.1% ( n  = 5) of the studies reported PPI as per the GRIPP2 checklist. Regarding the quality of reporting following the GRIPP2-SF criteria, our findings show sub-optimal reporting mainly due to failures in: critically reflecting on PPI in the study; reporting the aim of PPI in the study; and reporting the extent to which PPI influenced the study overall.

Conclusions

Our review shows a low frequency of PPI reporting in PS research using the GRIPP2 checklist. Furthermore, it reveals a sub-optimal quality in PPI reporting following GRIPP2-SF items. Researchers, funders, publishers, and journals need to promote consistent and transparent PPI reporting following internationally developed reporting guidelines such as the GRIPP2. Evidence-based guidelines for reporting PPI should be encouraged and supported as it helps future researchers to plan and report PPI more effectively.

Trial registration

The review protocol is registered with PROSPERO (CRD42023450715).

Peer Review reports

Patient safety (PS) is defined as “the absence of preventable harm to a patient and reduction of risk of unnecessary harm associated with healthcare to an acceptable minimum” [ 1 ]. It is estimated that one in 10 patients are harmed in healthcare settings due to unsafe care, resulting in over three million deaths annually [ 2 ]. More than 50% of adverse events are preventable, and half of these events are related to medications [ 3 , 4 ]. There are various types of adverse events that patients can experience such as medication errors, patient falls, healthcare-associated infections, diagnostic errors, pressure ulcers, unsafe surgical procedures, patient misidentification, and others [ 1 ].

Over the last few decades, the approach of PS management has shifted toward actively involving patients and their families in managing PS. This innovative approach has surpassed the traditional model where healthcare providers were the sole managers of PS [ 5 ]. Recent research has shown that patients have a vital role in promoting their safety and decreasing the occurrence of adverse events [ 6 ]. Hence, there is a growing recognition of patient and family involvement as a promising method to enhance PS [ 7 ]. This approach includes involving patients in PS policy development, research, and shared decision making [ 1 ].

In the last decade, research involving patients and the public has significantly increased. In the United Kingdom (U.K), the National Institute for Health Research (NIHR) has played a critical role in providing strategic and infrastructure support to integrate Public and Patient Involvement (PPI) throughout publicly funded research [ 8 ]. This has established a context where PPI is recognised as an essential element in research [ 9 ]. In Ireland, the national government agency responsible for the management and delivery of all public health and social services; the National Health Service Executive (HSE) emphasise the importance of PPI in research and provide guidance for researchers on how to involve patients and public in all parts of the research cycle and knowledge translation process [ 10 ]. Similar initiatives are also developing among other European countries, North America, and Australia. However, despite this significant expansion of PPI research, the reporting of PPI in research articles continues to be sub-optimal, inconsistent, and lacks essential information on the context, process, and impact of PPI [ 9 ]. To address this problem, the Guidance for Reporting Involvement of Patients and the Public (GRIPP) was developed in 2011 following the EQUATOR methodology to enhance the quality, consistency, and transparency of PPI reporting. Additionally, to provide guidance for researchers, patients, and the public to advance the quality of the international PPI evidence-base [ 11 ]. The first GRIPP checklist was a significant start in producing higher-quality PPI reporting; however, it was developed following a systematic review, and did not include any input from the international PPI research community. Given the importance of reaching consensus in generating current reporting guidelines, a second version of the GRIPP checklist (GRIPP2) was developed to tackle this problem by involving the international PPI community in its development [ 9 ]. There are two versions of the GRIPP2 checklist, a long form (GRIPP2-LF) for studies with PPI as the primary focus, and a short form (GRIPP2-SF) for studies with PPI as secondary or tertiary focus.

Since the publication of the GRIPP2 checklist, several systematic reviews have been conducted to assess the quality of PPI reporting on various topics. For instance, Bergin et al. in their review to investigate the nature and impact of PPI in cancer research, reported a sub-optimal quality of PPI reporting using the GRIPP2-SF, mainly due to failure to address PPI challenges [ 12 ]. Similarly, Owyang et al. in their systematic review to assess the prevalence, extent, and quality of PPI in orthopaedic practice, described a poor PPI reporting following the GRIPP2-SF checklist criteria [ 13 ]. While a few systematic reviews have been conducted to assess theories, strategies, types of interventions, and barriers and enablers of PPI in PS [ 5 , 14 , 15 , 16 ], no previous review has assessed the quality of PPI reporting in PS research. Thus, our systematic review aims to address this knowledge gap. The objective of this review is to identify the frequency PPI reporting in PS research using the GRIPP2 checklist from 2018 (the year after GRIPP2 was published) and the quality of reporting following the GRIPP2-SF. The GRIPP2 checklist was chosen as the benchmark as it is the first international, evidence-based, community consensus informed guideline for the reporting of PPI in research and more specifically in health and social care research [ 9 ]. Additionally, it is the most recent report-focused framework and the most recommended by several leading journals [ 17 ].

We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines to plan and report this review [ 18 ]. The review protocol was published on PROSPERO the International Database of Prospectively Registered Systematic Reviews in August 2023 (CRD42023450715).

Search strategy

For this review, we used the PICo framework to define the key elements in our research. These included articles on patients and public (P-Population) involvement (I- phenomenon of Interest) in PS (C-context). Details are presented in Table  1 . Four databases were searched including Ovid MEDLINE, EMBASE, PsycINFO, and CINAHL to identify papers on PPI in PS research. A systematic search strategy was initially developed using MEDLINE. MeSH terms and keywords relevant to specific categories (e.g., patient safety) were combined using the “OR” Boolean term (i.e. patient safety OR adverse event OR medical error OR surgical error) and categories were then combined using the “AND” Boolean term. (i.e. “patient and public involvement” AND “patient safety”). The search strategy was adapted for the other three databases. Full search strategies are provided in Supplementary file 1 . The search was conducted on July 27th, 2023, and was limited to papers published from 2018. As the GRIPP2 tool was published in 2017, this limit ensured the retrieval of relevant studies. An alert system was set on the four databases to receive all new published studies until December 2023, prior to the final analysis. The search was conducted without restrictions on study type, research design, and language. To reduce selection bias, hand searching was carried out on the reference lists of all the eligible articles in the later stages of the review. This was done by the first author. The search strategy was developed by the first author and confirmed by the research team and a Librarian. The database search was conducted by the first author.

Inclusion and exclusion criteria

Studies on PPI in PS research with a focus on health/healthcare were included in this review. We defined PPI as active involvement which is in line with the NIHR INVOLVE definition as “research being carried out ‘with’ or ‘by’ members of the public rather than ‘to’, ‘about’ or ‘for’ them” [ 19 ]. This includes any PPI including, being a co-applicant on a research project or grant application, identifying research priorities, being a member of an advisory or steering group, participating in developing research materials or giving feedback on them, conducting interviews with study participants, participating in recruitment, data collection, data analysis, drafting manuscripts and/or dissemination of results. Accordingly, we excluded studies where patients or the public were only involved as research participants.

We defined patients and public to include patients, relatives, carers, caregivers and community, which is also in line with the NIHR PPI involvement in National Health Service [ 19 ].

Patient safety included topics on medication safety, adverse events, communication, safety culture, diagnostic errors, and others. A full list of the used terms for PPI and PS is provided in Supplementary file 1 . Regarding the research type and design, we included empirical qualitative, quantitative, mixed methods, and case studies. Only articles published in peer-reviewed journals and in English were included.

Any article that did not meet the inclusion criteria was excluded. Studies not reporting outcomes were excluded. Furthermore, review papers, conference abstracts, letters to editor, commentary, viewpoints, and short communications were excluded. Finally, papers published prior to 2018 were excluded.

Study selection

The selection of eligible studies was done by the first and the second authors independently, starting with title and abstracts screening to eliminate papers that failed to meet our inclusion criteria. Then, full text screening was conducted to decide on the final included papers in this review. Covidence, an online data management system supported the review process, ensuring reviewers were blinded to each other’s decisions. Disagreements between reviewers were discussed first, in cases where the disagreement was not resolved, the fourth author was consulted.

Data extraction and analysis

A data extraction sheet was developed using excel then piloted, discussed with the research team and modified as appropriate. The following data were extracted: citation and year of publication, objective of the study, country, PS topic, design, setting, PPI participants, PPI stages (identifying research priorities, being a member of an advisory or steering group, etc.…), frequency of PPI reporting as per the GRIPP2 checklist, and the availability of a plain language summary. Additionally, data against the five items of GRIPP2-SF (aim of PPI in the study, methods used for PPI, outcomes of PPI including the results and the extent to which PPI influenced the study overall, and reflections on PPI) were extracted. To avoid multiple publication bias and missing outcomes, data extraction was done by the first and the second authors independently and then compared. Disagreements between reviewers were first discussed, and then resolved by the third and fourth authors if needed.

Quality assessment

The quality of PPI reporting was assessed using GRIPP2-SF developed by Staniszewska et al. [ 9 ] as it is developed to improve the quality, consistency, and reporting of PPI in social and healthcare research. Additionally the GRIPP2-SF is suitable for all studies regardless of whether PPI is the primary, secondary, or tertiary focus, whereas the GRIPP2-LF is not suitable for studies where PPI serves as a secondary or tertiary focus. The checklist includes five items (mentioned above) that authors should include in their studies. It is important to mention that Staniszewska et al. noted that “while GRIPP2-SF aims to guide consistent reporting, it is not possible to be prescriptive about the exact content of each item, as the current evidence-base is not advanced enough to make this possible” ([ 9 ] p5). For that reason, we had to develop criteria for scoring the five reporting items. We used three scoring as Yes, No, and partial for each of the five items of the GRIPP2-SF. Yes, was given when authors presented PPI information on the item clearly in the paper. No, when no information was provided, and partial when the information partially met the item requirement. For example, as per GRIPP2-SF authors should provide a clear description of the methods used for PPI in the study. In the example given by Staniszewska et al., information on patient/public partners and how many of them were provided, as well as the stages of the study they were involved in (i.e. refining the focus of the research questions, developing the search strategy, interpreting results). Thus, in our evaluation of the included studies, we gave a yes if information on PPI participants (i.e. patient partners, community partners, or family members etc..) and how many of them were involved was provided, and information on the stages or actions of their involvement in the study was provided. However, we gave a “partial” if information was not fully provided (i.e. information on patient/public partners and how many were involved in the study without describing in what stages or actions they were involved, and vice versa), and a “No” if no information was presented at all.

The quality of PPI reporting was done by the first and the second authors independently and then compared. Disagreements between reviewers were first discussed, and then resolved by the third and fourth author when needed.

Assessing the quality or risk of bias of the included studies was omitted, as the focus in this review was on appraising the quality of PPI reporting rather than assessing the quality of each research article.

Data synthesis

After data extraction, a table summarising the included studies was developed. Studies were compared according to the main outcomes of the review; frequency of PPI reporting following the GRIPP2 checklist and the quality of reporting as per GRIPP2-SF five items, and the availability of a plain language summary.

Search results and study selection

The database searches yielded a total of 8491 studies. First, 2496 were removed as duplicates. Then, after title and abstract screening, 5785 articles were excluded leaving 210 articles eligible for the full text review. After a careful examination, 68 of these studies were included in this review. A further 38 studies were identified from the alert system that was set on the four databases and 32 studies from the reference check of the included studies. Of these 70 articles, 56 were further excluded and 14 were added to the previous 68 included studies. Thus, 82 studies met the inclusion criteria and were included in this review. A summary of the database search results and the study selection process are presented in Fig.  1 .

figure 1

PRISMA flow diagram of the study selection process. The PRISMA flow diagram details the review search results and selection process

Overview of included studies

Details of the study characteristics including first author and year of publication, objective, country, study design, setting, PS topic, PPI participants and involvement stages are presented in Supplementary file 2 . The majority of the studies were conducted in the U.K ( n  = 24) and the United States of America ( n  = 18), with the remaining 39 conducted in other high income countries, the exception being one study in Haiti. A range of study designs were identified, the most common being qualitative ( n  = 31), mixed methods ( n  = 13), interventional ( n  = 5), and quality improvement projects ( n  = 4). Most PS topics concerned medication safety ( n  = 17), PS in general (e.g., developing a PS survey or PS management application) ( n  = 14), fall prevention ( n  = 13), communication ( n  = 11), and adverse events ( n  = 10), with the remaining PS topics listed in Supplementary file 2 .

Patient representatives, advocates, and patient advisory groups ( n  = 33) and patients, service users, and health consumers ( n  = 32) were the main groups involved. The remaining, included community members/ organisations. Concerning PPI stages, the main involvement across the studies was in commenting on or developing research materials ( n  = 74) including, patient leaflets, interventional tools, mobile applications, and survey instruments. Following this stage, involvement in data analysis, drafting manuscripts, and disseminating results ( n  = 30), and being a member of a project advisory or steering group ( n  = 18) were the most common PPI evident in included studies. Whereas the least involvement was in identifying research priorities ( n  = 5), and being a co-applicant on a research project or grant application ( n  = 6).

Regarding plain language summary, only one out of the 82 studies (1.22%) provided a plain language summary in their paper [ 20 ].

Frequency and quality of PPI reporting

The frequency of PPI reporting following the GRIPP2 checklist was 6.1%, where only five of the 82 included studies reported PPI in their papers following the GRIPP2 checklist. The quality of PPI reporting in those studies is presented in Table  2 . Of these five studies, one study (20%) did not report the aim of PPI in the study and one (20%) did not comment on the extent to which PPI influenced the study overall.

The quality of PPI reporting of the remaining 77 studies is presented in Table  3 . The aim of PPI in the study was reported in 62.3% of articles ( n  = 48), while 3.9% ( n  = 3) partially reported this. A clear description of the methods used for PPI in the study was reported in 79.2% of papers ( n  = 61) and partially in 20.8% ( n  = 16). Concerning the outcomes, 81.8% of papers ( n  = 63) reported the results of PPI in the study, while 10.4% ( n  = 8) partially did. Of the 77 studies, 68.8% ( n  = 53) reported the extent to which PPI influenced the study overall and 3.9% ( n  = 3) partially reported this. Finally, 57.1% ( n  = 44) of papers critically reflected on the things that went well and those that did not and 2.6% ( n  = 2) partially reflected on this.

Summary of main findings

This systematic review assessed the frequency of reporting PPI in PS research using the GRIPP2 checklist and quality of reporting using the GRIPP2-SF. In total, 82 studies were included in this review. Major PS topics were related to medication safety, general PS, and fall prevention. Patient representatives, advocates, patient advisory groups, patients, service users, and health consumers were the most involved. The main involvement across the studies was in commenting on or developing research materials such as educational and interventional tools, survey instruments, and applications while the least was in identifying research priorities and being a co-applicant on a research project or grant application. Thus, significant effort is still needed to involve patients and the public in the earlier stages of the research process given the fundamental impact of PS on their lives.

Overall completeness and applicability of evidence

A low frequency of reporting PPI in PS research following the GRIPP2 guidelines was revealed in this review, where only five of the 82 studies included mentioned that PPI was reported as per the GRIPP2 checklist. This is despite it being the most recent report-focused framework and the most recommended by several leading journals [ 17 ]. This was not surprising as similar results were reported in recent reviews in other healthcare topics. For instance, Musbahi et al. in their systematic review on PPI reporting in bariatric research reported that none of the 90 papers identified in their review mentioned or utilised the GRIPP2 checklist [ 102 ]. Similarly, a study on PPI in orthodontic research found that none of the 363 included articles reported PPI against the GRIPP2 checklist [ 103 ].

In relation to the quality of reporting following the GRIPP2-SF criteria, our findings show sub-optimal reporting within the 77 studies that did not use GRIPP2 as a guide/checklist to report their PPI. Similarly, Bergin et al. in their systematic review to investigate the nature and impact of PPI in cancer research concluded that substandard reporting was evident [ 12 ]. In our review, this was mainly due to failure to meet three criteria. First, the lowest percentage of reporting (57.1%, n  = 44) was related to critical reflection on PPI in the study (i.e., what went well and what did not). In total, 31 studies (42.9%) did not provide any information on this, and two studies were scored as partial. The first study mentioned that only involving one patient was a limitation [ 27 ] and the other stated that including three patients in the design of the tool was a strength [ 83 ]. Both studies did not critically comment or reflect on these points so that future researchers are able to avoid such problems and enhance PPI opportunities. For instance, providing the reasons/challenges behind the exclusive inclusion of a single patient and explaining how this limits the study findings and conclusion would help future researchers to address these challenges. Likewise, commenting on why incorporating three patients in the design of the study tool could be seen as a strength would have been beneficial. This could be, fostering diverse perspectives and generating novel ideas for developing the tool. Similar to our findings, Bergin et al. in their systematic review reported that 40% of the studies failed to meet this criterion [ 12 ].

Second, only 48 out of 77 articles (62.3%) reported the aim of PPI in their study, which is unlike the results of Bergin et al. where most of the studies (93.1%) in their review met this criterion [ 12 ]. Of the 29 studies which did not meet this criterion in our review, few mentioned in their objective developing a consensus-based instrument [ 41 ], reaching a consensus on the patient-reported outcomes [ 32 ], obtaining international consensus on a set of core outcome measures [ 98 ], and facilitating a multi-stakeholder dialogue [ 71 ] yet, without indicating anything in relation to patients, patient representatives, community members, or any other PPI participants. Thus, the lack of reporting the aim of PPI was clearly evident in this review. Reporting the aim of PPI in the study is crucial for promoting transparency, methodological rigor, reproducibility, and impact assessment of the PPI.

Third, 68.8% ( n  = 53) of the studies reported the extent to which PPI influenced the study overall including positive and negative effects if any. This was again similar to the findings of Bergin et al., where 38% of the studies did not meet this criterion mainly due to a failure to address PPI challenges in their respective studies [ 12 ]. Additionally, Owyang et al. in their review on the extent, and quality of PPI in orthopaedic practice, also described a poor reporting of PPI impact on research [ 13 ]. As per the GRIPP2 guidelines, both positive and negative effects of PPI on the study should be reported when applicable. Providing such information is essential as it enhances future research on PPI in terms of both practice and reporting.

Reporting a clear description of the methods used for PPI in the study was acceptable, with 79.2% of the papers meeting this criterion. Most studies provided information in the methods section of their papers on the PPI participants, their number, stages of their involvement and how they were involved. Providing clear information on the methods used for PPI is vital to give the reader a clear understanding of the steps taken to involve patients, and for other researchers to replicate these methods in future research. Additionally, reporting the results of PPI in the study was also acceptable with 81.8% of the papers reporting the outcomes of PPI in the results section. Reporting the results of PPI is important for enhancing methodological transparency, providing a more accurate interpretation for the study findings, contributing to the overall accountability and credibility of the research, and informing decision making.

Out of the 82 studies included in this review, only one study provided a plain language summary. We understand that PS research or health and medical research in general is difficult for patients and the public to understand given their diverse health literacy and educational backgrounds. However, if we expect patients and the public to be involved in research then, it is crucial to translate this research that has a huge impact on their lives into an easily accessible format. Failing to translate the benefits that such research may have on patient and public lives may result in them underestimating the value of this research and losing interest in being involved in the planning or implementation of future research [ 103 ]. Thus, providing a plain language summary for research is one way to tackle this problem. To our knowledge, only a few health and social care journals (i.e. Cochrane and BMC Research Involvement and Engagement) necessitate a plain language summary as a submission requirement. Having this as a requirement for submission is crucial in bringing the importance of this issue to researchers’ attention.

Research from recent years suggests that poor PPI reporting in articles relates to a lack of submission requirements for PPI reporting in journals and difficulties with word limits for submitted manuscripts [ 13 ]. Price et al. assessed the frequency of PPI reporting in published papers before and after the introduction of PPI reporting obligations by the British Medical Journal (BMJ) [ 104 ]. The authors identified an increase in PPI reporting in papers published by BMJ from 0.5% to 11% between the periods of 2013–2014 and 2015–2016. The study findings demonstrate the impact of journal guidelines in shaping higher quality research outputs [ 13 ]. In our review, we found a low frequency of PPI reporting in PS research using the GRIPP2 checklist, alongside sub-optimal quality of reporting following GRIPP2-SF. This could potentially be attributed to the absence of submission requirements for PPI reporting in journals following the GRIPP2 checklist, as well as challenges posed by word limits.

Strengths and limitations

This systematic review presents an overview on the frequency of PPI reporting in PS research using the GRIPP2 checklist, as well as an evaluation of the quality of reporting following the GRIPP2-SF. As the first review to focus on PS research, it provides useful knowledge on the status of PPI reporting in this field, and the extent to which researchers are adopting and adhering to PPI reporting guidelines. Despite these strengths, our review has some limitations that should be mentioned. First, only English language papers were included in this review due to being the main language of the researchers. Thus, there is a possibility that relevant articles on PPI in PS research may have been omitted. Another limitation is related to our search which was limited to papers published starting 2018 as the GRIPP2 guidelines were published in 2017. Thus it is probable that the protocols of some of these studies were developed earlier than the publication of the GRIPP2 checklist, meaning that PPI reporting following GRIPP2 was not common practice and thus not adopted by these studies. This might limit the conclusions we can draw from this review. Finally, the use of GRIPP2 to assess the quality of PPI reporting might be a limitation as usability testing has not yet been conducted to understand how the checklist works in practice with various types of research designs. However, the GRIPP2 is the first international, evidence-based, community consensus informed guideline for the reporting of PPI in health and social care research. Reflections and comments from researchers using the GRIPP2 will help improve its use in future studies.

Implications for research and practice

Lack of PPI reporting not only affects the quality of research but also implies that others cannot learn from previous research experience. Additionally, without consistent and transparent reporting it is difficult to evaluate the impact of various PPI in research [ 9 ]: “if it is not reported it cannot be assessed” ([ 105 ] p19). Enhanced PPI reporting will result in a wider range and richer high-quality evidence-based PPI research, leading to a better understanding of PPI use and effectiveness [ 103 ]. GRIPP2 reporting guidelines were developed to provide guidance for researchers, patients, and the public to enhance the quality of PPI reporting and improve the quality of the international PPI evidence-base. The guidance can be used prospectively to plan PPI or retrospectively to guide the structure or PPI reporting in research [ 9 ]. To enhance PPI reporting, we recommend the following;

Publishers and journals

First, we encourage publishers and journals to require researchers to report PPI following the GRIPP2 checklist. Utilising the short or the long version should depend on the primary focus of the study (i.e., if PPI is within the primary focus of the research then the GRIPP2-LF is recommended). Second, we recommend that journals and editorial members advise reviewers to evaluate PPI reporting within research articles following the GRIPP2 tool and make suggestions accordingly. Finally, we encourage journals to add a plain language summary as a submission requirement to increase research dissemination and improve the accessibility of research for patients and the public.

Researchers

Though there is greater evidence of PPI in research, it is still primarily the researchers that are setting the research agenda and deciding on the research questions to be addressed. Thus, significant effort is still needed to involve patients and the public in the earlier stages of the research process given the fundamental impact of PS on their lives. To enhance future PPI reporting, perhaps adding a criterion following the GRIPP2 tool to existing EQUATOR checklists for reporting research papers such as STROBE, PRISMA, CONSORT, may support higher quality research. Additionally, currently, there is no detailed explanation paper for the GRIPP2 where each criterion is explained in detail with examples. Addressing this gap would be of great benefit to guide the structure of PPI reporting and to explore the applicability of each criterion in relation to different stages of PPI in research. For instance, having a detailed explanation for each criterion across different research studies having various PPI stages would be of high value to improve future PPI reporting given the growing interest in PPI research in recent years and the relatively small PPI evidence base in health and medical research.

Funding bodies can also enhance PPI reporting by adding a requirement for researchers to report PPI following the GRIPP2 checklist. In Ireland, the National HSE has already initiated this by requiring all PPI in HSE research in Ireland to be reported following the GRIPP2 guidelines [ 10 ].

This study represents the first systematic review on the frequency and quality of PPI reporting in PS research using the GRIPP2 checklist. Most PS topics were related to medication safety, general PS, and fall prevention. The main involvement across the studies was in commenting on or developing research materials. Thus, efforts are still needed to involve patients and the public across all aspects of the research process, especially earlier stages of the research cycle. The frequency of PPI reporting following the GRIPP2 guidelines was low, and the quality of reporting following the GRIPP2-SF criteria was sub-optimal. The lowest percentages of reporting were on critically reflecting on PPI in the study so future research can learn from this experience and work to improve it, reporting the aim of the PPI in the study, and reporting the extent to which PPI influenced the study overall including positive and negative effects. Researchers, funders, publishers, journals, editorial members and reviewers have a responsibility to promote consistent and transparent PPI reporting following internationally developed reporting guidelines such as the GRIPP2. Evidence-based guidelines for reporting PPI should be supported to help future researchers plan and report PPI more effectively, which may ultimately improve the quality and relevance of research.

Availability of data and materials

All data generated or analysed during this study are included in this published article and its Supplementary information files.

Abbreviations

  • Patient safety

United Kingdom

National Institute for Health Research

Public and Patient Involvement

Health Service Executive

Guidance for Reporting Involvement of Patients and the Public

Second version of the GRIPP checklist

Long form of GRIPP2

Short form of GRIPP2

Preferred Reporting Items for Systematic Reviews and Meta-Analyses

The International Database of Prospectively Registered Systematic Reviews

British Medical Journal

Patient saftey: World Health Organisation. 2023. Available from: https://www.who.int/news-room/fact-sheets/detail/patient-safety . Updated 11 Sept 2023.

Slawomirski L, Klazinga N. The economics of patient safety: from analysis to action. Paris: Organisation for Economic Co-operation and Development; 2020.

Google Scholar  

Panagioti M, Khan K, Keers RN, Abuzour A, Phipps D, Kontopantelis E, et al. Prevalence, severity, and nature of preventable patient harm across medical care settings: systematic review and meta-analysis. Bmj-Brit Med J. 2019;366:l4185.

Article   Google Scholar  

Hodkinson A, Tyler N, Ashcroft DM, Keers RN, Khan K, Phipps D, et al. Preventable medication harm across health care settings: a systematic review and meta-analysis. Bmc Medicine. 2020;18(1):313.

Article   PubMed   PubMed Central   Google Scholar  

Park M, Giap TTT. Patient and family engagement as a potential approach for improving patient safety: A systematic review. J Adv Nurs. 2020;76(1):62–80.

Article   PubMed   Google Scholar  

Chegini Z, Janati A, Bababie J, Pouraghaei M. The role of patients in the delivery of safe care in hospital: Study protocol. J Adv Nurs. 2019;75(9):2015–23.

Chegini Z, Arab-Zozani M, Islam SMS, Tobiano G, Rahimi SA. Barriers and facilitators to patient engagement in patient safety from patients and healthcare professionals’ perspectives: A systematic review and meta-synthesis. Nurs Forum. 2021;56(4):938–49.

Going the extra mile: improving the nation’s health and wellbeing through public involvement in research. London: National Institute for Health; 2015.

Staniszewska S, Brett J, Simera I, Seers K, Mockford C, Goodlad S, et al. GRIPP2 reporting checklists: tools to improve reporting of patient and public involvement in research. Bmj-Brit Med J. 2017;358:j3453.

Article   CAS   Google Scholar  

Minogue V. Knowledge translation, dissemination, and impact: a practical guide for researchers. Guide No 8: patient and public involvement in HSE research. Ireland: Health Service Executive Research and Development; 2021.

Staniszewska S, Brett J, Mockford C, Barber R. The GRIPP checklist: Strengthening the quality of patient and public involvement reporting in research. Int J Technol Assess Health Care. 2011;27(4):391–9.

Bergin RJ, Short CE, Davis N, Marker J, Dawson MT, Milton S, et al. The nature and impact of patient and public involvement in cancer prevention, screening and early detection research: A systematic review. Prev Med. 2023;167:107412.

Owyang D, Bakhsh A, Brewer D, Boughton OR, Cobb JP. Patient and public involvement within orthopaedic research a systematic review. J Bone Joint Surg Am. 2021;103(13):e51.

Busch IM, Saxena A, Wu AW. Putting the patient in patient safety investigations: barriers and strategies for involvement. J Patient Saf. 2021;17(5):358–62.

Lee M, Lee NJ, Seo HJ, Jang H, Kim SM. Interventions to engage patients and families in patient safety: a systematic review. West J Nurs Res. 2021;43(10):972–83.

Ocloo J, Garfield S, Franklin BD, Dawson S. Exploring the theory, barriers and enablers for patient and public involvement across health, social care and patient safety: a systematic review of reviews. Health Res Policy Syst. 2021;19(1):8.

Greenhalgh T, Hinton L, Finlay T, Macfarlane A, Fahy N, Clyde B, et al. Frameworks for supporting patient and public involvement in research: Systematic review and co-design pilot. Health Expect. 2019;22(4):785–801.

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. Plos Medicine. 2021;18(3):372.

INVOLVE. What is public involvement in research? NIHR; 2019. Available from: https://www.invo.org.uk/find-out-more/what-is-public-involvement-in-research-2/ .

Shahid A, Sept B, Kupsch S, Brundin-Mather R, Piskulic D, Soo A, et al. Development and pilot implementation of a patient-oriented discharge summary for critically Ill patients. World J Crit Care Med. 2022;11(4):255–68.

Bisset CN, Dames N, Oliphant R, Alasadi A, Anderson D, Parson S, et al. Exploring shared surgical decision-making from the patient’s perspective: is the personality of the surgeon important? Colorectal Dis. 2020;22(12):2214–21.

Article   CAS   PubMed   Google Scholar  

Morris RL, Ruddock A, Gallacher K, Rolfe C, Giles S, Campbell S. Developing a patient safety guide for primary care: A co-design approach involving patients, carers and clinicians. Health Expect. 2021;24(1):42–52.

Tobiano G, Marshall AP, Gardiner T, Jenkinson K, Shapiro M, Ireland M. Development and psychometric testing of the patient participation in bedside handover survey. Health Expect. 2022;25(5):2492–502.

Francis-Coad J, Farlie MK, Haines T, Black L, Weselman T, Cummings P, et al. Revising and evaluating falls prevention education for older adults in hospital. Health Educ J. 2023;82(8):878–91.

Troya MI, Chew-Graham CA, Babatunde O, Bartlam B, Higginbottom A, Dikomitis L. Patient and public involvement and engagement in a doctoral research project exploring self-harm in older adults. Health Expect. 2019;22(4):617–31.

Aharaz A, Kejser CL, Poulsen MW, Jeftic S, Ulstrup-Hansen AI, Jorgensen LM, et al. Optimization of the Danish National Electronic Prescribing System to improve patient safety: Development of a user-friendly prototype of the digital platform shared medication record. Pharmacy (Basel, Switzerland). 2023;11(2):41.

PubMed   Google Scholar  

Aho-Glele U, Bouabida K, Kooijman A, Popescu IC, Pomey MP, Hawthornthwaite L, et al. Developing the first pan-Canadian survey on patient engagement in patient safety. BMC Health Serv Res. 2021;21(1):1099.

Albutt A, O’Hara J, Conner M, Lawton R. Involving patients in recognising clinical deterioration in hospital using the patient wellness questionnaire: A mixed-methods study. J Res Nurs. 2020;25(1):68–86.

Bell SK, Bourgeois F, DesRoches CM, Dong J, Harcourt K, Liu SK, et al. Filling a gap in safety metrics: development of a patient-centred framework to identify and categorise patient-reported breakdowns related to the diagnostic process in ambulatory care. BMJ Qual Saf. 2022;31(7):526–40.

Boet S, Etherington N, Lam S, Lê M, Proulx L, Britton M, et al. Implementation of the Operating Room Black Box research program at the Ottawa Hospital through patient, clinical, and organizational engagement: Case study. J Med Internet Res. 2021;23(3):e15443.

Carter J, Tribe RM, Shennan AH, Sandall J. Threatened preterm labour: Women’s experiences of risk and care management: A qualitative study. Midwifery. 2018;64:85–92.

Da Silva Lopes AM, Colomer-Lahiguera S, Mederos Alfonso N, Aedo-Lopez V, Spurrier-Bernard G, Tolstrup LK, et al. Patient-reported outcomes for monitoring symptomatic toxicities in cancer patients treated with immune-checkpoint inhibitors: A Delphi study. Eur J Cancer. 2021;157:225–37.

de Jong LD, Lavender AP, Wortham C, Skelton DA, Haines TP, Hill AM. Exploring purpose-designed audio-visual falls prevention messages on older people’s capability and motivation to prevent falls. Health Soc Care Community. 2019;27(4):e471–82.

Doucette L, Kiely BT, Gierisch JM, Marion E, Nadler L, Heflin MT, et al. Participatory research to improve medication reconciliation for older adults in the community. J Am Geriatr Soc. 2023;71(2):620–31.

Elrod CS, Pappa ST, Heyn PC, Wong RA. Using an academic-community partnership model to deliver evidence-based falls prevention programs in a metropolitan setting: A community case study. Front Public Health. 2023;11:1073520.

Feldman E, Pos FJ, Smeenk RJ, van der Poel H, van Leeuwen P, de Feijter JM, et al. Selecting a PRO-CTCAE-based subset for patient-reported symptom monitoring in prostate cancer patients: a modified Delphi procedure. ESMO Open. 2023;8(1):100775.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Francis-Coad J, Watts T, Bulsara C, Hill A-M. Designing and evaluating falls prevention education with residents and staff in aged care homes: a feasibility study. Health Educ (0965-4283). 2022;122(5):546–63.

Fuller TE, Pong DD, Piniella N, Pardo M, Bessa N, Yoon C, et al. Interactive digital health tools to engage patients and caregivers in discharge preparation: implementation study. J Med Internet Res. 2020;22(4):e15573.

Gibson B, Butler J, Schnock K, Bates D, Classen D. Design of a safety dashboard for patients. Patient Educ Couns. 2020;103(4):741–7.

Giles SJ, Lewis PJ, Phipps DL, Mann F, Avery AJ, Ashcroft DM. Capturing patients’ perspectives on medication safety: the development of a patient-centered medication safety framework. J Patient Saf. 2020;16(4):e324–39.

Gnagi R, Zuniga F, Brunkert T, Meyer-Massetti C. Development of a medication literacy assessment instrument (MELIA) for older people receiving home care. J Adv Nurs. 2022;78(12):4210–20.

Goodsmith N, Zhang L, Ong MK, Ngo VK, Miranda J, Hirsch S, et al. Implementation of a community-partnered research suicide-risk management protocol: case study from community partners in care. Psychiatr Serv (Washington, DC). 2021;72(3):281–7.

Gorman LS, Littlewood DL, Quinlivan L, Monaghan E, Smith J, Barlow S, et al. Family involvement, patient safety and suicide prevention in mental healthcare: ethnographic study. BJPsych open. 2023;9(2):e54.

Green MM, Meyer C, Hutchinson AM, Sutherland F, Lowthian JA. Co‐designing being your best program—a holistic approach to frailty in older community dwelling australians. Health Soc Care Community. 2021;30(5):e2022–32.

Guo X, Wang Y, Wang L, Yang X, Yang W, Lu Z, et al. Effect of a fall prevention strategy for the older patients: A quasi-experimental study. Nurs Open. 2023;10(2):1116–24.

Hahn-Goldberg S, Chaput A, Rosenberg-Yunger Z, Lunsky Y, Okrainec K, Guilcher S, et al. Tool development to improve medication information transfer to patients during transitions of care: A participatory action research and design thinking methodology approach. Res Social Adm Pharm. 2022;18(1):2170–7.

Harrington A, Darke H, Ennis G, Sundram S. Evaluation of an alternative model for the management of clinical risk in an adult acute psychiatric inpatient unit. Int J Ment Health Nurs. 2019;28(5):1099–109.

Harris K, Softeland E, Moi AL, Harthug S, Ravnoy M, Storesund A, et al. Development and validation of patients’ surgical safety checklist. BMC Health Serv Res. 2022;22(1):259.

Hawley-Hague H, Tacconi C, Mellone S, Martinez E, Ford C, Chiari L, et al. Smartphone apps to support falls rehabilitation exercise: app development and usability and acceptability study. JMIR Mhealth Uhealth. 2020;8(9):e15460.

Holmqvist M, Ros A, Lindenfalk B, Thor J, Johansson L. How older persons and health care professionals co-designed a medication plan prototype remotely to promote patient safety: case study. JMIR aging. 2023;6:e41950.

Jayesinghe R, Moriarty F, Khatter A, Durbaba S, Ashworth M, Redmond P. Cost outcomes of potentially inappropriate prescribing in middle-aged adults: A Delphi consensus and cross-sectional study. Br J Clin Pharmacol. 2022;88(7):3404–20.

Johannessen T, Ree E, Stromme T, Aase I, Bal R, Wiig S. Designing and pilot testing of a leadership intervention to improve quality and safety in nursing homes and home care (the SAFE-LEAD intervention). BMJ Open. 2019;9(6):e027790.

Joseph K, Newman B, Manias E, Walpola R, Seale H, Walton M, et al. Engaging with ethnic minority consumers to improve safety in cancer services: A national stakeholder analysis. Patient Educ Couns. 2022;105(8):2778–84.

Khan A, Spector ND, Baird JD, Ashland M, Starmer AJ, Rosenbluth G, et al. Patient safety after implementation of a coproduced family centered communication programme: multicenter before and after intervention study. BMJ. 2018;363:k4764.

Khazen M, Mirica M, Carlile N, Groisser A, Schiff GD. Developing a framework and electronic tool for communicating diagnostic uncertainty in primary care: a qualitative study. JAMA Network Open. 2023;6(3):e232218-e.

Knight SW, Trinkle J, Tschannen D. Hospital-to-homecare videoconference handoff: improved communication, coordination of care, and patient/family engagement. Home Healthc Now. 2019;37(4):198–207.

Lawrence V, Kimona K, Howard RJ, Serfaty MA, Wetherell JL, Livingston G, et al. Optimising the acceptability and feasibility of acceptance and commitment therapy for treatment-resistant generalised anxiety disorder in older adults. Age Ageing. 2019;48(5):741–50.

Louch G, Reynolds C, Moore S, Marsh C, Heyhoe J, Albutt A, et al. Validation of revised patient measures of safety: PMOS-30 and PMOS-10. BMJ Open. 2019;9(11):e031355.

MacDonald T, Jackson S, Charles M-C, Periel M, Jean-Baptiste M-V, Salomon A, et al. The fourth delay and community-driven solutions to reduce maternal mortality in rural Haiti: a community-based action research study. BMC Pregnancy Childbirth. 2018;18(1):254.

Mackintosh N, Sandall J, Collison C, Carter W, Harris J. Employing the arts for knowledge production and translation: Visualizing new possibilities for women speaking up about safety concerns in maternity. Health Expect. 2018;21(3):647–58.

Marchand K, Turuba R, Katan C, Brasset C, Fogarty O, Tallon C, et al. Becoming our young people’s case managers: caregivers’ experiences, needs, and ideas for improving opioid use treatments for young people using opioids. Subst Abuse Treat Prev Policy. 2022;17(1):1–15.

Mazuz K, Biswas S. Co-designing technology and aging in a service setting: Developing an interpretive framework of how to interact with older age users. Gerontechnology. 2022;21(1):1–13.

McCahon D, Duncan P, Payne R, Horwood J. Patient perceptions and experiences of medication review: qualitative study in general practice. BMC Prim Care. 2022;23(1):293.

McMullen S, Panagioti M, Planner C, Giles S, Angelakis I, Keers RN, et al. Supporting carers to improve patient safety and maintain their well-being in transitions from mental health hospitals to the community: A prioritisation nominal group technique. Health Expect. 2023;26(5):2064–74.

Morris RL, Giles S, Campbell S. Involving patients and carers in patient safety in primary care: A qualitative study of a co-designed patient safety guide. Health Expect. 2023;26(2):630–9.

Morris RL, Stocks SJ, Alam R, Taylor S, Rolfe C, Glover SW, et al. Identifying primary care patient safety research priorities in the UK: a James Lind Alliance Priority Setting Partnership. BMJ Open. 2018;8(2):e020870.

Nether KG, Thomas EJ, Khan A, Ottosen MJ, Yager L. Implementing a robust process improvement program in the neonatal intensive care unit to reduce harm. J Healthc Qual. 2022;44(1):23–30.

Powell C, Ismail H, Cleverley R, Taylor A, Breen L, Fylan B, et al. Patients as qualitative data analysts: Developing a method for a process evaluation of the “Improving the Safety and Continuity of Medicines management at care Transitions” (ISCOMAT) cluster randomised control trial. Health Expect. 2021;24(4):1254–62.

Article   PubMed Central   Google Scholar  

Powell C, Ismail H, Davis M, Taylor A, Breen L, Fylan B, et al. Experiences of patients with heart failure with medicines at transition intervention: Findings from the process evaluation of the Improving the Safety and Continuity of Medicines management at Transitions of care (ISCOMAT) programme. Health Expect. 2022;25(5):2503–14.

Radecki B, Keen A, Miller J, McClure JK, Kara A. Innovating fall safety: engaging patients as experts. J Nurs Care Qual. 2020;35(3):220–6.

Rosgen BK, Plotnikoff KM, Krewulak KD, Shahid A, Hernandez L, Sept BG, et al. Co-development of a transitions in care bundle for patient transitions from the intensive care unit: a mixed-methods analysis of a stakeholder consensus meeting. BMC Health Serv Res. 2022;22(1):10.

Schenk EC, Bryant RA, Van Son CR, Odom-Maryon T. Developing an intervention to reduce harm in hospitalized patients: patients and families in research. J Nurs Care Qual. 2019;34(3):273–8.

Spazzapan M, Vijayakumar B, Stewart CE. A bit about me: Bedside boards to create a culture of patient-centered care in pediatric intensive care units (PICUs). J Healthc Risk Manag. 2020;39(3):11–9.

Stoll JA, Ranahan M, Richbart MT, Brennan-Taylor MK, Taylor JS, Brady L, et al. Development of video animations to encourage patient-driven deprescribing: A team alice study. Patient Educ Couns. 2021;104(11):2716–23.

Subbe CP, Tomos H, Jones GM, Barach P. Express check-in: developing a personal health record for patients admitted to hospital with medical emergencies: a mixed-method feasibility study. Int J Qual Health Care. 2021;33(3):121.

Tai D, Li E, Liu-Ambrose T, Bansback N, Sadatsafavi M, Davis JC. Patient-Reported Outcome Measures (PROMs) to support adherence to falls prevention clinic recommendations: a qualitative study. Patient Prefer Adherence. 2020;14:2105–21.

Thakur T, Chewning B, Zetes N, Lee JTY. Involving caregivers in design and assessment of opioid risk and safety communication intervention in children. Patient Educ Couns. 2021;104(10):2432–6.

Thomas J, Dahm MR, Li J, Georgiou A. Can patients contribute to enhancing the safety and effectiveness of test-result follow-up? Qualitative outcomes from a health consumer workshop. Health Expect. 2021;24(2):222–33.

Tremblay MC, Bradette-Laplante M, Witteman HO, Dogba MJ, Breault P, Paquette JS, et al. Providing culturally safe care to indigenous people living with diabetes: Identifying barriers and enablers from different perspectives. Health Expect. 2021;24(2):296–306.

Troya MI, Dikomitis L, Babatunde OO, Bartlam B, Chew-Graham CA. Understanding self-harm in older adults: A qualitative study. EClinicalMedicine. 2019;12:52–61.

Tyler N, Giles S, Daker-White G, McManus BC, Panagioti M. A patient and public involvement workshop using visual art and priority setting to provide patients with a voice to describe quality and safety concerns: Vitamin B12 deficiency and pernicious anaemia. Health Expect. 2021;24(1):87–94.

Tyler N, Planner C, Shears B, Hernan A, Panagioti M, Giles S. Developing the Resident Measure of Safety in Care Homes (RMOS): A Delphi and think aloud study. Health Expect. 2023;26(3):1149–58.

Van den Bulck SA, Vankrunkelsven P, Goderis G, Van Pottelbergh G, Swerts J, Panis K, et al. Developing quality indicators for Chronic Kidney Disease in primary care, extractable from the Electronic Medical Record. A Rand-modified Delphi method. BMC Nephrol. 2020;21(1):161.

Van Strien-Knippenberg IS, Boshuizen MCS, Determann D, de Boer JH, Damman OC. Cocreation with Dutch patients of decision-relevant information to support shared decision-making about adjuvant treatment in breast cancer care. Health Expect. 2022;25(4):1664–77.

Wilson NA, Reich AJ, Graham J, Bhatt DL, Nguyen LL, Weissman JS. Patient perspectives on the need for implanted device information: Implications for a post-procedural communication framework. Health Expect. 2021;24(4):1391–402.

Winterberg AV, Lane B, Hill LM, Varughese AM. Optimizing Pediatric Induction Experiences Using Human-centered Design. J Perianesth Nurs. 2022;37(1):48–52.

Yang R, Donaldson GW, Edelman LS, Cloyes KG, Sanders NA, Pepper GA. Fear of older adult falling questionnaire for caregivers (FOAFQ-CG): Evidence from content validity and item-response theory graded-response modelling. J Adv Nurs. 2020;76(10):2768–80.

Young A, Menon D, Street J, Al-Hertani W, Stafinski T. A checklist for managed access programmes for reimbursement co-designed by Canadian patients and caregivers. Health Expect. 2018;21(6):973–80.

Yuen EYN, Street M, Abdelrazek M, Blencowe P, Etienne G, Liskaser R, et al. Evaluating the efficacy of a digital App to enhance patient-centred nursing handover: A simulation study. J Clin Nurs. 2023;32(19–20):7626–37.

Jo S, Nabatchi T. Coproducing healthcare: individual-level impacts of engaging citizens to develop recommendations for reducing diagnostic error. Public Manag Rev. 2019;21(3):354–75.

O’Hara JK, Reynolds C, Moore S, Armitage G, Sheard L, Marsh C, et al. What can patients tell us about the quality and safety of hospital care? Findings from a UK multicentre survey study. BMJ Qual Saf. 2018;27(9):673–82.

de Jong LD, Francis-Coad J, Wortham C, Haines TP, Skelton DA, Weselman T, et al. Evaluating audio-visual falls prevention messages with community-dwelling older people using a World Cafe forum approach. BMC Geriatrics. 2019;19(1):345.

O’Donnell D, Shé ÉN, McCarthy M, Thornton S, Doran T, Smith F, et al. Enabling public, patient and practitioner involvement in co-designing frailty pathways in the acute care setting. BMC Health Serv Res. 2019;19(1):797.

Russ S, Latif Z, Hazell A, Ogunmuyiwa H, Tapper J, Wachuku-King S, et al. A Smartphone app designed to empower patients to contribute toward safer surgical care: community-based evaluation using a participatory approach. Jmir Mhealth Uhealth. 2020;8(1):e12859.

Mazuz K, Biswas S, Lindner U. Developing self-management application of fall prevention among older adults: a content and usability evaluation. Front Digital Health. 2020;2:11.

Hjelmfors L, Strömberg A, Friedrichsen M, Sandgren A, Mårtensson J, Jaarsma T. Using co-design to develop an intervention to improve communication about the heart failure trajectory and end-of-life care. Bmc Palliat Care. 2018;17:17.

Horgan S, Hegarty J, Andrews E, Hooton C, Drennan J. Impact of a quality improvement intervention on the incidence of surgical site infection in patients undergoing colorectal surgery: Pre-test-post-test design. J Clin Nurs. 2023;32(15–16):4932–46.

Tyler N, Wright N, Grundy A, Waring J. Developing a core outcome set for interventions to improve discharge from mental health inpatient services: a survey, Delphi and consensus meeting with key stakeholder groups. BMJ Open. 2020;10(5):e034215.

Ward ME, De Brún A, Beirne D, Conway C, Cunningham U, English A, et al. Using Co-Design to Develop a Collective Leadership Intervention for Healthcare Teams to Improve Safety Culture. Int J Environ Res Public Health. 2018;15(6):1182.

Berthelsen DB, Simon LS, Ioannidis JPA, Voshaar M, Richards P, Goel N, et al. Stakeholder endorsement advancing the implementation of a patient-reported domain for harms in rheumatology clinical trials: outcome of the OMERACT safety working group. Semin Arthritis Rheum. 2023;63:152288.

Okkenhaug A, Tritter JQ, Landstad BJ. Developing a research tool to detect iatrogenic adverse events in psychiatric health care by involving service users and health professionals. J Psychiatr Ment Health Nurs. 2023;00:1–12.

Musbahi A, Clyde D, Small P, Courtney M, Mahawar K, Lamb PJ, et al. A systematic review of patient and public involvement (PPI) in bariatric research trials: the need for more work. Obes Surg. 2022;32(11):3740–51.

Patel VA, Shelswell J, Hillyard N, Pavitt S, Barber SK. A study of the reporting of patient and public involvement and engagement (PPIE) in orthodontic research. J Orthod. 2021;48(1):42–51.

Price A, Schroter S, Snow R, Hicks M, Harmston R, Staniszewska S, et al. Frequency of reporting on patient and public involvement (PPI) in research studies published in a general medical journal: a descriptive study. BMJ Open. 2018;8:e020452.

Amadea T, Anne-Marie B, Louise L. A researcher’s guide to patient and public involvement. 2017.

Download references

Acknowledgements

This research is funded as part of the Collective Leadership and Safety Cultures (Co-Lead) research programme which is funded by the Irish Health Research Board, grant reference number RL-2015–1588 and the Health Service Executive. The funders had no role in the study conceptualisation, design, data collection, analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and affiliations.

UCD Centre for Interdisciplinary Research, Education and Innovation in Health Systems (UCD IRIS), School of Nursing, Midwifery and Health Systems, Health Sciences Centre, University College Dublin, Dublin, Ireland

Sahar Hammoud, Laith Alsabek, Lisa Rogers & Eilish McAuliffe

Department of Oral and Maxillofacial Surgery, University Hospital Galway, Galway, Ireland

Laith Alsabek

You can also search for this author in PubMed   Google Scholar

Contributions

S.H and E.M.A designed the study. S.H developed the search strategies with feedback from L.A, L.R, and E.M.A. S.H conducted all searches. S.H and L.A screened the studies, extracted the data, and assessed the quality of PPI reporting. S.H analysed the data with feedback from E.M.A. S.H drafted the manuscript. All authors revised and approved the submitted manuscript. All authors agreed to be personally accountable for the author's own contributions and to ensure that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Authors’ information

Corresponding author.

Correspondence to Sahar Hammoud .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., supplementary material 2., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Hammoud, S., Alsabek, L., Rogers, L. et al. Systematic review on the frequency and quality of reporting patient and public involvement in patient safety research. BMC Health Serv Res 24 , 532 (2024). https://doi.org/10.1186/s12913-024-11021-z

Download citation

Received : 10 January 2024

Accepted : 21 April 2024

Published : 26 April 2024

DOI : https://doi.org/10.1186/s12913-024-11021-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Patient and public involvement
  • Patient participation
  • Research reporting
  • Research involvement

BMC Health Services Research

ISSN: 1472-6963

meaning of quality of data in research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Med Internet Res
  • PMC10131725

Logo of jmir

Digital Health Data Quality Issues: Systematic Review

1 School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia

Rebekah Eden

Tendai makasi, ignatius chukwudi, azumah mamudu, mostafa kamalpour, dakshi kapugama geeganage, sareh sadeghianasl, sander j j leemans.

2 Rheinisch-Westfälische Technische Hochschule, Aachen University, Aachen, Germany

Kanika Goel

Robert andrews, moe thandar wynn, arthur ter hofstede, trina myers, associated data.

Description of key terms.

Verification of search strategy.

Data coding structures.

Publication outlets.

Data quality definitions.

Evidence of the subtheme for each data quality dimension.

Evidence for the interrelationships among the dimensions of data quality.

Evidence for the outcomes of data quality.

The promise of digital health is principally dependent on the ability to electronically capture data that can be analyzed to improve decision-making. However, the ability to effectively harness data has proven elusive, largely because of the quality of the data captured. Despite the importance of data quality (DQ), an agreed-upon DQ taxonomy evades literature. When consolidated frameworks are developed, the dimensions are often fragmented, without consideration of the interrelationships among the dimensions or their resultant impact.

The aim of this study was to develop a consolidated digital health DQ dimension and outcome (DQ-DO) framework to provide insights into 3 research questions: What are the dimensions of digital health DQ? How are the dimensions of digital health DQ related? and What are the impacts of digital health DQ?

Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, a developmental systematic literature review was conducted of peer-reviewed literature focusing on digital health DQ in predominately hospital settings. A total of 227 relevant articles were retrieved and inductively analyzed to identify digital health DQ dimensions and outcomes. The inductive analysis was performed through open coding, constant comparison, and card sorting with subject matter experts to identify digital health DQ dimensions and digital health DQ outcomes. Subsequently, a computer-assisted analysis was performed and verified by DQ experts to identify the interrelationships among the DQ dimensions and relationships between DQ dimensions and outcomes. The analysis resulted in the development of the DQ-DO framework.

The digital health DQ-DO framework consists of 6 dimensions of DQ, namely accessibility, accuracy, completeness, consistency, contextual validity, and currency; interrelationships among the dimensions of digital health DQ, with consistency being the most influential dimension impacting all other digital health DQ dimensions; 5 digital health DQ outcomes, namely clinical, clinician, research-related, business process, and organizational outcomes; and relationships between the digital health DQ dimensions and DQ outcomes, with the consistency and accessibility dimensions impacting all DQ outcomes.

Conclusions

The DQ-DO framework developed in this study demonstrates the complexity of digital health DQ and the necessity for reducing digital health DQ issues. The framework further provides health care executives with holistic insights into DQ issues and resultant outcomes, which can help them prioritize which DQ-related problems to tackle first.

Introduction

The health care landscape is changing globally owing to substantial investments in health information systems that seek to improve health care outcomes [ 1 ]. Despite the rapid adoption of health information systems [ 2 ] and the perception of digital health as a panacea [ 3 ] for improving health care quality, the outcomes have been mixed [ 4 , 5 ]. As Reisman [ 6 ] noted, despite substantial investment and effort and widespread application of digital health, many of the promised benefits have yet to be realized.

The promise of digital health is principally dependent on the ability to electronically capture data that can be analyzed to improve decision-making at the local, national [ 6 ], and global levels [ 7 ]. However, the ability to harness data effectively and meaningfully has proven difficult and elusive, largely because of the quality of the data captured. Darko-Yawson and Ellingsen [ 8 ] highlighted that digital health has resulted in more bad data rather than improving the quality of data. It is widely accepted that the data from digital health are plagued by accuracy and completeness concerns [ 9 - 12 ]. Poor data quality (DQ) can be detrimental to continuity of care [ 13 ], patient safety [ 14 ], clinician productivity [ 15 ], and research [ 16 ].

To assess DQ, scholars have developed numerous DQ taxonomies, which evaluate the extent to which the data contained within digital health systems adhere to multiple dimensions (ie, measurable components of DQ). Weiskopf and Weng [ 17 ] identified 5 dimensions of DQ spanning completeness, correctness, concordance, plausibility, and currency. Subsequently, Weiskopf et al [ 18 ] refined the typology to consist of only 3 dimensions: completeness, correctness, and currency. Similarly, Puttkammer et al [ 13 ] focused on completeness, accuracy, and timeliness, whereas Kahn et al [ 19 ] examined conformance, completeness, and plausibility. Others identified “fitness of use” [ 20 ] and the validity of data to a specific context [ 21 ] as key DQ dimensions. Overall, there are wide-ranging definitions of DQ, with an agreed-upon taxonomy evading the literature. In this paper, upon synthesizing the literature, we define DQ as the extent to which digital health data are accessible, accurate, complete, consistent, contextually valid, and current. When consolidated frameworks are developed, the dimensions are often treated in a fragmented manner, with few attempts to understand the relationships between the dimensions and the resultant outcomes. This is substantiated by Bettencourt-Silva et al [ 22 ], who indicated that DQ is not systematically or consistently assessed.

Research Aims and Questions

Failure of health organizations to leverage high-quality data will compromise the sustainability of an already strained health care system [ 23 ]. Therefore, we undertook a systematic literature review to answer the following research questions: (1) What are the dimensions of digital health DQ? (2) How are the dimensions of digital health DQ related? and (3) What are the impacts of digital health DQ? The aim of this research was to develop, from synthesizing the literature, a consolidated digital health DQ dimension and outcome (DQ-DO) framework, which demonstrates the DQ dimensions and their interrelationships as well as their impact on core health care outcomes. The consolidated DQ-DO framework will be beneficial to both research and practice. For researchers, our review consolidates the digital health DQ literature and provides core areas for future research to rigorously evaluate and improve digital health DQ. For practice, this study provides health care executives and strategic decision makers with insights into both the criticality of digital health DQ by exemplifying the impacts and the complexity of digital health DQ by demonstrating the interrelationships between the dimensions. Multimedia Appendix 1 [ 24 ] provides a list of common acronyms used in this study.

This paper is structured as follows: first, we provide details of the systematic literature review method; second, in line with the research questions, we present our 3 key findings—(1) DQ dimensions, (2) DQ interrelationships, and (3) DQ outcomes; and third, we compare the findings of our study with those of previous studies and discuss the implications of this work.

We followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines and the guidelines proposed by Webster and Watson [ 25 ] for systematic literature reviews. Specifically, consistent with Templier and Paré [ 26 ], this systematic literature review was developmental in nature with the goal of developing a consolidated digital health DQ framework.

Literature Search and Selection

To ensure the completeness of the review [ 25 ] and consistent with interdisciplinary reviews, the literature search spanned multiple fields and databases (ie, PubMed, Public Health, Cochrane, SpringerLink, EBSCOhost [MEDLINE and PsycInfo], ABI/INFORM, AISel, Emerald Insight, IEEE Xplore digital library, Scopus, and ACM Digital Library). The search was conducted in October 2021 and was not constrained by the year of publication because the concept of DQ has a long-standing academic history. The search terms were reflective of our research topic and research questions. To ensure comprehensiveness, the search terms were broadened by searching their synonyms. For example, we used search terms such as “electronic health record,” “digital health record,” “e-health,” “electronic medical record,” “EHR,” “EMR,” “data quality,” “data reduction,” “data cleaning,” “data pre-processing,” “information quality,” “data cleansing,” “data preparation,” “intelligence quality,” “data wrangling,” and “data transformation.” Keywords and search queries were reviewed by the reference librarian and subject matter experts in digital health ( Multimedia Appendix 2 ).

The papers returned from the search were narrowed down in a 4-step process ( Figure 1 ). In the identification step, 5177 articles were identified through multiple database searches, and from these, 3856 (74.48%) duplicates were removed, resulting in 1321 (25.42%) articles. These 1321 articles were randomly divided into 6 batches, which were assigned to separate researchers, who applied the inclusion and exclusion criteria ( Textbox 1 ). As a result of abstract screening, 67.83% (896/1321) of articles were excluded, resulting in 425 (32.17%) articles. Following an approach to the abstract screening, the 425 articles were again randomly divided into 6 batches and assigned to 1 of the 6 researchers to read and assess the relevance of the article in line with the selection criteria. The assessment of each of the 425 articles was then verified by the research team, resulting in a final set of 227 (53.4%) relevant articles. During this screening phase (ie, abstract and full text), daily meetings were held with the research team in which any uncertainties were raised and discussed until consensus was reached by the team as to whether the article should be included or excluded from the review In line with Templier and Paré [ 26 ], as this systematic literature review was developmental in nature rather than an aggregative meta-analysis, quality appraisals were not performed on individual articles.

An external file that holds a picture, illustration, etc.
Object name is jmir_v25i1e42615_fig1.jpg

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) inclusion process. EHR: electronic health record.

Inclusion and exclusion criteria.

Inclusion criteria

  • Specifically focuses on data quality in digital health
  • Empirical papers or review articles where conceptual frameworks were either developed or assessed
  • Considers digital health within hospital settings
  • Published in peer-reviewed outlets within any time frame
  • Published in English

Exclusion criteria

  • Development of algorithms for advanced analytics techniques (eg, machine learning and artificial intelligence) without application within hospital settings
  • Descriptive papers without a conceptual framework or an empirical analysis
  • Focused only on primary care (eg, general practice)
  • Pre–go-live considerations (eg, software development)
  • Theses and non–peer-reviewed publications (eg, white papers and editorials)

Literature Analysis

The relevant articles were imported to NVivo (version 12; QSR International), where the analysis was iteratively performed. To ensure reliability and consistency in coding, a coding rule book [ 27 ] was developed and progressively updated to guide the coding process. The analysis involved 6 steps ( Figure 2 ).

An external file that holds a picture, illustration, etc.
Object name is jmir_v25i1e42615_fig2.jpg

Analysis process. DQ: data quality.

In the first step of the analysis, the research team performed open coding [ 27 ], where relevant statements from each article were extracted using verbatim codes and grouped based on similarities [ 28 ]. The first round of coding resulted in 1298 open codes. Second, the open codes were segmented into 2 high-level themes: the first group contained 1044 (80.43%) open codes pertaining directly to DQ dimensions (eg, data accuracy), and the second group contained 254 (19.57%) open codes pertaining to DQ outcomes (eg, financial outcomes).

In the third step, through constant comparison [ 29 ], the 1044 raw DQ codes were combined into 29 DQ subthemes based on commonalities (eg, contextual DQ, fitness for use, granularity, relevancy, accessibility, and availability). In the fourth step, again by performing iterative and multiple rounds of constant comparison, the 254 open codes related to DQ outcomes were used to construct 22 initial DQ outcome subthemes (eg, patient safety, clinician-patient relationship, and continuity of care). The DQ outcome subthemes were further compared with each other, resulting in 5 DQ outcome dimensions (eg, clinical, business process, research-related, clinician, and organizational outcomes). For the DQ subthemes, a constant comparison was performed using the card-sorting method [ 30 ], where an expert panel of 8 DQ researchers split into 4 groups assessed the subthemes for commonalities and differences. The expert groups presented their categorization to each other until a consensus was reached. This resulted in a consolidated set of 6 DQ dimensions (accuracy, consistency, completeness, contextual validity, accessibility, and currency). Multimedia Appendix 3 [ 9 , 12 , 13 , 15 , 16 , 18 , 19 , 21 , 31 - 65 ] provides an example of how the open codes were reflected in the subthemes and themes.

After identifying the DQ dimensions and outcomes, the next stage of coding progressed to identifying the interrelationships (step 5) among the DQ dimensions and the relationships (step 6) between the DQ dimensions and DQ outcomes. To this end, the matrix coding query function using relevant Boolean operators (AND and NEAR) in NVivo was performed. The outcomes of the matrix queries were reviewed and verified by an expert researcher in the health domain.

Throughout the analysis, steps for providing credibility to our findings were performed. First, before commencing the analysis, the research team members who extracted the verbatim codes initially independently reviewed 3 common articles and then convened to review any variations in coding. In addition, they reconvened multiple times a week to discuss their coding and update the codebook to ensure that a consistent approach was followed. Coder corroboration was performed throughout the analysis, with 2 experienced researchers independently verifying all verbatim codes until a consensus was reached [ 27 ]. Subsequent coder corroboration was performed by 2 experienced researchers to ensure that the open codes were accurately mapped to the themes and dimensions. This served to provide internal reliability. Steps for improving external reliability were also performed [ 66 ]. Specifically, the card-sorting method provided an expert appraisal. In addition, the findings were presented to and confirmed by 3 digital health care professionals.

The vast majority of relevant articles were published in journal outlets (169/227, 74.4%), followed by conference proceedings (42/227, 18.5%) and book sections (16/227, 7%). The 169 journal articles were published in 107 journals, with 12% (n=13) of the journals publishing >1 study (these journals are BMC Medical Informatics and Decision Making, eGEMS , International Journal of Medical Informatics , Applied Clinical Informatics , Journal of Medical Internet Research , Journal of the American Medical Information Association , PLOS One , BMC Emergency Medicine , Computer Methods and Programs in Biomedicine , International Journal of Population Data Science , JCO Clinical Cancer Informatics , Perspectives in Health Information Management , Studies in Health Technology and Informatics , Australian Health Review , MBC Health Services Research , BMJ Open , Decision Support Systems , Health Informatics Journal , International Journal of Information Management , JAMIA Open , JMIR Medical Informatics , Journal of Biomedical Informatics , Journal of Medical Systems , Malawi Medical Journal , Medical Care , Online Journal of Public Health Informatics , and Telemedicine and e-Health ). A complete breakdown of the number of articles published in each outlet is provided in Multimedia Appendix 4 .

Overall, as illustrated in Figure 3 , the interest in digital health DQ has been increasing over time, with sporadic interest before 2006.

An external file that holds a picture, illustration, etc.
Object name is jmir_v25i1e42615_fig3.jpg

Publications by year.

In the subsequent sections, we provide an overview of the DQ definitions, DQ dimensions, their interrelationships, and DQ outcomes to develop a consolidated digital health DQ framework.

DQ Definitions

Multiple definitions of DQ were discussed in the literature ( Multimedia Appendix 5 [ 17 , 18 , 20 - 22 , 31 , 54 , 67 - 77 ]). There was no consensus on a single definition of DQ; however, an analysis of the definitions revealed two perspectives, which we labeled as the (1) context-agnostic perspective and (2) context-aware perspective. The context-agnostic perspective defines DQ based on a set of dimensions, regardless of the context within which the data are used. For instance, as Abiy et al [ 67 ] noted “documentation and contents of data within an electronic medical record (EMR) must be accurate, complete, concise, consistent and universally understood by users of the data, and must support the legal business record of the organization by maintaining the required parameters such as consistency, completeness and accuracy.” By contrast, the context-aware perspective evaluates the dimensions of DQ with recognition of the context within which the data are used. For instance, as the International Organization for Standardization and Liu et al [ 78 ] noted, DQ is “the degree to which data satisfy the requirements defined by the product-owner organization” and can be reflected through its dimensions such as completeness and accuracy.

DQ Dimensions

In total, 30 subthemes were identified and grouped into 6 DQ dimensions: accuracy, consistency, completeness, contextual validity, accessibility, and currency ( Table 1 ; Multimedia Appendix 6 [ 8 - 12 , 14 - 16 , 18 - 22 , 31 - 62 , 67 , 69 , 71 , 72 , 76 , 79 - 168 ]). Consistency (164/227, 72.2%), completeness (137/227, 60.4%), and accuracy (123/227, 54.2%) were the main DQ dimensions. Comparatively, less attention was paid to accessibility (28/227, 12.3%), currency (18/227, 7.9%), and contextual validity (26/227. 11.5%).

Description of the data quality (DQ) dimensions.

DQ Dimension: Accessibility

The accessibility dimension (28/227, 12.3%) is composed of both the accessibility (15/28, 54%) and availability (13/28, 46%) subthemes, reflecting the feasibility for users to extract data of interest [ 18 ]. Scholars regularly view the accessibility subtheme favorably, with the increased adoption of electronic health record (EHR) systems overcoming physical and chronological boundaries associated with paper records by allowing access to information from multiple locations at any time [ 33 , 80 ]. Top et al [ 33 ] noted that EHR made it possible for nurses to access patient data, resulting in improved decision-making. Furthermore, Rosenlund et al [ 81 ] noted that EHRs benefit health care professionals by providing increased opportunities for searching and using information. The availability subtheme is an extension of the accessibility subtheme and examines whether data exist and whether the existing data are in a format that is readily usable [ 34 ]. For instance, Dentler et al [ 34 ] noted that pathology reports, although accessible, are recorded in a nonstructured, free-text format, making it challenging to readily use the data. Although structuredness may make data more available, Yoo et al [ 82 ] highlighted that structured data entry in the form of drop-down lists and check boxes tends to reduce the narrative description of patients’ medical conditions. Although not explicitly investigating accessibility, Makeleni and Cilliers [ 31 ] also noted the challenges associated with structured data entry.

DQ Dimension: Accuracy

The accuracy dimension (123/227, 54.2%) is composed of 7 subthemes, namely correctness (42/123, 34.1%), validity (23/123, 18.7%), integrity (19/123, 15.4%), plausibility (17/123, 13.8%), accurate diagnostic data (13/123, 10.6%), conformance (7/123, 5.7%), and veracity (2/123, 1.6%). Accuracy refers to the extent to which data reveal the truth about the event being described [ 31 ] and conform to their actual value [ 83 ].

Studies often referred to accuracy as the “ correctness” of data, which is the degree to which data correctly communicate the parameter being represented [ 32 ]. By contrast, other studies focused on plausibility , which is the extent to which data points are believable [ 35 ]. Although accuracy concerns were present for all forms of digital health data, some studies focused specifically on inaccuracies in diagnostic data and stated that “the accurate and precise assignment of structured [diagnostic] data within EHRs is crucial” [ 84 ] and is “key to supporting secondary clinical data” [ 36 ].

To assess accuracy, the literature regularly asserts that data must be validated against metadata constraints, system assumptions, and local knowledge [ 19 ] and conform to structural and syntactical rules. According to Kahn et al [ 19 ] and Sirgo et al [ 85 ], conformance focuses on the compliance of data with internal or external formatting and relational or computational definitions. Accurate, verified, and validated data as well as data conforming to standards contribute to the integrity of the data. Integrity requires that the data stored in health information systems are accurate and consistent, where the “improper use of [health information systems] can jeopardise the integrity of a patient’s information” [ 31 ]. An emerging subtheme of accuracy is the veracity of data, which represents uncertainty in the data owing to inconsistency, ambiguity, latency, deception, and model approximations [ 21 ]. It is particularly important in the context of the secondary use of big data, where “data veracity issues can arise from attempts to preserve privacy,...and is a function of how many sources contributed to the data” [ 86 ].

DQ Dimension: Completeness

The completeness dimension (114/227, 50.2%) is composed of 5 subthemes: missing data (66/114, 57.9%), level of completeness (25/114, 21.9%), representativeness (13/114, 11.4%), fragmentation (8/114, 7%), and breadth of documentation (2/114, 1.8%). A well-accepted definition of data completeness considers 4 perspectives: documentation (the presence of observations regarding a patient in data), breadth (the presence of all desired forms of data), density (the presence of a desired frequency of data values over time), and prediction (the presence of sufficient data to predict an outcome) [ 169 ]. Our analysis revealed that these 4 perspectives, although accepted, are rarely systematically examined in the extant literature; rather, papers tended to discuss completeness or the lack thereof as a whole.

Missing data is a prominent subtheme and represents a common problem in EHR data. For instance, Gloyd et al [ 87 ] argued that incomplete, missing, and implausible data “was by far the most common challenge encountered.” Scholars regularly identified that data fragmentation contributed to incompleteness, with a patient’s medical record deemed incomplete owing to data being required from multiple systems and EHRs [ 18 , 37 , 88 - 93 ]. “Data were also considered hidden within portals, outside systems, or multiple EHRs, frustrating efforts to assemble a complete clinical picture of the patient” [ 89 ]. Positive perspectives pertaining to data completeness focus on the level of completeness, with studies reporting relatively high completeness rates in health data sets [ 34 , 38 , 80 , 94 , 95 , 170 ]. For data to be considered complete, it needs to be captured at sufficient breadth and depth over time [ 12 , 18 ].

Some studies have proposed techniques for improving completeness, including developing fit-for-purpose user interfaces [ 68 , 96 , 97 ], standardizing documentation practices, [ 98 , 99 ], automating documentation [ 100 ], and performing quality control [ 99 ].

In some instances, the level of completeness and extent of missing data differed depending on the health status of the patient [ 15 , 16 , 18 , 20 , 39 - 43 , 86 , 90 , 101 , 170 , 171 ], which we classified into the subtheme of representativeness . It has been found that there is “a statistically significant relationship between EHR completeness and patient health status” [ 42 ], with more data recorded for patients who are sick than for patients with less-acute conditions. This strongly aligns with the subtheme of contextual validity.

DQ Dimension: Consistency

The consistency dimension (157/227, 69.2%) is composed of 10 subthemes: inconsistent data capturing (33/157, 21%), standardization (28/157, 17.8%), concordance (22/157, 14%), uniqueness (14/157, 8.9%), data variability (14/157, 8.9%), temporal variability (13/157, 8.3%), system differences (12/157, 7.6%), semantic consistency (10/157, 6.4%), structuredness (7/157, 4.5%), and representational consistency (4/157, 2.5%).

Inconsistent data capturing is a prevalent subtheme caused by the manual nature of data entry in health care settings [ 86 ], especially when data entry involve multiple times, teams, and goals [ 102 ]. Inconsistent data capturing results in data variability and temporal variability . Data variability refers to inconsistencies in the data captured within and between health information systems, whereas temporal variability reflects inconsistencies that occur over time and may be because of changes in policies or medical guidelines [ 20 , 44 - 46 , 87 , 103 - 105 ]. Semantic inconsistency (ie, data with logical contradictions) and representational inconsistency (ie, data variations owing to multiple formats) can also result from inconsistent data capturing [ 47 ].

Standardization in terms of terminology, diagnostic codes, and workflows [ 99 ] are proffered to minimize inconsistency in data entry, yet in practice, there is a “lack of standardized data and terminology” [ 9 ] and “even with a set standard in place not all staff accept and follow the routine” [ 99 ]. The lack of standardization is further manifested because of health information system differences across settings [ 106 ]. As a result of the differences between systems, concordance —the extent of “agreement between elements in the EHR, or between the EHR and another data source”—is hampered [ 107 ].

Furthermore, inconsistent data entry can be caused by redundancy within the system because of structured versus unstructured data [ 108 ], which we label as the subtheme “ structuredness ,” and duplication across systems [ 39 , 48 , 104 , 109 , 172 , 173 ], which we label as the subtheme “ uniqueness .” Although structured data entry “facilitates information retrieval” [ 33 ] and is “in a format that enables reliable extraction” [ 18 ], the presence of unstructured fields leads to data duplication efforts, hampering uniqueness, as data are recorded in multiple places with varying degrees of granularity and level of detail.

DQ Dimension: Contextual Validity

The contextual validity dimension (26/227, 11.5%) is composed of 4 subthemes: fitness for use (11/26, 42%), contextual DQ (9/26, 35%), granularity (4/26, 15%), and relevancy (2/26, 8%). Contextual validity requires a deep understanding of the context that gives rise to data [ 86 ], including technical, organizational, behavioral, and environmental factors [ 174 ].

Contextual DQ is often described as “fitness of use” [ 20 ], for which understanding the context in which data are collected is deemed important [ 18 , 90 ]. Another factor that contributes to data being fit for use is the granularity of data. Adequate granularity of time stamps [ 49 ], patient information [ 16 ], and data present in EHR (eg, diagnostic code [ 16 ]) was considered important to make data fit for use. Finally, for data to be fit for use, they must be relevant . As indicated by Schneeweiss and Glynn [ 41 ], for data to be meaningful, health care databases need to contain relevant information of sufficient quality, which can help answer specific questions. The literature clearly demonstrates the need to take context into consideration when analyzing data and the need to adapt technologies to the health care context so that appropriate data are collected for reliable analysis to be performed.

DQ Dimension: Currency

The currency dimension (18/227, 7.9%) is composed of a single subtheme: timeliness . Currency, or timeliness, is defined by Afshar et al [ 32 ] and Makeleni and Cilliers [ 31 ] as the degree to which data represent reality from the required point in time. From an EHR perspective, data should be up to date, available, and reflect the profile of the patient at the time when the data are accessed [ 32 , 50 ]. Lee et al [ 35 ] extended this to include the recording of an event at the time when it occurs such that a value is deemed current if it is representative of the clinically relevant time of the event. Frequently mentioned causes for lack of currency of data include (1) recording of events (long) after the event actually occurred [ 91 , 99 , 110 , 111 ], (2) incomplete recording of patient characteristics over time [ 16 ], (3) system or interface design not matching workflow and impeding timely recording of data [ 99 ], (4) mixed-mode recording—paper and electronic [ 99 ], and (5) lack of time stamp metadata, meaning that the temporal sequence of events is not reflected in the recorded data [ 16 ].

Interrelationships Among the DQ Dimensions

As illustrated in Figure 4 and Multimedia Appendix 7 [ 16 , 34 , 40 , 42 , 78 , 80 , 90 , 91 , 109 ], interrelationships were found among the digital health DQ dimensions.

An external file that holds a picture, illustration, etc.
Object name is jmir_v25i1e42615_fig4.jpg

Interrelationships between the data quality (DQ) dimensions.

Consistency influenced all the DQ dimensions. Commonly, these relationships were expressed in terms of the presence of structured and consistent data entry, which prompts complete and accurate data to be entered into the health information system and provides more readily accessible and current data for health care professionals when treating patients. As Roukema et al [ 80 ] noted, “structured data entry applications can prompt for completeness, provide greater accuracy and better ordering for searching and retrieval, and permit validity checks for DQ monitoring, research, and especially decision support.” When data are entered inconsistently, it impedes the accuracy of the medical record and the contextual validity for secondary uses of the data [ 40 ].

Accessibility of data was found to influence the currency dimension of DQ. When data are not readily accessible, they seldom satisfy the timeliness of information for health care or research purposes [ 34 ]. Currency also influenced the accuracy of data. In a study investigating where DQ issues in EHR arise, it was found that “false negatives and false positives in the problem list sometimes arose when the problem list...[was] out-of-date, either because a resolved problem was not removed or because an active problem was not added” [ 90 ].

Furthermore, completeness influences the accuracy of data; as Makeleni and Cilliers [ 31 ] noted, “data should be complete to ensure it is accurate.” The presence of inaccurate data was regularly linked to information fragmentation [ 88 ], incomplete data entry [ 109 ], and omissions [ 35 ]. Completeness also influenced contextual validity, as it is necessary to have all the data available to complete specific tasks [ 78 ]. When it comes to the secondary use of EHR data, evaluation of “completeness becomes extrinsic, and is dependent upon whether or not there are sufficient types and quantities of data to perform a research task of interest” [ 42 ].

Accuracy and contextual validity exhibited a bidirectional relationship with each other. The literature suggests that accuracy influences contextual validity; however, data cannot simply be extracted from structured form fields, and free-text fields will also need to be consulted. For instance, Kim and Kim [ 112 ] identified “it is sometimes thought that structured data are more completely optimized for clinical research. However, this is not always the case, particularly given that extracted EMR data can still be unstable and contain serious errors.” By contrast, other studies suggest that when only a segment of information regarding a specific clinical event (ie, contextual validity) is captured, inaccuracy can ensue [ 16 ].

Outcomes of Digital Health DQ

The analysis of the literature identified five types of digital health DQ outcomes: (1) clinical, (2) business process, (3) clinician, (4) research-related, and (5) organizational outcomes ( Multimedia Appendix 8 [ 15 , 16 , 20 , 31 , 33 , 39 , 40 , 42 , 51 , 52 , 55 , 57 , 58 , 61 , 63 , 64 , 84 , 90 , 105 , 113 , 166 , 175 - 178 ]). Using NVivo’s built-in cross-tab query coupled with subject matter expert analysis, it was identified that different DQ dimensions were related to DQ outcomes in different ways ( Table 2 ). Currency was the only dimension that did not have a direct effect on DQ outcomes. However, as shown in Figure 5 , it is plausible that currency affects DQ outcomes by impacting other DQ dimensions. In the subsequent paragraphs, we discuss each DQ dimension and its respective outcomes.

The relationships between data quality (DQ) dimensions and data outcomes.

a The checkmark symbol indicates that the relationship between the DQ dimension and the outcome is reported in the literature. Blank cells indicate that there is no evidence to support the relationship.

An external file that holds a picture, illustration, etc.
Object name is jmir_v25i1e42615_fig5.jpg

Consolidated digital health data quality dimension and outcome framework.

We identified that the accessibility DQ dimension influenced clinical, clinician, business process, research-related, and organizational outcomes. In terms of clinical outcomes, Roukema et al [ 80 ] indicated that EHRs substantially enhance the quality of patient care by improving the accessibility and legibility of health care data. The increased accessibility of medical records during the delivery of patient care is further proffered to benefit clinicians by reducing the data entry burden [ 33 ]. By contrast, inconsistency in the availability of data across health settings increases clinician workload; as Wiebe et al [ 15 ] noted, “given the predominantly electronic form of communication between hospitals and general practitioners in Alberta, the inconsistency in availability of documentation in one single location can delay processes for practitioners searching for important health information.” When data are accessible and available, they can improve business processes (eg, quality assurance) and research - related (eg, outcome-oriented research) outcomes and can support organizational outcomes with improved billing and financial management [ 179 ].

The literature demonstrates that data accuracy influences clinical outcomes [ 14 , 39 , 51 ] and research-related outcomes [ 14 , 113 ]; as Wang et al [ 14 ] described, “errors in healthcare data are numerous and impact secondary data use and potentially patient care and safety.” Downey et al [ 39 ] observed the negative impact on quality of care (ie, clinical outcomes) resulting from incorrect data and stated, “manual data entry remains a primary mechanism for acquiring data in EHRs, and if the data is incorrect then the impact to patients and patient care could be significant” [ 39 ]. Poor data accuracy also diminishes the quality of research outcomes . Precise data are beneficial in producing high-quality research outcomes. As Gibby [ 113 ] explained, “computerized clinical information systems have considerable advantages over paper recording of data, which should increase the likelihood of their use in outcomes research. Manual records are often inaccurate, biased, incomplete, and illegible.” Closely related to accuracy, contextual validity is an important DQ dimension that considers the fitness for research ; as stated by Weiskopf et al [ 42 ], “[w]hen repurposed for secondary use, however, the concept of ‘fitness for use’ can be applied.”

The consistency DQ dimension was related to all DQ outcomes. It was commonly reported that inconsistency in data negatively impacts the reusability of EHR data for research purposes, hindering research-related outcomes and negatively impacting business processes and organizational outcomes . For example, Kim et al [ 114 ] acknowledged that inconsistent data labeling in EHR systems may hinder accurate research results, noting that “a system may use local terminology that allows unmanaged synonyms and abbreviations...If local data are not mapped to terminologies, performing multicentre research would require extensive labour.” Alternatively, von Lucadou et al [ 16 ] indicated the impact of inconsistency on clinical outcomes , reporting that the existence of inconsistencies in captured data “could explain the varying number of diagnoses throughout the encounter history of some subjects,” whereas Diaz-Garelli et al [ 84 ] demonstrated the negative impact that inconsistency has on clinicians in terms of increased workload.

Incomplete EMR data were found to impact clinical outcomes (eg, reduced quality of care), business process outcomes (eg, interprofessional communication), research-related outcomes (eg, research facilitation), and organizational outcomes (eg, key performance indicators related to readmissions) [ 15 ]. For example, while reviewing the charts of 3011 nonobstetric inpatients, Wiebe et al [ 15 ] found that missing discharge summary within an EHR “can present several issues for healthcare processes, including hindered communication between hospitals and general practitioners, heightened risk of readmissions, and poor usability of coded health data,” among other widespread implications. Furthermore, Liu et al [ 69 ] reported that “having incomplete data on patients’ records has posed the greatest threat to patient care.” Owing to the heterogeneous nature (with multiple data points) of EHR data, Richesson et al [ 20 ] emphasized that access to large, complete data will allow clinical investigators “to detect smaller clinical effects, identify and study rare disorders, and produce robust, generalisable results.”

The following sections describe the three main findings of this research: the (1) dimensions of DQ, (2) interrelationships among the dimensions of DQ, and (3) outcomes of DQ. As described in the Summary of Key Findings section, these 3 findings led to the development of the DQ-DO framework. Subsequently, we compared the DQ-DO framework with related works. This leads to implications for future research. The Discussion section concludes with a reflection on the limitations of this study.

Summary of Key Findings

In summary, we unearthed 3 core findings. First, we identified 6 dimensions of DQ within the digital health domain: consistency, accessibility, completeness, accuracy, contextual validity, and currency. These dimensions were synthesized from 30 subthemes described in the literature. We found that consistency, completeness, and accuracy are the predominant dimensions of DQ. Comparatively, limited attention has been paid to the dimensions of accessibility, currency, and contextual validity. Second, we identified the interrelationships among these 6 dimensions of digital health DQ ( Table 2 ). The literature indicates that the consistency dimension can influence all other DQ dimensions. The accessibility of data was found to influence the currency of data. Completeness impacts accuracy and contextual validity, with these dimensions serving as dependent variables and exhibiting a bidirectional relationship with each other. Third, we identified 5 types of data outcomes ( Table 2 ; Multimedia Appendix 8 ): research-related, organizational, business process, clinical, and clinician outcomes. Consistency was found to be a very influential dimension, impacting all types of DQ outcomes. By contrast, contextual validity was shown to be particularly important for data reuse (eg, performance measurement and outcome-oriented research). Although currency does not directly impact any outcomes, it impacts the accuracy of data, which impacts clinical and research-related outcomes. Therefore, if currency issues are not resolved, accuracy issues would still prevail. Consistency, accessibility, and completeness were shown to be important considerations for achieving the goal of improving organizational outcomes. Through consolidating our 3 core findings, we developed a consolidated DQ-DO framework ( Figure 5 ).

Comparison With Literature

Our findings extend those of previous studies on digital health DQ in 3 ways. First, through our rigorous approach, we identified a comprehensive set of DQ dimensions, which both confirmed and extended the existing literature. For instance, Weiskopf and Weng [ 17 ] identified 5 DQ dimensions, namely completeness, correctness, concordance, plausibility, and currency, all of which are present within our DQ framework, although in some instances, we use slightly different terms (referring to correctness as accuracy and concordance as consistency). Extending the framework of Weiskopf and Weng [ 17 ], we view plausibility as a subtheme of accuracy and disentangle accessibility from completeness, and we also stress the importance of contextual validity per Richesson et al [ 20 ]. Others have commonly had a narrower perspective of DQ, focusing on completeness, correctness, and currency [ 18 ] or on completeness, timeliness, and accuracy [ 13 ]. In other domains of digital health, such as physician rating systems, Wang and Strong’s [ 180 ] DQ dimensions of intrinsic, contextual, representational, and accessibility have been adopted. Such approaches to assessing DQ are appropriate, although they remove a level of granularity that is necessary to understand relationships and outcomes. This is particularly necessary given the salience of consistency in our data set and the important role it plays in generating outcomes.

Second, unlike previous studies on DQ dimensions, we also demonstrate how these dimensions are all related to each other. By analyzing the interrelationships between these DQ dimensions, we can determine how a particular dimension influences another and in which direction this relationship is unfolding. This is an important implication for digital health practitioners, as although several studies have examined how to validate [ 38 ] and resolve DQ issues [ 16 ], resolving issues with a specific DQ dimension requires awareness of the interrelated DQ dimensions. For instance, to improve accuracy, one also needs to consider improving consistency and completeness.

Third, although previous studies describe how DQ can impact a particular outcome (eg, the studies by Weiskopf et al [ 18 ], Johnson et al [ 52 ], and Dantanarayana and Sahama [ 115 ]), they largely focus broadly on DQ, a specific dimension of DQ, or a specific outcome. For instance, Sung et al [ 181 ] noted that poor-quality data were a prominent barrier hindering the adoption of digital health systems. By contrast, Kohane et al [ 182 ] focused on research-related outcomes in terms of publication potential and identified that incompleteness and inconsistency can serve as core impediments. To summarize, the DQ-DO framework ( Figure 5 ) developed through this review provides not only the dimensions and the outcomes but also the interrelationships between these dimensions and how they influence outcomes.

Implications for Future Work

Implication 1: equal consideration across dq dimensions.

This study highlights the importance of each of the 6 DQ dimensions: consistency, accessibility, completeness, accuracy, contextual validity, and currency. These dimensions have received varying amounts of attention in the literature. Although we observe that some DQ dimensions such as accessibility, contextual validity, and currency are discussed less frequently than others, it does not mean that these dimensions are not important for assessment. This is evident in Figure 5 , which shows that all DQ dimensions except for currency directly influence DQ outcomes. Although we did not identify a direct relationship between the currency of data and the 6 types of data outcomes, it is likely that the currency of data influences the accuracy of data, which subsequently influences the research-related and clinical outcomes. Future research, including consultation with a range of stakeholders, needs to further delve into understanding the underresearched DQ dimensions. For instance, both the currency and accessibility of data are less frequently discussed dimensions in the literature; however, with the advances in digital health technologies, both have become highly relevant for real-time clinical decisions [ 21 , 53 ].

Implication 2: Empirical Investigations of the Impact of the DQ Dimensions

The DQ-DO framework identified in this study has been developed through a rigorous systematic literature review process, which synthesized the literature related to digital health DQ. To extend this study, we advocate for empirical mixed methods case studies to validate the framework, including an examination of the interrelationships between DQ dimensions and DQ outcomes, based on real-life data and consultation with a variety of stakeholders. Existing approaches can be used to identify the presence of issues related to DQ dimensions within digital health system logs [ 38 , 183 ]. The DQ outcomes could be assessed by extracting prerecorded key performance indicators from case hospitals and be triangulated with interview data to capture patients’, clinicians’, and hospitals’ perspectives of the impacts of DQ [ 184 ]. This could then be incorporated into a longitudinal study in which data collection is performed before and after a DQ improvement intervention, which would provide efficacy to the digital health DQ intervention.

Implication 3: Understanding the Root Causes of DQ Challenges

Although this study provides a first step toward a more comprehensive understanding of DQ dimensions for digital health data and their influences on outcomes, it does not explore the potential causes of such DQ challenges. Without understanding the reasons behind these DQ issues, the true potential of evidence-based health care decision-making remains unfulfilled. Future research should examine the root causes of DQ challenges in health care data to prevent such challenges from occurring in the first place. A framework that may prove useful in illuminating the root causes of DQ issues is the Odigos framework, which indicates that DQ issues emanate from the social world (ie, macro and situational structures, roles, and norms), material world (eg, quality of the EHR system and technological infrastructure), and personal world (eg, characteristics and behaviors of health care professionals) [ 183 ]. These insights could then be incorporated into a data governance roadmap for digital hospitals.

Implication 4: Systematic Assessment and Remedy of DQ Issues

Although prevention remains better than the cure (refer to implication 3), not all DQ errors can be prevented or mitigated. It is common for many health care organizations to dedicate resources to data cleaning to obtain high-quality data in a timely manner, and this will remain necessary (although hopefully to a lesser degree). Some studies (eg, Weiskopf et al [ 18 ]) advocate evidence-based guidelines and frameworks for a detailed assessment of the quality of digital health data. However, few studies have focused on a systematic and automated method of assessing and remedying common DQ issues. Future research should also focus on evidence-based guidelines, best practices, and automated means to assess and remedy digital health data.

Limitations

This review is scoped to studying digital health data generated within a hospital setting and not those generated within other health care settings. This is necessary because of the vast differences between acute health care settings and primary care. Future research should seek to investigate the digital health data of primary care settings to identify the DQ dimensions and outcomes relevant to these settings. In addition, this literature review has been scoped to peer-reviewed outlets, with “grey” literature excluded, which could have led to publication bias. Although this scoping may have resulted in the exclusion of some relevant articles, it was necessary to ensure the quality behind the development of the digital health DQ framework. An additional limitation that may be raised by our method is that because of the sheer number of articles returned by our search, we did not perform double coding (where 2 independent researchers analyze the same article). To mitigate this limitation, steps were taken to minimize bias by conducting coder corroboration sessions and group validation, as mentioned in the Methods section, with the objective of improving internal and external reliability [ 66 ]. To further improve internal reliability, 2 experienced researchers verified the entirety of the analysis in NVivo and to improve external reliability, card-sorting assessments were performed with DQ experts, and the findings were presented and confirmed by 3 digital health care professionals. Furthermore, empirical validation of the framework is required, both in terms of real-life data and inputs from a range of experts.

The multidisciplinary systematic literature review conducted in this study resulted in the development of a consolidated digital health DQ framework comprising 6 DQ dimensions, the interrelationships among these dimensions, 6 DQ outcomes, and the relationships between these dimensions and outcomes. We identified four core implications to motivate future research: specifically, researchers should (1) pay equal consideration to all dimensions of DQ, as the dimensions can both directly and indirectly influence DQ outcomes; (2) seek to empirically assess the DQ-DO framework using a mixed methods case study design; (3) identify the root causes of the digital health DQ issues; and (4) develop interventions for mitigating DQ issues or preventing them from arising. The DQ-DO framework provides health care executives (eg, chief information officers and chief clinical informatics officers) with insights into DQ issues and which digital health-related outcomes they have an impact on, and this can help them prioritize tackling DQ-related problems.

Acknowledgments

The authors acknowledge the support provided by the Centre of Data Science, Queensland University of Technology.

Abbreviations

Multimedia appendix 1, multimedia appendix 2, multimedia appendix 3, multimedia appendix 4, multimedia appendix 5, multimedia appendix 6, multimedia appendix 7, multimedia appendix 8.

Conflicts of Interest: None declared.

Numbers, Facts and Trends Shaping Your World

Read our research on:

Full Topic List

Regions & Countries

  • Publications
  • Our Methods
  • Short Reads
  • Tools & Resources

Read Our Research On:

Both Republicans and Democrats prioritize family, but they differ over other sources of meaning in life

meaning of quality of data in research

In the United States, even the meaning of life can have a partisan tinge.

In February 2021, Pew Research Center asked 2,596 U.S. adults the following open-ended question: “What about your life do you currently find meaningful, fulfilling or satisfying? What keeps you going and why?” Researchers then evaluated the answers and grouped them into the most commonly mentioned categories.

Both Republicans and Democrats are most likely to say they derive meaning from their families, and they also commonly mention their friends, careers and material well-being. But Republicans and Democrats differ substantially over several other factors, including faith, freedom, health and hobbies.

A chart showing that Republicans and Democrats largely agree that family, friends and careers give them meaning in life – but differ on other factors including faith and health

In fact, even some of the words that partisans use to describe where they draw meaning in life differ substantially. Republicans, along with independents who lean to the Republican Party, are much more likely than Democrats and Democratic-leaning independents to mention words like “God,” “freedom,” “country,” “Jesus” and “religion.” Democrats are much more likely than Republicans to mention words like “new,” “dog,” “reading,” “outside,” “daughter” and “nature.” (Democrats are most likely to mention “new” in the context of learning something new. But some also mention it in the context of new experiences, meeting new people or other forms of exploration.)

Below, we explore these partisan differences in more detail and look at how attitudes in the United States compare internationally, based on surveys conducted among 16 other publics in spring 2021.

This analysis examines Americans’ responses to an open-ended survey question about what gives them meaning in life and explores how responses in the United States differ from those elsewhere in the world.

In the U.S., Pew Research Center conducted a nationally representative survey of 2,596 U.S. adults from Feb. 1 to 7, 2021. Everyone who took part in the U.S. survey is a member of the Center’s American Trends Panel (ATP), an online survey panel that is recruited through national, random sampling of residential addresses. This way nearly all adults have a chance of selection. The survey is weighted to be representative of the U.S. adult population by gender, race, ethnicity, partisan affiliation, education and other categories. In the U.S., respondents were asked a slightly longer version of the question asked elsewhere: “We’re interested in exploring what it means to live a satisfying life. Please take a moment to reflect on your life and what makes it feel worthwhile – then answer the question below as thoughtfully as you can. What about your life do you currently find meaningful, fulfilling or satisfying? What keeps you going and why?”

The Center also conducted nationally representative surveys of 16,254 adults from March 12 to May 26, 2021, in 16 advanced economies. All surveys were conducted over the phone with adults in Canada, Belgium, France, Germany, Greece, Italy, the Netherlands, Spain, Sweden, the United Kingdom, Australia, Japan, New Zealand, Singapore, South Korea and Taiwan. Responses are weighted to be representative of the adult population in each public. Respondents in these publics were asked a shorter version of the question asked in the U.S.: “We’re interested in exploring what it means to live a satisfying life. What aspects of your life do you currently find meaningful, fulfilling or satisfying?” Responses were transcribed by interviewers in the language in which the interviews were conducted.

Researchers examined random samples of English responses, machine-translated non-English responses and responses translated by a professional translation firm to inductively develop a codebook for the main sources of meaning mentioned across the 17 publics. The codebook was iteratively improved via practice coding and calculations of intercoder reliability until a final selection of codes was formally adopted (see Appendix C of the full report).

To apply the codebook to the full collection of 18,850 responses, a team of Pew Research Center coders and professional translators were trained to code English and non-English responses, respectively. Coders in both groups coded random samples and were evaluated for consistency and accuracy. They were asked to independently code responses only after reaching an acceptable threshold for intercoder reliability. (For more on the codebook, see Appendix A of the full report.)

Here is the question used for this analysis, along with the coded responses for each public. Open-ended responses have been lightly edited for clarity (and, in some cases, translated into English by a professional firm). Here are more details about our international survey methodology and country-specific sample designs. For respondents in the U.S., read more about the ATP’s methodology .

Words in the lead graphic were selected first by filtering to the top 100 words that are distinctive of each party, as measured by a likelihood ratio comparing the proportion of responses from Democrats who mentioned each word versus Republicans who did so, and vice versa. Words were then filtered to the top 25 based on overall frequency within each party. Words shown are used at least 50% more often by those in one party relative to the other. Words were reduced to their root form and exclude 354 common English “stop words.”

In item 6 in this analysis, support for the governing party is not the same as partisanship, but it is the best comparative measure across the 16 survey publics where partisan identification is asked (it is not asked in South Korea). Elsewhere in this analysis, we rely on traditional measures of partisanship and look at how Democrats and independents who lean Democratic compare with Republicans and Republican leaners. 

Mentions of political executives were identified by searching responses for particular names as well as generic terms like “president” and “prime minister” using case-insensitive regular expressions, a method for pattern matching.

Republicans are much more likely than Democrats to cite religion as a source of meaning in their life. People in both parties mention spirituality, faith and religion as a source of meaning, with specific references to participating in traditional religious practices (e.g., “attending church services”), as well as more general references to living a life informed by faith. One Republican woman, for example, said, “My faith and the ability to choose to be thankful, optimistic and joyful are what keeps me going.”

A chart showing that Republicans and Democrats in the U.S. differ over some factors that make life meaningful

Overall, though, around one-in-five Republicans and Republican-leaning independents (22%) say spirituality, faith or religion gives them meaning in life, compared with only 8% of Democrats and those who lean to the party. Evangelical Protestants – a heavily Republican group – are especially likely to mention faith and religion as a source of meaning (34%). Smaller shares do so in other religious groups, including those following the historically Black Protestant tradition (18%), mainline Protestants (13%), Catholics (11%) and those who describe themselves as atheist, agnostic or “nothing in particular” (2%).

Republicans are also particularly likely to mention God and Jesus. One Republican man said, “Life without Jesus is meaningless, sad and hopeless. It is only through a daily relationship with Christ that joy, love, peace and goodness can be found.”

Republicans are more likely than Democrats (12% vs. 6%) to bring up freedom and independence as something that gives their life meaning. Some people mention freedom in the personal sense, focusing on their ability to live the way they want, their work-life balance, or having or wanting free time. One Republican woman said, “I like being able to have the freedoms to make my own decisions and to be able to contribute to my country. Being able to express my views without worrying about retribution.”

Others emphasize freedom in a more political sense, highlighting things like freedom of speech and religion. One Republican man had this to say: “Keeping the true meaning of being an American, country first, defending the Constitution and freedom of speech.”

Democrats are more likely than Republicans to cite physical and mental health as part of what gives them meaning in life – and they mention the COVID-19 pandemic more frequently. When the survey was fielded in February, some 13% of Democrats and 9% of Republicans mentioned health – whether people’s current state of well-being, their exercise regimens or the steps they take to lead healthy lives. For some, health is also a precursor for other sources of meaning. One Democratic man put it this way: “The biggest thing for me is health. If you don’t have your health you don’t have much. Everything else can come later but you have to have your health.”

One-in-five Americans who mentioned health also mentioned the COVID-19 pandemic , including 23% of Democrats and 17% of Republicans. And while Democrats and Republicans were about equally likely to mention COVID-19 in the context of difficulties or challenges they faced, the specifics varied by party. One Republican woman, for example, said, “My family is my only driving force. Being forced into a yearlong quarantine isn’t making that easy.” On the other hand, a Democratic woman said, “Though COVID is a constant worry, I have faith we will come through eventually and that President Biden will be able to unite our country.”

Democrats were also much more likely than Republicans to mention COVID-19 in the context of the country and where they live (23% vs. 6%) – suggesting that for Democrats, the pandemic has more of a societal dimension than for Republicans.

Democrats are more likely than Republicans to find meaning in hobbies and recreation, nature and the outdoors, and pets – though small shares of Americans overall mention these things. Overall, only one-in-ten Americans say hobbies are a source of meaning in their life, and even fewer say the same about nature (4%) or pets (3%). But Democrats are about twice as likely as Republicans to cite each one as a source of meaning in their life. Among Democrats, liberals are more likely than moderates and conservatives to find meaning in hobbies, nature and pets, but there are few ideological differences among Republicans on these topics.

Conservative Republicans are particularly likely to mention their country or where they live as a source of meaning. Among Republicans, 16% mention the country, patriotic and national sentiments, or the state of America’s economy or society as a source of meaning, compared with 12% of Democrats. But conservative Republicans (21%) are particularly likely to mention society relative to moderate and liberal Republicans (9%), while there are no major ideological differences among Democrats. One Republican man offered a short and simple description of what gives him meaning in life: “Being born in America.” And one Republican woman said, “I am first-generation American and I think it is the greatest country in the world, and I am very grateful to live here.”

Partisanship is associated with Americans’ views about the meaning of life more than it is in other parts of the world. In most of the 17 publics surveyed, those who support the governing party and those who do not differ little when it comes to the factors that bring them meaning in life. Take the United Kingdom: Those who support the governing Conservative Party are just as likely as those who do not to mention freedom, religion and other factors as sources of meaning in their life. In fact, the sole outlying factor – out of all topics that the Center coded – is material well-being: Conservative Party supporters in the UK are slightly more likely than nonsupporters to say this brings them meaning (16% vs. 10%). Looking more closely at the specific topic of freedom, the partisan differences that are found in the U.S. are generally not on display elsewhere. In fact, the only other place where partisan differences emerge over freedom is Taiwan, where supporters of the governing Democratic Progressive Party (DPP) are more likely than nonsupporters to mention it as a source of meaning (19% vs. 10%).

Though few mention government leaders when discussing the meaning of life, Americans are more likely to do so than people in other countries. In the U.S., 2% of people mentioned President Joe Biden or former President Donald Trump – often by name – when answering the Center’s question about where they find meaning in life. (The survey was conducted soon after Biden was inaugurated as president.)

One Republican woman, for example, said that what gives her meaning in life is “the strength and backbone taught to me by President Trump – the meaning of standing up fiercely in the face of idiocy.” On the other hand, a Democratic man celebrated Trump’s absence from office, declaring that he finds meaning in life through “job satisfaction. Enough free time and money to enjoy life. Less racial inequality. Less Donald Trump and his fanatics.”

In every other place surveyed by the Center, no more than one person – essentially 0% of the overall sample – mentioned a national leader such as a prime minister or president by name, or even the words “prime minister” or “president.” 

  • Happiness & Life Satisfaction
  • Political Polarization

Laura Silver's photo

Laura Silver is an associate director focusing on global attitudes at Pew Research Center

Patrick van Kessel's photo

Patrick van Kessel is a former senior data former scientist at Pew Research Center

Black Americans’ Views on Success in the U.S.

Among black adults, those with higher incomes are most likely to say they are happy, spirituality among americans, public has mixed views on the modern american family, the modern american family, most popular.

1615 L St. NW, Suite 800 Washington, DC 20036 USA (+1) 202-419-4300 | Main (+1) 202-857-8562 | Fax (+1) 202-419-4372 |  Media Inquiries

Research Topics

  • Age & Generations
  • Coronavirus (COVID-19)
  • Economy & Work
  • Family & Relationships
  • Gender & LGBTQ
  • Immigration & Migration
  • International Affairs
  • Internet & Technology
  • Methodological Research
  • News Habits & Media
  • Non-U.S. Governments
  • Other Topics
  • Politics & Policy
  • Race & Ethnicity
  • Email Newsletters

ABOUT PEW RESEARCH CENTER  Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of  The Pew Charitable Trusts .

Copyright 2024 Pew Research Center

Terms & Conditions

Privacy Policy

Cookie Settings

Reprints, Permissions & Use Policy

IMAGES

  1. Data Quality

    meaning of quality of data in research

  2. Data Quality Characteristics & Examples

    meaning of quality of data in research

  3. Why Is Data Quality Important?

    meaning of quality of data in research

  4. Data Quality Explained: Measuring, Enforcing & Improving Data Quality

    meaning of quality of data in research

  5. Data Quality

    meaning of quality of data in research

  6. Data Quality In 6 Step Process Showing Assessment And Control

    meaning of quality of data in research

VIDEO

  1. Quality Enhancement during Data Collection

  2. Get Your Dream Job! Learn Data Science with Innomatics

  3. Manage your research data

  4. Make It About Meaning

  5. Quality का हिन्दी में मतलब || Quality ka matlab kya hota hai || Quality meaning in Hindi

  6. What is research data?

COMMENTS

  1. What is Data Quality, and How to Enhance it in Research

    We often talk about "data quality" or "data integrity" when we are discussing the collection or analysis of one type of data or another. Yet, the definition of these terms might be unclear, or they may vary across different contexts. In any event, the terms are somewhat abstract — which can make it difficult, in practice, to improve.

  2. What Is Data Quality in Research and Why Is It Important?

    Data quality is a complex and multifaceted construct, making it difficult to precisely define. Nevertheless, perhaps one of the simplest definitions of data quality is that quality data 1) are fit for their intended purpose, and 2) have a close relationship with the construct they are intended to measure.

  3. Overview of Data Quality: Examining the Dimensions, Antecedents, and

    The impact of data quality on decision-making and the impact of data quality on end users are two main themes. Studies of the impact of data quality on decision-making frequently use the definition of data quality information (DQI), which is a general evaluation of data quality and data sets (Chengalur-Smith et al., 1999; Fisher et al., 2003).

  4. Research quality: What it is, and how to achieve it

    2) Initiating research stream: The researcher (s) must be able to assemble a research team that can achieve the identified research potential. The team should be motivated to identify research opportunities and insights, as well as to produce top-quality articles, which can reach the highest-level journals.

  5. Data Quality in Clinical Research

    Data quality is foundational to trusting the results and conclusions from human research. Data quality is so important that a National Academy of Medicine (then, Institute of Medicine) report [] was written on the topic.Further, two key thought leaders in the industrial and clinical quality arenas, W. E. Deming and A. Donabedian, specifically addressed data quality [2,3,4].

  6. What Is Data Quality?

    Data quality is a broader category of criteria that organizations use to evaluate their data for accuracy, completeness, validity, consistency, uniqueness, timeliness, and fitness for purpose. Data integrity focuses on only a subset of these attributes, specifically accuracy, consistency, and completeness. It also focuses on this more from the ...

  7. Data Quality in Health Research: Integrative Literature Review

    The low quality, nonavailability, and lack of integration (fragmentation) of health data can be highlighted among the main factors that negatively influence research and health decision-making. In addition, it is worth noting the existence of a large number of remote databases accessible only in a particular context.

  8. Data quality

    Data quality refers to the state of qualitative or quantitative pieces of information. There are many definitions of data quality, but data is generally considered high quality if it is "fit for [its] intended uses in operations, decision making and planning". Moreover, data is deemed of high quality if it correctly represents the real-world construct to which it refers.

  9. PDF Data and Information Quality Research: Its Evolution and Future

    Early data quality research focused on developing techniques for querying multiple data sources and building large data warehouses. The work of Wang and Madnick [1989] used a systematic ... terms of fitness for use and to identify dimensions of data quality according to that definition via a systematic multi-stage survey study [Wang and Strong ...

  10. Everything you need to know about data quality

    An expansive research panel of 170M+ people. Quality data begins with a representative sample of survey respondents. To provide you with just that, Kantar offers the biggest and best source of human respondents. Productive panellists. Even better, our survey respondents are satisfied and engaged. This results in a 23 percent higher survey ...

  11. PDF Data Quality—Concepts and Problems

    Next, in conceptualizing a project, lack of planning and expertise, including a poorly designed data (quality) management plan, may lead to preventable data problems during the data collection and quality assurance phase, leading to erroneous data collection or the inability to detect flawed data.

  12. Data analysis

    data analysis, the process of systematically collecting, cleaning, transforming, describing, modeling, and interpreting data, generally employing statistical techniques. Data analysis is an important part of both scientific research and business, where demand has grown in recent years for data-driven decision making.

  13. A Review of the Quality Indicators of Rigor in Qualitative Research

    To begin, Denzin and Lincoln's definition of qualitative research, a long-standing cornerstone in the field, provides a useful foundation for summarizing quality standards and best practices: ... Any researcher biases not adequately addressed or errors in judgement can affect the quality of data and subsequent research results. 33 Thus, due ...

  14. PDF Overview of Data Quality: Examining the Dimensions ...

    This article provides an overview of the dimensions, subdimensions, and metrics utilized in research publications on OD evaluation. To better understand data quality, we review the literature on data quality studies in information systems. We identify the data quality dimensions, antecedents, and their impacts.

  15. (PDF) Data Quality

    Data Quality is, in essence, understood as the degree to which the data of interest satisfies the requirements, is free of flaws, and is suited for the intended purpose.

  16. Quality in Research: Asking the Right Question

    This column is about research questions, the beginning of the researcher's process. For the reader, the question driving the researcher's inquiry is the first place to start when examining the quality of their work because if the question is flawed, the quality of the methods and soundness of the researchers' thinking does not matter.

  17. Assessing Research Quality

    Assessing Research Quality. This page presents information and tools to help evaluate the quality of a research study, as well as information on the ethics of research. The quality of social science and policy research can vary considerably. It is important that consumers of research keep this in mind when reading the findings from a research ...

  18. What is Data Quality, and How to Enhance it in Research

    What is data quality? We often talk about "data quality" or "data integrity" when we are discussing the collection or analysis of one type of data or another. Yet, the definition of these terms might be unclear, or they may vary across different contexts. In any event, the terms are somewhat abstract -- which can make it difficult, in practice, to improve. That is, we need to know what ...

  19. What is quality research? A guide to identifying the key features and

    Importance of quality research. Quality research helps us better understand complex problems. It enables us to make decisions based on facts and evidence. And it empowers us to solve real-world issues. Without quality research, we can't advance knowledge or identify trends and patterns.

  20. Data Collection

    Data Collection | Definition, Methods & Examples. Published on June 5, 2020 by Pritha Bhandari.Revised on June 21, 2023. Data collection is a systematic process of gathering observations or measurements. Whether you are performing research for business, governmental or academic purposes, data collection allows you to gain first-hand knowledge and original insights into your research problem.

  21. The crucial role of data quality in market research

    The importance of data quality in market research cannot be overstated. Global associations and leaders in the field are adamant that data quality is either a core strength or an eminent threat to the future of market research. It is the cornerstone upon which informed decision-making, strategic agility and sustainable growth rest.

  22. Data Analysis in Research: Types & Methods

    Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. Three essential things occur during the data ...

  23. What Is Qualitative Research?

    Qualitative research involves collecting and analyzing non-numerical data (e.g., text, video, or audio) to understand concepts, opinions, or experiences. It can be used to gather in-depth insights into a problem or generate new ideas for research. Qualitative research is the opposite of quantitative research, which involves collecting and ...

  24. A scoping review of continuous quality improvement in healthcare system

    The growing adoption of continuous quality improvement (CQI) initiatives in healthcare has generated a surge in research interest to gain a deeper understanding of CQI. However, comprehensive evidence regarding the diverse facets of CQI in healthcare has been limited. Our review sought to comprehensively grasp the conceptualization and principles of CQI, explore existing models and tools ...

  25. Writing Survey Questions

    Many of the questions in Pew Research Center surveys have been asked in prior polls. Asking the same questions at different points in time allows us to report on changes in the overall views of the general public (or a subset of the public, such as registered voters, men or Black Americans), or what we call "trending the data".

  26. Systematic review on the frequency and quality of reporting patient and

    In recent years, patient and public involvement (PPI) in research has significantly increased; however, the reporting of PPI remains poor. The Guidance for Reporting Involvement of Patients and the Public (GRIPP2) was developed to enhance the quality and consistency of PPI reporting. The objective of this systematic review is to identify the frequency and quality of PPI reporting in patient ...

  27. Sustainability

    Wang, D. Research on the impact of data elements on the high-quality development of manufacturing industry in the context of digital economy. Macroecon. Res. 2022, 9, 51-63+105. [Google Scholar] Cai, Y.; Ma, W. How Data Influence Hign-quality Development as a Factor and the Restriction of Data Flow. Res. Quant. Tech. Econ. 2021, 38, 64-83.

  28. Digital Health Data Quality Issues: Systematic Review

    Poor data accuracy also diminishes the quality of research outcomes. Precise data are beneficial in producing high-quality research outcomes. As Gibby explained, "computerized clinical information systems have considerable advantages over paper recording of data, which should increase the likelihood of their use in outcomes research. Manual ...

  29. Republicans, Democrats differ on what (besides family) brings meaning

    Republicans are much more likely than Democrats to cite religion as a source of meaning in their life. People in both parties mention spirituality, faith and religion as a source of meaning, with specific references to participating in traditional religious practices (e.g., "attending church services"), as well as more general references to living a life informed by faith.

  30. PDF Air Quality Information: Making Sense of Air Pollution Data to Inform

    Research Area 1: Methods and Tools for Data Integration and Analysis. This research area seeks to advance methods and tools that integrate data to inform community exposures to air pollution in underserved communities. Here, integration includes the process of how data are quality assured, harmonized, aggregated, and shared with potential users.