LOGO ANALYTICS FOR DECISIONS

The Importance of Data Analysis in Research

Studying data is amongst the everyday  chores  of researchers. It’s not a big deal for them to go through hundreds of pages per day to extract useful information from it. However, recent times have seen a massive jump in the  amount  of data available. While it’s certainly good news for researchers to get their hands on more data that could result in better studies, it’s also no less than a headache.

Thankfully, the rising  trend  of  data science  in the past years has also meant a sharp rise in data analysis  techniques . These tools and techniques save a lot of time in hefty processes a researcher has to go through and allow them to finish the work of days in minutes!

As a famous saying goes,

“Information is the  oil of the 21st century , and analytics is the combustion engine.”

 –  Peter Sondergaard , senior vice president, Gartner Research.

So, if you’re also a researcher or just curious about the most important data analysis techniques in research, this article is for you. Make sure you give it a thorough read, as I’ll be dropping some very important points throughout the article.

What is the Importance of Data Analysis in Research?

Data analysis is important in research because it makes studying data a lot simpler and more accurate. It helps the researchers straightforwardly interpret the data so that researchers don’t leave anything out that could help them derive insights from it.

Data analysis is a way to study and analyze huge amounts of data. Research often includes going through heaps of data, which is getting more and more for the researchers to handle with every passing minute.

Hence, data analysis knowledge is a huge edge for researchers in the current era, making them very efficient and productive.

What is Data Analysis?

Once the data is  cleaned ,  transformed , and ready to use, it can do wonders. Not only does it contain a variety of useful information, studying the data collectively results in uncovering very minor patterns and details that would otherwise have been ignored.

So, you can see why it has such a huge role to play in research. Research is all about studying patterns and trends, followed by making a hypothesis and proving them. All this is supported by appropriate data.

Further in the article, we’ll see some of the most important types of data analysis that you should be aware of as a researcher so you can put them to use.

The Role of Data Analytics at The Senior Management Level

The Role of Data Analytics at The Senior Management Level

From small and medium-sized businesses to Fortune 500 conglomerates, the success of a modern business is now increasingly tied to how the company implements its data infrastructure and data-based decision-making. According

The Decision-Making Model Explained (In Plain Terms)

The Decision-Making Model Explained (In Plain Terms)

Any form of the systematic decision-making process is better enhanced with data. But making sense of big data or even small data analysis when venturing into a decision-making process might

13 Reasons Why Data Is Important in Decision Making

13 Reasons Why Data Is Important in Decision Making

Data is important in decision making process, and that is the new golden rule in the business world. Businesses are always trying to find the balance of cutting costs while

Types of Data Analysis: Qualitative Vs Quantitative

Looking at it from a broader perspective, data analysis boils down to two major types. Namely,  qualitative data analysis and  quantitative data  analysis. While the latter deals with the numerical data, comprising of numbers, the former comes in the non-text form. It can be anything such as summaries, images, symbols, and so on.

Both types have different methods to deal with them and we’ll be taking a look at both of them so you can use whatever suits your requirements.

Qualitative Data Analysis

As mentioned before, qualitative data comprises non-text-based data, and it can be either in the form of text or images. So, how do we analyze such data? Before we start, here are a few common tips first that you should always use before applying any techniques.

Now, let’s move ahead and see where the qualitative data analysis techniques come in. Even though there are a lot of professional ways to achieve this, here are some of them that you’ll need to know as a beginner.

Narrative Analysis

If your research is based upon collecting some answers from people in interviews or other scenarios, this might be one of the best analysis techniques for you.  The narrative analysis  helps to analyze the narratives of various people, which is available in textual form. The stories, experiences, and other answers from respondents are used to power the analysis.

The important thing to note here is that the data has to be available in the form of text only. Narrative analysis cannot be performed on other data types such as images.

Content Analysis

Content analysis  is amongst the most used methods in analyzing quantitative data. This method doesn’t put a restriction on the form of data. You can use any kind of data here, whether it’s in the form of images, text, or even real-life items.

Here, an important application is when you know the questions you need to know the answers to. Upon getting the answers, you can perform this method to perform analysis to them, followed by extracting insights from it to be used in your research. It’s a full-fledged method and a lot of analytical  studies  are based solely on this.

Grounded Theory

Grounded theory  is used when the researchers want to know the reason behind the occurrence of a certain event. They may have to go through a lot of different  use cases  and comparing them to each other while following this approach. It’s an iterative approach and the explanations keep on being modified or re-created till the researchers end up on a suitable conclusion that satisfies their specific conditions.

So, make sure you employ this method if you need to have certain qualitative data at hand and you need to know the reason why something happened, based on that data.

Discourse Analysis

Discourse analysis  is quite similar to narrative analysis in the sense that it also uses interactions with people for the analysis purpose. The only difference is that the focal point here is different. Instead of analyzing the narrative, the researchers focus on the context in which the conversation is happening.

The complete background of the person being questioned, including his everyday environment, is used to perform the research.

Quantitative Analysis

Quantitative analysis involves any kind of analysis that’s being done on numbers. From the most basic analysis techniques to the most advanced ones, quantitative analysis techniques comprise a huge range of techniques. No matter what level of research you need to do, if it’s based on numerical data, you’ll always have efficient analysis methods to use.

There are two broad ways here;  Descriptive statistics  and  inferential analysis . 

However, before applying the analysis methods on numerical data, there are a few pre-processing steps that need to be done. These steps are used to make the data ‘ready’ for applying the analysis methods.

Make sure you don’t miss these steps, or you will end up drawing biased conclusions from the data analysis. IF you want to know why data is the key in data analysis and problem-solving, feel free to check out this article here . Now, about the steps for PRE-PROCESSING THE QUANTITATIVE DATA .

Descriptive Statistics

Descriptive statistics  is the most basic step that researchers can use to draw conclusions from data. It helps to find patterns and helps the data ‘speak’. Let’s see some of the most common data analysis techniques used to perform descriptive statistics .

Mean is nothing but the average of the total data available at hand. The formula is simple and tells what average value to expect throughout the data.

The median is the middle value available in the data. It lets the researchers estimate where the mid-point of the data is. It’s important to note that the data needs to be sorted to find the median from it.

The mode is simply the most frequently occurring data in the dataset. For example, if you’re studying the ages of students in a particular class, the model will be the age of most students in the class.

  • Standard Deviation

Numerical data is always spread over a wide range and finding out how much the data is spread is quite important. Standard deviation is what lets us achieve this. It tells us how much an average data point is far from the average.

Related Article: The Best Programming Language for Statistics

Inferential Analysis

Inferential statistics  point towards the techniques used to predict future occurrences of data. These methods help draw relationships between data and once it’s done, predicting future data becomes possible.

  • Correlation

Correlation  s the measure of the relationship between two numerical variables. It measures the degree of their relation, whether it is causal or not. 

For example, the age and height of a person are highly correlated. If the age of a person increases, height is also likely to increase. This is called a positive correlation.

A negative correlation means that upon increasing one variable, the other one decreases. An example would be the relationship between the age and maturity of a random person.

Regression  aims to find the mathematical relationship between a set of variables. While the correlation was a statistical measure, regression is a mathematical measure that can be measured in the form of variables. Once the relationship between variables is formed, one variable can be used to predict the other variable.

This method has a huge application when it comes to predicting future data. If your research is based upon calculating future occurrences of some data based on past data and then testing it, make sure you use this method.

A Summary of Data Analysis Methods

Now that we’re done with some of the most common methods for both quantitative and qualitative data, let’s summarize them in a tabular form so you would have something to take home in the end.

Before we close the article, I’d like to strongly recommend you to check out some interesting related topics:

That’s it! We have seen why data analysis is such an important tool when it comes to research and how it saves a huge lot of time for the researchers, making them not only efficient but more productive as well.

Moreover, the article covers some of the most important data analysis techniques that one needs to know for research purposes in today’s age. We’ve gone through the analysis methods for both quantitative and qualitative data in a basic way so it might be easy to understand for beginners.

Emidio Amadebai

As an IT Engineer, who is passionate about learning and sharing. I have worked and learned quite a bit from Data Engineers, Data Analysts, Business Analysts, and Key Decision Makers almost for the past 5 years. Interested in learning more about Data Science and How to leverage it for better decision-making in my business and hopefully help you do the same in yours.

Recent Posts

Causal vs Evidential Decision-making (How to Make Businesses More Effective) 

In today’s fast-paced business landscape, it is crucial to make informed decisions to stay in the competition which makes it important to understand the concept of the different characteristics and...

Bootstrapping vs. Boosting

Over the past decade, the field of machine learning has witnessed remarkable advancements in predictive techniques and ensemble learning methods. Ensemble techniques are very popular in machine...

importance of data analysis in research methodology

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

importance of data analysis in research methodology

Home Market Research

Data Analysis in Research: Types & Methods

data-analysis-in-research

Content Index

Why analyze data in research?

Types of data in research, finding patterns in the qualitative data, methods used for data analysis in qualitative research, preparing data for analysis, methods used for data analysis in quantitative research, considerations in research data analysis, what is data analysis in research.

Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. 

Three essential things occur during the data analysis process — the first is data organization . Summarization and categorization together contribute to becoming the second known method used for data reduction. It helps find patterns and themes in the data for easy identification and linking. The third and last way is data analysis – researchers do it in both top-down and bottom-up fashion.

LEARN ABOUT: Research Process Steps

On the other hand, Marshall and Rossman describe data analysis as a messy, ambiguous, and time-consuming but creative and fascinating process through which a mass of collected data is brought to order, structure and meaning.

We can say that “the data analysis and data interpretation is a process representing the application of deductive and inductive logic to the research and data analysis.”

Researchers rely heavily on data as they have a story to tell or research problems to solve. It starts with a question, and data is nothing but an answer to that question. But, what if there is no question to ask? Well! It is possible to explore data even without a problem – we call it ‘Data Mining’, which often reveals some interesting patterns within the data that are worth exploring.

Irrelevant to the type of data researchers explore, their mission and audiences’ vision guide them to find the patterns to shape the story they want to tell. One of the essential things expected from researchers while analyzing data is to stay open and remain unbiased toward unexpected patterns, expressions, and results. Remember, sometimes, data analysis tells the most unforeseen yet exciting stories that were not expected when initiating data analysis. Therefore, rely on the data you have at hand and enjoy the journey of exploratory research. 

Create a Free Account

Every kind of data has a rare quality of describing things after assigning a specific value to it. For analysis, you need to organize these values, processed and presented in a given context, to make it useful. Data can be in different forms; here are the primary data types.

  • Qualitative data: When the data presented has words and descriptions, then we call it qualitative data . Although you can observe this data, it is subjective and harder to analyze data in research, especially for comparison. Example: Quality data represents everything describing taste, experience, texture, or an opinion that is considered quality data. This type of data is usually collected through focus groups, personal qualitative interviews , qualitative observation or using open-ended questions in surveys.
  • Quantitative data: Any data expressed in numbers of numerical figures are called quantitative data . This type of data can be distinguished into categories, grouped, measured, calculated, or ranked. Example: questions such as age, rank, cost, length, weight, scores, etc. everything comes under this type of data. You can present such data in graphical format, charts, or apply statistical analysis methods to this data. The (Outcomes Measurement Systems) OMS questionnaires in surveys are a significant source of collecting numeric data.
  • Categorical data: It is data presented in groups. However, an item included in the categorical data cannot belong to more than one group. Example: A person responding to a survey by telling his living style, marital status, smoking habit, or drinking habit comes under the categorical data. A chi-square test is a standard method used to analyze this data.

Learn More : Examples of Qualitative Data in Education

Data analysis in qualitative research

Data analysis and qualitative data research work a little differently from the numerical data as the quality data is made up of words, descriptions, images, objects, and sometimes symbols. Getting insight from such complicated information is a complicated process. Hence it is typically used for exploratory research and data analysis .

Although there are several ways to find patterns in the textual information, a word-based method is the most relied and widely used global technique for research and data analysis. Notably, the data analysis process in qualitative research is manual. Here the researchers usually read the available data and find repetitive or commonly used words. 

For example, while studying data collected from African countries to understand the most pressing issues people face, researchers might find  “food”  and  “hunger” are the most commonly used words and will highlight them for further analysis.

LEARN ABOUT: Level of Analysis

The keyword context is another widely used word-based technique. In this method, the researcher tries to understand the concept by analyzing the context in which the participants use a particular keyword.  

For example , researchers conducting research and data analysis for studying the concept of ‘diabetes’ amongst respondents might analyze the context of when and how the respondent has used or referred to the word ‘diabetes.’

The scrutiny-based technique is also one of the highly recommended  text analysis  methods used to identify a quality data pattern. Compare and contrast is the widely used method under this technique to differentiate how a specific text is similar or different from each other. 

For example: To find out the “importance of resident doctor in a company,” the collected data is divided into people who think it is necessary to hire a resident doctor and those who think it is unnecessary. Compare and contrast is the best method that can be used to analyze the polls having single-answer questions types .

Metaphors can be used to reduce the data pile and find patterns in it so that it becomes easier to connect data with theory.

Variable Partitioning is another technique used to split variables so that researchers can find more coherent descriptions and explanations from the enormous data.

LEARN ABOUT: Qualitative Research Questions and Questionnaires

There are several techniques to analyze the data in qualitative research, but here are some commonly used methods,

  • Content Analysis:  It is widely accepted and the most frequently employed technique for data analysis in research methodology. It can be used to analyze the documented information from text, images, and sometimes from the physical items. It depends on the research questions to predict when and where to use this method.
  • Narrative Analysis: This method is used to analyze content gathered from various sources such as personal interviews, field observation, and  surveys . The majority of times, stories, or opinions shared by people are focused on finding answers to the research questions.
  • Discourse Analysis:  Similar to narrative analysis, discourse analysis is used to analyze the interactions with people. Nevertheless, this particular method considers the social context under which or within which the communication between the researcher and respondent takes place. In addition to that, discourse analysis also focuses on the lifestyle and day-to-day environment while deriving any conclusion.
  • Grounded Theory:  When you want to explain why a particular phenomenon happened, then using grounded theory for analyzing quality data is the best resort. Grounded theory is applied to study data about the host of similar cases occurring in different settings. When researchers are using this method, they might alter explanations or produce new ones until they arrive at some conclusion.

LEARN ABOUT: 12 Best Tools for Researchers

Data analysis in quantitative research

The first stage in research and data analysis is to make it for the analysis so that the nominal data can be converted into something meaningful. Data preparation consists of the below phases.

Phase I: Data Validation

Data validation is done to understand if the collected data sample is per the pre-set standards, or it is a biased data sample again divided into four different stages

  • Fraud: To ensure an actual human being records each response to the survey or the questionnaire
  • Screening: To make sure each participant or respondent is selected or chosen in compliance with the research criteria
  • Procedure: To ensure ethical standards were maintained while collecting the data sample
  • Completeness: To ensure that the respondent has answered all the questions in an online survey. Else, the interviewer had asked all the questions devised in the questionnaire.

Phase II: Data Editing

More often, an extensive research data sample comes loaded with errors. Respondents sometimes fill in some fields incorrectly or sometimes skip them accidentally. Data editing is a process wherein the researchers have to confirm that the provided data is free of such errors. They need to conduct necessary checks and outlier checks to edit the raw edit and make it ready for analysis.

Phase III: Data Coding

Out of all three, this is the most critical phase of data preparation associated with grouping and assigning values to the survey responses . If a survey is completed with a 1000 sample size, the researcher will create an age bracket to distinguish the respondents based on their age. Thus, it becomes easier to analyze small data buckets rather than deal with the massive data pile.

LEARN ABOUT: Steps in Qualitative Research

After the data is prepared for analysis, researchers are open to using different research and data analysis methods to derive meaningful insights. For sure, statistical analysis plans are the most favored to analyze numerical data. In statistical analysis, distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities. The method is again classified into two groups. First, ‘Descriptive Statistics’ used to describe data. Second, ‘Inferential statistics’ that helps in comparing the data .

Descriptive statistics

This method is used to describe the basic features of versatile types of data in research. It presents the data in such a meaningful way that pattern in the data starts making sense. Nevertheless, the descriptive analysis does not go beyond making conclusions. The conclusions are again based on the hypothesis researchers have formulated so far. Here are a few major types of descriptive analysis methods.

Measures of Frequency

  • Count, Percent, Frequency
  • It is used to denote home often a particular event occurs.
  • Researchers use it when they want to showcase how often a response is given.

Measures of Central Tendency

  • Mean, Median, Mode
  • The method is widely used to demonstrate distribution by various points.
  • Researchers use this method when they want to showcase the most commonly or averagely indicated response.

Measures of Dispersion or Variation

  • Range, Variance, Standard deviation
  • Here the field equals high/low points.
  • Variance standard deviation = difference between the observed score and mean
  • It is used to identify the spread of scores by stating intervals.
  • Researchers use this method to showcase data spread out. It helps them identify the depth until which the data is spread out that it directly affects the mean.

Measures of Position

  • Percentile ranks, Quartile ranks
  • It relies on standardized scores helping researchers to identify the relationship between different scores.
  • It is often used when researchers want to compare scores with the average count.

For quantitative research use of descriptive analysis often give absolute numbers, but the in-depth analysis is never sufficient to demonstrate the rationale behind those numbers. Nevertheless, it is necessary to think of the best method for research and data analysis suiting your survey questionnaire and what story researchers want to tell. For example, the mean is the best way to demonstrate the students’ average scores in schools. It is better to rely on the descriptive statistics when the researchers intend to keep the research or outcome limited to the provided  sample  without generalizing it. For example, when you want to compare average voting done in two different cities, differential statistics are enough.

Descriptive analysis is also called a ‘univariate analysis’ since it is commonly used to analyze a single variable.

Inferential statistics

Inferential statistics are used to make predictions about a larger population after research and data analysis of the representing population’s collected sample. For example, you can ask some odd 100 audiences at a movie theater if they like the movie they are watching. Researchers then use inferential statistics on the collected  sample  to reason that about 80-90% of people like the movie. 

Here are two significant areas of inferential statistics.

  • Estimating parameters: It takes statistics from the sample research data and demonstrates something about the population parameter.
  • Hypothesis test: I t’s about sampling research data to answer the survey research questions. For example, researchers might be interested to understand if the new shade of lipstick recently launched is good or not, or if the multivitamin capsules help children to perform better at games.

These are sophisticated analysis methods used to showcase the relationship between different variables instead of describing a single variable. It is often used when researchers want something beyond absolute numbers to understand the relationship between variables.

Here are some of the commonly used methods for data analysis in research.

  • Correlation: When researchers are not conducting experimental research or quasi-experimental research wherein the researchers are interested to understand the relationship between two or more variables, they opt for correlational research methods.
  • Cross-tabulation: Also called contingency tables,  cross-tabulation  is used to analyze the relationship between multiple variables.  Suppose provided data has age and gender categories presented in rows and columns. A two-dimensional cross-tabulation helps for seamless data analysis and research by showing the number of males and females in each age category.
  • Regression analysis: For understanding the strong relationship between two variables, researchers do not look beyond the primary and commonly used regression analysis method, which is also a type of predictive analysis used. In this method, you have an essential factor called the dependent variable. You also have multiple independent variables in regression analysis. You undertake efforts to find out the impact of independent variables on the dependent variable. The values of both independent and dependent variables are assumed as being ascertained in an error-free random manner.
  • Frequency tables: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Analysis of variance: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Researchers must have the necessary research skills to analyze and manipulation the data , Getting trained to demonstrate a high standard of research practice. Ideally, researchers must possess more than a basic understanding of the rationale of selecting one statistical method over the other to obtain better data insights.
  • Usually, research and data analytics projects differ by scientific discipline; therefore, getting statistical advice at the beginning of analysis helps design a survey questionnaire, select data collection methods , and choose samples.

LEARN ABOUT: Best Data Collection Tools

  • The primary aim of data research and analysis is to derive ultimate insights that are unbiased. Any mistake in or keeping a biased mind to collect data, selecting an analysis method, or choosing  audience  sample il to draw a biased inference.
  • Irrelevant to the sophistication used in research data and analysis is enough to rectify the poorly defined objective outcome measurements. It does not matter if the design is at fault or intentions are not clear, but lack of clarity might mislead readers, so avoid the practice.
  • The motive behind data analysis in research is to present accurate and reliable data. As far as possible, avoid statistical errors, and find a way to deal with everyday challenges like outliers, missing data, data altering, data mining , or developing graphical representation.

LEARN MORE: Descriptive Research vs Correlational Research The sheer amount of data generated daily is frightening. Especially when data analysis has taken center stage. in 2018. In last year, the total data supply amounted to 2.8 trillion gigabytes. Hence, it is clear that the enterprises willing to survive in the hypercompetitive world must possess an excellent capability to analyze complex research data, derive actionable insights, and adapt to the new market needs.

LEARN ABOUT: Average Order Value

QuestionPro is an online survey platform that empowers organizations in data analysis and research and provides them a medium to collect data by creating appealing surveys.

MORE LIKE THIS

data information vs insight

Data Information vs Insight: Essential differences

May 14, 2024

pricing analytics software

Pricing Analytics Software: Optimize Your Pricing Strategy

May 13, 2024

relationship marketing

Relationship Marketing: What It Is, Examples & Top 7 Benefits

May 8, 2024

email survey tool

The Best Email Survey Tool to Boost Your Feedback Game

May 7, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

Elsevier QRcode Wechat

  • Research Process

When Data Speak, Listen: Importance of Data Collection and Analysis Methods

  • 3 minute read
  • 16.4K views

Table of Contents

With the recent advent of digital tools, the rise in data manipulation has become a key challenge. And so, the scientific community has begun taking a more careful look at scientific malpractice involving data manipulation. But why are data so important in scientific research?

Role of data in science

Reliable data facilitates knowledge generation and reproducibility of key scientific protocols and experiments. For each step of a research project, from data collection to knowledge generation, researchers need to pay careful attention to data analysis to ensure that their results are robust.

In science, data are used to confirm or reject a hypothesis, which can fundamentally change the research landscape. Thus, with respect to the outcome of a specific study, data are expected to fit one of two patterns. However, data may not conform to an apparent pattern. When this happens, researchers may engage in malpractices or use unreliable data collection and analysis methods, jeopardising their reputation and career. Hence, it is necessary to resist the temptation to cherry-pick data. Always let the data speak for itself.

There are two ways to ensure the integrity of data and results.

Data validation

Data validation is a streamlined process that ensures the quality and accuracy of collected data. Inaccurate data may keep a researcher from uncovering important discoveries or lead to spurious results. At times, the amount of data collected might help unravel existing patterns that are important.

The data validation process can also provide a glimpse into the patterns within the data, preventing you from forming incorrect hypotheses.

In addition, data validation can also confirm the legitimacy of your study, and help you get a clearer picture of what your study reveals.

Analytical method validation

Analytical method validation confirms that a method is suitable for its intended purpose and will result in high-quality, accurate results.

Often, different analytical methods can produce surprisingly varying results, despite using the same dataset. Therefore, it is necessary to ensure that the methods fit the purpose of your research, a feature referred to as ‘system suitability’. This is one of the main objectives of analytical method validation. The other objective of analytical method validation is ensuring the results’ robustness (ability of your method to provide reliable results under various conditions) and reproducibility (ease with which your work can be repeated in a new setting). Reproducibility is important because it allows other researchers to confirm your findings (which can make your work more impactful) or refute your results if unique conditions in your lab favour one result over others. Moreover, as a collaborative enterprise, scientific research rewards the use and sharing of clearly defined analytical processes.

In the long run, it is rewarding for researchers to double-check their dataset and analytical methods than make the data fit an expected pattern.

While data are the crux of a scientific study, unless it is acquired and validated using the most suitable methods of data and method validation, it may fail to produce authentic and legitimate results. To get useful tips on how to collect and validate data, feel free to approach Elsevier Author Services . Our experts will support you throughout your research journey, ensuring that your results are reproducible, robust, and valid.

choosing the Right Research Methodology

Choosing the Right Research Methodology: A Guide for Researchers

Publishing Biomedical Research

  • Publication Process

Publishing Biomedical Research: What Rules Should You Follow?

You may also like.

what is a descriptive research design

Descriptive Research Design and Its Myriad Uses

Doctor doing a Biomedical Research Paper

Five Common Mistakes to Avoid When Writing a Biomedical Research Paper

Writing in Environmental Engineering

Making Technical Writing in Environmental Engineering Accessible

Risks of AI-assisted Academic Writing

To Err is Not Human: The Dangers of AI-assisted Academic Writing

choosing the Right Research Methodology

Why is data validation important in research?

Writing a good review article

Writing a good review article

Scholarly Sources What are They and Where can You Find Them

Scholarly Sources: What are They and Where can You Find Them?

Input your search keywords and press Enter.

Data Analysis

  • Introduction to Data Analysis
  • Quantitative Analysis Tools
  • Qualitative Analysis Tools
  • Mixed Methods Analysis
  • Geospatial Analysis
  • Further Reading

Profile Photo

What is Data Analysis?

According to the federal government, data analysis is "the process of systematically applying statistical and/or logical techniques to describe and illustrate, condense and recap, and evaluate data" ( Responsible Conduct in Data Management ). Important components of data analysis include searching for patterns, remaining unbiased in drawing inference from data, practicing responsible  data management , and maintaining "honest and accurate analysis" ( Responsible Conduct in Data Management ). 

In order to understand data analysis further, it can be helpful to take a step back and understand the question "What is data?". Many of us associate data with spreadsheets of numbers and values, however, data can encompass much more than that. According to the federal government, data is "The recorded factual material commonly accepted in the scientific community as necessary to validate research findings" ( OMB Circular 110 ). This broad definition can include information in many formats. 

Some examples of types of data are as follows:

  • Photographs 
  • Hand-written notes from field observation
  • Machine learning training data sets
  • Ethnographic interview transcripts
  • Sheet music
  • Scripts for plays and musicals 
  • Observations from laboratory experiments ( CMU Data 101 )

Thus, data analysis includes the processing and manipulation of these data sources in order to gain additional insight from data, answer a research question, or confirm a research hypothesis. 

Data analysis falls within the larger research data lifecycle, as seen below. 

( University of Virginia )

Why Analyze Data?

Through data analysis, a researcher can gain additional insight from data and draw conclusions to address the research question or hypothesis. Use of data analysis tools helps researchers understand and interpret data. 

What are the Types of Data Analysis?

Data analysis can be quantitative, qualitative, or mixed methods. 

Quantitative research typically involves numbers and "close-ended questions and responses" ( Creswell & Creswell, 2018 , p. 3). Quantitative research tests variables against objective theories, usually measured and collected on instruments and analyzed using statistical procedures ( Creswell & Creswell, 2018 , p. 4). Quantitative analysis usually uses deductive reasoning. 

Qualitative  research typically involves words and "open-ended questions and responses" ( Creswell & Creswell, 2018 , p. 3). According to Creswell & Creswell, "qualitative research is an approach for exploring and understanding the meaning individuals or groups ascribe to a social or human problem" ( 2018 , p. 4). Thus, qualitative analysis usually invokes inductive reasoning. 

Mixed methods  research uses methods from both quantitative and qualitative research approaches. Mixed methods research works under the "core assumption... that the integration of qualitative and quantitative data yields additional insight beyond the information provided by either the quantitative or qualitative data alone" ( Creswell & Creswell, 2018 , p. 4). 

  • Next: Planning >>
  • Last Updated: May 3, 2024 9:38 AM
  • URL: https://guides.library.georgetown.edu/data-analysis

Creative Commons

Data Analysis in Quantitative Research

  • Reference work entry
  • First Online: 13 January 2019
  • Cite this reference work entry

importance of data analysis in research methodology

  • Yong Moon Jung 2  

1797 Accesses

2 Citations

Quantitative data analysis serves as part of an essential process of evidence-making in health and social sciences. It is adopted for any types of research question and design whether it is descriptive, explanatory, or causal. However, compared with qualitative counterpart, quantitative data analysis has less flexibility. Conducting quantitative data analysis requires a prerequisite understanding of the statistical knowledge and skills. It also requires rigor in the choice of appropriate analysis model and the interpretation of the analysis outcomes. Basically, the choice of appropriate analysis techniques is determined by the type of research question and the nature of the data. In addition, different analysis techniques require different assumptions of data. This chapter provides introductory guides for readers to assist them with their informed decision-making in choosing the correct analysis models. To this end, it begins with discussion of the levels of measure: nominal, ordinal, and scale. Some commonly used analysis techniques in univariate, bivariate, and multivariate data analysis are presented for practical examples. Example analysis outcomes are produced by the use of SPSS (Statistical Package for Social Sciences).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

importance of data analysis in research methodology

Data Analysis Techniques for Quantitative Study

importance of data analysis in research methodology

Meta-Analytic Methods for Public Health Research

Armstrong JS. Significance tests harm progress in forecasting. Int J Forecast. 2007;23(2):321–7.

Article   Google Scholar  

Babbie E. The practice of social research. 14th ed. Belmont: Cengage Learning; 2016.

Google Scholar  

Brockopp DY, Hastings-Tolsma MT. Fundamentals of nursing research. Boston: Jones & Bartlett; 2003.

Creswell JW. Research design: qualitative, quantitative, and mixed methods approaches. Thousand Oaks: Sage; 2014.

Fawcett J. The relationship of theory and research. Philadelphia: F. A. Davis; 1999.

Field A. Discovering statistics using IBM SPSS statistics. London: Sage; 2013.

Grove SK, Gray JR, Burns N. Understanding nursing research: building an evidence-based practice. 6th ed. St. Louis: Elsevier Saunders; 2015.

Hair JF, Black WC, Babin BJ, Anderson RE, Tatham RD. Multivariate data analysis. Upper Saddle River: Pearson Prentice Hall; 2006.

Katz MH. Multivariable analysis: a practical guide for clinicians. Cambridge: Cambridge University Press; 2006.

Book   Google Scholar  

McHugh ML. Scientific inquiry. J Specialists Pediatr Nurs. 2007; 8 (1):35–7. Volume 8, Issue 1, Version of Record online: 22 FEB 2007

Pallant J. SPSS survival manual: a step by step guide to data analysis using IBM SPSS. Sydney: Allen & Unwin; 2016.

Polit DF, Beck CT. Nursing research: principles and methods. Philadelphia: Lippincott Williams & Wilkins; 2004.

Trochim WMK, Donnelly JP. Research methods knowledge base. 3rd ed. Mason: Thomson Custom Publishing; 2007.

Tabachnick, B. G., & Fidell, L. S. (2013). Using multivariate statistics. Boston: Pearson Education.

Wells CS, Hin JM. Dealing with assumptions underlying statistical tests. Psychol Sch. 2007;44(5):495–502.

Download references

Author information

Authors and affiliations.

Centre for Business and Social Innovation, University of Technology Sydney, Ultimo, NSW, Australia

Yong Moon Jung

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Yong Moon Jung .

Editor information

Editors and affiliations.

School of Science and Health, Western Sydney University, Penrith, NSW, Australia

Pranee Liamputtong

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this entry

Cite this entry.

Jung, Y.M. (2019). Data Analysis in Quantitative Research. In: Liamputtong, P. (eds) Handbook of Research Methods in Health Social Sciences. Springer, Singapore. https://doi.org/10.1007/978-981-10-5251-4_109

Download citation

DOI : https://doi.org/10.1007/978-981-10-5251-4_109

Published : 13 January 2019

Publisher Name : Springer, Singapore

Print ISBN : 978-981-10-5250-7

Online ISBN : 978-981-10-5251-4

eBook Packages : Social Sciences Reference Module Humanities and Social Sciences Reference Module Business, Economics and Social Sciences

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Banner

Research Methods

  • Getting Started
  • What is Research Design?
  • Research Approach
  • Research Methodology
  • Data Collection
  • Data Analysis & Interpretation
  • Population & Sampling
  • Theories, Theoretical Perspective & Theoretical Framework
  • Useful Resources

Further Resources

Cover Art

Data Analysis & Interpretation

  • Quantitative Data

Qualitative Data

  • Mixed Methods

You will need to tidy, analyse and interpret the data you collected to give meaning to it, and to answer your research question.  Your choice of methodology points the way to the most suitable method of analysing your data.

importance of data analysis in research methodology

If the data is numeric you can use a software package such as SPSS, Excel Spreadsheet or “R” to do statistical analysis.  You can identify things like mean, median and average or identify a causal or correlational relationship between variables.  

The University of Connecticut has useful information on statistical analysis.

If your research set out to test a hypothesis your research will either support or refute it, and you will need to explain why this is the case.  You should also highlight and discuss any issues or actions that may have impacted on your results, either positively or negatively.  To fully contribute to the body of knowledge in your area be sure to discuss and interpret your results within the context of your research and the existing literature on the topic.

Data analysis for a qualitative study can be complex because of the variety of types of data that can be collected. Qualitative researchers aren’t attempting to measure observable characteristics, they are often attempting to capture an individual’s interpretation of a phenomena or situation in a particular context or setting.  This data could be captured in text from an interview or focus group, a movie, images, or documents.   Analysis of this type of data is usually done by analysing each artefact according to a predefined and outlined criteria for analysis and then by using a coding system.  The code can be developed by the researcher before analysis or the researcher may develop a code from the research data.  This can be done by hand or by using thematic analysis software such as NVivo.

Interpretation of qualitative data can be presented as a narrative.  The themes identified from the research can be organised and integrated with themes in the existing literature to give further weight and meaning to the research.  The interpretation should also state if the aims and objectives of the research were met.   Any shortcomings with research or areas for further research should also be discussed (Creswell,2009)*.

For further information on analysing and presenting qualitative date, read this article in Nature .

Mixed Methods Data

Data analysis for mixed methods involves aspects of both quantitative and qualitative methods.  However, the sequencing of data collection and analysis is important in terms of the mixed method approach that you are taking.  For example, you could be using a convergent, sequential or transformative model which directly impacts how you use different data to inform, support or direct the course of your study.

The intention in using mixed methods is to produce a synthesis of both quantitative and qualitative information to give a detailed picture of a phenomena in a particular context or setting. To fully understand how best to produce this synthesis it might be worth looking at why researchers choose this method.  Bergin**(2018) states that researchers choose mixed methods because it allows them to triangulate, illuminate or discover a more diverse set of findings.  Therefore, when it comes to interpretation you will need to return to the purpose of your research and discuss and interpret your data in that context. As with quantitative and qualitative methods, interpretation of data should be discussed within the context of the existing literature.

Bergin’s book is available in the Library to borrow. Bolton LTT collection 519.5 BER

Creswell’s book is available in the Library to borrow.  Bolton LTT collection 300.72 CRE

For more information on data analysis look at Sage Research Methods database on the library website.

*Creswell, John W.(2009)  Research design: qualitative, and mixed methods approaches.  Sage, Los Angeles, pp 183

**Bergin, T (2018), Data analysis: quantitative, qualitative and mixed methods. Sage, Los Angeles, pp182

  • << Previous: Data Collection
  • Next: Population & Sampling >>
  • Last Updated: Sep 7, 2023 3:09 PM
  • URL: https://tudublin.libguides.com/research_methods
  • Privacy Policy

Research Method

Home » Data Analysis – Process, Methods and Types

Data Analysis – Process, Methods and Types

Table of Contents

Data Analysis

Data Analysis

Definition:

Data analysis refers to the process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, drawing conclusions, and supporting decision-making. It involves applying various statistical and computational techniques to interpret and derive insights from large datasets. The ultimate aim of data analysis is to convert raw data into actionable insights that can inform business decisions, scientific research, and other endeavors.

Data Analysis Process

The following are step-by-step guides to the data analysis process:

Define the Problem

The first step in data analysis is to clearly define the problem or question that needs to be answered. This involves identifying the purpose of the analysis, the data required, and the intended outcome.

Collect the Data

The next step is to collect the relevant data from various sources. This may involve collecting data from surveys, databases, or other sources. It is important to ensure that the data collected is accurate, complete, and relevant to the problem being analyzed.

Clean and Organize the Data

Once the data has been collected, it needs to be cleaned and organized. This involves removing any errors or inconsistencies in the data, filling in missing values, and ensuring that the data is in a format that can be easily analyzed.

Analyze the Data

The next step is to analyze the data using various statistical and analytical techniques. This may involve identifying patterns in the data, conducting statistical tests, or using machine learning algorithms to identify trends and insights.

Interpret the Results

After analyzing the data, the next step is to interpret the results. This involves drawing conclusions based on the analysis and identifying any significant findings or trends.

Communicate the Findings

Once the results have been interpreted, they need to be communicated to stakeholders. This may involve creating reports, visualizations, or presentations to effectively communicate the findings and recommendations.

Take Action

The final step in the data analysis process is to take action based on the findings. This may involve implementing new policies or procedures, making strategic decisions, or taking other actions based on the insights gained from the analysis.

Types of Data Analysis

Types of Data Analysis are as follows:

Descriptive Analysis

This type of analysis involves summarizing and describing the main characteristics of a dataset, such as the mean, median, mode, standard deviation, and range.

Inferential Analysis

This type of analysis involves making inferences about a population based on a sample. Inferential analysis can help determine whether a certain relationship or pattern observed in a sample is likely to be present in the entire population.

Diagnostic Analysis

This type of analysis involves identifying and diagnosing problems or issues within a dataset. Diagnostic analysis can help identify outliers, errors, missing data, or other anomalies in the dataset.

Predictive Analysis

This type of analysis involves using statistical models and algorithms to predict future outcomes or trends based on historical data. Predictive analysis can help businesses and organizations make informed decisions about the future.

Prescriptive Analysis

This type of analysis involves recommending a course of action based on the results of previous analyses. Prescriptive analysis can help organizations make data-driven decisions about how to optimize their operations, products, or services.

Exploratory Analysis

This type of analysis involves exploring the relationships and patterns within a dataset to identify new insights and trends. Exploratory analysis is often used in the early stages of research or data analysis to generate hypotheses and identify areas for further investigation.

Data Analysis Methods

Data Analysis Methods are as follows:

Statistical Analysis

This method involves the use of mathematical models and statistical tools to analyze and interpret data. It includes measures of central tendency, correlation analysis, regression analysis, hypothesis testing, and more.

Machine Learning

This method involves the use of algorithms to identify patterns and relationships in data. It includes supervised and unsupervised learning, classification, clustering, and predictive modeling.

Data Mining

This method involves using statistical and machine learning techniques to extract information and insights from large and complex datasets.

Text Analysis

This method involves using natural language processing (NLP) techniques to analyze and interpret text data. It includes sentiment analysis, topic modeling, and entity recognition.

Network Analysis

This method involves analyzing the relationships and connections between entities in a network, such as social networks or computer networks. It includes social network analysis and graph theory.

Time Series Analysis

This method involves analyzing data collected over time to identify patterns and trends. It includes forecasting, decomposition, and smoothing techniques.

Spatial Analysis

This method involves analyzing geographic data to identify spatial patterns and relationships. It includes spatial statistics, spatial regression, and geospatial data visualization.

Data Visualization

This method involves using graphs, charts, and other visual representations to help communicate the findings of the analysis. It includes scatter plots, bar charts, heat maps, and interactive dashboards.

Qualitative Analysis

This method involves analyzing non-numeric data such as interviews, observations, and open-ended survey responses. It includes thematic analysis, content analysis, and grounded theory.

Multi-criteria Decision Analysis

This method involves analyzing multiple criteria and objectives to support decision-making. It includes techniques such as the analytical hierarchy process, TOPSIS, and ELECTRE.

Data Analysis Tools

There are various data analysis tools available that can help with different aspects of data analysis. Below is a list of some commonly used data analysis tools:

  • Microsoft Excel: A widely used spreadsheet program that allows for data organization, analysis, and visualization.
  • SQL : A programming language used to manage and manipulate relational databases.
  • R : An open-source programming language and software environment for statistical computing and graphics.
  • Python : A general-purpose programming language that is widely used in data analysis and machine learning.
  • Tableau : A data visualization software that allows for interactive and dynamic visualizations of data.
  • SAS : A statistical analysis software used for data management, analysis, and reporting.
  • SPSS : A statistical analysis software used for data analysis, reporting, and modeling.
  • Matlab : A numerical computing software that is widely used in scientific research and engineering.
  • RapidMiner : A data science platform that offers a wide range of data analysis and machine learning tools.

Applications of Data Analysis

Data analysis has numerous applications across various fields. Below are some examples of how data analysis is used in different fields:

  • Business : Data analysis is used to gain insights into customer behavior, market trends, and financial performance. This includes customer segmentation, sales forecasting, and market research.
  • Healthcare : Data analysis is used to identify patterns and trends in patient data, improve patient outcomes, and optimize healthcare operations. This includes clinical decision support, disease surveillance, and healthcare cost analysis.
  • Education : Data analysis is used to measure student performance, evaluate teaching effectiveness, and improve educational programs. This includes assessment analytics, learning analytics, and program evaluation.
  • Finance : Data analysis is used to monitor and evaluate financial performance, identify risks, and make investment decisions. This includes risk management, portfolio optimization, and fraud detection.
  • Government : Data analysis is used to inform policy-making, improve public services, and enhance public safety. This includes crime analysis, disaster response planning, and social welfare program evaluation.
  • Sports : Data analysis is used to gain insights into athlete performance, improve team strategy, and enhance fan engagement. This includes player evaluation, scouting analysis, and game strategy optimization.
  • Marketing : Data analysis is used to measure the effectiveness of marketing campaigns, understand customer behavior, and develop targeted marketing strategies. This includes customer segmentation, marketing attribution analysis, and social media analytics.
  • Environmental science : Data analysis is used to monitor and evaluate environmental conditions, assess the impact of human activities on the environment, and develop environmental policies. This includes climate modeling, ecological forecasting, and pollution monitoring.

When to Use Data Analysis

Data analysis is useful when you need to extract meaningful insights and information from large and complex datasets. It is a crucial step in the decision-making process, as it helps you understand the underlying patterns and relationships within the data, and identify potential areas for improvement or opportunities for growth.

Here are some specific scenarios where data analysis can be particularly helpful:

  • Problem-solving : When you encounter a problem or challenge, data analysis can help you identify the root cause and develop effective solutions.
  • Optimization : Data analysis can help you optimize processes, products, or services to increase efficiency, reduce costs, and improve overall performance.
  • Prediction: Data analysis can help you make predictions about future trends or outcomes, which can inform strategic planning and decision-making.
  • Performance evaluation : Data analysis can help you evaluate the performance of a process, product, or service to identify areas for improvement and potential opportunities for growth.
  • Risk assessment : Data analysis can help you assess and mitigate risks, whether it is financial, operational, or related to safety.
  • Market research : Data analysis can help you understand customer behavior and preferences, identify market trends, and develop effective marketing strategies.
  • Quality control: Data analysis can help you ensure product quality and customer satisfaction by identifying and addressing quality issues.

Purpose of Data Analysis

The primary purposes of data analysis can be summarized as follows:

  • To gain insights: Data analysis allows you to identify patterns and trends in data, which can provide valuable insights into the underlying factors that influence a particular phenomenon or process.
  • To inform decision-making: Data analysis can help you make informed decisions based on the information that is available. By analyzing data, you can identify potential risks, opportunities, and solutions to problems.
  • To improve performance: Data analysis can help you optimize processes, products, or services by identifying areas for improvement and potential opportunities for growth.
  • To measure progress: Data analysis can help you measure progress towards a specific goal or objective, allowing you to track performance over time and adjust your strategies accordingly.
  • To identify new opportunities: Data analysis can help you identify new opportunities for growth and innovation by identifying patterns and trends that may not have been visible before.

Examples of Data Analysis

Some Examples of Data Analysis are as follows:

  • Social Media Monitoring: Companies use data analysis to monitor social media activity in real-time to understand their brand reputation, identify potential customer issues, and track competitors. By analyzing social media data, businesses can make informed decisions on product development, marketing strategies, and customer service.
  • Financial Trading: Financial traders use data analysis to make real-time decisions about buying and selling stocks, bonds, and other financial instruments. By analyzing real-time market data, traders can identify trends and patterns that help them make informed investment decisions.
  • Traffic Monitoring : Cities use data analysis to monitor traffic patterns and make real-time decisions about traffic management. By analyzing data from traffic cameras, sensors, and other sources, cities can identify congestion hotspots and make changes to improve traffic flow.
  • Healthcare Monitoring: Healthcare providers use data analysis to monitor patient health in real-time. By analyzing data from wearable devices, electronic health records, and other sources, healthcare providers can identify potential health issues and provide timely interventions.
  • Online Advertising: Online advertisers use data analysis to make real-time decisions about advertising campaigns. By analyzing data on user behavior and ad performance, advertisers can make adjustments to their campaigns to improve their effectiveness.
  • Sports Analysis : Sports teams use data analysis to make real-time decisions about strategy and player performance. By analyzing data on player movement, ball position, and other variables, coaches can make informed decisions about substitutions, game strategy, and training regimens.
  • Energy Management : Energy companies use data analysis to monitor energy consumption in real-time. By analyzing data on energy usage patterns, companies can identify opportunities to reduce energy consumption and improve efficiency.

Characteristics of Data Analysis

Characteristics of Data Analysis are as follows:

  • Objective : Data analysis should be objective and based on empirical evidence, rather than subjective assumptions or opinions.
  • Systematic : Data analysis should follow a systematic approach, using established methods and procedures for collecting, cleaning, and analyzing data.
  • Accurate : Data analysis should produce accurate results, free from errors and bias. Data should be validated and verified to ensure its quality.
  • Relevant : Data analysis should be relevant to the research question or problem being addressed. It should focus on the data that is most useful for answering the research question or solving the problem.
  • Comprehensive : Data analysis should be comprehensive and consider all relevant factors that may affect the research question or problem.
  • Timely : Data analysis should be conducted in a timely manner, so that the results are available when they are needed.
  • Reproducible : Data analysis should be reproducible, meaning that other researchers should be able to replicate the analysis using the same data and methods.
  • Communicable : Data analysis should be communicated clearly and effectively to stakeholders and other interested parties. The results should be presented in a way that is understandable and useful for decision-making.

Advantages of Data Analysis

Advantages of Data Analysis are as follows:

  • Better decision-making: Data analysis helps in making informed decisions based on facts and evidence, rather than intuition or guesswork.
  • Improved efficiency: Data analysis can identify inefficiencies and bottlenecks in business processes, allowing organizations to optimize their operations and reduce costs.
  • Increased accuracy: Data analysis helps to reduce errors and bias, providing more accurate and reliable information.
  • Better customer service: Data analysis can help organizations understand their customers better, allowing them to provide better customer service and improve customer satisfaction.
  • Competitive advantage: Data analysis can provide organizations with insights into their competitors, allowing them to identify areas where they can gain a competitive advantage.
  • Identification of trends and patterns : Data analysis can identify trends and patterns in data that may not be immediately apparent, helping organizations to make predictions and plan for the future.
  • Improved risk management : Data analysis can help organizations identify potential risks and take proactive steps to mitigate them.
  • Innovation: Data analysis can inspire innovation and new ideas by revealing new opportunities or previously unknown correlations in data.

Limitations of Data Analysis

  • Data quality: The quality of data can impact the accuracy and reliability of analysis results. If data is incomplete, inconsistent, or outdated, the analysis may not provide meaningful insights.
  • Limited scope: Data analysis is limited by the scope of the data available. If data is incomplete or does not capture all relevant factors, the analysis may not provide a complete picture.
  • Human error : Data analysis is often conducted by humans, and errors can occur in data collection, cleaning, and analysis.
  • Cost : Data analysis can be expensive, requiring specialized tools, software, and expertise.
  • Time-consuming : Data analysis can be time-consuming, especially when working with large datasets or conducting complex analyses.
  • Overreliance on data: Data analysis should be complemented with human intuition and expertise. Overreliance on data can lead to a lack of creativity and innovation.
  • Privacy concerns: Data analysis can raise privacy concerns if personal or sensitive information is used without proper consent or security measures.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Cluster Analysis

Cluster Analysis – Types, Methods and Examples

Data collection

Data Collection – Methods Types and Examples

Delimitations

Delimitations in Research – Types, Examples and...

Discriminant Analysis

Discriminant Analysis – Methods, Types and...

Research Process

Research Process – Steps, Examples and Tips

Research Design

Research Design – Types, Methods and Examples

Table of Contents

What is data analysis, why is data analysis important, what is the data analysis process, data analysis methods, applications of data analysis, top data analysis techniques to analyze data, what is the importance of data analysis in research, future trends in data analysis, choose the right program, what is data analysis: a comprehensive guide.

What Is Data Analysis: A Comprehensive Guide

In the contemporary business landscape, gaining a competitive edge is imperative, given the challenges such as rapidly evolving markets, economic unpredictability, fluctuating political environments, capricious consumer sentiments, and even global health crises. These challenges have reduced the room for error in business operations. For companies striving not only to survive but also to thrive in this demanding environment, the key lies in embracing the concept of data analysis . This involves strategically accumulating valuable, actionable information, which is leveraged to enhance decision-making processes.

If you're interested in forging a career in data analysis and wish to discover the top data analysis courses in 2024, we invite you to explore our informative video. It will provide insights into the opportunities to develop your expertise in this crucial field.

Data analysis inspects, cleans, transforms, and models data to extract insights and support decision-making. As a data analyst , your role involves dissecting vast datasets, unearthing hidden patterns, and translating numbers into actionable information.

Data analysis plays a pivotal role in today's data-driven world. It helps organizations harness the power of data, enabling them to make decisions, optimize processes, and gain a competitive edge. By turning raw data into meaningful insights, data analysis empowers businesses to identify opportunities, mitigate risks, and enhance their overall performance.

1. Informed Decision-Making

Data analysis is the compass that guides decision-makers through a sea of information. It enables organizations to base their choices on concrete evidence rather than intuition or guesswork. In business, this means making decisions more likely to lead to success, whether choosing the right marketing strategy, optimizing supply chains, or launching new products. By analyzing data, decision-makers can assess various options' potential risks and rewards, leading to better choices.

2. Improved Understanding

Data analysis provides a deeper understanding of processes, behaviors, and trends. It allows organizations to gain insights into customer preferences, market dynamics, and operational efficiency .

3. Competitive Advantage

Organizations can identify opportunities and threats by analyzing market trends, consumer behavior , and competitor performance. They can pivot their strategies to respond effectively, staying one step ahead of the competition. This ability to adapt and innovate based on data insights can lead to a significant competitive advantage.

Become a Data Science & Business Analytics Professional

  • 11.5 M Expected New Jobs For Data Science And Analytics
  • 28% Annual Job Growth By 2026
  • $46K-$100K Average Annual Salary

Post Graduate Program in Data Analytics

  • Post Graduate Program certificate and Alumni Association membership
  • Exclusive hackathons and Ask me Anything sessions by IBM

Data Analyst

  • Industry-recognized Data Analyst Master’s certificate from Simplilearn
  • Dedicated live sessions by faculty of industry experts

Here's what learners are saying regarding our programs:

Felix Chong

Felix Chong

Project manage , codethink.

After completing this course, I landed a new job & a salary hike of 30%. I now work with Zuhlke Group as a Project Manager.

Gayathri Ramesh

Gayathri Ramesh

Associate data engineer , publicis sapient.

The course was well structured and curated. The live classes were extremely helpful. They made learning more productive and interactive. The program helped me change my domain from a data analyst to an Associate Data Engineer.

4. Risk Mitigation

Data analysis is a valuable tool for risk assessment and management. Organizations can assess potential issues and take preventive measures by analyzing historical data. For instance, data analysis detects fraudulent activities in the finance industry by identifying unusual transaction patterns. This not only helps minimize financial losses but also safeguards the reputation and trust of customers.

5. Efficient Resource Allocation

Data analysis helps organizations optimize resource allocation. Whether it's allocating budgets, human resources, or manufacturing capacities, data-driven insights can ensure that resources are utilized efficiently. For example, data analysis can help hospitals allocate staff and resources to the areas with the highest patient demand, ensuring that patient care remains efficient and effective.

6. Continuous Improvement

Data analysis is a catalyst for continuous improvement. It allows organizations to monitor performance metrics, track progress, and identify areas for enhancement. This iterative process of analyzing data, implementing changes, and analyzing again leads to ongoing refinement and excellence in processes and products.

The data analysis process is a structured sequence of steps that lead from raw data to actionable insights. Here are the answers to what is data analysis:

  • Data Collection: Gather relevant data from various sources, ensuring data quality and integrity.
  • Data Cleaning: Identify and rectify errors, missing values, and inconsistencies in the dataset. Clean data is crucial for accurate analysis.
  • Exploratory Data Analysis (EDA): Conduct preliminary analysis to understand the data's characteristics, distributions, and relationships. Visualization techniques are often used here.
  • Data Transformation: Prepare the data for analysis by encoding categorical variables, scaling features, and handling outliers, if necessary.
  • Model Building: Depending on the objectives, apply appropriate data analysis methods, such as regression, clustering, or deep learning.
  • Model Evaluation: Depending on the problem type, assess the models' performance using metrics like Mean Absolute Error, Root Mean Squared Error , or others.
  • Interpretation and Visualization: Translate the model's results into actionable insights. Visualizations, tables, and summary statistics help in conveying findings effectively.
  • Deployment: Implement the insights into real-world solutions or strategies, ensuring that the data-driven recommendations are implemented.

1. Regression Analysis

Regression analysis is a powerful method for understanding the relationship between a dependent and one or more independent variables. It is applied in economics, finance, and social sciences. By fitting a regression model, you can make predictions, analyze cause-and-effect relationships, and uncover trends within your data.

2. Statistical Analysis

Statistical analysis encompasses a broad range of techniques for summarizing and interpreting data. It involves descriptive statistics (mean, median, standard deviation), inferential statistics (hypothesis testing, confidence intervals), and multivariate analysis. Statistical methods help make inferences about populations from sample data, draw conclusions, and assess the significance of results.

3. Cohort Analysis

Cohort analysis focuses on understanding the behavior of specific groups or cohorts over time. It can reveal patterns, retention rates, and customer lifetime value, helping businesses tailor their strategies.

4. Content Analysis

It is a qualitative data analysis method used to study the content of textual, visual, or multimedia data. Social sciences, journalism, and marketing often employ it to analyze themes, sentiments, or patterns within documents or media. Content analysis can help researchers gain insights from large volumes of unstructured data.

5. Factor Analysis

Factor analysis is a technique for uncovering underlying latent factors that explain the variance in observed variables. It is commonly used in psychology and the social sciences to reduce the dimensionality of data and identify underlying constructs. Factor analysis can simplify complex datasets, making them easier to interpret and analyze.

6. Monte Carlo Method

This method is a simulation technique that uses random sampling to solve complex problems and make probabilistic predictions. Monte Carlo simulations allow analysts to model uncertainty and risk, making it a valuable tool for decision-making.

7. Text Analysis

Also known as text mining , this method involves extracting insights from textual data. It analyzes large volumes of text, such as social media posts, customer reviews, or documents. Text analysis can uncover sentiment, topics, and trends, enabling organizations to understand public opinion, customer feedback, and emerging issues.

8. Time Series Analysis

Time series analysis deals with data collected at regular intervals over time. It is essential for forecasting, trend analysis, and understanding temporal patterns. Time series methods include moving averages, exponential smoothing, and autoregressive integrated moving average (ARIMA) models. They are widely used in finance for stock price prediction, meteorology for weather forecasting, and economics for economic modeling.

9. Descriptive Analysis

Descriptive analysis   involves summarizing and describing the main features of a dataset. It focuses on organizing and presenting the data in a meaningful way, often using measures such as mean, median, mode, and standard deviation. It provides an overview of the data and helps identify patterns or trends.

10. Inferential Analysis

Inferential analysis   aims to make inferences or predictions about a larger population based on sample data. It involves applying statistical techniques such as hypothesis testing, confidence intervals, and regression analysis. It helps generalize findings from a sample to a larger population.

11. Exploratory Data Analysis (EDA)

EDA   focuses on exploring and understanding the data without preconceived hypotheses. It involves visualizations, summary statistics, and data profiling techniques to uncover patterns, relationships, and interesting features. It helps generate hypotheses for further analysis.

12. Diagnostic Analysis

Diagnostic analysis aims to understand the cause-and-effect relationships within the data. It investigates the factors or variables that contribute to specific outcomes or behaviors. Techniques such as regression analysis, ANOVA (Analysis of Variance), or correlation analysis are commonly used in diagnostic analysis.

13. Predictive Analysis

Predictive analysis   involves using historical data to make predictions or forecasts about future outcomes. It utilizes statistical modeling techniques, machine learning algorithms, and time series analysis to identify patterns and build predictive models. It is often used for forecasting sales, predicting customer behavior, or estimating risk.

14. Prescriptive Analysis

Prescriptive analysis goes beyond predictive analysis by recommending actions or decisions based on the predictions. It combines historical data, optimization algorithms, and business rules to provide actionable insights and optimize outcomes. It helps in decision-making and resource allocation.

Our Data Analyst Master's Program will help you learn analytics tools and techniques to become a Data Analyst expert! It's the pefect course for you to jumpstart your career. Enroll now!

Data analysis is a versatile and indispensable tool that finds applications across various industries and domains. Its ability to extract actionable insights from data has made it a fundamental component of decision-making and problem-solving. Let's explore some of the key applications of data analysis:

1. Business and Marketing

  • Market Research: Data analysis helps businesses understand market trends, consumer preferences, and competitive landscapes. It aids in identifying opportunities for product development, pricing strategies, and market expansion.
  • Sales Forecasting: Data analysis models can predict future sales based on historical data, seasonality, and external factors. This helps businesses optimize inventory management and resource allocation.

2. Healthcare and Life Sciences

  • Disease Diagnosis: Data analysis is vital in medical diagnostics, from interpreting medical images (e.g., MRI, X-rays) to analyzing patient records. Machine learning models can assist in early disease detection.
  • Drug Discovery: Pharmaceutical companies use data analysis to identify potential drug candidates, predict their efficacy, and optimize clinical trials.
  • Genomics and Personalized Medicine: Genomic data analysis enables personalized treatment plans by identifying genetic markers that influence disease susceptibility and response to therapies.
  • Risk Management: Financial institutions use data analysis to assess credit risk, detect fraudulent activities, and model market risks.
  • Algorithmic Trading: Data analysis is integral to developing trading algorithms that analyze market data and execute trades automatically based on predefined strategies.
  • Fraud Detection: Credit card companies and banks employ data analysis to identify unusual transaction patterns and detect fraudulent activities in real time.

4. Manufacturing and Supply Chain

  • Quality Control: Data analysis monitors and controls product quality on manufacturing lines. It helps detect defects and ensure consistency in production processes.
  • Inventory Optimization: By analyzing demand patterns and supply chain data, businesses can optimize inventory levels, reduce carrying costs, and ensure timely deliveries.

5. Social Sciences and Academia

  • Social Research: Researchers in social sciences analyze survey data, interviews, and textual data to study human behavior, attitudes, and trends. It helps in policy development and understanding societal issues.
  • Academic Research: Data analysis is crucial to scientific physics, biology, and environmental science research. It assists in interpreting experimental results and drawing conclusions.

6. Internet and Technology

  • Search Engines: Google uses complex data analysis algorithms to retrieve and rank search results based on user behavior and relevance.
  • Recommendation Systems: Services like Netflix and Amazon leverage data analysis to recommend content and products to users based on their past preferences and behaviors.

7. Environmental Science

  • Climate Modeling: Data analysis is essential in climate science. It analyzes temperature, precipitation, and other environmental data. It helps in understanding climate patterns and predicting future trends.
  • Environmental Monitoring: Remote sensing data analysis monitors ecological changes, including deforestation, water quality, and air pollution.

1. Descriptive Statistics

Descriptive statistics provide a snapshot of a dataset's central tendencies and variability. These techniques help summarize and understand the data's basic characteristics.

2. Inferential Statistics

Inferential statistics involve making predictions or inferences based on a sample of data. Techniques include hypothesis testing, confidence intervals, and regression analysis. These methods are crucial for drawing conclusions from data and assessing the significance of findings.

3. Regression Analysis

It explores the relationship between one or more independent variables and a dependent variable. It is widely used for prediction and understanding causal links. Linear, logistic, and multiple regression are common in various fields.

4. Clustering Analysis

It is an unsupervised learning method that groups similar data points. K-means clustering and hierarchical clustering are examples. This technique is used for customer segmentation, anomaly detection, and pattern recognition.

5. Classification Analysis

Classification analysis assigns data points to predefined categories or classes. It's often used in applications like spam email detection, image recognition, and sentiment analysis. Popular algorithms include decision trees, support vector machines, and neural networks.

6. Time Series Analysis

Time series analysis deals with data collected over time, making it suitable for forecasting and trend analysis. Techniques like moving averages, autoregressive integrated moving averages (ARIMA), and exponential smoothing are applied in fields like finance, economics, and weather forecasting.

7. Text Analysis (Natural Language Processing - NLP)

Text analysis techniques, part of NLP , enable extracting insights from textual data. These methods include sentiment analysis, topic modeling, and named entity recognition. Text analysis is widely used for analyzing customer reviews, social media content, and news articles.

8. Principal Component Analysis

It is a dimensionality reduction technique that simplifies complex datasets while retaining important information. It transforms correlated variables into a set of linearly uncorrelated variables, making it easier to analyze and visualize high-dimensional data.

9. Anomaly Detection

Anomaly detection identifies unusual patterns or outliers in data. It's critical in fraud detection, network security, and quality control. Techniques like statistical methods, clustering-based approaches, and machine learning algorithms are employed for anomaly detection.

10. Data Mining

Data mining involves the automated discovery of patterns, associations, and relationships within large datasets. Techniques like association rule mining, frequent pattern analysis, and decision tree mining extract valuable knowledge from data.

11. Machine Learning and Deep Learning

ML and deep learning algorithms are applied for predictive modeling, classification, and regression tasks. Techniques like random forests, support vector machines, and convolutional neural networks (CNNs) have revolutionized various industries, including healthcare, finance, and image recognition.

12. Geographic Information Systems (GIS) Analysis

GIS analysis combines geographical data with spatial analysis techniques to solve location-based problems. It's widely used in urban planning, environmental management, and disaster response.

  • Uncovering Patterns and Trends: Data analysis allows researchers to identify patterns, trends, and relationships within the data. By examining these patterns, researchers can better understand the phenomena under investigation. For example, in epidemiological research, data analysis can reveal the trends and patterns of disease outbreaks, helping public health officials take proactive measures.
  • Testing Hypotheses: Research often involves formulating hypotheses and testing them. Data analysis provides the means to evaluate hypotheses rigorously. Through statistical tests and inferential analysis, researchers can determine whether the observed patterns in the data are statistically significant or simply due to chance.
  • Making Informed Conclusions: Data analysis helps researchers draw meaningful and evidence-based conclusions from their research findings. It provides a quantitative basis for making claims and recommendations. In academic research, these conclusions form the basis for scholarly publications and contribute to the body of knowledge in a particular field.
  • Enhancing Data Quality: Data analysis includes data cleaning and validation processes that improve the quality and reliability of the dataset. Identifying and addressing errors, missing values, and outliers ensures that the research results accurately reflect the phenomena being studied.
  • Supporting Decision-Making: In applied research, data analysis assists decision-makers in various sectors, such as business, government, and healthcare. Policy decisions, marketing strategies, and resource allocations are often based on research findings.
  • Identifying Outliers and Anomalies: Outliers and anomalies in data can hold valuable information or indicate errors. Data analysis techniques can help identify these exceptional cases, whether medical diagnoses, financial fraud detection, or product quality control.
  • Revealing Insights: Research data often contain hidden insights that are not immediately apparent. Data analysis techniques, such as clustering or text analysis, can uncover these insights. For example, social media data sentiment analysis can reveal public sentiment and trends on various topics in social sciences.
  • Forecasting and Prediction: Data analysis allows for the development of predictive models. Researchers can use historical data to build models forecasting future trends or outcomes. This is valuable in fields like finance for stock price predictions, meteorology for weather forecasting, and epidemiology for disease spread projections.
  • Optimizing Resources: Research often involves resource allocation. Data analysis helps researchers and organizations optimize resource use by identifying areas where improvements can be made, or costs can be reduced.
  • Continuous Improvement: Data analysis supports the iterative nature of research. Researchers can analyze data, draw conclusions, and refine their hypotheses or research designs based on their findings. This cycle of analysis and refinement leads to continuous improvement in research methods and understanding.

Data analysis is an ever-evolving field driven by technological advancements. The future of data analysis promises exciting developments that will reshape how data is collected, processed, and utilized. Here are some of the key trends of data analysis:

1. Artificial Intelligence and Machine Learning Integration

Artificial intelligence (AI) and machine learning (ML) are expected to play a central role in data analysis. These technologies can automate complex data processing tasks, identify patterns at scale, and make highly accurate predictions. AI-driven analytics tools will become more accessible, enabling organizations to harness the power of ML without requiring extensive expertise.

2. Augmented Analytics

Augmented analytics combines AI and natural language processing (NLP) to assist data analysts in finding insights. These tools can automatically generate narratives, suggest visualizations, and highlight important trends within data. They enhance the speed and efficiency of data analysis, making it more accessible to a broader audience.

3. Data Privacy and Ethical Considerations

As data collection becomes more pervasive, privacy concerns and ethical considerations will gain prominence. Future data analysis trends will prioritize responsible data handling, transparency, and compliance with regulations like GDPR . Differential privacy techniques and data anonymization will be crucial in balancing data utility with privacy protection.

4. Real-time and Streaming Data Analysis

The demand for real-time insights will drive the adoption of real-time and streaming data analysis. Organizations will leverage technologies like Apache Kafka and Apache Flink to process and analyze data as it is generated. This trend is essential for fraud detection, IoT analytics, and monitoring systems.

5. Quantum Computing

It can potentially revolutionize data analysis by solving complex problems exponentially faster than classical computers. Although quantum computing is in its infancy, its impact on optimization, cryptography , and simulations will be significant once practical quantum computers become available.

6. Edge Analytics

With the proliferation of edge devices in the Internet of Things (IoT), data analysis is moving closer to the data source. Edge analytics allows for real-time processing and decision-making at the network's edge, reducing latency and bandwidth requirements.

7. Explainable AI (XAI)

Interpretable and explainable AI models will become crucial, especially in applications where trust and transparency are paramount. XAI techniques aim to make AI decisions more understandable and accountable, which is critical in healthcare and finance.

8. Data Democratization

The future of data analysis will see more democratization of data access and analysis tools. Non-technical users will have easier access to data and analytics through intuitive interfaces and self-service BI tools , reducing the reliance on data specialists.

9. Advanced Data Visualization

Data visualization tools will continue to evolve, offering more interactivity, 3D visualization, and augmented reality (AR) capabilities. Advanced visualizations will help users explore data in new and immersive ways.

10. Ethnographic Data Analysis

Ethnographic data analysis will gain importance as organizations seek to understand human behavior, cultural dynamics, and social trends. This qualitative data analysis approach and quantitative methods will provide a holistic understanding of complex issues.

11. Data Analytics Ethics and Bias Mitigation

Ethical considerations in data analysis will remain a key trend. Efforts to identify and mitigate bias in algorithms and models will become standard practice, ensuring fair and equitable outcomes.

Our Data Analytics courses have been meticulously crafted to equip you with the necessary skills and knowledge to thrive in this swiftly expanding industry. Our instructors will lead you through immersive, hands-on projects, real-world simulations, and illuminating case studies, ensuring you gain the practical expertise necessary for success. Through our courses, you will acquire the ability to dissect data, craft enlightening reports, and make data-driven choices that have the potential to steer businesses toward prosperity.

Having addressed the question of what is data analysis, if you're considering a career in data analytics, it's advisable to begin by researching the prerequisites for becoming a data analyst. You may also want to explore the Post Graduate Program in Data Analytics offered in collaboration with Purdue University. This program offers a practical learning experience through real-world case studies and projects aligned with industry needs. It provides comprehensive exposure to the essential technologies and skills currently employed in the field of data analytics.

Program Name Data Analyst Post Graduate Program In Data Analytics Data Analytics Bootcamp Geo All Geos All Geos US University Simplilearn Purdue Caltech Course Duration 11 Months 8 Months 6 Months Coding Experience Required No Basic No Skills You Will Learn 10+ skills including Python, MySQL, Tableau, NumPy and more Data Analytics, Statistical Analysis using Excel, Data Analysis Python and R, and more Data Visualization with Tableau, Linear and Logistic Regression, Data Manipulation and more Additional Benefits Applied Learning via Capstone and 20+ industry-relevant Data Analytics projects Purdue Alumni Association Membership Free IIMJobs Pro-Membership of 6 months Access to Integrated Practical Labs Caltech CTME Circle Membership Cost $$ $$$$ $$$$ Explore Program Explore Program Explore Program

1. What is the difference between data analysis and data science? 

Data analysis primarily involves extracting meaningful insights from existing data using statistical techniques and visualization tools. Whereas, data science encompasses a broader spectrum, incorporating data analysis as a subset while involving machine learning, deep learning, and predictive modeling to build data-driven solutions and algorithms.

2. What are the common mistakes to avoid in data analysis?

Common mistakes to avoid in data analysis include neglecting data quality issues, failing to define clear objectives, overcomplicating visualizations, not considering algorithmic biases, and disregarding the importance of proper data preprocessing and cleaning. Additionally, avoiding making unwarranted assumptions and misinterpreting correlation as causation in your analysis is crucial.

Data Science & Business Analytics Courses Duration and Fees

Data Science & Business Analytics programs typically range from a few weeks to several months, with fees varying based on program and institution.

Learn from Industry Experts with free Masterclasses

Data science & business analytics.

How Can You Master the Art of Data Analysis: Uncover the Path to Career Advancement

Develop Your Career in Data Analytics with Purdue University Professional Certificate

Career Masterclass: How to Get Qualified for a Data Analytics Career

Recommended Reads

Big Data Career Guide: A Comprehensive Playbook to Becoming a Big Data Engineer

Why Python Is Essential for Data Analysis and Data Science?

All the Ins and Outs of Exploratory Data Analysis

The Rise of the Data-Driven Professional: 6 Non-Data Roles That Need Data Analytics Skills

Exploratory Data Analysis [EDA]: Techniques, Best Practices and Popular Applications

The Best Spotify Data Analysis Project You Need to Know

Get Affiliated Certifications with Live Class programs

  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

Banner

Research Guide: Data analysis and reporting findings

  • Postgraduate Online Training subject guide This link opens in a new window
  • Open Educational Resources (OERs)
  • Library support
  • Research ideas
  • You and your supervisor
  • Researcher skills
  • Research Data Management This link opens in a new window
  • Literature review
  • Plagiarism This link opens in a new window
  • Research Methods
  • Data analysis and reporting findings
  • Statistical support
  • Writing support
  • Researcher visibility
  • Conferences and Presentations
  • Postgraduate Forums
  • Soft skills development
  • Emotional support
  • The Commons Informer (blog)
  • Research Tip Archives
  • RC Newsletter Archives
  • Evaluation Forms

Data analysis and findings

Data analysis is the most crucial part of any research. Data analysis summarizes collected data. It involves the interpretation of data gathered through the use of analytical and logical reasoning to determine patterns, relationships or trends. 

Data Analysis Checklist

Cleaning  data

* Did you capture and code your data in the right manner?

*Do you have all data or missing data?

* Do you have enough observations?

* Do you have any outliers? If yes, what is the remedy for outlier?

* Does your data have the potential to answer your questions?

Analyzing data

* Visualize your data, e.g. charts, tables, and graphs, to mention a few.

*  Identify patterns, correlations, and trends

* Test your hypotheses

* Let your data tell a story

Reports the results

* Communicate and interpret the results

* Conclude and recommend

* Your targeted audience must understand your results

* Use more datasets and samples

* Use accessible and understandable data analytical tool

* Do not delegate your data analysis

* Clean data to confirm that they are complete and free from errors

* Analyze cleaned data

* Understand your results

* Keep in mind who will be reading your results and present it in a way that they will understand it

* Share the results with the supervisor oftentimes

Past presentations

  • PhD Writing Retreat - Analysing_Fieldwork_Data by Cori Wielenga A clear and concise presentation on the ‘now what’ and ‘so what’ of data collection and analysis - compiled and originally presented by Cori Wielenga.

Online Resources

importance of data analysis in research methodology

  • Qualitative analysis of interview data: A step-by-step guide
  • Qualitative Data Analysis - Coding & Developing Themes

Recommended Quantitative Data Analysis books

importance of data analysis in research methodology

Recommended Qualitative Data Analysis books

importance of data analysis in research methodology

  • << Previous: Data collection techniques
  • Next: Statistical support >>
  • Last Updated: Apr 22, 2024 11:02 AM
  • URL: https://library.up.ac.za/c.php?g=485435

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HCA Healthc J Med
  • v.1(2); 2020
  • PMC10324782

Logo of hcahjm

Introduction to Research Statistical Analysis: An Overview of the Basics

Christian vandever.

1 HCA Healthcare Graduate Medical Education

Description

This article covers many statistical ideas essential to research statistical analysis. Sample size is explained through the concepts of statistical significance level and power. Variable types and definitions are included to clarify necessities for how the analysis will be interpreted. Categorical and quantitative variable types are defined, as well as response and predictor variables. Statistical tests described include t-tests, ANOVA and chi-square tests. Multiple regression is also explored for both logistic and linear regression. Finally, the most common statistics produced by these methods are explored.

Introduction

Statistical analysis is necessary for any research project seeking to make quantitative conclusions. The following is a primer for research-based statistical analysis. It is intended to be a high-level overview of appropriate statistical testing, while not diving too deep into any specific methodology. Some of the information is more applicable to retrospective projects, where analysis is performed on data that has already been collected, but most of it will be suitable to any type of research. This primer will help the reader understand research results in coordination with a statistician, not to perform the actual analysis. Analysis is commonly performed using statistical programming software such as R, SAS or SPSS. These allow for analysis to be replicated while minimizing the risk for an error. Resources are listed later for those working on analysis without a statistician.

After coming up with a hypothesis for a study, including any variables to be used, one of the first steps is to think about the patient population to apply the question. Results are only relevant to the population that the underlying data represents. Since it is impractical to include everyone with a certain condition, a subset of the population of interest should be taken. This subset should be large enough to have power, which means there is enough data to deliver significant results and accurately reflect the study’s population.

The first statistics of interest are related to significance level and power, alpha and beta. Alpha (α) is the significance level and probability of a type I error, the rejection of the null hypothesis when it is true. The null hypothesis is generally that there is no difference between the groups compared. A type I error is also known as a false positive. An example would be an analysis that finds one medication statistically better than another, when in reality there is no difference in efficacy between the two. Beta (β) is the probability of a type II error, the failure to reject the null hypothesis when it is actually false. A type II error is also known as a false negative. This occurs when the analysis finds there is no difference in two medications when in reality one works better than the other. Power is defined as 1-β and should be calculated prior to running any sort of statistical testing. Ideally, alpha should be as small as possible while power should be as large as possible. Power generally increases with a larger sample size, but so does cost and the effect of any bias in the study design. Additionally, as the sample size gets bigger, the chance for a statistically significant result goes up even though these results can be small differences that do not matter practically. Power calculators include the magnitude of the effect in order to combat the potential for exaggeration and only give significant results that have an actual impact. The calculators take inputs like the mean, effect size and desired power, and output the required minimum sample size for analysis. Effect size is calculated using statistical information on the variables of interest. If that information is not available, most tests have commonly used values for small, medium or large effect sizes.

When the desired patient population is decided, the next step is to define the variables previously chosen to be included. Variables come in different types that determine which statistical methods are appropriate and useful. One way variables can be split is into categorical and quantitative variables. ( Table 1 ) Categorical variables place patients into groups, such as gender, race and smoking status. Quantitative variables measure or count some quantity of interest. Common quantitative variables in research include age and weight. An important note is that there can often be a choice for whether to treat a variable as quantitative or categorical. For example, in a study looking at body mass index (BMI), BMI could be defined as a quantitative variable or as a categorical variable, with each patient’s BMI listed as a category (underweight, normal, overweight, and obese) rather than the discrete value. The decision whether a variable is quantitative or categorical will affect what conclusions can be made when interpreting results from statistical tests. Keep in mind that since quantitative variables are treated on a continuous scale it would be inappropriate to transform a variable like which medication was given into a quantitative variable with values 1, 2 and 3.

Categorical vs. Quantitative Variables

Both of these types of variables can also be split into response and predictor variables. ( Table 2 ) Predictor variables are explanatory, or independent, variables that help explain changes in a response variable. Conversely, response variables are outcome, or dependent, variables whose changes can be partially explained by the predictor variables.

Response vs. Predictor Variables

Choosing the correct statistical test depends on the types of variables defined and the question being answered. The appropriate test is determined by the variables being compared. Some common statistical tests include t-tests, ANOVA and chi-square tests.

T-tests compare whether there are differences in a quantitative variable between two values of a categorical variable. For example, a t-test could be useful to compare the length of stay for knee replacement surgery patients between those that took apixaban and those that took rivaroxaban. A t-test could examine whether there is a statistically significant difference in the length of stay between the two groups. The t-test will output a p-value, a number between zero and one, which represents the probability that the two groups could be as different as they are in the data, if they were actually the same. A value closer to zero suggests that the difference, in this case for length of stay, is more statistically significant than a number closer to one. Prior to collecting the data, set a significance level, the previously defined alpha. Alpha is typically set at 0.05, but is commonly reduced in order to limit the chance of a type I error, or false positive. Going back to the example above, if alpha is set at 0.05 and the analysis gives a p-value of 0.039, then a statistically significant difference in length of stay is observed between apixaban and rivaroxaban patients. If the analysis gives a p-value of 0.91, then there was no statistical evidence of a difference in length of stay between the two medications. Other statistical summaries or methods examine how big of a difference that might be. These other summaries are known as post-hoc analysis since they are performed after the original test to provide additional context to the results.

Analysis of variance, or ANOVA, tests can observe mean differences in a quantitative variable between values of a categorical variable, typically with three or more values to distinguish from a t-test. ANOVA could add patients given dabigatran to the previous population and evaluate whether the length of stay was significantly different across the three medications. If the p-value is lower than the designated significance level then the hypothesis that length of stay was the same across the three medications is rejected. Summaries and post-hoc tests also could be performed to look at the differences between length of stay and which individual medications may have observed statistically significant differences in length of stay from the other medications. A chi-square test examines the association between two categorical variables. An example would be to consider whether the rate of having a post-operative bleed is the same across patients provided with apixaban, rivaroxaban and dabigatran. A chi-square test can compute a p-value determining whether the bleeding rates were significantly different or not. Post-hoc tests could then give the bleeding rate for each medication, as well as a breakdown as to which specific medications may have a significantly different bleeding rate from each other.

A slightly more advanced way of examining a question can come through multiple regression. Regression allows more predictor variables to be analyzed and can act as a control when looking at associations between variables. Common control variables are age, sex and any comorbidities likely to affect the outcome variable that are not closely related to the other explanatory variables. Control variables can be especially important in reducing the effect of bias in a retrospective population. Since retrospective data was not built with the research question in mind, it is important to eliminate threats to the validity of the analysis. Testing that controls for confounding variables, such as regression, is often more valuable with retrospective data because it can ease these concerns. The two main types of regression are linear and logistic. Linear regression is used to predict differences in a quantitative, continuous response variable, such as length of stay. Logistic regression predicts differences in a dichotomous, categorical response variable, such as 90-day readmission. So whether the outcome variable is categorical or quantitative, regression can be appropriate. An example for each of these types could be found in two similar cases. For both examples define the predictor variables as age, gender and anticoagulant usage. In the first, use the predictor variables in a linear regression to evaluate their individual effects on length of stay, a quantitative variable. For the second, use the same predictor variables in a logistic regression to evaluate their individual effects on whether the patient had a 90-day readmission, a dichotomous categorical variable. Analysis can compute a p-value for each included predictor variable to determine whether they are significantly associated. The statistical tests in this article generate an associated test statistic which determines the probability the results could be acquired given that there is no association between the compared variables. These results often come with coefficients which can give the degree of the association and the degree to which one variable changes with another. Most tests, including all listed in this article, also have confidence intervals, which give a range for the correlation with a specified level of confidence. Even if these tests do not give statistically significant results, the results are still important. Not reporting statistically insignificant findings creates a bias in research. Ideas can be repeated enough times that eventually statistically significant results are reached, even though there is no true significance. In some cases with very large sample sizes, p-values will almost always be significant. In this case the effect size is critical as even the smallest, meaningless differences can be found to be statistically significant.

These variables and tests are just some things to keep in mind before, during and after the analysis process in order to make sure that the statistical reports are supporting the questions being answered. The patient population, types of variables and statistical tests are all important things to consider in the process of statistical analysis. Any results are only as useful as the process used to obtain them. This primer can be used as a reference to help ensure appropriate statistical analysis.

Funding Statement

This research was supported (in whole or in part) by HCA Healthcare and/or an HCA Healthcare affiliated entity.

Conflicts of Interest

The author declares he has no conflicts of interest.

Christian Vandever is an employee of HCA Healthcare Graduate Medical Education, an organization affiliated with the journal’s publisher.

This research was supported (in whole or in part) by HCA Healthcare and/or an HCA Healthcare affiliated entity. The views expressed in this publication represent those of the author(s) and do not necessarily represent the official views of HCA Healthcare or any of its affiliated entities.

Analyst Answers

Data & Finance for Work & Life

data analysis types, methods, and techniques tree diagram

Data Analysis: Types, Methods & Techniques (a Complete List)

( Updated Version )

While the term sounds intimidating, “data analysis” is nothing more than making sense of information in a table. It consists of filtering, sorting, grouping, and manipulating data tables with basic algebra and statistics.

In fact, you don’t need experience to understand the basics. You have already worked with data extensively in your life, and “analysis” is nothing more than a fancy word for good sense and basic logic.

Over time, people have intuitively categorized the best logical practices for treating data. These categories are what we call today types , methods , and techniques .

This article provides a comprehensive list of types, methods, and techniques, and explains the difference between them.

For a practical intro to data analysis (including types, methods, & techniques), check out our Intro to Data Analysis eBook for free.

Descriptive, Diagnostic, Predictive, & Prescriptive Analysis

If you Google “types of data analysis,” the first few results will explore descriptive , diagnostic , predictive , and prescriptive analysis. Why? Because these names are easy to understand and are used a lot in “the real world.”

Descriptive analysis is an informational method, diagnostic analysis explains “why” a phenomenon occurs, predictive analysis seeks to forecast the result of an action, and prescriptive analysis identifies solutions to a specific problem.

That said, these are only four branches of a larger analytical tree.

Good data analysts know how to position these four types within other analytical methods and tactics, allowing them to leverage strengths and weaknesses in each to uproot the most valuable insights.

Let’s explore the full analytical tree to understand how to appropriately assess and apply these four traditional types.

Tree diagram of Data Analysis Types, Methods, and Techniques

Here’s a picture to visualize the structure and hierarchy of data analysis types, methods, and techniques.

If it’s too small you can view the picture in a new tab . Open it to follow along!

importance of data analysis in research methodology

Note: basic descriptive statistics such as mean , median , and mode , as well as standard deviation , are not shown because most people are already familiar with them. In the diagram, they would fall under the “descriptive” analysis type.

Tree Diagram Explained

The highest-level classification of data analysis is quantitative vs qualitative . Quantitative implies numbers while qualitative implies information other than numbers.

Quantitative data analysis then splits into mathematical analysis and artificial intelligence (AI) analysis . Mathematical types then branch into descriptive , diagnostic , predictive , and prescriptive .

Methods falling under mathematical analysis include clustering , classification , forecasting , and optimization . Qualitative data analysis methods include content analysis , narrative analysis , discourse analysis , framework analysis , and/or grounded theory .

Moreover, mathematical techniques include regression , Nïave Bayes , Simple Exponential Smoothing , cohorts , factors , linear discriminants , and more, whereas techniques falling under the AI type include artificial neural networks , decision trees , evolutionary programming , and fuzzy logic . Techniques under qualitative analysis include text analysis , coding , idea pattern analysis , and word frequency .

It’s a lot to remember! Don’t worry, once you understand the relationship and motive behind all these terms, it’ll be like riding a bike.

We’ll move down the list from top to bottom and I encourage you to open the tree diagram above in a new tab so you can follow along .

But first, let’s just address the elephant in the room: what’s the difference between methods and techniques anyway?

Difference between methods and techniques

Though often used interchangeably, methods ands techniques are not the same. By definition, methods are the process by which techniques are applied, and techniques are the practical application of those methods.

For example, consider driving. Methods include staying in your lane, stopping at a red light, and parking in a spot. Techniques include turning the steering wheel, braking, and pushing the gas pedal.

Data sets: observations and fields

It’s important to understand the basic structure of data tables to comprehend the rest of the article. A data set consists of one far-left column containing observations, then a series of columns containing the fields (aka “traits” or “characteristics”) that describe each observations. For example, imagine we want a data table for fruit. It might look like this:

Now let’s turn to types, methods, and techniques. Each heading below consists of a description, relative importance, the nature of data it explores, and the motivation for using it.

Quantitative Analysis

  • It accounts for more than 50% of all data analysis and is by far the most widespread and well-known type of data analysis.
  • As you have seen, it holds descriptive, diagnostic, predictive, and prescriptive methods, which in turn hold some of the most important techniques available today, such as clustering and forecasting.
  • It can be broken down into mathematical and AI analysis.
  • Importance : Very high . Quantitative analysis is a must for anyone interesting in becoming or improving as a data analyst.
  • Nature of Data: data treated under quantitative analysis is, quite simply, quantitative. It encompasses all numeric data.
  • Motive: to extract insights. (Note: we’re at the top of the pyramid, this gets more insightful as we move down.)

Qualitative Analysis

  • It accounts for less than 30% of all data analysis and is common in social sciences .
  • It can refer to the simple recognition of qualitative elements, which is not analytic in any way, but most often refers to methods that assign numeric values to non-numeric data for analysis.
  • Because of this, some argue that it’s ultimately a quantitative type.
  • Importance: Medium. In general, knowing qualitative data analysis is not common or even necessary for corporate roles. However, for researchers working in social sciences, its importance is very high .
  • Nature of Data: data treated under qualitative analysis is non-numeric. However, as part of the analysis, analysts turn non-numeric data into numbers, at which point many argue it is no longer qualitative analysis.
  • Motive: to extract insights. (This will be more important as we move down the pyramid.)

Mathematical Analysis

  • Description: mathematical data analysis is a subtype of qualitative data analysis that designates methods and techniques based on statistics, algebra, and logical reasoning to extract insights. It stands in opposition to artificial intelligence analysis.
  • Importance: Very High. The most widespread methods and techniques fall under mathematical analysis. In fact, it’s so common that many people use “quantitative” and “mathematical” analysis interchangeably.
  • Nature of Data: numeric. By definition, all data under mathematical analysis are numbers.
  • Motive: to extract measurable insights that can be used to act upon.

Artificial Intelligence & Machine Learning Analysis

  • Description: artificial intelligence and machine learning analyses designate techniques based on the titular skills. They are not traditionally mathematical, but they are quantitative since they use numbers. Applications of AI & ML analysis techniques are developing, but they’re not yet mainstream enough to show promise across the field.
  • Importance: Medium . As of today (September 2020), you don’t need to be fluent in AI & ML data analysis to be a great analyst. BUT, if it’s a field that interests you, learn it. Many believe that in 10 year’s time its importance will be very high .
  • Nature of Data: numeric.
  • Motive: to create calculations that build on themselves in order and extract insights without direct input from a human.

Descriptive Analysis

  • Description: descriptive analysis is a subtype of mathematical data analysis that uses methods and techniques to provide information about the size, dispersion, groupings, and behavior of data sets. This may sounds complicated, but just think about mean, median, and mode: all three are types of descriptive analysis. They provide information about the data set. We’ll look at specific techniques below.
  • Importance: Very high. Descriptive analysis is among the most commonly used data analyses in both corporations and research today.
  • Nature of Data: the nature of data under descriptive statistics is sets. A set is simply a collection of numbers that behaves in predictable ways. Data reflects real life, and there are patterns everywhere to be found. Descriptive analysis describes those patterns.
  • Motive: the motive behind descriptive analysis is to understand how numbers in a set group together, how far apart they are from each other, and how often they occur. As with most statistical analysis, the more data points there are, the easier it is to describe the set.

Diagnostic Analysis

  • Description: diagnostic analysis answers the question “why did it happen?” It is an advanced type of mathematical data analysis that manipulates multiple techniques, but does not own any single one. Analysts engage in diagnostic analysis when they try to explain why.
  • Importance: Very high. Diagnostics are probably the most important type of data analysis for people who don’t do analysis because they’re valuable to anyone who’s curious. They’re most common in corporations, as managers often only want to know the “why.”
  • Nature of Data : data under diagnostic analysis are data sets. These sets in themselves are not enough under diagnostic analysis. Instead, the analyst must know what’s behind the numbers in order to explain “why.” That’s what makes diagnostics so challenging yet so valuable.
  • Motive: the motive behind diagnostics is to diagnose — to understand why.

Predictive Analysis

  • Description: predictive analysis uses past data to project future data. It’s very often one of the first kinds of analysis new researchers and corporate analysts use because it is intuitive. It is a subtype of the mathematical type of data analysis, and its three notable techniques are regression, moving average, and exponential smoothing.
  • Importance: Very high. Predictive analysis is critical for any data analyst working in a corporate environment. Companies always want to know what the future will hold — especially for their revenue.
  • Nature of Data: Because past and future imply time, predictive data always includes an element of time. Whether it’s minutes, hours, days, months, or years, we call this time series data . In fact, this data is so important that I’ll mention it twice so you don’t forget: predictive analysis uses time series data .
  • Motive: the motive for investigating time series data with predictive analysis is to predict the future in the most analytical way possible.

Prescriptive Analysis

  • Description: prescriptive analysis is a subtype of mathematical analysis that answers the question “what will happen if we do X?” It’s largely underestimated in the data analysis world because it requires diagnostic and descriptive analyses to be done before it even starts. More than simple predictive analysis, prescriptive analysis builds entire data models to show how a simple change could impact the ensemble.
  • Importance: High. Prescriptive analysis is most common under the finance function in many companies. Financial analysts use it to build a financial model of the financial statements that show how that data will change given alternative inputs.
  • Nature of Data: the nature of data in prescriptive analysis is data sets. These data sets contain patterns that respond differently to various inputs. Data that is useful for prescriptive analysis contains correlations between different variables. It’s through these correlations that we establish patterns and prescribe action on this basis. This analysis cannot be performed on data that exists in a vacuum — it must be viewed on the backdrop of the tangibles behind it.
  • Motive: the motive for prescriptive analysis is to establish, with an acceptable degree of certainty, what results we can expect given a certain action. As you might expect, this necessitates that the analyst or researcher be aware of the world behind the data, not just the data itself.

Clustering Method

  • Description: the clustering method groups data points together based on their relativeness closeness to further explore and treat them based on these groupings. There are two ways to group clusters: intuitively and statistically (or K-means).
  • Importance: Very high. Though most corporate roles group clusters intuitively based on management criteria, a solid understanding of how to group them mathematically is an excellent descriptive and diagnostic approach to allow for prescriptive analysis thereafter.
  • Nature of Data : the nature of data useful for clustering is sets with 1 or more data fields. While most people are used to looking at only two dimensions (x and y), clustering becomes more accurate the more fields there are.
  • Motive: the motive for clustering is to understand how data sets group and to explore them further based on those groups.
  • Here’s an example set:

importance of data analysis in research methodology

Classification Method

  • Description: the classification method aims to separate and group data points based on common characteristics . This can be done intuitively or statistically.
  • Importance: High. While simple on the surface, classification can become quite complex. It’s very valuable in corporate and research environments, but can feel like its not worth the work. A good analyst can execute it quickly to deliver results.
  • Nature of Data: the nature of data useful for classification is data sets. As we will see, it can be used on qualitative data as well as quantitative. This method requires knowledge of the substance behind the data, not just the numbers themselves.
  • Motive: the motive for classification is group data not based on mathematical relationships (which would be clustering), but by predetermined outputs. This is why it’s less useful for diagnostic analysis, and more useful for prescriptive analysis.

Forecasting Method

  • Description: the forecasting method uses time past series data to forecast the future.
  • Importance: Very high. Forecasting falls under predictive analysis and is arguably the most common and most important method in the corporate world. It is less useful in research, which prefers to understand the known rather than speculate about the future.
  • Nature of Data: data useful for forecasting is time series data, which, as we’ve noted, always includes a variable of time.
  • Motive: the motive for the forecasting method is the same as that of prescriptive analysis: the confidently estimate future values.

Optimization Method

  • Description: the optimization method maximized or minimizes values in a set given a set of criteria. It is arguably most common in prescriptive analysis. In mathematical terms, it is maximizing or minimizing a function given certain constraints.
  • Importance: Very high. The idea of optimization applies to more analysis types than any other method. In fact, some argue that it is the fundamental driver behind data analysis. You would use it everywhere in research and in a corporation.
  • Nature of Data: the nature of optimizable data is a data set of at least two points.
  • Motive: the motive behind optimization is to achieve the best result possible given certain conditions.

Content Analysis Method

  • Description: content analysis is a method of qualitative analysis that quantifies textual data to track themes across a document. It’s most common in academic fields and in social sciences, where written content is the subject of inquiry.
  • Importance: High. In a corporate setting, content analysis as such is less common. If anything Nïave Bayes (a technique we’ll look at below) is the closest corporations come to text. However, it is of the utmost importance for researchers. If you’re a researcher, check out this article on content analysis .
  • Nature of Data: data useful for content analysis is textual data.
  • Motive: the motive behind content analysis is to understand themes expressed in a large text

Narrative Analysis Method

  • Description: narrative analysis is a method of qualitative analysis that quantifies stories to trace themes in them. It’s differs from content analysis because it focuses on stories rather than research documents, and the techniques used are slightly different from those in content analysis (very nuances and outside the scope of this article).
  • Importance: Low. Unless you are highly specialized in working with stories, narrative analysis rare.
  • Nature of Data: the nature of the data useful for the narrative analysis method is narrative text.
  • Motive: the motive for narrative analysis is to uncover hidden patterns in narrative text.

Discourse Analysis Method

  • Description: the discourse analysis method falls under qualitative analysis and uses thematic coding to trace patterns in real-life discourse. That said, real-life discourse is oral, so it must first be transcribed into text.
  • Importance: Low. Unless you are focused on understand real-world idea sharing in a research setting, this kind of analysis is less common than the others on this list.
  • Nature of Data: the nature of data useful in discourse analysis is first audio files, then transcriptions of those audio files.
  • Motive: the motive behind discourse analysis is to trace patterns of real-world discussions. (As a spooky sidenote, have you ever felt like your phone microphone was listening to you and making reading suggestions? If it was, the method was discourse analysis.)

Framework Analysis Method

  • Description: the framework analysis method falls under qualitative analysis and uses similar thematic coding techniques to content analysis. However, where content analysis aims to discover themes, framework analysis starts with a framework and only considers elements that fall in its purview.
  • Importance: Low. As with the other textual analysis methods, framework analysis is less common in corporate settings. Even in the world of research, only some use it. Strangely, it’s very common for legislative and political research.
  • Nature of Data: the nature of data useful for framework analysis is textual.
  • Motive: the motive behind framework analysis is to understand what themes and parts of a text match your search criteria.

Grounded Theory Method

  • Description: the grounded theory method falls under qualitative analysis and uses thematic coding to build theories around those themes.
  • Importance: Low. Like other qualitative analysis techniques, grounded theory is less common in the corporate world. Even among researchers, you would be hard pressed to find many using it. Though powerful, it’s simply too rare to spend time learning.
  • Nature of Data: the nature of data useful in the grounded theory method is textual.
  • Motive: the motive of grounded theory method is to establish a series of theories based on themes uncovered from a text.

Clustering Technique: K-Means

  • Description: k-means is a clustering technique in which data points are grouped in clusters that have the closest means. Though not considered AI or ML, it inherently requires the use of supervised learning to reevaluate clusters as data points are added. Clustering techniques can be used in diagnostic, descriptive, & prescriptive data analyses.
  • Importance: Very important. If you only take 3 things from this article, k-means clustering should be part of it. It is useful in any situation where n observations have multiple characteristics and we want to put them in groups.
  • Nature of Data: the nature of data is at least one characteristic per observation, but the more the merrier.
  • Motive: the motive for clustering techniques such as k-means is to group observations together and either understand or react to them.

Regression Technique

  • Description: simple and multivariable regressions use either one independent variable or combination of multiple independent variables to calculate a correlation to a single dependent variable using constants. Regressions are almost synonymous with correlation today.
  • Importance: Very high. Along with clustering, if you only take 3 things from this article, regression techniques should be part of it. They’re everywhere in corporate and research fields alike.
  • Nature of Data: the nature of data used is regressions is data sets with “n” number of observations and as many variables as are reasonable. It’s important, however, to distinguish between time series data and regression data. You cannot use regressions or time series data without accounting for time. The easier way is to use techniques under the forecasting method.
  • Motive: The motive behind regression techniques is to understand correlations between independent variable(s) and a dependent one.

Nïave Bayes Technique

  • Description: Nïave Bayes is a classification technique that uses simple probability to classify items based previous classifications. In plain English, the formula would be “the chance that thing with trait x belongs to class c depends on (=) the overall chance of trait x belonging to class c, multiplied by the overall chance of class c, divided by the overall chance of getting trait x.” As a formula, it’s P(c|x) = P(x|c) * P(c) / P(x).
  • Importance: High. Nïave Bayes is a very common, simplistic classification techniques because it’s effective with large data sets and it can be applied to any instant in which there is a class. Google, for example, might use it to group webpages into groups for certain search engine queries.
  • Nature of Data: the nature of data for Nïave Bayes is at least one class and at least two traits in a data set.
  • Motive: the motive behind Nïave Bayes is to classify observations based on previous data. It’s thus considered part of predictive analysis.

Cohorts Technique

  • Description: cohorts technique is a type of clustering method used in behavioral sciences to separate users by common traits. As with clustering, it can be done intuitively or mathematically, the latter of which would simply be k-means.
  • Importance: Very high. With regard to resembles k-means, the cohort technique is more of a high-level counterpart. In fact, most people are familiar with it as a part of Google Analytics. It’s most common in marketing departments in corporations, rather than in research.
  • Nature of Data: the nature of cohort data is data sets in which users are the observation and other fields are used as defining traits for each cohort.
  • Motive: the motive for cohort analysis techniques is to group similar users and analyze how you retain them and how the churn.

Factor Technique

  • Description: the factor analysis technique is a way of grouping many traits into a single factor to expedite analysis. For example, factors can be used as traits for Nïave Bayes classifications instead of more general fields.
  • Importance: High. While not commonly employed in corporations, factor analysis is hugely valuable. Good data analysts use it to simplify their projects and communicate them more clearly.
  • Nature of Data: the nature of data useful in factor analysis techniques is data sets with a large number of fields on its observations.
  • Motive: the motive for using factor analysis techniques is to reduce the number of fields in order to more quickly analyze and communicate findings.

Linear Discriminants Technique

  • Description: linear discriminant analysis techniques are similar to regressions in that they use one or more independent variable to determine a dependent variable; however, the linear discriminant technique falls under a classifier method since it uses traits as independent variables and class as a dependent variable. In this way, it becomes a classifying method AND a predictive method.
  • Importance: High. Though the analyst world speaks of and uses linear discriminants less commonly, it’s a highly valuable technique to keep in mind as you progress in data analysis.
  • Nature of Data: the nature of data useful for the linear discriminant technique is data sets with many fields.
  • Motive: the motive for using linear discriminants is to classify observations that would be otherwise too complex for simple techniques like Nïave Bayes.

Exponential Smoothing Technique

  • Description: exponential smoothing is a technique falling under the forecasting method that uses a smoothing factor on prior data in order to predict future values. It can be linear or adjusted for seasonality. The basic principle behind exponential smoothing is to use a percent weight (value between 0 and 1 called alpha) on more recent values in a series and a smaller percent weight on less recent values. The formula is f(x) = current period value * alpha + previous period value * 1-alpha.
  • Importance: High. Most analysts still use the moving average technique (covered next) for forecasting, though it is less efficient than exponential moving, because it’s easy to understand. However, good analysts will have exponential smoothing techniques in their pocket to increase the value of their forecasts.
  • Nature of Data: the nature of data useful for exponential smoothing is time series data . Time series data has time as part of its fields .
  • Motive: the motive for exponential smoothing is to forecast future values with a smoothing variable.

Moving Average Technique

  • Description: the moving average technique falls under the forecasting method and uses an average of recent values to predict future ones. For example, to predict rainfall in April, you would take the average of rainfall from January to March. It’s simple, yet highly effective.
  • Importance: Very high. While I’m personally not a huge fan of moving averages due to their simplistic nature and lack of consideration for seasonality, they’re the most common forecasting technique and therefore very important.
  • Nature of Data: the nature of data useful for moving averages is time series data .
  • Motive: the motive for moving averages is to predict future values is a simple, easy-to-communicate way.

Neural Networks Technique

  • Description: neural networks are a highly complex artificial intelligence technique that replicate a human’s neural analysis through a series of hyper-rapid computations and comparisons that evolve in real time. This technique is so complex that an analyst must use computer programs to perform it.
  • Importance: Medium. While the potential for neural networks is theoretically unlimited, it’s still little understood and therefore uncommon. You do not need to know it by any means in order to be a data analyst.
  • Nature of Data: the nature of data useful for neural networks is data sets of astronomical size, meaning with 100s of 1000s of fields and the same number of row at a minimum .
  • Motive: the motive for neural networks is to understand wildly complex phenomenon and data to thereafter act on it.

Decision Tree Technique

  • Description: the decision tree technique uses artificial intelligence algorithms to rapidly calculate possible decision pathways and their outcomes on a real-time basis. It’s so complex that computer programs are needed to perform it.
  • Importance: Medium. As with neural networks, decision trees with AI are too little understood and are therefore uncommon in corporate and research settings alike.
  • Nature of Data: the nature of data useful for the decision tree technique is hierarchical data sets that show multiple optional fields for each preceding field.
  • Motive: the motive for decision tree techniques is to compute the optimal choices to make in order to achieve a desired result.

Evolutionary Programming Technique

  • Description: the evolutionary programming technique uses a series of neural networks, sees how well each one fits a desired outcome, and selects only the best to test and retest. It’s called evolutionary because is resembles the process of natural selection by weeding out weaker options.
  • Importance: Medium. As with the other AI techniques, evolutionary programming just isn’t well-understood enough to be usable in many cases. It’s complexity also makes it hard to explain in corporate settings and difficult to defend in research settings.
  • Nature of Data: the nature of data in evolutionary programming is data sets of neural networks, or data sets of data sets.
  • Motive: the motive for using evolutionary programming is similar to decision trees: understanding the best possible option from complex data.
  • Video example :

Fuzzy Logic Technique

  • Description: fuzzy logic is a type of computing based on “approximate truths” rather than simple truths such as “true” and “false.” It is essentially two tiers of classification. For example, to say whether “Apples are good,” you need to first classify that “Good is x, y, z.” Only then can you say apples are good. Another way to see it helping a computer see truth like humans do: “definitely true, probably true, maybe true, probably false, definitely false.”
  • Importance: Medium. Like the other AI techniques, fuzzy logic is uncommon in both research and corporate settings, which means it’s less important in today’s world.
  • Nature of Data: the nature of fuzzy logic data is huge data tables that include other huge data tables with a hierarchy including multiple subfields for each preceding field.
  • Motive: the motive of fuzzy logic to replicate human truth valuations in a computer is to model human decisions based on past data. The obvious possible application is marketing.

Text Analysis Technique

  • Description: text analysis techniques fall under the qualitative data analysis type and use text to extract insights.
  • Importance: Medium. Text analysis techniques, like all the qualitative analysis type, are most valuable for researchers.
  • Nature of Data: the nature of data useful in text analysis is words.
  • Motive: the motive for text analysis is to trace themes in a text across sets of very long documents, such as books.

Coding Technique

  • Description: the coding technique is used in textual analysis to turn ideas into uniform phrases and analyze the number of times and the ways in which those ideas appear. For this reason, some consider it a quantitative technique as well. You can learn more about coding and the other qualitative techniques here .
  • Importance: Very high. If you’re a researcher working in social sciences, coding is THE analysis techniques, and for good reason. It’s a great way to add rigor to analysis. That said, it’s less common in corporate settings.
  • Nature of Data: the nature of data useful for coding is long text documents.
  • Motive: the motive for coding is to make tracing ideas on paper more than an exercise of the mind by quantifying it and understanding is through descriptive methods.

Idea Pattern Technique

  • Description: the idea pattern analysis technique fits into coding as the second step of the process. Once themes and ideas are coded, simple descriptive analysis tests may be run. Some people even cluster the ideas!
  • Importance: Very high. If you’re a researcher, idea pattern analysis is as important as the coding itself.
  • Nature of Data: the nature of data useful for idea pattern analysis is already coded themes.
  • Motive: the motive for the idea pattern technique is to trace ideas in otherwise unmanageably-large documents.

Word Frequency Technique

  • Description: word frequency is a qualitative technique that stands in opposition to coding and uses an inductive approach to locate specific words in a document in order to understand its relevance. Word frequency is essentially the descriptive analysis of qualitative data because it uses stats like mean, median, and mode to gather insights.
  • Importance: High. As with the other qualitative approaches, word frequency is very important in social science research, but less so in corporate settings.
  • Nature of Data: the nature of data useful for word frequency is long, informative documents.
  • Motive: the motive for word frequency is to locate target words to determine the relevance of a document in question.

Types of data analysis in research

Types of data analysis in research methodology include every item discussed in this article. As a list, they are:

  • Quantitative
  • Qualitative
  • Mathematical
  • Machine Learning and AI
  • Descriptive
  • Prescriptive
  • Classification
  • Forecasting
  • Optimization
  • Grounded theory
  • Artificial Neural Networks
  • Decision Trees
  • Evolutionary Programming
  • Fuzzy Logic
  • Text analysis
  • Idea Pattern Analysis
  • Word Frequency Analysis
  • Nïave Bayes
  • Exponential smoothing
  • Moving average
  • Linear discriminant

Types of data analysis in qualitative research

As a list, the types of data analysis in qualitative research are the following methods:

Types of data analysis in quantitative research

As a list, the types of data analysis in quantitative research are:

Data analysis methods

As a list, data analysis methods are:

  • Content (qualitative)
  • Narrative (qualitative)
  • Discourse (qualitative)
  • Framework (qualitative)
  • Grounded theory (qualitative)

Quantitative data analysis methods

As a list, quantitative data analysis methods are:

Tabular View of Data Analysis Types, Methods, and Techniques

About the author.

Noah is the founder & Editor-in-Chief at AnalystAnswers. He is a transatlantic professional and entrepreneur with 5+ years of corporate finance and data analytics experience, as well as 3+ years in consumer financial products and business software. He started AnalystAnswers to provide aspiring professionals with accessible explanations of otherwise dense finance and data concepts. Noah believes everyone can benefit from an analytical mindset in growing digital world. When he's not busy at work, Noah likes to explore new European cities, exercise, and spend time with friends and family.

File available immediately.

importance of data analysis in research methodology

Notice: JavaScript is required for this content.

importance of data analysis in research methodology

MBA Notes

Analysis of Data: Techniques and Importance in Research

Table of Contents

Data analysis is an essential component of research, providing meaningful insights and information. It is the process of examining data to extract meaningful information and insights. It is a crucial step in research, enabling researchers to draw conclusions and make informed decisions. In this blog, we will discuss the various techniques used for data analysis and their significance in research.

Importance of Data Analysis

Data analysis is critical in research as it helps to identify patterns, relationships, and correlations between variables. By analyzing data, researchers can draw inferences, make predictions, and identify trends. The insights derived from data analysis help to inform decision-making, assess the impact of interventions, and evaluate the effectiveness of programs.

Techniques for Data Analysis

The techniques used for data analysis can broadly be divided into two categories – descriptive and inferential.

Descriptive Techniques

Descriptive techniques are used to summarize and describe the characteristics of the data. These techniques include:

  • Measures of central tendency: mean, median, and mode
  • Measures of dispersion: range, standard deviation, and variance
  • Frequency distributions: histograms, frequency polygons, and bar graphs

Descriptive techniques are used to provide an overview of the data, enabling researchers to identify patterns and trends.

Inferential Techniques

Inferential techniques are used to make inferences about the population based on the data collected from a sample. These techniques include:

  • Hypothesis testing: t-tests, ANOVA, and chi-square tests
  • Correlation analysis: Pearson correlation and Spearman correlation
  • Regression analysis: linear regression and logistic regression

Inferential techniques are used to draw conclusions about the population, based on the data collected from a sample.

Data Analysis Process

The data analysis process typically involves the following steps:

  • Data cleaning: removing outliers, missing values, and inconsistencies in the data.
  • Data exploration: examining the data to identify patterns and trends.
  • Data preparation: transforming the data to make it suitable for analysis.
  • Data analysis: using the appropriate techniques to analyze the data.
  • Data interpretation: interpreting the results of the analysis.

Data analysis is a crucial step in research, enabling researchers to draw conclusions and make informed decisions. The techniques used for data analysis can broadly be divided into descriptive and inferential techniques. The insights derived from data analysis help to inform decision-making, assess the impact of interventions, and evaluate the effectiveness of programs.

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

We are sorry that this post was not useful for you! 😔

Let us improve this post!

Tell us how we can improve this post?

Research Methodology for Management Decisions

1 Research Methodology: An Overview

  • Meaning of Research
  • Research Methodology
  • Research Method
  • Business Research Method
  • Types of Research
  • Importance of business research
  • Role of research in important areas

2 Steps for Research Process

  • Research process
  • Define research problems
  • Research Problem as Hypothesis Testing
  • Extensive literature review in research
  • Development of working hypothesis
  • Preparing the research design
  • Collecting the data
  • Analysis of data
  • Preparation of the report or the thesis

3 Research Designs

  • Functions and Goals of Research Design
  • Characteristics of a Good Design
  • Different Types of Research Designs
  • Exploratory Research Design
  • Descriptive Research Design
  • Experimental Research Design
  • Types of Experimental Designs

4 Methods and Techniques of Data Collection

  • Primary and Secondary Data
  • Methods of Collecting Primary Data
  • Merits and Demerits of Different Methods of Collecting Primary Data
  • Designing a Questionnaire
  • Pretesting a Questionnaire
  • Editing of Primary Data
  • Technique of Interview
  • Collection of Secondary Data
  • Scrutiny of Secondary Data

5 Attitude Measurement and Scales

  • Attitudes, Attributes and Beliefs
  • Issues in Attitude Measurement
  • Scaling of Attitudes
  • Deterministic Attitude Measurement Models: The Guttman Scale
  • Thurstone’s Equal-Appearing Interval Scale
  • The Semantic Differential Scale
  • Summative Models: The Likert Scale
  • The Q-Sort Technique
  • Multidimensional Scaling
  • Selection of an Appropriate Attitude Measurement Scale
  • Limitations of Attitude Measurement Scales

6 Questionnaire Designing

  • Introductory decisions
  • Contents of the questionnaire
  • Format of the questionnaire
  • Steps involved in the questionnaire
  • Structure and Design of Questionnaire
  • Management of Fieldwork
  • Ambiguities in the Questionnaire Methods

7 Sampling and Sampling Design

  • Advantage of Sampling Over Census
  • Simple Random Sampling
  • Sampling Frame
  • Probabilistic As pects of Sampling
  • Stratified Random Sampling
  • Other Methods of Sampling
  • Sampling Design
  • Non-Probability Sampling Methods

8 Data Processing

  • Editing of Data
  • Coding of Data
  • Classification of Data
  • Statistical Series
  • Tables as Data Presentation Devices
  • Graphical Presentation of Data

9 Statistical Analysis and Interpretation of Data: Nonparametric Tests

  • One Sample Tests
  • Two Sample Tests
  • K Sample Tests

10 Multivariate Analysis of Data

  • Regression Analysis
  • Discriminant Analysis
  • Factor Analysis

11 Ethics in Research

  • Principles of research ethics
  • Advantages of research ethics
  • Limitations of the research ethics
  • Steps involved in ethics
  • What are research misconducts?

12 Substance of Reports

  • Research Proposal
  • Categories of Report
  • Reviewing the Draft

13 Formats of Reports

  • Parts of a Report
  • Cover and Title Page
  • Introductory Pages
  • Reference Section
  • Typing Instructions
  • Copy Reading
  • Proof Reading

14 Presentation of a Report

  • Communication Dimensions
  • Presentation Package
  • Audio-Visual Aids
  • Presenter’s Poise

PW Skills | Blog

Data Analysis: Importance, Types, Methods of Data Analytics

' src=

Methods of Data Analytics: Data isn't just information; it's the heartbeat of decision-making. The ability to harness and make sense of this vast sea of information has become paramount in a world driven by information, where data flows like a digital river.

importance of data analysis in research methodology

In our increasingly data-driven world, the ability to extract valuable insights from raw information has become a coveted skill! In this blog, we’ll talk about some effective methods of data analytics, data analytics processes, importance, etc. 

If you want to venture into the field of data analytics, PhysicsWallah’s Full-Stack Data Analytics course could help you a lot! Our comprehensive curriculum, taught by industry experts, will equip you with the knowledge and experience to handle any data analytics challenge.

Table of Contents

What Is Data Analysis?

Analysing data means checking, cleaning, changing, and modelling information to find valuable insights, make conclusions, and aid decision-making. It’s a systematic way of looking at and explaining data, helping organisations understand their operations, customer actions, and market patterns better.

Also Read: How to Become a Data Analyst in 2023

Purpose of Data Analysis

The main aim of data analysis is to get useful insights from basic data. Whether it’s structured or not, the objective is to expose patterns, connections, and trends that can guide important choices, boost efficiency, and lead to business triumph.

Role of Data Analysis in Extracting Meaningful Insights

Data analysis serves as the bridge between raw data and valuable insights. By applying statistical and mathematical techniques, analysts can transform complex datasets into understandable and actionable information. This process is crucial for organisations seeking a competitive edge in their respective industries.

Why is Data Analytics Important?

Decision-making and strategy.

Effective organisational choices hinge on smart decision-making. Data analysis empowers decision-makers with the info necessary for wise choices. Strategic planning, resource allocation, and risk management all gain from insights obtained through thorough data analysis.

Identifying Patterns and Trends

Data analysis enables the identification of patterns and trends within datasets that may not be immediately apparent. Whether it’s recognizing changing consumer preferences or anticipating market shifts, the ability to spot trends early on is a key advantage in today’s fast-paced business environment.

Driving Business Performance

In the competitive landscape of the business world, performance is paramount. Data analysis contributes to optimising business processes, improving efficiency, and fostering innovation. By leveraging data insights, organisations can streamline operations and enhance overall performance.

What Is the Data Analytics Process?

The data analysis process involves several key steps, each playing a crucial role in transforming raw data into actionable insights.

Data Collection

The journey of data analysis begins with the collection of relevant data. This may involve gathering information from various sources, including databases, surveys, and external datasets. The accuracy and completeness of the collected data set the foundation for meaningful analysis.

Data Cleaning and Preprocessing

Raw data is seldom flawless, often riddled with errors, inconsistencies, and missing values. Data cleaning, or scrubbing, identifies and corrects errors, boosting dataset quality. Preprocessing transforms raw data into an analysis-friendly format, addressing missing data and outliers.

Data Exploration

With cleaned and preprocessed data, analysts perform exploratory data analysis (EDA) for an initial dataset grasp. This involves generating summary stats, visualisations, and exploratory techniques to uncover patterns or anomalies.

Data Modeling

Data modelling involves the application of statistical and mathematical models to the dataset. Identifying links among variables, predicting outcomes, and categorising data are goals of this step. Techniques involve regression analysis, machine learning, and predictive modelling.

Data Visualization

Data visualisation is a potent means to present intricate information clearly. Visual elements like charts and graphs aid in conveying findings to both technical and non-technical audiences, enhancing comprehension of the derived insights from the data.

Interpretation and Communication of Results

Analysing data wraps up with interpreting results and sharing findings. It means turning technical analysis into practical insights for decision-making.

Types of Data Analysis

Descriptive analysis.

Descriptive analysis involves summarising and presenting key features of a dataset. This type of analysis provides a snapshot of the main characteristics, such as mean, median, and mode, allowing stakeholders to understand the central tendencies and variability within the data.

Diagnostic Analysis

Diagnostic analysis aims to uncover the root causes of specific events or trends within a dataset. Digging deeper into variable connections, it explains why specific outcomes happened. This analysis is vital for problem-solving and finding areas to enhance.

Predictive Analysis

Predictive analysis uses historical data and statistical algorithms to make predictions about future events. By identifying patterns and trends, analysts can build models that forecast potential outcomes. This type of analysis is valuable for businesses looking to anticipate market changes, customer behaviours, or financial trends.

Read more: Predictive Analysis: Predicting the Future with Data

Prescriptive Analysis

Prescriptive analysis goes beyond predicting future outcomes. It suggests actions to optimise results based on the predictions made by predictive models. This type of analysis provides actionable insights, guiding decision-makers on the most effective strategies to achieve desired outcomes.

Methods of Data Analytics

1. quantitative analysis.

Quantitative analysis deploys numerical data and mathematical models for pattern and relationship comprehension. Statistical methods, hypothesis testing, and regression analysis emerge as prevalent techniques. This approach is prevalent in fields such as finance, economics, and experimental sciences.

2. Qualitative Analysis

Quantitative analysis deploys numbers and maths models to grasp patterns and relationships. Stats techniques, hypothesis testing, and regression are typical tools. Techniques like content analysis, thematic analysis, and grounded theory are employed in qualitative analysis, making it essential in social sciences and humanities.

3. Mixed-Methods Analysis

Mixed-methods analysis combines both quantitative and qualitative approaches to gain a comprehensive understanding of a research question. This integrative approach allows researchers to triangulate findings and provides a more robust interpretation of complex phenomena.

4. Exploratory Data Analysis (EDA)

Exploratory Data Analysis involves visually and statistically exploring datasets to identify patterns, outliers, and trends. Techniques like histograms, scatter plots, and box plots are commonly used in EDA. This method is particularly useful in the initial stages of analysis to guide further investigation.

How to Analyse Data?

Choosing the right data analysis approach.

Selecting the appropriate data analysis approach depends on the nature of the data and the research question. Quantitative data may require statistical techniques, while qualitative data may involve coding and thematic analysis. A mixed-methods approach can provide a holistic perspective.

Selecting Appropriate Tools and Techniques

Choosing the right tools and techniques is crucial for accurate and efficient analysis. Statistical software such as R, Python, and SAS are popular for quantitative analysis, while qualitative analysis may involve tools like NVivo or ATLAS.ti. It’s essential to match the tools to the specific requirements of the analysis.

Ensuring Data Accuracy and Reliability

Data accuracy is paramount in analysis. Analysts must verify the reliability of the data source, address missing or inconsistent data, and ensure that the chosen analysis methods are appropriate for the dataset. Rigorous validation processes contribute to the credibility of the analysis.

Methods of Data Analytics in Research

Role of data analytics in research.

Data analytics is crucial in research, organising data systematically for analysis and interpretation. Researchers employ it to test hypotheses, find patterns, and make evidence-based conclusions, boosting the rigour and objectivity of studies.

Integrating Data Analysis into the Research Process

Effective integration of data analysis into the research process involves defining research questions, selecting appropriate data sources, and choosing relevant analysis methods. Researchers should align their data analysis plan with the overall research design to ensure a cohesive and comprehensive study.

Examples of Successful Data Analytics in Research Studies

Numerous research studies across disciplines showcase the power of data analytics. From epidemiology and social sciences to business and technology, data analytics has facilitated groundbreaking discoveries and insights. Case studies and examples demonstrate how data-driven approaches enhance the validity and reliability of research findings.

Top Data Analysis Tools

Several tools cater to the diverse needs of data analysts. Tools like Microsoft Excel, R, Python, SAS, and Tableau are widely used. Each has unique features, and understanding their strengths and limitations is key to choosing the right one for analysis.

Features and Capabilities of Each Tool

Microsoft Excel, user-friendly, is often used for basic analysis. R and Python, powerful programming languages, come with extensive libraries for statistical analysis and machine learning. SAS is renowned for its robust statistical procedures, while Tableau excels in data visualisation.

Choosing the Right Tool for Specific Analysis Needs

Tool choice hinges on analysis intricacy, dataset scale, and user expertise. Analysts weigh factors like data visualisation needs, statistical depth, and automation requirements to select the optimal tool for a specific task.

Read more: Essential Data Analytics Tools for Successful Analysis

Choose the Right Program

Considerations for selecting data analysis programs.

When considering data analysis programs, individuals should assess their specific needs, skill level, and the industry’s demands. Choosing the correct tool relies on analysis complexity, dataset scale, and user skill. Analysts weigh data visualisation needs, statistical depth, and automation necessity for optimal tool selection.

Comparison of Different Programs

When comparing data analysis programs, factors like usability, scalability, and community support come into play. Each program boasts unique strengths, and the decision frequently hinges on user preferences and analysis specifics. Valuable insights can be gleaned from online reviews, tutorials, and community forums.

Tips for Learning and Mastering Data Analysis Tools

Learning data analysis tools requires a combination of theoretical knowledge and hands-on practice. Online courses, tutorials, and interactive exercises can help individuals acquire the necessary skills. Mastering a tool involves continuous learning and staying updated with new features and functionalities.

How to Become a Data Analyst?

Educational background and skills required.

Acquiring data analysis prowess usually demands a solid grasp of maths, stats, and computer science. A bachelor’s in a pertinent domain is usually a baseline. Also, honing abilities in programming tongues like Python or R, mastering data visualisation, and handling databases are crucial for excelling in this realm.

Steps to Enter the Field of Data Analysis

Entering the realm of data analysis usually demands a solid grasp of maths, statistics, and computer science. Many times, a bachelor’s degree in a pertinent field is the baseline. Furthermore, mastering programming languages like Python or R, honing data visualisation skills, and handling databases becomes crucial for triumph.

Career Paths and Opportunities for Data Analysts

Diverse opportunities unfold for data analysts across sectors such as finance, healthcare, marketing, and technology. The specific role—be it business analyst, financial analyst, or data scientist—depends on personal proficiency and preferences. Progressing in this domain hinges on ongoing professional growth and keeping pace with industry shifts.

Types of Data Analysis Methods in Research

Quantitative research methods.

Quantitative research methods involve the collection and analysis of numerical data to test hypotheses and make predictions. Surveys, experiments, and statistical analyses are common in quantitative research. This approach provides measurable and statistically significant results, contributing to the empirical understanding of phenomena.

Qualitative Research Methods

Qualitative research methods focus on exploring and understanding non-numerical data, emphasising context, meanings, and experiences. Techniques such as interviews, focus groups, and content analysis are employed in qualitative research. This approach is valuable for gaining in-depth insights into complex social, cultural, and psychological phenomena.

Integrating Multiple Methods for Comprehensive Analysis

Some research studies benefit from combining both quantitative and qualitative methods. This mixed-methods approach allows researchers to triangulate findings, providing a more comprehensive understanding of the research question. The integration of multiple methods enhances the robustness and validity of research outcomes.

What Is the Importance of Data Analysis in Research?

In the realm of research, data analysis serves a crucial role in validating hypotheses, drawing conclusions, and contributing to the broader body of knowledge.

  • Validating Hypotheses: Data analysis is the means by which researchers test hypotheses and determine the statistical significance of their findings.
  • Making Informed Conclusions: The process of data analysis enables researchers to draw informed conclusions based on evidence. This contributes to the reliability and validity of research outcomes.
  • Contributing to Scientific Knowledge: By analysing data and publishing results, researchers contribute to the collective knowledge within their field. This iterative process builds on existing understanding and propels the advancement of science.

Data Analysis Example

Case study or real-life scenario illustrating data analysis.

Consider a retail business aiming to optimise its product offerings. Through data analysis, the business collects and analyses customer purchase data, demographic information, and market trends. The analysis reveals patterns indicating a growing demand for eco-friendly products among a specific demographic. Armed with this insight, the business adjusts its inventory and marketing strategy, resulting in increased sales and customer satisfaction.

Step-by-Step Breakdown of the Analysis Process

This hypothetical case study highlights the step-by-step breakdown of the data analysis process, from data collection to interpretation of results. Each stage involves specific techniques, tools, and decisions made by the analysts to derive meaningful insights and drive strategic changes.

Lessons Learned from the Example

The example illustrates the practical application of data analysis in real-world scenarios. Key lessons include the importance of understanding customer behaviour, the need for accurate and relevant data, and the impact of data-driven decisions on business outcomes. Such case studies serve as valuable learning tools for aspiring data analysts.

Data Analysis Techniques in Qualitative Research

Coding and categorization.

Coding involves the systematic labelling of data to identify themes, patterns, or concepts. Researchers assign codes to segments of data, creating a structure for analysis. Categorization involves organising codes into broader categories, facilitating the interpretation of qualitative data.

Thematic Analysis

Thematic analysis aims to identify and analyse themes or patterns within qualitative data. It involves systematically coding data, searching for recurring themes, and interpreting their significance. Thematic analysis provides a flexible and accessible method for uncovering meaning in diverse datasets.

Grounded Theory

Grounded theory is an inductive research method that involves developing theories or explanations from the data itself. Researchers iteratively collect, code, and analyse data to generate concepts and theories. Grounded theory is particularly useful when exploring complex and poorly understood phenomena.

Narrative Analysis

Narrative analysis focuses on the stories people tell. Researchers examine narratives, whether in written or spoken form, to understand the meanings individuals attribute to their experiences. This approach is valuable for exploring subjective interpretations and cultural contexts.

Must Read: Data Analysis Courses for Beginners: Where to Start and What to Learn

Data quality is at the core of effective data analysis, and understanding the various aspects of the analysis process is essential for extracting meaningful insights. From choosing the right data analysis methods to selecting appropriate tools, this comprehensive guide provides a roadmap for navigating the dynamic field of data analysis.  Whether you’re a seasoned data analyst or a novice exploring the possibilities, continuous learning and a commitment to data quality are key to success in the data-driven world. Stay curious, stay analytical, and unlock the power of data to drive informed decisions and innovation.

The PW Skills Full-Stack Data Analytics course can help you in securing a high-paying job as a data analyst. So, don’t wait! Enrol now and start your journey to becoming a full-stack data analyst.

Are there ethical considerations in data analysis?

Ethical considerations in data analysis include ensuring the privacy of individuals, transparent reporting of findings, and avoiding biassed interpretations. Researchers must prioritise ethical practices to maintain the integrity of their analyses.

Can data analysis be applied to environmental studies?

Yes, data analysis plays a crucial role in environmental studies by examining trends in climate data, analysing the impact of human activities, and guiding conservation efforts. It helps researchers understand complex ecosystems and inform sustainable practices.

How does data analysis contribute to innovation in business strategies?

Data analysis contributes to innovation in business strategies by uncovering market trends, identifying emerging opportunities, and predicting consumer preferences. It enables businesses to adapt and stay competitive in dynamic markets.

Can data analysis help in crisis management?

Absolutely. Data analysis aids crisis management by providing real-time insights, assessing the impact of crises, and facilitating data-driven decision-making. This helps organisations respond effectively and allocate resources where they are most needed.

Is there a difference between exploratory and confirmatory data analysis?

Exploratory data analysis involves uncovering patterns without preconceived hypotheses, while confirmatory data analysis tests specific hypotheses. Both approaches are valuable in different stages of research, providing a holistic view of the data.

How does data analysis contribute to personalised medicine?

Data analysis in personalised medicine involves analysing genetic, clinical, and lifestyle data to tailor medical treatments to individual patients. It enables more precise diagnoses, treatment plans, and better patient outcomes.

Can data analysis help in detecting fraud in financial transactions?

Yes, data analysis is instrumental in detecting fraud in financial transactions by identifying unusual patterns, anomalies, and suspicious activities. It allows financial institutions to take preventive measures and enhance security.

How does data analysis contribute to educational research?

In educational research, data analysis helps assess the effectiveness of teaching methods, identify learning trends, and inform curriculum development. It supports evidence-based decision-making to enhance the educational experience.

What is the significance of data visualisation in data analysis?

Data visualisation is crucial in data analysis as it transforms complex datasets into accessible and understandable visual representations. It helps communicate findings effectively and facilitates decision-making across various audiences.

Can data analysis be used to measure the success of employee training programs?

Absolutely. Data analysis can assess the effectiveness of employee training programs by analysing performance metrics, feedback, and skill acquisition. It provides insights for refining training strategies and maximising impact.

What is Business Analytics?

business analytics

This comprehensive article explores the concept of business analytics, detailing its definition, influence on business decisions, comparisons with related fields,…

What Is A Trusted Analytics Platform?

Trusted Analytics Platform

A trusted analytics platform refers to a software infrastructure or system that helps organizations to securely and effectively analyze large…

10 Best Companies For Data Analysis Internships 2024

data analysis internship

This article will help you provide the top 10 best companies for a Data Analysis Internship which will not only…

bottom banner

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • Write for Us
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 3, Issue 3
  • Data analysis in qualitative research
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • Sally Thorne , RN, PhD
  • School of Nursing, University of British Columbia Vancouver, British Columbia, Canada

https://doi.org/10.1136/ebn.3.3.68

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Unquestionably, data analysis is the most complex and mysterious of all of the phases of a qualitative project, and the one that receives the least thoughtful discussion in the literature. For neophyte nurse researchers, many of the data collection strategies involved in a qualitative project may feel familiar and comfortable. After all, nurses have always based their clinical practice on learning as much as possible about the people they work with, and detecting commonalities and variations among and between them in order to provide individualised care. However, creating a database is not sufficient to conduct a qualitative study. In order to generate findings that transform raw data into new knowledge, a qualitative researcher must engage in active and demanding analytic processes throughout all phases of the research. Understanding these processes is therefore an important aspect not only of doing qualitative research, but also of reading, understanding, and interpreting it.

For readers of qualitative studies, the language of analysis can be confusing. It is sometimes difficult to know what the researchers actually did during this phase and to understand how their findings evolved out of the data that were collected or constructed. Furthermore, in describing their processes, some authors use language that accentuates this sense of mystery and magic. For example, they may claim that their conceptual categories “emerged” from the data 1 —almost as if they left the raw data out overnight and awoke to find that the data analysis fairies had organised the data into a coherent new structure that explained everything! In this EBN notebook, I will try to help readers make sense of some of the assertions that are made about qualitative data analysis so that they can develop a critical eye for when an analytical claim is convincing and when it is not.

Qualitative data

Qualitative data come in various forms. In many qualitative nursing studies, the database consists of interview transcripts from open ended, focused, but exploratory interviews. However, there is no limit to what might possibly constitute a qualitative database, and increasingly we are seeing more and more creative use of such sources as recorded observations (both video and participatory), focus groups, texts and documents, multi-media or public domain sources, policy manuals, photographs, and lay autobiographical accounts.

Qualitative analytic reasoning processes

What makes a study qualitative is that it usually relies on inductive reasoning processes to interpret and structure the meanings that can be derived from data. Distinguishing inductive from deductive inquiry processes is an important step in identifying what counts as qualitative research. Generally, inductive reasoning uses the data to generate ideas (hypothesis generating), whereas deductive reasoning begins with the idea and uses the data to confirm or negate the idea (hypothesis testing). 2 In actual practice, however, many quantitative studies involve much inductive reasoning, whereas good qualitative analysis often requires access to a full range of strategies. 3 A traditional quantitative study in the health sciences typically begins with a theoretical grounding, takes direction from hypotheses or explicit study questions, and uses a predetermined (and auditable) set of steps to confirm or refute the hypothesis. It does this to add evidence to the development of specific, causal, and theoretical explanations of phenomena. 3 In contrast, qualitative research often takes the position that an interpretive understanding is only possible by way of uncovering or deconstructing the meanings of a phenomenon. Thus, a distinction between explaining how something operates (explanation) and why it operates in the manner that it does (interpretation) may be a more effective way to distinguish quantitative from qualitative analytic processes involved in any particular study.

Because data collection and analysis processes tend to be concurrent, with new analytic steps informing the process of additional data collection and new data informing the analytic processes, it is important to recognise that qualitative data analysis processes are not entirely distinguishable from the actual data. The theoretical lens from which the researcher approaches the phenomenon, the strategies that the researcher uses to collect or construct data, and the understandings that the researcher has about what might count as relevant or important data in answering the research question are all analytic processes that influence the data. Analysis also occurs as an explicit step in conceptually interpreting the data set as a whole, using specific analytic strategies to transform the raw data into a new and coherent depiction of the thing being studied. Although there are many qualitative data analysis computer programs available on the market today, these are essentially aids to sorting and organising sets of qualitative data, and none are capable of the intellectual and conceptualising processes required to transform data into meaningful findings.

Specific analytic strategies

Although a description of the actual procedural details and nuances of every qualitative data analysis strategy is well beyond the scope of a short paper, a general appreciation of the theoretical assumptions underlying some of the more common approaches can be helpful in understanding what a researcher is trying to say about how data were sorted, organised, conceptualised, refined, and interpreted.

CONSTANT COMPARATIVE ANALYSIS

Many qualitative analytic strategies rely on a general approach called “constant comparative analysis”. Originally developed for use in the grounded theory methodology of Glaser and Strauss, 4 which itself evolved out of the sociological theory of symbolic interactionism, this strategy involves taking one piece of data (one interview, one statement, one theme) and comparing it with all others that may be similar or different in order to develop conceptualisations of the possible relations between various pieces of data. For example, by comparing the accounts of 2 different people who had a similar experience, a researcher might pose analytical questions like: why is this different from that? and how are these 2 related? In many qualitative studies whose purpose it is to generate knowledge about common patterns and themes within human experience, this process continues with the comparison of each new interview or account until all have been compared with each other. A good example of this process is reported in a grounded theory study of how adults with brain injury cope with the social attitudes they face (see Evidence-Based Nursing , April 1999, p64).

Constant comparison analysis is well suited to grounded theory because this design is specifically used to study those human phenomena for which the researcher assumes that fundamental social processes explain something of human behaviour and experience, such as stages of grieving or processes of recovery. However, many other methodologies draw from this analytical strategy to create knowledge that is more generally descriptive or interpretive, such as coping with cancer, or living with illness. Naturalistic inquiry, thematic analysis, and interpretive description are methods that depend on constant comparative analysis processes to develop ways of understanding human phenomena within the context in which they are experienced.

PHENOMENOLOGICAL APPROACHES

Constant comparative analysis is not the only approach in qualitative research. Some qualitative methods are not oriented toward finding patterns and commonalities within human experience, but instead seek to discover some of the underlying structure or essence of that experience through the intensive study of individual cases. For example, rather than explain the stages and transitions within grieving that are common to people in various circumstances, a phenomenological study might attempt to uncover and describe the essential nature of grieving and represent it in such a manner that a person who had not grieved might begin to appreciate the phenomenon. The analytic methods that would be employed in these studies explicitly avoid cross comparisons and instead orient the researcher toward the depth and detail that can be appreciated only through an exhaustive, systematic, and reflective study of experiences as they are lived.

Although constant comparative methods might well permit the analyst to use some pre-existing or emergent theory against which to test all new pieces of data that are collected, these more phenomenological approaches typically challenge the researcher to set aside or “bracket” all such preconceptions so that they can work inductively with the data to generate entirely new descriptions and conceptualisations. There are numerous forms of phenomenological research; however, many of the most popular approaches used by nurses derive from the philosophical work of Husserl on modes of awareness (epistemology) and the hermeneutic tradition of Heidegger, which emphasises modes of being (ontology). 5 These approaches differ from one another in the degree to which interpretation is acceptable, but both represent strategies for immersing oneself in data, engaging with data reflectively, and generating a rich description that will enlighten a reader as to the deeper essential structures underlying a particular human experience. Examples of the kinds of human experience that are amenable to this type of inquiry are the suffering experienced by individuals who have a drinking problem (see Evidence-Based Nursing , October 1998, p134) and the emotional experiences of parents of terminally ill adolescents (see Evidence-Based Nursing , October 1999, p132). Sometimes authors explain their approaches not by the phenomenological position they have adopted, but by naming the theorist whose specific techniques they are borrowing. Colaizzi and Giorgi are phenomenologists who have rendered the phenomenological attitude into a set of manageable steps and processes for working with such data and have therefore become popular reference sources among phenomenological nurse researchers.

ETHNOGRAPHIC METHODS

Ethnographic research methods derive from anthropology's tradition of interpreting the processes and products of cultural behaviour. Ethnographers documented such aspects of human experience as beliefs, kinship patterns and ways of living. In the healthcare field, nurses and others have used ethnographic methods to uncover and record variations in how different social and cultural groups understand and enact health and illness. An example of this kind of study is an investigation of how older adults adjust to living in a nursing home environment (see Evidence-Based Nursing , October 1999, p136). When a researcher claims to have used ethnographic methods, we can assume that he or she has come to know a culture or group through immersion and engagement in fieldwork or participant observation and has also undertaken to portray that culture through text. 6 Ethnographic analysis uses an iterative process in which cultural ideas that arise during active involvement “in the field” are transformed, translated, or represented in a written document. It involves sifting and sorting through pieces of data to detect and interpret thematic categorisations, search for inconsistencies and contradictions, and generate conclusions about what is happening and why.

NARRATIVE ANALYSIS AND DISCOURSE ANALYSIS

Many qualitative nurse researchers have discovered the extent to which human experience is shaped, transformed, and understood through linguistic representation. The vague and subjective sensations that characterise cognitively unstructured life experiences take on meaning and order when we try to articulate them in communication. Putting experience into words, whether we do this verbally, in writing, or in thought, transforms the actual experience into a communicable representation of it. Thus, speech forms are not the experiences themselves, but a socially and culturally constructed device for creating shared understandings about them. Narrative analysis is a strategy that recognises the extent to which the stories we tell provide insights about our lived experiences. 7 For example, it was used as a strategy to learn more about the experiences of women who discover that they have a breast lump (see Evidence-Based Nursing , July 1999, p93). Through analytic processes that help us detect the main narrative themes within the accounts people give about their lives, we discover how they understand and make sense of their lives.

By contrast, discourse analysis recognises speech not as a direct representation of human experience, but as an explicit linguistic tool constructed and shaped by numerous social or ideological influences. Discourse analysis strategies draw heavily upon theories developed in such fields as sociolinguistics and cognitive psychology to try to understand what is represented by the various ways in which people communicate ideas. They capitalise on critical inquiry into the language that is used and the way that it is used to uncover the societal influences underlying our behaviours and thoughts. 8 Thus, although discourse analysis and narrative analysis both rely heavily on speech as the most relevant data form, their reasons for analysing speech differ. The table ⇓ illustrates the distinctions among the analytic strategies described above using breast cancer research as an example.

  • View inline

General distinctions between selected qualitative research approaches: an illustration using breast cancer research

Cognitive processes inherent in qualitative analysis

The term “qualitative research” encompasses a wide range of philosophical positions, methodological strategies, and analytical procedures. Morse 1 has summarised the cognitive processes involved in qualitative research in a way that can help us to better understand how the researcher's cognitive processes interact with qualitative data to bring about findings and generate new knowledge. Morse believes that all qualitative analysis, regardless of the specific approach, involves:

comprehending the phenomenon under study

synthesising a portrait of the phenomenon that accounts for relations and linkages within its aspects

theorising about how and why these relations appear as they do, and

recontextualising , or putting the new knowledge about phenomena and relations back into the context of how others have articulated the evolving knowledge.

Although the form that each of these steps will take may vary according to such factors as the research question, the researcher's orientation to the inquiry, or the setting and context of the study, this set of steps helps to depict a series of intellectual processes by which data in their raw form are considered, examined, and reformulated to become a research product.

Quality measures in qualitative analysis

It used to be a tradition among qualitative nurse researchers to claim that such issues as reliability and validity were irrelevant to the qualitative enterprise. Instead, they might say that the proof of the quality of the work rested entirely on the reader's acceptance or rejection of the claims that were made. If the findings “rang true” to the intended audience, then the qualitative study was considered successful. More recently, nurse researchers have taken a lead among their colleagues in other disciplines in trying to work out more formally how the quality of a piece of qualitative research might be judged. Many of these researchers have concluded that systematic, rigorous, and auditable analytical processes are among the most significant factors distinguishing good from poor quality research. 9 Researchers are therefore encouraged to articulate their findings in such a manner that the logical processes by which they were developed are accessible to a critical reader, the relation between the actual data and the conclusions about data is explicit, and the claims made in relation to the data set are rendered credible and believable. Through this short description of analytical approaches, readers will be in a better position to critically evaluate individual qualitative studies, and decide whether and when to apply the findings of such studies to their nursing practice.

  • ↵ Morse JM. “Emerging from the data”: the cognitive processes of analysis in qualitative inquiry. In: JM Morse, editor. Critical issues in qualitative research methods . Thousand Oaks, CA: Sage, 1994:23–43.
  • ↵ Holloway I. Basic concepts for qualitative research . Oxford: Blackwell Science, 1997.
  • ↵ Schwandt TA. Qualitative inquiry: a dictionary of terms . Thousand Oaks, CA: Sage, 1997.
  • ↵ Glaser BG, Strauss AL. The discovery of grounded theory . Hawthorne, NY: Aldine, 1967.
  • ↵ Ray MA. The richness of phenomenology: philosophic, theoretic, and methodologic concerns. In: J M Morse, editor. Critical issues in qualitative research methods . Thousand Oaks, CA: Sage, 1994:117–33.
  • ↵ Boyle JS. Styles of ethnography. In: JM Morse, editor. Critical issues in qualitative research methods .. Thousand Oaks, CA: Sage, 1994:159–85.
  • ↵ Sandelowski M. We are the stories we tell: narrative knowing in nursing practice. J Holist Nurs 1994 ; 12 : 23 –33. OpenUrl CrossRef PubMed
  • ↵ Boutain DM. Critical language and discourse study: their transformative relevance for critical nursing inquiry. ANS Adv Nurs Sci 1999 ; 21 : 1 –8.
  • ↵ Thorne S. The art (and science) of critiquing qualitative research. In: JM Morse, editor. Completing a qualitative project: details and dialogue . Thousand Oaks, CA: Sage, 1997:117–32.

Read the full text or download the PDF:

Justjooz

7 Reasons Why Data Analysis is Important for Research

' src=

We’re reader-supported; we may earn a commission from links in this article.

Data analysis is an integral part of any research process.

All great publications have one thing in common – the use of data analysis to draw meaningful insights from the collected information.

In this blog post, we’ll discuss seven reasons why data analysis is essential in research and provide examples for each point.

If that’s what you want to find out, read on to get started!

importance of data analysis in research methodology

What is the Importance of Data Analysis in Research?

We know data analysis is important, but here are some specific reasons why it is crucial for research purposes:

1. Data analysis provides a reliable source of evidence

By analyzing data, researchers can identify patterns and trends in the gathered information that they may not be able to uncover on their own. This allows them to draw conclusions with greater accuracy and confidence.

Numerical data such as percentages, averages, and other summary statistics can be used to assess the reliability of a research outcome.

For example, if an experiment evaluates the effectiveness of a new drug, researchers can compare the outcomes from multiple groups of participants with different treatments in order to determine which one is more effective.

2. Data analysis helps make informed decisions

Data analysis can help identify the factors that are most likely to lead to successful outcomes for a research project.

Using various data analysis methods, such as statistical analysis, machine learning, and visualization, researchers can identify patterns, trends, and relationships in the data to inform decision-making.

For example, in a study about employee motivation, data analysis can provide information about which incentives impact employee performance most.

This can help researchers determine which strategies are most likely to be effective in motivating employees.

3. Data analysis improves accuracy

Data analysis is also essential for making accurate and reliable conclusions from research data.

Using various statistical techniques, researchers can identify patterns and trends in the data that would otherwise go unnoticed.

This enables them to make more robust conclusions about the subject matter they are studying, leading to better research outcomes.

For example, in a study about customer preferences, data analysis can identify which products customers prefer, allowing researchers to make more accurate decisions about product design and marketing.

4. Data analysis saves time and money

Data analysis allows researchers to collect and analyze data faster than with manual data analysis methods, which helps them save time and money.

Data analysis techniques can help researchers to identify and eliminate unnecessary or redundant experiments.

By analyzing data from previous experiments, researchers can identify the factors that are most likely to impact the outcome of their research.

This allows them to focus their efforts on the most promising areas, reducing the need for costly and time-consuming experimentation.

This means doing away with arduous processes such as the entire data collection phase and sometimes qualitative data analysis.

For example, in a study about customer service, data analysis can quickly identify areas of improvement that may be costly to fix with traditional methods.

5. Data analysis provides insights into new research areas

Data analysis can help researchers uncover new trends and relationships that may have been overlooked. The initial hypothesis may not be relevant at the start when first postulating a research study.

As research is a process of pivoting the research aim to find out new research areas to focus on, data analysis helps to uncover potential new research directions.

For example, in a study about the economy, data analysis can reveal correlations between different economic indicators that were previously unknown. This can provide valuable insights for economists looking for new areas of research.

6. Increasing the Statistical Power of a Study

Another benefit of data analysis is that it can increase the statistical power of a study.

Researchers can identify multiple factors influencing their results using advanced techniques like multivariate analysis.

This allows them to make more generalizable conclusions and increases the chances of detecting real effects.

Data analysis can help increase the statistical power of a study by using techniques like resampling and bootstrapping.

These statistical analysis techniques allow researchers to estimate the sampling distribution of a statistic of interest, such as the mean or the difference between means.

This can help to identify the range of possible values for a statistic and increase the chances of detecting real effects.

7. Data Analysis Helps in Communicating Research Findings

Finally, data analysis plays an essential role in communicating research findings to others.

By using various data visualization techniques, researchers can present their findings in a clear and concise manner. This makes it easier for others to understand and interpret the results, leading to better dissemination of the research findings.

The data analysis process usually utilizes the power of data visualization techniques to represent qualitative and quantitative data.

Data visualization techniques can be used to communicate research findings in a variety of ways, such as:

  • Research papers and articles, where visualizations can be used to supplement text-based explanations
  • Presentations, where visualizations can be used to convey key findings to an audience
  • Online platforms, where visualizations can be used to make research findings more accessible to a wider audience

What is Statistical Analysis?

Statistical analysis examines data to identify patterns and trends, which can be used to draw conclusions about a given population or system.

It involves collecting, organizing, summarizing, interpreting, and presenting data in order to answer questions or provide evidence for making decisions.

In research, statistical analysis is often used to test hypotheses and construct models of behavior.

This can help researchers better understand the underlying phenomena they are studying and make informed decisions about how best to move forward with their research.

What are Data Analysis Methods?

Data analysis methods are used to collect, organize, summarize, interpret, and present data. These methods can be used to answer questions or draw conclusions about a given population or system.

Data analysis methods come in a range of forms, such as descriptive statistics (mean and standard deviation), inferential statistics (chi-square tests, t-tests), and regression analysis.

These methods can be used to analyze data from experiments, surveys, and other research activities.

Final Thoughts

In conclusion, data analysis is an invaluable tool for any research project.

It provides reliable evidence, helps make informed decisions, increases accuracy, saves time and money, and provides insights into new research areas.

This makes data analysis an essential part of any successful research project.

By understanding the importance and capabilities of data analysis, researchers can leverage its power to improve their studies and better understand the phenomena they are studying.

importance of data analysis in research methodology

Justin Chia

Justin is the author of Justjooz and is a data analyst and AI expert. He is also a Nanyang Technological University (NTU) alumni, majoring in Biological Sciences.

He regularly posts AI and analytics content on LinkedIn , and writes a weekly newsletter, The Juicer , on AI, analytics, tech, and personal development.

To unwind, Justin enjoys gaming and reading.

Similar Posts

7 best data science business ideas.

You love data science, but you also love business. You intend to mash those both…

Data Analytics: Definition, Applications, and its Importance

Global companies today have already seen massive changes due to the data-driven approaches in their…

5 Best Machine Learning Journals

The machine learning industry moves really quickly, so you’ll have to stay abreast of the…

How Long Does it Take to Learn Power BI? (Explained!)

Immersing yourself into a new skill can be both exciting and overwhelming. Power BI is…

How To Become A Healthcare Data Analyst (7 Steps I Took!)

When I graduated from college, I had no idea what I wanted to do with…

7 Benefits of AI Writing: Efficiency, Accuracy, and Creativity

Have you ever wished for a magical writing assistant who could boost your productivity, enhance…

Want to Join The Juicer Newsletter? 🗞️

importance of data analysis in research methodology

  • Open access
  • Published: 13 May 2024

What are the strengths and limitations to utilising creative methods in public and patient involvement in health and social care research? A qualitative systematic review

  • Olivia R. Phillips 1 , 2   na1 ,
  • Cerian Harries 2 , 3   na1 ,
  • Jo Leonardi-Bee 1 , 2 , 4   na1 ,
  • Holly Knight 1 , 2 ,
  • Lauren B. Sherar 2 , 3 ,
  • Veronica Varela-Mato 2 , 3 &
  • Joanne R. Morling 1 , 2 , 5  

Research Involvement and Engagement volume  10 , Article number:  48 ( 2024 ) Cite this article

103 Accesses

2 Altmetric

Metrics details

There is increasing interest in using patient and public involvement (PPI) in research to improve the quality of healthcare. Ordinarily, traditional methods have been used such as interviews or focus groups. However, these methods tend to engage a similar demographic of people. Thus, creative methods are being developed to involve patients for whom traditional methods are inaccessible or non-engaging.

To determine the strengths and limitations to using creative PPI methods in health and social care research.

Electronic searches were conducted over five databases on 14th April 2023 (Web of Science, PubMed, ASSIA, CINAHL, Cochrane Library). Studies that involved traditional, non-creative PPI methods were excluded. Creative PPI methods were used to engage with people as research advisors, rather than study participants. Only primary data published in English from 2009 were accepted. Title, abstract and full text screening was undertaken by two independent reviewers before inductive thematic analysis was used to generate themes.

Twelve papers met the inclusion criteria. The creative methods used included songs, poems, drawings, photograph elicitation, drama performance, visualisations, social media, photography, prototype development, cultural animation, card sorting and persona development. Analysis identified four limitations and five strengths to the creative approaches. Limitations included the time and resource intensive nature of creative PPI, the lack of generalisation to wider populations and ethical issues. External factors, such as the lack of infrastructure to support creative PPI, also affected their implementation. Strengths included the disruption of power hierarchies and the creation of a safe space for people to express mundane or “taboo” topics. Creative methods are also engaging, inclusive of people who struggle to participate in traditional PPI and can also be cost and time efficient.

‘Creative PPI’ is an umbrella term encapsulating many different methods of engagement and there are strengths and limitations to each. The choice of which should be determined by the aims and requirements of the research, as well as the characteristics of the PPI group and practical limitations. Creative PPI can be advantageous over more traditional methods, however a hybrid approach could be considered to reap the benefits of both. Creative PPI methods are not widely used; however, this could change over time as PPI becomes embedded even more into research.

Plain English Summary

It is important that patients and public are included in the research process from initial brainstorming, through design to delivery. This is known as public and patient involvement (PPI). Their input means that research closely aligns with their wants and needs. Traditionally to get this input, interviews and group discussions are held, but this can exclude people who find these activities non-engaging or inaccessible, for example those with language challenges, learning disabilities or memory issues. Creative methods of PPI can overcome this. This is a broad term describing different (non-traditional) ways of engaging patients and public in research, such as through the use or art, animation or performance. This review investigated the reasons why creative approaches to PPI could be difficult (limitations) or helpful (strengths) in health and social care research. After searching 5 online databases, 12 studies were included in the review. PPI groups included adults, children and people with language and memory impairments. Creative methods included songs, poems, drawings, the use of photos and drama, visualisations, Facebook, creating prototypes, personas and card sorting. Limitations included the time, cost and effort associated with creative methods, the lack of application to other populations, ethical issues and buy-in from the wider research community. Strengths included the feeling of equality between academics and the public, creation of a safe space for people to express themselves, inclusivity, and that creative PPI can be cost and time efficient. Overall, this review suggests that creative PPI is worthwhile, however each method has its own strengths and limitations and the choice of which will depend on the research project, PPI group characteristics and other practical limitations, such as time and financial constraints.

Peer Review reports

Introduction

Patient and public involvement (PPI) is the term used to describe the partnership between patients (including caregivers, potential patients, healthcare users etc.) or the public (a community member with no known interest in the topic) with researchers. It describes research that is done “‘with’ or ‘by’ the public, rather than ‘to,’ ‘about’ or ‘for’ them” [ 1 ]. In 2009, it became a legislative requirement for certain health and social care organisations to include patients, families, carers and communities in not only the planning of health and social care services, but the commissioning, delivery and evaluation of them too [ 2 ]. For example, funding applications for the National Institute of Health and Care Research (NIHR), a UK funding body, mandates a demonstration of how researchers plan to include patients/service users, the public and carers at each stage of the project [ 3 ]. However, this should not simply be a tokenistic, tick-box exercise. PPI should help formulate initial ideas and should be an instrumental, continuous part of the research process. Input from PPI can provide unique insights not yet considered and can ensure that research and health services are closely aligned to the needs and requirements of service users PPI also generally makes research more relevant with clearer outcomes and impacts [ 4 ]. Although this review refers to both patients and the public using the umbrella term ‘PPI’, it is important to acknowledge that these are two different groups with different motivations, needs and interests when it comes to health research and service delivery [ 5 ].

Despite continuing recognition of the need of PPI to improve quality of healthcare, researchers have also recognised that there is no ‘one size fits all’ method for involving patients [ 4 ]. Traditionally, PPI methods invite people to take part in interviews or focus groups to facilitate discussion, or surveys and questionnaires. However, these can sometimes be inaccessible or non-engaging for certain populations. For example, someone with communication difficulties may find it difficult to engage in focus groups or interviews. If individuals lack the appropriate skills to interact in these types of scenarios, they cannot take advantage of the participation opportunities it can provide [ 6 ]. Creative methods, however, aim to resolve these issues. These are a relatively new concept whereby researchers use creative methods (e.g., artwork, animations, Lego), to make PPI more accessible and engaging for those whose voices would otherwise go unheard. They ensure that all populations can engage in research, regardless of their background or skills. Seminal work has previously been conducted in this area, which brought to light the use of creative methodologies in research. Leavy (2008) [ 7 ] discussed how traditional interviews had limits on what could be expressed due to their sterile, jargon-filled and formulaic structure, read by only a few specialised academics. It was this that called for more creative approaches, which included narrative enquiry, fiction-based research, poetry, music, dance, art, theatre, film and visual art. These practices, which can be used in any stage of the research cycle, supported greater empathy, self-reflection and longer-lasting learning experiences compared to interviews [ 7 ]. They also pushed traditional academic boundaries, which made the research accessible not only to researchers, but the public too. Leavy explains that there are similarities between arts-based approaches and scientific approaches: both attempts to investigate what it means to be human through exploration, and used together, these complimentary approaches can progress our understanding of the human experience [ 7 ]. Further, it is important to acknowledge the parallels and nuances between creative and inclusive methods of PPI. Although creative methods aim to be inclusive (this should underlie any PPI activity, whether creative or not), they do not incorporate all types of accessible, inclusive methodologies e.g., using sign language for people with hearing impairments or audio recordings for people who cannot read. Given that there was not enough scope to include an evaluation of all possible inclusive methodologies, this review will focus on creative methods of PPI only.

We aimed to conduct a qualitative systematic review to highlight the strengths of creative PPI in health and social care research, as well as the limitations, which might act as a barrier to their implementation. A qualitative systematic review “brings together research on a topic, systematically searching for research evidence from primary qualitative studies and drawing the findings together” [ 8 ]. This review can then advise researchers of the best practices when designing PPI.

Public involvement

The PHIRST-LIGHT Public Advisory Group (PAG) consists of a team of experienced public contributors with a diverse range of characteristics from across the UK. The PAG was involved in the initial question setting and study design for this review.

Search strategy

For the purpose of this review, the JBI approach for conducting qualitative systematic reviews was followed [ 9 ]. The search terms were (“creativ*” OR “innovat*” OR “authentic” OR “original” OR “inclu*”) AND (“public and patient involvement” OR “patient and public involvement” OR “public and patient involvement and engagement” OR “patient and public involvement and engagement” OR “PPI” OR “PPIE” OR “co-produc*” OR “co-creat*” OR “co-design*” OR “cooperat*” OR “co-operat*”). This search string was modified according to the requirements of each database. Papers were filtered by title, abstract and keywords (see Additional file 1 for search strings). The databases searched included Web of Science (WoS), PubMed, ASSIA and CINAHL. The Cochrane Library was also searched to identify relevant reviews which could lead to the identification of primary research. The search was conducted on 14/04/23. As our aim was to report on the use of creative PPI in research, rather than more generic public engagement, we used electronic databases of scholarly peer-reviewed literature, which represent a wide range of recognised databases. These identified studies published in general international journals (WoS, PubMed), those in social sciences journals (ASSIA), those in nursing and allied health journals (CINAHL), and trials of interventions (Cochrane Library).

Inclusion criteria

Only full-text, English language, primary research papers from 2009 to 2023 were included. This was the chosen timeframe as in 2009 the Health and Social Reform Act made it mandatory for certain Health and Social Care organisations to involve the public and patients in planning, delivering, and evaluating services [ 2 ]. Only creative methods of PPI were accepted, rather than traditional methods, such as interviews or focus groups. For the purposes of this paper, creative PPI included creative art or arts-based approaches (e.g., e.g. stories, songs, drama, drawing, painting, poetry, photography) to enhance engagement. Titles were related to health and social care and the creative PPI was used to engage with people as research advisors, not as study participants. Meta-analyses, conference abstracts, book chapters, commentaries and reviews were excluded. There were no limits concerning study location or the demographic characteristics of the PPI groups. Only qualitative data were accepted.

Quality appraisal

Quality appraisal using the Critical Appraisal Skills Programme (CASP) checklist [ 10 ] was conducted by the primary authors (ORP and CH). This was done independently, and discrepancies were discussed and resolved. If a consensus could not be reached, a third independent reviewer was consulted (JRM). The full list of quality appraisal questions can be found in Additional file 2 .

Data extraction

ORP extracted the study characteristics and a subset of these were checked by CH. Discrepancies were discussed and amendments made. Extracted data included author, title, location, year of publication, year study was carried out, research question/aim, creative methods used, number of participants, mean age, gender, ethnicity of participants, setting, limitations and strengths of creative PPI and main findings.

Data analysis

The included studies were analysed using inductive thematic analysis [ 11 ], where themes were determined by the data. The familiarisation stage took place during full-text reading of the included articles. Anything identified as a strength or limitation to creative PPI methods was extracted verbatim as an initial code and inputted into the data extraction Excel sheet. Similar codes were sorted into broader themes, either under ‘strengths’ or ‘limitations’ and reviewed. Themes were then assigned a name according to the codes.

The search yielded 9978 titles across the 5 databases: Web of Science (1480 results), PubMed (94 results), ASSIA (2454 results), CINAHL (5948 results) and Cochrane Library (2 results), resulting in 8553 different studies after deduplication. ORP and CH independently screened their titles and abstracts, excluding those that did not meet the criteria. After assessment, 12 studies were included (see Fig.  1 ).

figure 1

PRISMA flowchart of the study selection process

Study characteristics

The included studies were published between 2018 and 2022. Seven were conducted in the UK [ 12 , 14 , 15 , 17 , 18 , 19 , 23 ], two in Canada [ 21 , 22 ], one in Australia [ 13 ], one in Norway [ 16 ] and one in Ireland [ 20 ]. The PPI activities occurred across various settings, including a school [ 12 ], social club [ 12 ], hospital [ 17 ], university [ 22 ], theatre [ 19 ], hotel [ 20 ], or online [ 15 , 21 ], however this information was omitted in 5 studies [ 13 , 14 , 16 , 18 , 23 ]. The number of people attending the PPI sessions varied, ranging from 6 to 289, however the majority (ten studies) had less than 70 participants [ 13 , 14 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 ]. Seven studies did not provide information on the age or gender of the PPI groups. Of those that did, ages ranged from 8 to 76 and were mostly female. The ethnicities of the PPI group members were also rarely recorded (see Additional file 3 for data extraction table).

Types of creative methods

The type of creative methods used to engage the PPI groups were varied. These included songs, poems, drawings, photograph elicitation, drama performance, visualisations, Facebook, photography, prototype development, cultural animation, card sorting and creating personas (see Table  1 ). These were sometimes accompanied by traditional methods of PPI such as interviews and focus group discussions.

The 12 included studies were all deemed to be of good methodological quality, with scores ranging from 6/10 to 10/10 with the CASP critical appraisal tool [ 10 ] (Table  2 ).

Thematic analysis

Analysis identified four limitations and five strengths to creative PPI (see Fig.  2 ). Limitations included the time and resource intensity of creative PPI methods, its lack of generalisation, ethical issues and external factors. Strengths included the disruption of power hierarchies, the engaging and inclusive nature of the methods and their long-term cost and time efficiency. Creative PPI methods also allowed mundane and “taboo” topics to be discussed within a safe space.

figure 2

Theme map of strengths and limitations

Limitations of creative PPI

Creative ppi methods are time and resource intensive.

The time and resource intensive nature of creative PPI methods is a limitation, most notably for the persona-scenario methodology. Valaitis et al. [ 22 ] used 14 persona-scenario workshops with 70 participants to co-design a healthcare intervention, which aimed to promote optimal aging in Canada. Using the persona method, pairs composed of patients, healthcare providers, community service providers and volunteers developed a fictional character which they believed represented an ‘end-user’ of the healthcare intervention. Due to the depth and richness of the data produced the authors reported that it was time consuming to analyse. Further, they commented that the amount of information was difficult to disseminate to scientific leads and present at team meetings. Additionally, to ensure the production of high-quality data, to probe for details and lead group discussion there was a need for highly skilled facilitators. The resource intensive nature of the creative co-production was also noted in a study using the persona scenario and creative worksheets to develop a prototype decision support tool for individuals with malignant pleural effusion [ 17 ]. With approximately 50 people, this was also likely to yield a high volume of data to consider.

To prepare materials for populations who cannot engage in traditional methods of PPI was also timely. Kearns et al. [ 18 ] developed a feedback questionnaire for people with aphasia to evaluate ICT-delivered rehabilitation. To ensure people could participate effectively, the resources used during the workshops, such as PowerPoints, online images and photographs, had to be aphasia-accessible, which was labour and time intensive. The author warned that this time commitment should not be underestimated.

There are further practical limitations to implementing creative PPI, such as the costs of materials for activities as well as hiring a space for workshops. For example, the included studies in this review utilised pens, paper, worksheets, laptops, arts and craft supplies and magazines and took place in venues such as universities, a social club, and a hotel. Further, although not limited to creative PPI methods exclusively but rather most studies involving the public, a financial incentive was often offered for participation, as well as food, parking, transport and accommodation [ 21 , 22 ].

Creative PPI lacks generalisation

Another barrier to the use of creative PPI methods in health and social care research was the individual nature of its output. Those who participate, usually small in number, produce unique creative outputs specific to their own experiences, opinions and location. Craven et al. [ 13 ], used arts-based visualisations to develop a toolbox for adults with mental health difficulties. They commented, “such an approach might still not be worthwhile”, as the visualisations were individualised and highly personal. This indicates that the output may fail to meet the needs of its end-users. Further, these creative PPI groups were based in certain geographical regions such as Stoke-on-Trent [ 19 ] Sheffield [ 23 ], South Wales [ 12 ] or Ireland [ 20 ], which limits the extent the findings can be applied to wider populations, even within the same area due to individual nuances. Further, the study by Galler et al. [ 16 ], is specific to the Norwegian context and even then, maybe only a sub-group of the Norwegian population as the sample used was of higher socioeconomic status.

However, Grindell et al. [ 17 ], who used persona scenarios, creative worksheets and prototype development, pointed out that the purpose of this type of research is to improve a certain place, rather than apply findings across other populations and locations. Individualised output may, therefore, only be a limitation to research wanting to conduct PPI on a large scale.

If, however, greater generalisation within PPI is deemed necessary, then social media may offer a resolution. Fedorowicz et al. [ 15 ], used Facebook to gain feedback from the public on the use of video-recording methodology for an upcoming project. This had the benefit of including a more diverse range of people (289 people joined the closed group), who were spread geographically around the UK, as well as seven people from overseas.

Creative PPI has ethical issues

As with other research, ethical issues must be taken into consideration. Due to the nature of creative approaches, as well as the personal effort put into them, people often want to be recognised for their work. However, this compromises principles so heavily instilled in research such as anonymity and confidentiality. With the aim of exploring issues related to health and well-being in a town in South Wales, Byrne et al. [ 12 ], asked year 4/5 and year 10 pupils to create poems, songs, drawings and photographs. Community members also created a performance, mainly of monologues, to explore how poverty and inequalities are dealt with. Byrne noted the risks of these arts-based approaches, that being the possibility of over-disclosure and consequent emotional distress, as well as people’s desire to be named for their work. On one hand, the anonymity reduces the sense of ownership of the output as it does not portray a particular individual’s lived experience anymore. On the other hand, however, it could promote a more honest account of lived experience. Supporting this, Webber et al. [ 23 ], who used the persona method to co-design a back pain educational resource prototype, claimed that the anonymity provided by this creative technique allowed individuals to externalise and anonymise their own personal experience, thus creating a more authentic and genuine resource for future users. This implies that anonymity can be both a limitation and strength here.

The use of creative PPI methods is impeded by external factors

Despite the above limitations influencing the implementation of creative PPI techniques, perhaps the most influential is that creative methodologies are simply not mainstream [ 19 ]. This could be linked to the issues above, like time and resource intensity, generalisation and ethical issues but it is also likely to involve more systemic factors within the research community. Micsinszki et al. [ 21 ], who co-designed a hub for the health and well-being of vulnerable populations, commented that there is insufficient infrastructure to conduct meaningful co-design as well as a dominant medical model. Through a more holistic lens, there are “sociopolitical environments that privilege individualism over collectivism, self-sufficiency over collaboration, and scientific expertise over other ways of knowing based on lived experience” [ 21 ]. This, it could be suggested, renders creative co-design methodologies, which are based on the foundations of collectivism, collaboration and imagination an invalid technique in the research field, which is heavily dominated by more scientific methods offering reproducibility, objectivity and reliability.

Although we acknowledge that creative PPI techniques are not always appropriate, it may be that their main limitation is the lack of awareness of these methods or lack of willingness to use them. Further, there is always the risk that PPI, despite being a mandatory part of research, is used in a tokenistic or tick-box fashion [ 20 ], without considering the contribution that meaningful PPI could make to enhancing the research. It may be that PPI, let alone creative PPI, is not at the forefront of researchers’ minds when planning research.

Strengths of creative PPI

Creative ppi disrupts power hierarchies.

One of the main strengths of creative PPI techniques, cited most frequently in the included literature, was that they disrupt traditional power hierarchies [ 12 , 13 , 17 , 19 , 23 ]. For example, the use of theatre performance blurred the lines between professional and lay roles between the community and policy makers [ 12 ]. Individuals created a monologue to portray how poverty and inequality impact daily life and presented this to representatives of the National Assembly of Wales, Welsh Government, the Local Authority, Arts Council and Westminster. Byrne et al. [ 12 ], states how this medium allowed the community to engage with the people who make decisions about their lives in an environment of respect and understanding, where the hierarchies are not as visible as in other settings, e.g., political surgeries. Creative PPI methods have also removed traditional power hierarchies between researchers and adolescents. Cook et al. [ 13 ], used arts-based approaches to explore adolescents’ ideas about the “perfect” condom. They utilised the “Life Happens” resource, where adolescents drew and then decorated a person with their thoughts about sexual relationships, not too dissimilar from the persona-scenario method. This was then combined with hypothetical scenarios about sexuality. A condom-mapping exercise was then implemented, where groups shared the characteristics that make a condom “perfect” on large pieces of paper. Cook et al. [ 13 ], noted that usually power imbalances make it difficult to elicit information from adolescents, however these power imbalances were reduced due to the use of creative co-design techniques.

The same reduction in power hierarchies was noted by Grindell et al. [ 17 ], who used the person-scenario method and creative worksheets with individuals with malignant pleural effusion. This was with the aim of developing a prototype of a decision support tool for patients to help with treatment options. Although this process involved a variety of stakeholders, such as patients, carers and healthcare professionals, creative co-design was cited as a mechanism that worked to reduce power imbalances – a limitation of more traditional methods of research. Creative co-design blurred boundaries between end-users and clinical staff and enabled the sharing of ideas from multiple, valuable perspectives, meaning the prototype was able to suit user needs whilst addressing clinical problems.

Similarly, a specific creative method named cultural animation was also cited to dissolve hierarchies and encourage equal contributions from participants. Within this arts-based approach, Keleman et al. [ 19 ], explored the concept of “good health” with individuals from Stoke-on Trent. Members of the group created art installations using ribbons, buttons, cardboard and straws to depict their idea of a “healthy community”, which was accompanied by a poem. They also created a 3D Facebook page and produced another poem or song addressing the government to communicate their version of a “picture of health”. Public participants said that they found the process empowering, honest, democratic, valuable and practical.

This dissolving of hierarchies and levelling of power is beneficial as it increases the sense of ownership experienced by the creators/producers of the output [ 12 , 17 , 23 ]. This is advantageous as it has been suggested to improve its quality [ 23 ].

Creative PPI allows the unsayable to be said

Creative PPI fosters a safe space for mundane or taboo topics to be shared, which may be difficult to communicate using traditional methods of PPI. For example, the hypothetical nature of condom mapping and persona-scenarios meant that adolescents could discuss a personal topic without fear of discrimination, judgement or personal disclosure [ 13 ]. The safe space allowed a greater volume of ideas to be generated amongst peers where they might not have otherwise. Similarly, Webber et al. [ 23 ], , who used the persona method to co-design the prototype back pain educational resource, also noted how this method creates anonymity whilst allowing people the opportunity to externalise personal experiences, thoughts and feelings. Other creative methods were also used, such as drawing, collaging, role play and creating mood boards. A cardboard cube (labelled a “magic box”) was used to symbolise a physical representation of their final prototype. These creative methods levelled the playing field and made personal experiences accessible in a safe, open environment that fostered trust, as well as understanding from the researchers.

It is not only sensitive subjects that were made easier to articulate through creative PPI. The communication of mundane everyday experiences were also facilitated, which were deemed typically ‘unsayable’. This was specifically given in the context of describing intangible aspects of everyday health and wellbeing [ 11 ]. Graphic designers can also be used to visually represent the outputs of creative PPI. These captured the movement and fluidity of people and well as the relationships between them - things that cannot be spoken but can be depicted [ 21 ].

Creative PPI methods are inclusive

Another strength of creative PPI was that it is inclusive and accessible [ 17 , 19 , 21 ]. The safe space it fosters, as well as the dismantling of hierarchies, welcomed people from a diverse range of backgrounds and provided equal opportunities [ 21 ], especially for those with communication and memory difficulties who might be otherwise excluded from PPI. Kelemen et al. [ 19 ], who used creative methods to explore health and well-being in Stoke-on-Trent, discussed how people from different backgrounds came together and connected, discussed and reached a consensus over a topic which evoked strong emotions, that they all have in common. Individuals said that the techniques used “sets people to open up as they are not overwhelmed by words”. Similarly, creative activities, such as the persona method, have been stated to allow people to express themselves in an inclusive environment using a common language. Kearns et al. [ 18 ], who used aphasia-accessible material to develop a questionnaire with aphasic individuals, described how they felt comfortable in contributing to workshops (although this material was time-consuming to make, see ‘Limitations of creative PPI’ ).

Despite the general inclusivity of creative PPI, it can also be exclusive, particularly if online mediums are used. Fedorowicz et al. [ 15 ], used Facebook to create a PPI group, and although this may rectify previous drawbacks about lack of generalisation of creative methods (as Facebook can reach a greater number of people, globally), it excluded those who are not digitally active or have limited internet access or knowledge of technology. Online methods have other issues too. Maintaining the online group was cited as challenging and the volume of responses required researchers to interact outside of their working hours. Despite this, online methods like Facebook are very accessible for people who are physically disabled.

Creative PPI methods are engaging

The process of creative PPI is typically more engaging and produces more colourful data than traditional methods [ 13 ]. Individuals are permitted and encouraged to explore a creative self [ 19 ], which can lead to the exploration of new ideas and an overall increased enjoyment of the process. This increased engagement is particularly beneficial for younger PPI groups. For example, to involve children in the development of health food products, Galler et al. [ 16 ] asked 9-12-year-olds to take photos of their food and present it to other children in a “show and tell” fashion. They then created a newspaper article describing a new healthy snack. In this creative focus group, children were given lab coats to further their identity as inventors. Galler et al. [ 16 ], notes that the methods were highly engaging and facilitated teamwork and group learning. This collaborative nature of problem-solving was also observed in adults who used personas and creative worksheets to develop the resource for lower back pain [ 23 ]. Dementia patients too have been reported to enjoy the creative and informal approach to idea generation [ 20 ].

The use of cultural animation allowed people to connect with each other in a way that traditional methods do not [ 19 , 21 ]. These connections were held in place by boundary objects, such as ribbons, buttons, fabric and picture frames, which symbolised a shared meaning between people and an exchange of knowledge and emotion. Asking groups to create an art installation using these objects further fostered teamwork and collaboration, both at an individual and collective level. The exploration of a creative self increased energy levels and encouraged productive discussions and problem-solving [ 19 ]. Objects also encouraged a solution-focused approach and permitted people to think beyond their usual everyday scope [ 17 ]. They also allowed facilitators to probe deeper about the greater meanings carried by the object, which acted as a metaphor [ 21 ].

From the researcher’s point of view, co-creative methods gave rise to ideas they might not have initially considered. Valaitis et al. [ 22 ], found that over 40% of the creative outputs were novel ideas brought to light by patients, healthcare providers/community care providers, community service providers and volunteers. One researcher commented, “It [the creative methods] took me on a journey, in a way that when we do other pieces of research it can feel disconnected” [ 23 ]. Another researcher also stated they could not return to the way they used to do research, as they have learnt so much about their own health and community and how they are perceived [ 19 ]. This demonstrates that creative processes not only benefit the project outcomes and the PPI group, but also facilitators and researchers. However, although engaging, creative methods have been criticised for not demonstrating academic rigour [ 17 ]. Moreover, creative PPI may also be exclusive to people who do not like or enjoy creative activities.

Creative PPI methods are cost and time efficient

Creative PPI workshops can often produce output that is visible and tangible. This can save time and money in the long run as the output is either ready to be implemented in a healthcare setting or a first iteration has already been developed. This may also offset the time and costs it takes to implement creative PPI. For example, the prototype of the decision support tool for people with malignant pleural effusion was developed using personas and creative worksheets. The end result was two tangible prototypes to drive the initial idea forward as something to be used in practice [ 17 ]. The use of creative co-design in this case saved clinician time as well as the time it would take to develop this product without the help of its end-users. In the development of this particular prototype, analysis was iterative and informed the next stage of development, which again saved time. The same applies for the feedback questionnaire for the assessment of ICT delivered aphasia rehabilitation. The co-created questionnaire, designed with people with aphasia, was ready to be used in practice [ 18 ]. This suggests that to overcome time and resource barriers to creative PPI, researchers should aim for it to be engaging whilst also producing output.

That useable products are generated during creative workshops signals to participating patients and public members that they have been listened to and their thoughts and opinions acted upon [ 23 ]. For example, the development of the back pain resource based on patient experiences implies that their suggestions were valid and valuable. Further, those who participated in the cultural animation workshop reported that the process visualises change, and that it already feels as though the process of change has started [ 19 ].

The most cost and time efficient method of creative PPI in this review is most likely the use of Facebook to gather feedback on project methodology [ 15 ]. Although there were drawbacks to this, researchers could involve more people from a range of geographical areas at little to no cost. Feedback was instantaneous and no training was required. From the perspective of the PPI group, they could interact however much or little they wish with no time commitment.

This systematic review identified four limitations and five strengths to the use of creative PPI in health and social care research. Creative PPI is time and resource intensive, can raise ethical issues and lacks generalisability. It is also not accepted by the mainstream. These factors may act as barriers to the implementation of creative PPI. However, creative PPI disrupts traditional power hierarchies and creates a safe space for taboo or mundane topics. It is also engaging, inclusive and can be time and cost efficient in the long term.

Something that became apparent during data analysis was that these are not blanket strengths and limitations of creative PPI as a whole. The umbrella term ‘creative PPI’ is broad and encapsulates a wide range of activities, ranging from music and poems to prototype development and persona-scenarios, to more simplistic things like the use of sticky notes and ordering cards. Many different activities can be deemed ‘creative’ and the strengths and limitations of one does not necessarily apply to another. For example, cultural animation takes greater effort to prepare than the use of sticky notes and sorting cards, and the use of Facebook is cheaper and wider reaching than persona development. Researchers should use their discretion and weigh up the benefits and drawbacks of each method to decide on a technique which suits the project. What might be a limitation to creative PPI in one project may not be in another. In some cases, creative PPI may not be suitable at all.

Furthermore, the choice of creative PPI method also depends on the needs and characteristics of the PPI group. Children, adults and people living with dementia or language difficulties all have different engagement needs and capabilities. This indicates that creative PPI is not one size fits all and that the most appropriate method will change depending on the composition of the group. The choice of method will also be determined by the constraints of the research project, namely time, money and the research aim. For example, if there are time constraints, then a method which yields a lot of data and requires a lot of preparation may not be appropriate. If generalisation is important, then an online method is more suitable. Together this indicates that the choice of creative PPI method is highly individualised and dependent on multiple factors.

Although the limitations discussed in this review apply to creative PPI, they are not exclusive to creative PPI. Ethical issues are a consideration within general PPI research, especially when working with more vulnerable populations, such as children or adults living with a disability. It can also be the case that traditional PPI methods lack generalisability, as people who volunteer to be part of such a group are more likely be older, middle class and retired [ 24 ]. Most research is vulnerable to this type of bias, however, it is worth noting that generalisation is not always a goal and research remains valid and meaningful in its absence. Although online methods may somewhat combat issues related to generalisability, these methods still exclude people who do not have access to the internet/technology or who choose not to use it, implying that online PPI methods may not be wholly representative of the general population. Saying this, however, the accessibility of creative PPI techniques differs from person to person, and for some, online mediums may be more accessible (for example for those with a physical disability), and for others, this might be face-to-face. To combat this, a range of methods should be implemented. Planning multiple focus group and interviews for traditional PPI is also time and resource intensive, however the extra resources required to make this creative may be even greater. Although, the rich data provided may be worth the preparation and analysis time, which is also likely to depend on the number of participants and workshop sessions required. PPI, not just creative PPI, often requires the provision of a financial incentive, refreshments, parking and accommodation, which increase costs. These, however, are imperative and non-negotiable, as they increase the accessibility of research, especially to minority and lower-income groups less likely to participate. Adequate funding is also important for co-design studies where repeated engagement is required. One barrier to implementation, which appears to be exclusive to creative methods, however, is that creative methods are not mainstream. This cannot be said for traditional PPI as this is often a mandatory part of research applications.

Regarding the strengths of creative PPI, it could be argued that most appear to be exclusive to creative methodologies. These are inclusive by nature as multiple approaches can be taken to evoke ideas from different populations - approaches that do not necessarily rely on verbal or written communication like interviews and focus groups do. Given the anonymity provided by some creative methods, such as personas, people may be more likely to discuss their personal experiences under the guise of a general end-user, which might be more difficult to maintain when an interviewer is asking an individual questions directly. Additionally, creative methods are by nature more engaging and interactive than traditional methods, although this is a blanket statement and there may be people who find the question-and-answer/group discussion format more engaging. Creative methods have also been cited to eliminate power imbalances which exist in traditional research [ 12 , 13 , 17 , 19 , 23 ]. These imbalances exist between researchers and policy makers and adolescents, adults and the community. Lastly, although this may occur to a greater extent in creative methods like prototype development, it could be suggested that PPI in general – regardless of whether it is creative - is more time and cost efficient in the long-term than not using any PPI to guide or refine the research process. It must be noted that these are observations based on the literature. To be certain these differences exist between creative and traditional methods of PPI, direct empirical evaluation of both should be conducted.

To the best of our knowledge, this is the first review to identify the strengths and limitations to creative PPI, however, similar literature has identified barriers and facilitators to PPI in general. In the context of clinical trials, recruitment difficulties were cited as a barrier, as well as finding public contributors who were free during work/school hours. Trial managers reported finding group dynamics difficult to manage and the academic environment also made some public contributors feel nervous and lacking confidence to speak. Facilitators, however, included the shared ownership of the research – something that has been identified in the current review too. In addition, planning and the provision of knowledge, information and communication were also identified as facilitators [ 25 ]. Other research on the barriers to meaningful PPI in trial oversight committees included trialist confusion or scepticism over the PPI role and the difficulties in finding PPI members who had a basic understanding of research [ 26 ]. However, it could be argued that this is not representative of the average patient or public member. The formality of oversight meetings and the technical language used also acted as a barrier, which may imply that the informal nature of creative methods and its lack of dependency on literacy skills could overcome this. Further, a review of 42 reviews on PPI in health and social care identified financial compensation, resources, training and general support as necessary to conduct PPI, much like in the current review where the resource intensiveness of creative PPI was identified as a limitation. However, others were identified too, such as recruitment and representativeness of public contributors [ 27 ]. Like in the current review, power imbalances were also noted, however this was included as both a barrier and facilitator. Collaboration seemed to diminish hierarchies but not always, as sometimes these imbalances remained between public contributors and healthcare staff, described as a ‘them and us’ culture [ 27 ]. Although these studies compliment the findings of the current review, a direct comparison cannot be made as they do not concern creative methods. However, it does suggest that some strengths and weaknesses are shared between creative and traditional methods of PPI.

Strengths and limitations of this review

Although a general definition of creative PPI exists, it was up to our discretion to decide exactly which activities were deemed as such for this review. For example, we included sorting cards, the use of interactive whiteboards and sticky notes. Other researchers may have a more or less stringent criteria. However, two reviewers were involved in this decision which aids the reliability of the included articles. Further, it may be that some of the strengths and limitations cannot fully be attributed to the creative nature of the PPI process, but rather their co-created nature, however this is hard to disentangle as the included papers involved both these aspects.

During screening, it was difficult to decide whether the article was utilising creative qualitative methodology or creative PPI , as it was often not explicitly labelled as such. Regardless, both approaches involved the public/patients refining a healthcare product/service. This implies that if this review were to be replicated, others may do it differently. This may call for greater standardisation in the reporting of the public’s involvement in research. For example, the NIHR outlines different approaches to PPI, namely “consultation”, “collaboration”, “co-production” and “user-controlled”, which each signify an increased level of public power and influence [ 28 ]. Papers with elements of PPI could use these labels to clarify the extent of public involvement, or even explicitly state that there was no PPI. Further, given our decision to include only scholarly peer-reviewed literature, it is possible that data were missed within the grey literature. Similarly, the literature search will not have identified all papers relating to different types of accessible inclusion. However, the intent of the review was to focus solely on those within the definition of creative.

This review fills a gap in the literature and helps circulate and promote the concept of creative PPI. Each stage of this review, namely screening and quality appraisal, was conducted by two independent reviewers. However, four full texts could not be accessed during the full text reading stage, meaning there are missing data that could have altered or contributed to the findings of this review.

Research recommendations

Given that creative PPI can require effort to prepare, perform and analyse, sufficient time and funding should be allocated in the research protocol to enable meaningful and continuous PPI. This is worthwhile as PPI can significantly change the research output so that it aligns closely with the needs of the group it is to benefit. Researchers should also consider prototype development as a creative PPI activity as this might reduce future time/resource constraints. Shifting from a top-down approach within research to a bottom-up can be advantageous to all stakeholders and can help move creative PPI towards the mainstream. This, however, is the collective responsibility of funding bodies, universities and researchers, as well as committees who approve research bids.

A few of the included studies used creative techniques alongside traditional methods, such as interviews, which could also be used as a hybrid method of PPI, perhaps by researchers who are unfamiliar with creative techniques or to those who wish to reap the benefits of both. Often the characteristics of the PPI group were not included, including age, gender and ethnicity. It would be useful to include such information to assess how representative the PPI group is of the population of interest.

Creative PPI is a relatively novel approach of engaging the public and patients in research and it has both advantages and disadvantages compared to more traditional methods. There are many approaches to implementing creative PPI and the choice of technique will be unique to each piece of research and is reliant on several factors. These include the age and ability of the PPI group as well as the resource limitations of the project. Each method has benefits and drawbacks, which should be considered at the protocol-writing stage. However, given adequate funding, time and planning, creative PPI is a worthwhile and engaging method of generating ideas with end-users of research – ideas which may not be otherwise generated using traditional methods.

Data availability

No datasets were generated or analysed during the current study.

Abbreviations

Critical Appraisal Skills Programme

The Joanna Briggs Institute

National Institute of Health and Care Research

Public Advisory Group

Public and Patient Involvement

Web of Science

National Institute for Health and Care Research. What Is Patient and Public Involvement and Public Engagement? https://www.spcr.nihr.ac.uk/PPI/what-is-patient-and-public-involvement-and-engagement Accessed 01 Sept 2023.

Department of Health. Personal and Public Involvement (PPI) https://www.health-ni.gov.uk/topics/safety-and-quality-standards/personal-and-public-involvement-ppi#:~:text=The Health and Social Care Reform Act (NI) 2009 placed,delivery and evaluation of services . Accessed 01 Sept 2023.

National Institute for Health and Care Research. Policy Research Programme – Guidance for Stage 1 Applications https://www.nihr.ac.uk/documents/policy-research-programme-guidance-for-stage-1-applications-updated/26398 Accessed 01 Sept 2023.

Greenhalgh T, Hinton L, Finlay T, Macfarlane A, Fahy N, Clyde B, Chant A. Frameworks for supporting patient and public involvement in research: systematic review and co-design pilot. Health Expect. 2019. https://doi.org/10.1111/hex.12888

Article   PubMed   PubMed Central   Google Scholar  

Street JM, Stafinski T, Lopes E, Menon D. Defining the role of the public in health technology assessment (HTA) and HTA-informed decision-making processes. Int J Technol Assess Health Care. 2020. https://doi.org/10.1017/S0266462320000094

Article   PubMed   Google Scholar  

Morrison C, Dearden A. Beyond tokenistic participation: using representational artefacts to enable meaningful public participation in health service design. Health Policy. 2013. https://doi.org/10.1016/j.healthpol.2013.05.008

Leavy P. Method meets art: arts-Based Research Practice. New York: Guilford; 2020.

Google Scholar  

Seers K. Qualitative systematic reviews: their importance for our understanding of research relevant to pain. Br J Pain. 2015. https://doi.org/10.1177/2049463714549777

Lockwood C, Porritt K, Munn Z, Rittenmeyer L, Salmond S, Bjerrum M, Loveday H, Carrier J, Stannard D. Chapter 2: Systematic reviews of qualitative evidence. Aromataris E, Munn Z, editors. JBI Manual for Evidence Synthesis JBI. 2020. https://synthesismanual.jbi.global . https://doi.org/10.46658/JBIMES-20-03

CASP. CASP Checklists https://casp-uk.net/images/checklist/documents/CASP-Qualitative-Studies-Checklist/CASP-Qualitative-Checklist-2018_fillable_form.pdf (2022).

Braun V, Clarke V. Using thematic analysis in psychology. Qualitative Res Psychol. 2006. https://doi.org/10.1191/1478088706qp063oa

Article   Google Scholar  

Byrne E, Elliott E, Saltus R, Angharad J. The creative turn in evidence for public health: community and arts-based methodologies. J Public Health. 2018. https://doi.org/10.1093/pubmed/fdx151

Cook S, Grozdanovski L, Renda G, Santoso D, Gorkin R, Senior K. Can you design the perfect condom? Engaging young people to inform safe sexual health practice and innovation. Sex Educ. 2022. https://doi.org/10.1080/14681811.2021.1891040

Craven MP, Goodwin R, Rawsthorne M, Butler D, Waddingham P, Brown S, Jamieson M. Try to see it my way: exploring the co-design of visual presentations of wellbeing through a workshop process. Perspect Public Health. 2019. https://doi.org/10.1177/1757913919835231

Fedorowicz S, Riley V, Cowap L, Ellis NJ, Chambers R, Grogan S, Crone D, Cottrell E, Clark-Carter D, Roberts L, Gidlow CJ. Using social media for patient and public involvement and engagement in health research: the process and impact of a closed Facebook group. Health Expect. 2022. https://doi.org/10.1111/hex.13515

Galler M, Myhrer K, Ares G, Varela P. Listening to children voices in early stages of new product development through co-creation – creative focus group and online platform. Food Res Int. 2022. https://doi.org/10.1016/j.foodres.2022.111000

Grindell C, Tod A, Bec R, Wolstenholme D, Bhatnagar R, Sivakumar P, Morley A, Holme J, Lyons J, Ahmed M, Jackson S, Wallace D, Noorzad F, Kamalanathan M, Ahmed L, Evison M. Using creative co-design to develop a decision support tool for people with malignant pleural effusion. BMC Med Inf Decis Mak. 2020. https://doi.org/10.1186/s12911-020-01200-3

Kearns Á, Kelly H, Pitt I. Rating experience of ICT-delivered aphasia rehabilitation: co-design of a feedback questionnaire. Aphasiology. 2020. https://doi.org/10.1080/02687038.2019.1649913

Kelemen M, Surman E, Dikomitis L. Cultural animation in health research: an innovative methodology for patient and public involvement and engagement. Health Expect. 2018. https://doi.org/10.1111/hex.12677

Keogh F, Carney P, O’Shea E. Innovative methods for involving people with dementia and carers in the policymaking process. Health Expect. 2021. https://doi.org/10.1111/hex.13213

Micsinszki SK, Buettgen A, Mulvale G, Moll S, Wyndham-West M, Bruce E, Rogerson K, Murray-Leung L, Fleisig R, Park S, Phoenix M. Creative processes in co-designing a co-design hub: towards system change in health and social services in collaboration with structurally vulnerable populations. Evid Policy. 2022. https://doi.org/10.1332/174426421X16366319768599

Valaitis R, Longaphy J, Ploeg J, Agarwal G, Oliver D, Nair K, Kastner M, Avilla E, Dolovich L. Health TAPESTRY: co-designing interprofessional primary care programs for older adults using the persona-scenario method. BMC Fam Pract. 2019. https://doi.org/10.1186/s12875-019-1013-9

Webber R, Partridge R, Grindell C. The creative co-design of low back pain education resources. Evid Policy. 2022. https://doi.org/10.1332/174426421X16437342906266

National Institute for Health and Care Research. A Researcher’s Guide to Patient and Public Involvement. https://oxfordbrc.nihr.ac.uk/wp-content/uploads/2017/03/A-Researchers-Guide-to-PPI.pdf Accessed 01 Nov 2023.

Selman L, Clement C, Douglas M, Douglas K, Taylor J, Metcalfe C, Lane J, Horwood J. Patient and public involvement in randomised clinical trials: a mixed-methods study of a clinical trials unit to identify good practice, barriers and facilitators. Trials. 2021 https://doi.org/10.1186/s13063-021-05701-y

Coulman K, Nicholson A, Shaw A, Daykin A, Selman L, Macefield R, Shorter G, Cramer H, Sydes M, Gamble C, Pick M, Taylor G, Lane J. Understanding and optimising patient and public involvement in trial oversight: an ethnographic study of eight clinical trials. Trials. 2020. https://doi.org/10.1186/s13063-020-04495-9

Ocloo J, Garfield S, Franklin B, Dawson S. Exploring the theory, barriers and enablers for patient and public involvement across health, social care and patient safety: a systematic review of reviews. Health Res Policy Sys. 2021. https://doi.org/10.1186/s12961-020-00644-3

National Institute for Health and Care Research. Briefing notes for researchers - public involvement in NHS, health and social care research. https://www.nihr.ac.uk/documents/briefing-notes-for-researchers-public-involvement-in-nhs-health-and-social-care-research/27371 Accessed 01 Nov 2023.

Download references

Acknowledgements

With thanks to the PHIRST-LIGHT public advisory group and consortium for their thoughts and contributions to the design of this work.

The research team is supported by a National Institute for Health and Care Research grant (PHIRST-LIGHT Reference NIHR 135190).

Author information

Olivia R. Phillips and Cerian Harries share joint first authorship.

Authors and Affiliations

Nottingham Centre for Public Health and Epidemiology, Lifespan and Population Health, School of Medicine, University of Nottingham, Clinical Sciences Building, City Hospital Campus, Hucknall Road, Nottingham, NG5 1PB, UK

Olivia R. Phillips, Jo Leonardi-Bee, Holly Knight & Joanne R. Morling

National Institute for Health and Care Research (NIHR) PHIRST-LIGHT, Nottingham, UK

Olivia R. Phillips, Cerian Harries, Jo Leonardi-Bee, Holly Knight, Lauren B. Sherar, Veronica Varela-Mato & Joanne R. Morling

School of Sport, Exercise and Health Sciences, Loughborough University, Epinal Way, Loughborough, Leicestershire, LE11 3TU, UK

Cerian Harries, Lauren B. Sherar & Veronica Varela-Mato

Nottingham Centre for Evidence Based Healthcare, School of Medicine, University of Nottingham, Nottingham, UK

Jo Leonardi-Bee

NIHR Nottingham Biomedical Research Centre (BRC), Nottingham University Hospitals NHS Trust, University of Nottingham, Nottingham, NG7 2UH, UK

Joanne R. Morling

You can also search for this author in PubMed   Google Scholar

Contributions

Author contributions: study design: ORP, CH, JRM, JLB, HK, LBS, VVM, literature searching and screening: ORP, CH, JRM, data curation: ORP, CH, analysis: ORP, CH, JRM, manuscript draft: ORP, CH, JRM, Plain English Summary: ORP, manuscript critical review and editing: ORP, CH, JRM, JLB, HK, LBS, VVM.

Corresponding author

Correspondence to Olivia R. Phillips .

Ethics declarations

Ethics approval and consent to participate.

The Ethics Committee of the Faculty of Medicine and Health Sciences, University of Nottingham advised that approval from the ethics committee and consent to participate was not required for systematic review studies.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

40900_2024_580_MOESM1_ESM.docx

Additional file 1: Search strings: Description of data: the search strings and filters used in each of the 5 databases in this review

Additional file 2: Quality appraisal questions: Description of data: CASP quality appraisal questions

40900_2024_580_moesm3_esm.docx.

Additional file 3: Table 1: Description of data: elements of the data extraction table that are not in the main manuscript

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Phillips, O.R., Harries, C., Leonardi-Bee, J. et al. What are the strengths and limitations to utilising creative methods in public and patient involvement in health and social care research? A qualitative systematic review. Res Involv Engagem 10 , 48 (2024). https://doi.org/10.1186/s40900-024-00580-4

Download citation

Received : 28 November 2023

Accepted : 25 April 2024

Published : 13 May 2024

DOI : https://doi.org/10.1186/s40900-024-00580-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Public and patient involvement
  • Creative PPI
  • Qualitative systematic review

Research Involvement and Engagement

ISSN: 2056-7529

importance of data analysis in research methodology

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Systematic Review
  • Open access
  • Published: 17 May 2024

Risk factors and incidence of central venous access device-related thrombosis in hospitalized children: a systematic review and meta-analysis

  • Maoling Fu 1 , 2 ,
  • Quan Yuan 2 ,
  • Qiaoyue Yang 1 , 2 ,
  • Yaqi Yu 1 , 2 ,
  • Wenshuai Song 1 , 2 ,
  • Xiuli Qin 1 ,
  • Ying Luo 1 ,
  • Xiaoju Xiong 1 &
  • Genzhen Yu 1  

Pediatric Research ( 2024 ) Cite this article

Metrics details

The risk factors for central venous access device-related thrombosis (CRT) in children are not fully understood. We used evidence-based medicine to find the risk factors for CRT by pooling current studies reporting risk factors of CRT, aiming to guide clinical diagnosis and treatment.

A systematic search of PubMed, Web of Science, Embase, Cochrane Library, Scopus, CNKI, Sinomed, and Wanfang databases was conducted. RevMan 5.4 was employed for data analysis.

The review included 47 studies evaluating 262,587 children with CVAD placement. Qualitative synthesis and quantitative meta-analysis identified D-dimer, location of insertion, type of catheter, number of lumens, catheter indwelling time, and central line-associated bloodstream infection as the most critical risk factors for CRT. Primarily due to observational design, the quality of evidence was regarded as low certainty for these risk factors according to the GRADE approach.

Because fewer high-quality studies are available, larger sample sizes and well-designed prospective studies are still needed to clarify the risk factors affecting CRT. In the future, developing pediatric-specific CRT risk assessment tools is important. Appropriate stratified preventive strategies for CRT according to risk assessment level will help improve clinical efficiency, avoid the occurrence of CRT, and alleviate unnecessary suffering of children.

This is the latest systematic review of risk factors and incidence of CRT in children.

A total of 47 studies involving 262,587 patients were included in our meta-analysis, according to which the pooled prevalence of CRT was 9.1%.

This study identified several of the most critical risk factors affecting CRT in children, including D-dimer, insertion location, type of catheter, number of lumens, catheter indwelling time, and central line-associated bloodstream infection (CLABSI).

Introduction

Central venous access device (CVAD) is an infusion device inserted through different parts to make the tip of the catheter to the vena cava. In the clinic, CVAD is mainly divided into the following four categories: tunneled central venous catheter (CVC), nontunneled CVC, peripherally inserted central catheter (PICC), and totally implantable venous access port (TIVAP). 1 Pediatric patients often require stable, multifunctional, and comfortable long-term vascular access due to factors such as poor puncture cooperation, small vessel diameter, poor peripheral venous visibility and tolerance, high water content in the body leading to easy dehydration, and easy changes in condition after diseases. 2 The application of CVAD can significantly reduce the frequency of venipuncture, relieve the stimulation of drugs on the venous blood vessels, alleviate the pain and fear of the children, improve their medication compliance, ensure the effectiveness of intravenous infusion, and improve the quality of disease treatment. 3 , 4 , 5 Therefore, CVAD is widely used in pediatric clinics and has become an indispensable aspect of complex medical care for children with severe and chronic diseases.

Although CVAD has become an important tool in the pediatric treatment and nursing process, there are also risks of complications related to it, including CVAD-related thrombosis (CRT), phlebitis, fluid and blood leakage at the puncture point, catheter displacement, catheter obstruction, central line-associated bloodstream infection (CLABSI) and so on. 6 , 7 Among these, CRT is one of the most common and serious complications. The prevalence of CRT in children varies significantly by country, age, disease, and medical institution, ranging from 2 to 81%, 4 , 8 , 9 , 10 while in Chinese children without prophylactic treatment ranges from 20 to 66%. 11 , 12 CRT has no obvious clinical symptoms in the early stage, but it may still cause serious side effects, not only increasing the patient pain and medical costs but also delaying treatment timing, affecting prognosis and quality of life, and in severe cases, may even lead to thromboembolism, endangering life. 13 , 14 , 15

Identifying risk factors and incidence of CRT facilitates clinical practitioners in the early identification of high-risk patients, designing specific preventive strategies, treatment regimens, and management plans, thereby effectively reducing the incidence of CRT in hospitalized children and alleviating unnecessary patient suffering. However, most current research on CRT involves only small-scale groups in isolated nursing units or specific disease types. To date, no up-to-date systematic review provides pooled estimates of the risk factors and prevalence of CRT in children. Therefore, this study had a dual purpose: 1. to explore potential risk factors for CRT in children and to determine a pooled level of CRT prevalence; and 2. to provide evidence-based recommendations to improve the recognition, control, and treatment of CRT in children, as well as better nursing management for CRT.

This review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. 16 The detailed research protocol can be accessed on the PROSPERO website (registration number: CRD42023421353).

Search strategy

Eight electronic databases were utilized to conduct a thorough literature search: PubMed, Web of Science, Embase, Cochrane Library, Scopus, China National Knowledge Infrastructure (CNKI), Sinomed, and Wanfang. The search in these databases was conducted from the earliest records available up to January 31st, 2024. The search strategy used a combination of Mesh terms and free words. The following Mesh terms and free words were mainly used: “child,” “children,” “adolescent,” “infant,” “pediatrics,” “central venous access device-related thrombosis,” “CRT,” “catheter-related thrombosis,” “catheter-related venous thrombosis,” “CVC-related thrombosis,” “risk factors,” “protective factors,” “predictors,” “causality,” “influencing factors”. The full search strategy for each database is available in the Supplementary Materials. In addition, we screened the reference lists of all included studies for relevant studies that met the criteria. Grey literature was searched as well. Some authors were contacted through email to gather more information or clarify any uncertainties.

Inclusion criteria

The study population was hospitalized children aged ≤18 years.

The primary research objective was to explore the risk factors for CRT.

The study results have at least one statistically significant predictor.

Case-control studies or cohort studies.

Published in English or Chinese.

Exclusion criteria

Catheter-related infection, catheter dysfunction, or other catheter complications as the primary outcome indicators.

Repeated published research.

Case reports, study designs, or clinical trials.

Reviews, editorials, letters, and conference abstracts.

In vitro or animal research.

Data were incomplete and could not be extracted.

Unable to find the original article.

Data extraction

Data from each eligible study were independently extracted by two reviewers using a pre-designed data collection form. Any disagreements were resolved by discussions among all authors. Data on the following characteristics were obtained from all included studies (see Supplementary Table S 1 for details):

Basic information: first author, country, year of publication, study duration, and study design.

Demographic characteristics: study population, sample size, number of CRT, and CRT rate.

Catheter-related features: catheter type, CRT type, and diagnostic method.

Potential risk factors for CRT: odds ratios (OR) or relative risks (RR) values and 95% confidence interval (CI) were extracted for each risk factor. If the study did not provide specific values, it was calculated by constructing a 2 × 2 contingency table.

Quality assessment

Two reviewers evaluated the quality of each study independently using the Risk of Bias Assessment for Nonrandomized Studies tool, 17 with any differences settled via group discussion. The tool assessed six domains of risk of bias: participant selection, confounding variables, exposure measurement, blinding of outcome assessment, incomplete outcome data, and selective outcome reporting. If all six domains were rated as low risk, the overall risk of bias for the study was low. The overall risk of bias was moderate if at least one domain was rated as unclear risk, and no domain was rated as high risk, and high if one or more domains were rated as high risk.

To ensure the accuracy of the assessment results, a third reviewer randomly selected five studies to check the data extraction and quality assessment.

Qualitative synthesis and quantitative meta-analysis

Qualitatively classify each risk factor as definite, likely, unclear, or not a risk factor based on the total number of studies with low and moderate bias risks and the proportion of studies demonstrating positive association (Box 1 in the supplementary material). If a risk factor was reported by more than two studies with low or moderate risk of bias, and the definition and reference range were sufficiently consistent, a quantitative meta-analysis was performed to estimate the combined OR.

Data were analyzed using Revman 5.4 software. In the meta-analysis of risk factors and CRT rate, the generic inverse variance method was applied, which only required effect estimate and standard error (SE). 18 The SE was obtained by inverse transforming the 95% CI applying the standard normal distribution. Heterogeneity tests were performed on the studies included in the Meta-analysis to examine for the combinability of the results of each independent study. P  ≥ 0.05 and I-squared ( I 2 ) < 50% considered less heterogeneity between studies and therefore a fixed-effects model was chosen for the analysis, conversely, P  < 0.05 or I 2  ≥ 50% considered greater heterogeneity, and a random-effects model was chosen.

Certainty of the evidence

The Grading of Recommendations Assessment, Development, and Evaluation (GRADE) method was used to assess the certainty of the evidence. In this method, observational studies were initially classified as low-quality evidence and then downgraded and upgraded according to five downgrading and three upgrading principles. The 5 downgrading factors included risk of bias, inconsistency, indirectness, imprecision, and publication bias, and the 3 upgrading factors included the magnitude of an effect, dose-response gradient, and effect of plausible residual confounding. Based on these considerations, the overall certainty of each piece of evidence was rated as one of four levels: high, moderate, low, or very low.

The initial search of the databases extracted a total of 4193 articles, of which 1656 were duplicates and removed. The titles and abstracts of the remaining 2537 articles were screened according to the inclusion criteria and 142 were selected for full-text search. After a rigorous eligibility review, 45 articles met the inclusion criteria. In addition, two articles were found to meet the eligibility criteria in a search of the reference lists of the selected articles and grey literature. In the end, a total of 47 articles were included in this review, of which 43 contributed to the qualitative synthesis and quantitative meta-analysis (Fig.  1 ).

figure 1

Demonstrate the screening and inclusion process for systematic literature search.

Of the 47 studies, 19 were prospective 4 , 13 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 and the rest were retrospective, 9 , 12 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 of which 10 were multicenter 4 , 9 , 13 , 21 , 23 , 26 , 27 , 28 , 49 , 59 and 37 were single-center. 12 , 19 , 20 , 22 , 24 , 25 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47 , 48 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 60 , 61 The sample sizes ranged from 47 to 158,299, with the two largest being 71,782 13 and 158,299, 59 respectively. In addition, three studies constructed clinical prediction models. 22 , 28 , 47 Table  1 lists the summary characteristics of the included studies.

Study populations and CRT rates in included studies

These studies investigated a series of hospitalized children of different ages and departments, of which 12 studies with all hospitalized children as the study population, 12 studies with PICU hospitalized children as the study population, six studies with NICU hospitalized children as the study population, one study with all ICU hospitalized children as the study population, four studies with leukemia children as the study population, two studies with infants under 1-year-old as the study population, and the other ten studies with children with a specific disease as the study population.

The combined CRT rate was 9.1% (95% CI : 5.7–14.5%) with a high degree of heterogeneity ( I 2  = 100%). The combined CRT rate was 11.5% (95% CI : 5.7–23.1%; I 2  = 99%) in both male and female children. The frequency of CRT in PICU and NICU was available from 13 articles with 234,464 children and 7 articles with 6093 infants, which combined CRT rates were 10.7% (95% CI : 3.8–23.7%; I 2  = 100%), 2.9% (95% CI : 1.0–6.5%; I 2  = 96%), respectively. The combined CRT rate of children with leukemia was 13.0% (95% CI : 2.9–38.3%; I 2  = 98%) (Supplementary Material Figs. S 1 – 6 )

Quality of the CRT studies

The methodological quality of the included studies varied (Fig.  2 and Supplementary Material Fig. S 7 ). Nine studies had a low overall risk of bias, as all six domains were categorized as low risk. Four studies had a high overall risk of bias, three of which were associated with confounding variables and one to participant selection. The remaining 34 studies had a moderate overall risk of bias, with at least one of the six domains having an unclear risk.

figure 2

A summary presentation of the assessment results of risk of bias for the 47 studies.

Risk factors of CRT in included studies

The 47 included studies reported 61 statistically significant risk factors for CRT (Table  1 ). These factors were classified into three categories: patient-related risk factors (37.7%, 23/61); CVAD-related risk factors (34.4%, 21/61), and treatment-related risk factors (27.9%, 17/61).

Based on the qualitative synthesis, six variables were considered to be definite risk factors for CRT, including D-dimer, location of insertion, type of catheter, number of lumens, catheter indwelling time, and CLABSI. Eleven variables were considered likely associated with CRT, including gastrointestinal diseases, history of catheterization, thrombophilia, geographic location of line placement, catheter dysfunction, number of catheters, insertion length (cm), catheter to vein ratio, dialysis, hypertonic liquid, and cardiac catheterization. For 42 variables, the relationship with CRT was deemed unclear due to conflicting results from studies assessed as having low and moderate risk of bias, or because they were positively associated in only one study. Additionally, birth weight and gestational age were considered non-risk factors (Table  2 ).

Meta-analyses were implemented for risk factors that were reported by at least two low or moderate risk of bias studies with a consistent definition and reference range (Table  3 and Figs.  3 – 6 ).

figure 3

Forest plots of odds ratios (OR) that were included in the quantitative meta-analysis and the associated overall OR. For each OR, the size of the red square region is proportional to the corresponding study weight. Diamond shape intervals represent the overall OR. I 2 represents the fraction of variability among the individual OR that cannot be explained by sampling variability.

figure 4

Forest plots of odds ratios (OR) that were included in the quantitative meta-analysis and the associated overall OR. For each OR, the size of the red square region is proportional to the corresponding study weight. Diamond shape intervals represent the overall OR. I 2 represents the fraction of variability among the individual OR that cannot be explained by sampling variability.

figure 5

GRADE assessment of evidence

Supplementary Table S 2 shows GRADE assessments for the certainty of evidence. Due to the design of the observational studies, all evidence was initially rated as low certainty. Based on five downgrading and three upgrading principles, 17 pieces of evidence were still rated as low certainty, and the remaining 44 pieces of evidence were downgraded to very low certainty for serious inconsistency and imprecision.

Our study is the latest systematic review of risk factors and the incidence of CRT in hospitalized children. Based on 47 studies included in the current meta-analysis, which involved a total of 262,587 patients, the pooled prevalence of CRT is 9.1%. We conducted a qualitative synthesis analysis of 61 predictive factors and a quantitative meta-analysis of 38 factors, identifying six definite factors, 11 likely factors, and 42 unclear factors associated with CRT. Definite predictors included being of D-dimer, location of insertion, type of catheter, number of lumens, catheter indwelling time and CLABSI. The findings of our systematic review provide the latest comprehensive evidence summary that can inform the early identification of children at risk for CRT and the development of intervention measures to prevent and reduce CRT.

Implantable and temporary medical devices such as CVAD are exposed to blood for weeks to years depending on the type of CVAD in place. Since CVAD is an artificial surface and lacks an endothelial layer that inhibits platelet coagulation and adhesion, it is thought to potentially activate the contact pathways, ultimately leading to thrombosis. Assembly of artificial surface contact systems might be part of the host defense mechanism against foreign substances, but it can lead to kinin and thrombin generation, and complement activation. 62 This eventually promotes thrombosis and inflammation. The presence of CVAD is the most common risk factor for venous thromboembolism (VTE). CRT accounts for 10% of deep vein thrombosis (DVT) in adults and 50–80% in children. 10 , 55 , 63 The incidence of CRT in hospitalized children has increased significantly by 30–70% over the past 20 years, 64 , 65 which may cause serious medical complications besides increasing healthcare expenditures and length of stay.

We discover that a higher level of D-dimer is an independent risk factor for CRT in hospitalized children, consistent with the results of adult studies. 66 D-dimer is a soluble fibrin degradation product deriving from the plasmin-mediated degradation of cross-linked fibrin that is increased or positive in secondary hyperfibrinolysis, such as hypercoagulable states, disseminated intravascular coagulation, and thrombolytic therapy. 67 , 68 Increased D-dimer suggests an association with thrombotic disorders in the body of various origins and an increase in fibrinolytic activity. D-dimer has been extensively investigated for excluding the diagnosis of VTE and is used routinely for this indication. 67 , 69 Therefore, for early recognition and to reduce the incidence of CRT, D-dimer levels should be closely monitored before and after catheterization. However, the elevated D-dimer test results cannot fully explain the cause and location of CRT formation and must be analyzed in conjunction with clinical and other test results. Inherited thrombophilia, caused by genetic defects leading to a deficiency or abnormality in associated proteins, including protein C, protein S, antithrombin, the coagulation factor V Leiden mutation, and factor II mutation G20210A, 70 is considered a potential risk factor for CRT. The prevalence of thrombophilia varies widely among different populations, with a reported prevalence of 10% to 59% in pediatric VTE patients. 71 Children with gastrointestinal diseases like short bowel syndrome (SBS) and inflammatory bowel disease (IBD) have an increased risk of developing CRT during hospitalization. The precise mechanism behind this association is still uncertain according to current research. It may be attributed to the heightened inflammation levels during catheterization, particularly in patients with active IBD episodes or admissions during surgery, which leads to a period of increased inactivity. 55 This suggests that delaying placement during the most active period of inflammation may reduce the rate of thrombosis.

A narrative review pointed out that age is one of the most significant risk factors for VTE. In children, CRT shows a bimodal distribution, with the highest incidence rate in infancy and adolescence. 10 The higher incidence in infancy may be due in part to the smaller diameter of the vein, making insertion difficult and requiring multiple attempts. However, whether age is a risk factor for CRT is still highly controversial. The study by Chojnacka et al. did not find a statistically significant difference, 39 although a trend toward a similar bimodal distribution was found in the study population. Cancer, cardiovascular disease, sepsis, asphyxia, and neurological diseases are also considered unclear factors for CRT. Pediatric patients diagnosed with leukemia have multiple risk factors for VTE formation, such as the presence of hypercoagulable blast cells, the pro-thrombotic nature of the cancer itself, and treatment with steroids and L-asparaginase. Chen et al. 38 and Jaffray et al. 4 concluded that children with leukemia are more likely to develop CRT. Sepsis causes the coagulation mechanism to become fragile, which in turn activates the coagulation system and creates thrombosis. 72 However, a study by Onyeama et al. 52 showed that sepsis was significantly associated with a reduced incidence of CRT, and the exact mechanism is currently unknown.

The location of insertion and type of catheter are critical risk factors for CRT. The incidence of CRT is higher in femoral vein catheterizations compared to subclavian and jugular vein catheterizations in children, which is contrary to findings in adult patients. 73 The femoral location is a larger vessel and allows placement of a larger size catheter. Femoral CVAD is prioritized in urgent and emergency situations. In such cases, the patients tend to be more critically ill and often immobilized, further exacerbating the low-flow state. In addition, there may be vein compression and kinking beneath the inguinal ligament with leg movement, which may increase the risk of CRT. 27 PICC catheters provide a reliable medium to long-term route to intravenous therapy for children, but compared with other types of catheters, the risk of CRT is higher. We speculate that the long tunnel length and relatively large lumen size of the PICC, compared to the diameter of the vessel at the insertion site, may lead to increased blood flow obstruction. 52 Additionally, patients with PICC may be more likely to be diagnosed with symptomatic VTE than tunneled lines (TLs) because PICC is often placed in smaller vessels and journeys through the arm or leg causing limb pain and swelling, whereas TLs are located in the chest.

The risk of CRT increases with the number of lumens. A possible explanation for this finding is that multilumen catheters tend to have larger catheter sizes and thus occupy more area within the vessel lumen, leading to obstruction of normal blood flow within the veins. The relationship between CRT and CLABSI is bidirectional. Following catheter insertion, a fibrin sheath forms around the catheter. Microorganisms, especially staphylococcus aureus, easily adhere to the fibrin sheaths, and may lead to CLABSI. 74 Conversely, CLABSI can trigger inflammatory reactions, leading to further progression of thrombosis. CVAD duration is positively associated with the risk of CRT. Catheter placement may cause mechanical injury to the vein. As the indwelling duration increases, many damaged smooth muscle and endothelial cells become embedded within the fibrin, resulting in thrombus formation. In addition, prolonged indwelling increases the chance of platelet contact with the vessel lining, activating coagulation factors and thrombin, increasing the risk of thrombosis. 22 Therefore, nurses should perform routine maintenance of the catheter in children who require long-term CVAD indwelling. The duration of CVAD should be monitored, the necessity of its indwelling should be assessed daily, and the catheter should be removed as early as possible while ensuring treatment.

As obstruction of venous blood flow from the CVAD is considered an essential causative mechanism for the development of VTE, a high ratio between catheter size and vein diameter could be a risk factor for CRT. The 2012 international guidelines on pediatric CVC insertion recommend that the ratio between the catheter’s external diameter and the cannulated vein’s diameter should not exceed 0.33. 75 However, this suggestion is only based on expert opinions and currently lacks relevant clinical data support. Therefore, further research is still needed to verify it. Catheter dysfunction is mainly caused by small clots or fibrous sheaths wrapping around the tip of the catheter. Prolonged accumulation may lead to incomplete or complete blockage of blood vessels, becoming a gathering point for thrombosis. 74 Journeycake et al. observed that the risk of VTE was highest in pediatric cancer patients with multiple episodes of catheter dysfunction. 76 A study of pediatric brain tumor patients reported that VTE was more common in patients with catheter dysfunction. 77 Thus, these studies and the current data support the need to consider catheter dysfunction as a possible risk factor for CRT and to design further screening and intervention studies for early identification and prevention of catheter dysfunction.

The rationale for studying the relationship between the insertion side of CVAD and the risk of CRT is based on the anatomy of the upper body venous system. The left brachiocephalic vein is longer and courses more horizontally than the right side, thus entering the superior vena cava at a sharper angle. The right jugular vein is the most direct and shortest route for the CVAD to enter the heart. By contrast, the CVAD located in the left jugular vein has a greater distance to the heart and passes through 2 angles in the venous system, which may cause endothelial damage and increase the likelihood of blood flow obstruction and venous wall adhesion. 26 However, our meta-analysis did not find a statistically significant increase in the risk of CRT with left-sided placement compared to right-sided placement. The ideal location for the catheter tip is the junction of the superior vena cava and the right atrium. This location is preferred because of the higher blood flow rate, which may be protective against thrombosis. 43 Currently, the pediatric literature on the effect of optimal tip position on CRT is scarce and inconclusive. In addition, catheter tips do not always remain in that position after initial placement. Therefore, tip movement should be a significant concern in pediatric patients, especially active, growing, and requiring long-term catheter use.

Providing renal replacement therapy is a lifelong task for pediatric end-stage renal disease (ESRD) patients. Although successful transplantation can be achieved even in young patients, the lifespan of the graft is limited. Consequently, many transplant recipients may be put back on dialysis as part of their ESRD treatment. 78 CVC remains the main vascular access for hemodialysis in children. Long-term reliance on CVC is related to a high incidence of catheter dysfunction and failure. The frequent need for recurrent CVC placement in such patients leads to an elevated risk of central vein stenosis and CRT. Cardiac catheterization is also a possible risk factor for CRT. Appropriate anticoagulation is required during catheterization, without which the risk of thrombosis is up to 40%. However, the use of unfractionated heparin in pediatric patients is challenging because the coagulation system and heparin response are different from that of adults. 79 There’s a need for further research to determine if children are receiving adequate doses of heparin during cardiac catheterization to prevent thrombosis without increasing the risk of bleeding complications. The incidence of VTE in adult patients who are chronically bedridden and braked is 3.59 times higher than in patients with normal activity levels. 80 In critically ill or surgical children, mechanical ventilation is often performed in the early stages, requiring continuous use of multiple sedative or inotropic drugs to reduce cardiac load and protect pulmonary function. During sedation, the child is in a braked state, limb activity is reduced or even inactive, blood flow slows down, and blood stagnates in the veins, increasing the chance of platelet adhesion to the endothelium, which may increase the risk of CRT. Therefore, passive movements such as limb abduction, internal rotation, elbow flexion and elbow extension should be performed appropriately when the child’s condition permits.

Nutritional support is an important part of critical illness treatment, including enteral and parenteral nutrition (PN). CVAD is the supply channel for total parenteral nutrition (TPN), and some children may even need this method to provide calories for a long time. High glucose and calcium concentrations in PN are both possible triggers of CRT, and PN has been shown to upregulate the extrinsic coagulation cascade, especially with long-term use. 60 Diamanti et al. reported that the incidence rate of TPN complicated with CRT was 20%. 81 Mannitol or glycerol fructose are widely used as hypertonic drugs in clinical practice, which can increase plasma osmolality to dehydrate tissues after entering the body. At the same time, it may cause a cellular stress response, induce apoptosis, and can activate inflammatory cytokines and coagulation pathways to induce thrombosis. Jiang et al. 22 found vasoactive drugs to be a risk factor for CRT. The possible reason is that vasoactive drugs can cause strong vasoconstriction, endothelial function damage or impairment, and promote fibrinogen synthesis. However, this is contrary to the findings of Marquez et al. 28 and Faustino et al. 21 Therefore, larger prospective studies are still needed to assess this risk factor more precisely.

The strengths of this study include the systematic identification of all relevant studies of risk factors for CRT in hospitalized children and the classification of risk factors into three categories, patient-related risk factors, CVAD-related risk factors, and treatment-related risk factors, to offer a logical progression of the possible causes of CRT in children. However, several limitations of this systematic review should be stated. Firstly, as most of the studies originate from Western countries, extrapolating these results to Eastern populations is questionable. Second, significant heterogeneity was encountered in our analysis, potentially stemming from variations in regimen, duration, population enrolled, and center setting, among other factors. This diversity necessitates a cautious interpretation of the results. In addition, only a few high-quality studies with a low risk of bias, and many of the studies suffer from significant sources of bias. Furthermore, the effect in many occasions was assessed by very few studies. Therefore, the evidence to support it is low, which needs to be validated in future studies. Finally, risk factors for CRT could not be made causal assertions since the majority of studies were retrospective.

Conclusions

In conclusion, we have identified several critical factors that affect CRT, including D-dimer, location of insertion, type of catheter, number of lumens, catheter indwelling time, and CLABSI. Nevertheless, none of the included studies considered the impact of socio-demographic factors on CRT, such as parental education level, occupation, and family economic status. Therefore, larger sample sizes and well-designed prospective studies are still needed to clarify the predictors affecting CRT in the future. In addition, there is a lack of pediatric-specific CRT risk assessment tools, which need to be further developed and validated. Machine learning (ML), as a method for designing risk assessment models that help to efficiently explore and mine useful information, has been widely used in recent years to solve a variety of challenging medical problems. Likewise, the application of ML in CRT risk diagnosis may contribute to a more precise assessment. In clinical practice, it is necessary to take appropriate stratified preventive measures according to the level of CRT risk assessment of children, to improve the efficiency of clinical work, reduce the burden of clinical work, and minimize the occurrence of CRT under the premise of ensuring the safety of children.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Yeow, M. et al. A systematic review and network meta-analysis of randomized controlled trials on choice of central venous access device for delivery of chemotherapy. J. Vasc. Surg. Venous Lymphat. Disord. 10 , 1184–91.e8 (2022).

Article   PubMed   Google Scholar  

Cellini, M. et al. Guidelines of the Italian Association of Pediatric Hematology and Oncology for the management of the central venous access devices in pediatric patients with onco-hematological disease. J. Vasc. Access 23 , 3–17 (2022).

Ares, G. & Hunter, C. J. Central venous access in children: indications, devices, and risks. Curr. Opin. Pediatr. 29 , 340–346 (2017).

Jaffray, J. et al. Peripherally inserted central catheters lead to a high risk of venous thromboembolism in children. Blood 135 , 220–226 (2020).

Article   CAS   PubMed   Google Scholar  

Zhang, J. J. et al. Factors affecting mechanical complications of central venous access devices in children. Pediatr. Surg. Int. 38 , 1067–1073 (2022).

Article   PubMed   PubMed Central   Google Scholar  

Akhtar, N. & Lee, L. Utilization and Complications of Central Venous Access Devices in Oncology Patients. Curr. Oncol. 28 , 367–377 (2021).

Ullman, A. J., Marsh, N., Mihala, G., Cooke, M. & Rickard, C. M. Complications of Central Venous Access Devices: A Systematic Review. Pediatrics 136 , e1331–e1344 (2015).

Östlund, Å. et al. Erratum to ‘Incidence of and risk factors for venous thrombosis in children with percutaneous non-tunnelled central venous catheters’ (Br J Anaesth 2019; 123: 316-24). Br. J. Anaesth. 123 , 918 (2019).

McLaughlin, C. M. et al. Symptomatic catheter-associated thrombosis in pediatric trauma patients: Choose your access wisely. Surgery 166 , 1117–1121 (2019).

Citla Sridhar, D., Abou-Ismail, M. Y. & Ahuja, S. P. Central venous catheter-related thrombosis in children and adults. Thromb. Res. 187 , 103–112 (2020).

Zhou, X. et al. A retrospective analysis of risk factors associated with catheter-related thrombosis: a single-center study. Perfusion 35 , 806–813 (2020).

Li, S. et al. Risk factors for central venous catheter-related thrombosis in hospitalized children: a single-center a retrospective cohort study. Transl. Pediatr. 11 , 1840–1851 (2022).

Patel, N., Petersen, T. L., Simpson, P. M., Feng, M. & Hanson, S. J. Rates of Venous Thromboembolism and Central Line-Associated Bloodstream Infections Among Types of Central Venous Access Devices in Critically Ill Children. Crit. Care Med. 48 , 1340–1348 (2020).

Timsit, J. F. et al. A state of the art review on optimal practices to prevent, recognize, and manage complications associated with intravascular devices in the critically ill. Intensive Care Med. 44 , 742–759 (2018).

Ullman, A. J. et al. Pediatric central venous access devices: practice, performance, and costs. Pediatr. Res. 92 , 1381–1390 (2022).

Hutton, B. et al. The PRISMA extension statement for reporting of systematic reviews incorporating network meta-analyses of health care interventions: checklist and explanations. Ann. Intern Med. 162 , 777–784 (2015).

Kim, S. Y. et al. Testing a tool for assessing the risk of bias for nonrandomized studies showed moderate reliability and promising validity. J. Clin. Epidemiol. 66 , 408–414 (2013).

Borenstein, M., Hedges, L. V., Higgins, J. P. & Rothstein, H. R. A basic introduction to fixed-effect and random-effects models for meta-analysis. Res. Synth. Methods 1 , 97–111 (2010).

Beck, C., Dubois, J., Grignon, A., Lacroix, J. & David, M. Incidence and risk factors of catheter-related deep vein thrombosis in a pediatric intensive care unit: a prospective study. J. Pediatr. 133 , 237–241 (1998).

Dubois, J. et al. Incidence of deep vein thrombosis related to peripherally inserted central catheters in children and adolescents. Cmaj 177 , 1185–1190 (2007).

Faustino, E. V. et al. Incidence and acute complications of asymptomatic central venous catheter-related deep venous thrombosis in critically ill children. J. Pediatr. 162 , 387–391 (2013).

Jiang, W. et al. Construction and validation of a risk prediction model for central venous catheter-associated deep venous thromboses in children with congenital heart disease after surgery. Chin. J. Nurs. 57 , 2217–2224 (2022).

Google Scholar  

Faustino, E. V. et al. Factor VIII May Predict Catheter-Related Thrombosis in Critically Ill Children: A Preliminary Study. Pediatr. Crit. Care Med. 16 , 497–504 (2015).

Jones, S., Butt, W., Monagle, P., Cain, T. & Newall, F. The natural history of asymptomatic central venous catheter-related thrombosis in critically ill children. Blood 133 , 857–866 (2019).

Kim, E. H. et al. Central venous catheter-related thrombosis in pediatric surgical patients: A prospective observational study. Paediatr. Anaesth. 32 , 563–571 (2022).

Male, C. et al. Central venous line-related thrombosis in children: association with central venous line location and insertion technique. Blood 101 , 4273–4278 (2003).

Male, C., Julian, J. A., Massicotte, P., Gent, M. & Mitchell, L. Significant association with location of central venous line placement and risk of venous thrombosis in children. Thromb. Haemost. 94 , 516–521 (2005).

Marquez, A., Shabanova, V. & Faustino, E. V. Prediction of Catheter-Associated Thrombosis in Critically Ill Children. Pediatr. Crit. Care Med. 17 , e521–e528 (2016).

Menéndez, J. J. et al. Incidence and risk factors of superficial and deep vein thrombosis associated with peripherally inserted central catheters in children. J. Thromb. Haemost. 14 , 2158–2168 (2016).

Rubio Longo, M. C. et al. Catheter-related deep vein thrombosis in newborn infants. Arch. Argent. Pediatr. 119 , 32–38 (2021).

PubMed   Google Scholar  

Östlund, Å. et al. Incidence of and risk factors for venous thrombosis in children with percutaneous non-tunnelled central venous catheters. Br. J. Anaesth. 123 , 316–324 (2019).

Sol, J. J. et al. Chronic Complications After Femoral Central Venous Catheter-related Thrombosis in Critically Ill Children. J. Pediatr. Hematol. Oncol. 37 , 462–467 (2015).

van Rooden, C. J. et al. Infectious complications of central venous catheters increase the risk of catheter-related thrombosis in hematology patients: a prospective study. J. Clin. Oncol. 23 , 2655–2660 (2005).

Zeng, X., Zhang, C. & Shi, Y. Analysis of risk factors for complicated catheter-related thrombosis in children. Chin. J. Emerg. Med. 29 , 719–723 (2020).

Wei, Y. et al. The incidence and risk factors of catheter-related-thrombosis during induction chemotherapy in acute lymphocytic leukemia children. Chin. J. Hematol. 38 , 313–317 (2017).

CAS   Google Scholar  

Deng, G. & Liao, Q. Analysis of risk factors for venous thrombosis after PICC placement in critically ill children. Int. I Nurs. 39 , 775–777 (2020).

Badheka, A. V. et al. Catheter related thrombosis in hospitalized infants: A neural network approach to predict risk factors. Thromb. Res. 200 , 34–40 (2021).

Chen, K. et al. Risk factors for central venous catheter-related thrombosis in children: a retrospective analysis. Blood Coagul. Fibrinol. 27 , 384–388 (2016).

Article   Google Scholar  

Chojnacka, K., Krasiński, Z., Wróblewska-Seniuk, K. & Mazela, J. Catheter-related venous thrombosis in NICU: A case-control retrospective study. J. Vasc. Access 23 , 88–93 (2022).

Derderian, S. C., Good, R., Vuille-Dit-Bille, R. N., Carpenter, T. & Bensard, D. D. Central venous lines in critically ill children: Thrombosis but not infection is site dependent. J. Pediatr. Surg. 54 , 1740–1743 (2019).

Diamond, C. E. et al. Catheter-Related Venous Thrombosis in Hospitalized Pediatric Patients with Inflammatory Bowel Disease: Incidence, Characteristics, and Role of Anticoagulant Thromboprophylaxis with Enoxaparin. J. Pediatr. 198 , 53–59 (2018).

Noailly Charny, P. A. et al. Increased Risk of Thrombosis Associated with Peripherally Inserted Central Catheters Compared with Conventional Central Venous Catheters in Children with Leukemia. J. Pediatr. 198 , 46–52 (2018).

Gnannt, R. et al. Increased risk of symptomatic upper-extremity venous thrombosis with multiple peripherally inserted central catheter insertions in pediatric patients. Pediatr. Radio. 48 , 1013–1020 (2018).

Gray, B. W. et al. Characterization of central venous catheter-associated deep venous thrombosis in infants. J. Pediatr. Surg. 47 , 1159–1166 (2012).

Haddad, H. et al. Routine surveillance ultrasound for the management of central venous catheters in neonates. J. Pediatr. 164 , 118–122 (2014).

Lambert, I., Tarima, S., Uhing, M. & Cohen, S. S. Risk Factors Linked to Central Catheter-Associated Thrombosis in Critically Ill Infants in the Neonatal Intensive Care Unit. Am. J. Perinatol. 36 , 291–295 (2019).

Li, H. et al. Prediction of central venous catheter-associated deep venous thrombosis in pediatric critical care settings. BMC Med. Inf. Decis. Mak. 21 , 332 (2021).

Article   CAS   Google Scholar  

Lovett, M. E. et al. Catheter-associated deep vein thrombosis in children with severe traumatic brain injury: A single-center experience. Pediatr. Blood Cancer 70 , e30044 (2023).

MacLean, J. et al. Need for tissue plasminogen activator for central venous catheter dysfunction is significantly associated with thrombosis in pediatric cancer patients. Pediatr. Blood Cancer 65 , e27015 (2018).

Noonan, P. J., Hanson, S. J., Simpson, P. M., Dasgupta, M. & Petersen, T. L. Comparison of Complication Rates of Central Venous Catheters Versus Peripherally Inserted Central Venous Catheters in Pediatric Patients. Pediatr. Crit. Care Med. 19 , 1097–1105 (2018).

Pei, L. et al. Clinical characteristics and risk factors of symptomatic central venous catheter-related deep vein thrombosis in children. Chin. Pediatr. Emerg. Med. 23 , 450–454 (2016).

Onyeama, S. N. et al. Central Venous Catheter-associated Venous Thromboembolism in Children With Hematologic Malignancy. J. Pediatr. Hematol. Oncol. 40 , e519–e524 (2018).

Shah, S. H. et al. Clinical risk factors for central line-associated venous thrombosis in children. Front. Pediatr. 3 , 35 (2015).

Shin, H. S., Towbin, A. J., Zhang, B., Johnson, N. D. & Goldstein, S. L. Venous thrombosis and stenosis after peripherally inserted central catheter placement in children. Pediatr. Radio. 47 , 1670–1675 (2017).

Smitherman, A. B. et al. The incidence of catheter-associated venous thrombosis in noncritically ill children. Hosp. Pediatr. 5 , 59–66 (2015).

Steen, E. H. et al. Central Venous Catheter-Related Deep Vein Thrombosis in the Pediatric Cardiac Intensive Care Unit. J. Surg. Res 241 , 149–159 (2019).

Wang, J. & Ren, G. Peripherally inserted central catheter related venous thromboembolism in children with acute leukemia: a factorial analysis. Chin. J. Biomed. Eng. 27 , 288–293 (2021).

Dubbink-Verheij, G. H. et al. Femoral Vein Catheter is an Important Risk Factor for Catheter-related Thrombosis in (Near-)term Neonates. J. Pediatr. Hematol. Oncol. 40 , e64–e68 (2018).

Tran, M., Shein, S. L., Ji, X. & Ahuja, S. P. Identification of a “VTE-rich” population in pediatrics - Critically ill children with central venous catheters. Thromb. Res 161 , 73–77 (2018).

Wisecup, S., Eades, S. & Turiy, Y. Characterizing the Risk Factors Associated With Venous Thromboembolism in Pediatric Patients After Central Venous Line Placement. J. Pediatr. Pharm. Ther. 20 , 358–366 (2015).

Zhu, W., Zhang, H., Xing, Y. Clinical Characteristics of Venous Thrombosis Associated with Peripherally Inserted Central Venous Catheter in Premature Infants. Children 9 , https://doi.org/10.3390/children9081126 (2022).

Ekdahl, K. N. et al. Innate immunity activation on biomaterial surfaces: a mechanistic model and coping strategies. Adv. Drug Deliv. Rev. 63 , 1042–1050 (2011).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Takemoto, C. M. et al. Hospital-associated venous thromboembolism in children: incidence and clinical characteristics. J. Pediatr. 164 , 332–338 (2014).

Boulet, S. L. et al. Trends in venous thromboembolism-related hospitalizations, 1994-2009. Pediatrics 130 , e812–e820 (2012).

Raffini, L., Huang, Y. S., Witmer, C. & Feudtner, C. Dramatic increase in venous thromboembolism in children’s hospitals in the United States from 2001 to 2007. Pediatrics 124 , 1001–1008 (2009).

Lin, S., Zhu, N., YihanZhang, Du, L. & Zhang, S. Development and validation of a prediction model of catheter-related thrombosis in patients with cancer undergoing chemotherapy based on ultrasonography results and clinical information. J. Thromb. Thrombolysis 54 , 480–491 (2022).

Johnson, E. D., Schell, J. C. & Rodgers, G. M. The D-dimer assay. Am. J. Hematol. 94 , 833–839 (2019).

Favresse, J. et al. D-dimer: Preanalytical, analytical, postanalytical variables, and clinical applications. Crit. Rev. Clin. Lab Sci. 55 , 548–577 (2018).

Weitz, J. I., Fredenburgh, J. C. & Eikelboom, J. W. A Test in Context: D-Dimer. J. Am. Coll. Cardiol. 70 , 2411–2420 (2017).

Darlow, J. & Mould, H. Thrombophilia testing in the era of direct oral anticoagulants. Clin. Med. 21 , e487–e491 (2021).

Monagle, P. et al. American Society of Hematology 2018 Guidelines for management of venous thromboembolism: treatment of pediatric venous thromboembolism. Blood Adv. 2 , 3292–3316 (2018).

Meziani, F., Gando, S. & Vincent, J. L. Should all patients with sepsis receive anticoagulation? Yes. Intensive Care Med 43 , 452–454 (2017).

Saber, W. et al. Risk factors for catheter-related thrombosis (CRT) in cancer patients: a patient-level data (IPD) meta-analysis of clinical trials and prospective studies. J. Thromb. Haemost. 9 , 312–319 (2011).

Journeycake, J. M. & Buchanan, G. R. Thrombotic complications of central venous catheters in children. Curr. Opin. Hematol. 10 , 369–374 (2003).

Lamperti, M. et al. International evidence-based recommendations on ultrasound-guided vascular access. Intensive Care Med 38 , 1105–1117 (2012).

Journeycake, J. M. & Buchanan, G. R. Catheter-related deep venous thrombosis and other catheter complications in children with cancer. J. Clin. Oncol. 24 , 4575–4580 (2006).

Deitcher, S. R., Gajjar, A., Kun, L. & Heideman, R. L. Clinically evident venous thromboembolic events in children with brain tumors. J. Pediatr. 145 , 848–850 (2004).

Mandel-Shorer, N., Tzvi-Behr, S., Harvey, E. & Revel-Vilk, S. Central venous catheter-related venous thrombosis in children with end-stage renal disease undergoing hemodialysis. Thromb. Res 172 , 150–157 (2018).

Chen, D., Långström, S., Petäjä, J., Heikinheimo, M. & Pihkala, J. Thrombin formation and effect of unfractionated heparin during pediatric cardiac catheterization. Catheter Cardiovasc Inter. 81 , 1174–1179 (2013).

Reynolds, P. M. et al. Evaluation of Prophylactic Heparin Dosage Strategies and Risk Factors for Venous Thromboembolism in the Critically Ill Patient. Pharmacotherapy 39 , 232–241 (2019).

Diamanti, A. et al. Prevalence of life-threatening complications in pediatric patients affected by intestinal failure. Transpl. Proc. 39 , 1632–1633 (2007).

Download references

This study was supported by the Fundamental Research Funds for the Central Universities [grant numbers YCJJ20230244] and Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology Research Fund [grant numbers 2022C09].

Author information

Authors and affiliations.

Department of Nursing, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China

Maoling Fu, Qiaoyue Yang, Yaqi Yu, Wenshuai Song, Xiuli Qin, Ying Luo, Xiaoju Xiong & Genzhen Yu

School of Nursing, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China

Maoling Fu, Quan Yuan, Qiaoyue Yang, Yaqi Yu & Wenshuai Song

You can also search for this author in PubMed   Google Scholar

Contributions

GY and YL framed the review questions on the basis of input from MF and QY. YY and XQ conducted the literature search. MF, WS, and QY screened and evaluated the identified papers. GY and YY performed data extraction and analysis. MF, WS, XQ and QY prepared the initial manuscript with revisions and comments from GY, YL, and XX. All authors approved the final manuscript as submitted and agreed to be accountable for all aspects of the work.

Corresponding author

Correspondence to Genzhen Yu .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary checklist, supplemental digital tables1, supplemental digital tables2, supplemental digital, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Fu, M., Yuan, Q., Yang, Q. et al. Risk factors and incidence of central venous access device-related thrombosis in hospitalized children: a systematic review and meta-analysis. Pediatr Res (2024). https://doi.org/10.1038/s41390-024-03225-0

Download citation

Received : 06 October 2023

Revised : 18 March 2024

Accepted : 25 March 2024

Published : 17 May 2024

DOI : https://doi.org/10.1038/s41390-024-03225-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

importance of data analysis in research methodology

  • Open access
  • Published: 07 May 2024

Integrative analysis of transcriptome and target metabolites uncovering flavonoid biosynthesis regulation of changing petal colors in Nymphaea ‘Feitian 2’

  • Xian Zhou 1 , 2 , 3 ,
  • Xiaohan Wang 1 , 2 , 3 ,
  • Haohui Wei 1 , 2 , 4 ,
  • Huijin Zhang 1 , 2 ,
  • Qian Wu 1 , 2 &
  • Liangsheng Wang 1 , 2 , 3  

BMC Plant Biology volume  24 , Article number:  370 ( 2024 ) Cite this article

180 Accesses

Metrics details

Nymphaea (waterlily) is known for its rich colors and role as an important aquatic ornamental plant globally. Nymphaea atrans and some hybrids, including N . ‘Feitian 2,’ are more appealing due to the gradual color change of their petals at different flower developmental stages. The petals of N. ‘Feitian 2’ gradually change color from light blue-purple to deep rose-red throughout flowering. The mechanism of the phenomenon remains unclear.

In this work, flavonoids in the petals of N. ‘Feitian 2’ at six flowering stages were examined to identify the influence of flavonoid components on flower color changes. Additionally, six cDNA libraries of N . ‘Feitian 2’ over two blooming stages were developed, and the transcriptome was sequenced to identify the molecular mechanism governing petal color changes. As a result, 18 flavonoid metabolites were identified, including five anthocyanins and 13 flavonols. Anthocyanin accumulation during flower development is the primary driver of petal color change. A total of 12 differentially expressed genes (DEGs) in the flavonoid biosynthesis pathway were uncovered, and these DEGs were significantly positively correlated with anthocyanin accumulation. Six structural genes were ultimately focused on, as their expression levels varied significantly across different flowering stages. Moreover, 104 differentially expressed transcription factors (TFs) were uncovered, and three MYBs associated with flavonoid biosynthesis were screened. The RT-qPCR results were generally aligned with high-throughput sequencing results.

Conclusions

This research offers a foundation to clarify the mechanisms underlying changes in the petal color of waterlilies.

Peer Review reports

Flower color is a remarkable and diverse characteristic throughout the plant kingdom. It not only plays a vital role in the success of plant survival or reproduction but also has significant economic benefits for the flower industry [ 1 , 2 ]. Some factors influencing the development of flower color encompass the classes and levels of pigments in the petals, co-pigmentation effects, cell vacuole pH, petal epidermal cell structure, and complexes with metal ions [ 3 ]. Among these, the type and contents of pigments in the petals exert the greatest influence [ 4 ]. Anthocyanins, a class of flavonoids, influence 80% of angiosperms flower colors globally, including yellow, pink, red, purple, and blue [ 5 , 6 , 7 ].

In addition to single-color flowers, some plants possess complex flower color patterns. Some flowers can gradually change color over the course of the flowering period, including Lonicera japonica (white to gold), Xanthoceras sorbifolium (yellow or orange to red), Lantana camara (yellow to orange to scarlet to magenta), Brunfelsia acuminata (deep purple to light purple to white), Hibiscus mutabilis (white to pink to red), Combretum indicum (white to red), and Victoria (white to pink or ruby red). The color change mechanism varies between different flowers. The color change of L. japonica flowers was caused by β-carotene concentrations increasing dramatically in the white to golden flower stages, accompanied by a drop in lutein concentrations [ 8 ]. In B. acuminate , the anthocyanin content decreased, and the petal color changed [ 9 ]. The alteration in petal coloration has been attributed to an elevated concentration of cyanidin derivatives or delphinidin derivatives in the vacuoles of petals [ 10 , 11 , 12 , 13 , 14 ]. The molecular mechanisms underlying color changes are receiving increasing attention. In Paeonia ‘Coral Sunset’, the change in flower color from coral to pink to pale yellow is due to a significant decrease in anthocyanin content. Then, eight structural genes related to anthocyanin synthesis were highly expressed during the S1 period and lowly expressed during other stages, causing petal color changed [ 15 ]. In Nelumbo ‘Qiusanse’, petals color faded during flowering. Researchers have found that the involvement of anthocyanin biosynthesis repressors and degrading genes as well as pH regulators in controlling color fading [ 16 ].

Waterlilies, a general term for plants in the Nymphaea genus, are perennial aquatic plants with high ornamental, cultural, and economic value in the family Nymphaeaceae [ 17 , 18 , 19 ]. They are also the national flower of Egypt, Thailand, and other countries. Waterlilies represent an early diverging clade of flowering plants with unique roles in angiosperm phylogeny [ 20 , 21 , 22 ]. They are the most diverse and widespread genus of the family Nymphaeaceae, including five subgenera, including Lotos , Hydrocallis , Anecphya , Brachyceras , and Nymphaea [ 23 , 24 ]. Waterlilies are globally distributed, with over 50 species, including abundant germplasm resources, with over 1000 cultivars globally.

Waterlilies display various flower colors, including white, yellow, red, purple, and blue, among others. Flavonoids are the primary pigments responsible for floral color in waterlilies. To date, 117 flavonoids have been identified in waterlilies, including 20 anthocyanins [ 25 ]. In examining waterlily flower color, some researchers have focused on the mechanism of blue flower formation [ 22 , 26 ]. In addition to blue flowers, N. atrans and its hybrids can undergo petal color changes throughout the flowering period, resulting in higher ornamental value than others [ 27 ]. The N . ‘Feitian 2,’ intersubgeneric hybrid of N. atrans , perfectly inherits the characteristic of color-changing petals of N. atrans . In addition, its single flower possesses a longer flowering period, its stamen color is deep purple-red instead of yellow, and its petals are more discolored. Petals of N . ‘Feitian 2’ gradually change from light blue-purple to deep rose-red during flowering. However, the mechanism underlying this color change during flowering remains unclear.

To uncover this mechanism, N . ‘Feitian 2’, which changes petal color during flowering, was selected as the experimental subject. In the present study, we identified flavonoids, especially anthocyanins, in flower petals six days after opening via high-performance liquid chromatography (HPLC) and examined the relationship between flower coloration and pigment content. Moreover, transcriptomic analysis of petals from two different flowering stages was conducted to reveal the gene regulation responsible for the N . ‘Feitian 2’ color transition process. This research elucidates the mechanism of flower color change in waterlilies and guides the molecular breeding of ornamental plants.

Materials and methods

Plant materials.

The intersubgeneric day blooming waterlily cultivar N. ‘Feitian 2’ were used as materials in this study. This cultivar was a hybrid of subgenus Anecphya as female parent and subgenus Brachyceras as male parent. Single flower of N. ‘Feitian 2’ could bloom for 6 days (D1 to D6), and the color was different everyday (Fig.  1 A). The petals (inner petals, middle petals, and outer petals) of N. ‘Feitian 2’ from D1 to D6 were collected at China National Botanical Garden/Institute of Botany, the Chinese Academy of Sciences, Beijing, China.

Petal color measurement

The color parameters of fresh petals (inner petals, middle petals, and outer petals) were measured with a spectrophotometer NF555 (Nippon Denshoku Industries Co. Ltd., Japan) at CIE C/28 illumination/viewer conditions. CIELAB was used to measure different aspects of flower color by L * , a * , and b * parameters. An average of five measurements was used. Then petals were snap-frozen in liquid nitrogen and stored at -80°C. Sampling was conducted with three biological replicates, to reduce analysis bias.

Flavonoids extraction, qualitative and quantitative analysis

The flavonoids extraction of petals (D1 to D6, mixture of inner petals, middle petals, and outer petals) was carried out as described by a previous study, with minor modification [ 26 ]. First, samples were freeze-dried before the experiment. Then approximately 0.05 g of freeze-dried petals were pulverized in liquid nitrogen, extracted with 1 mL of extracting solution (99.8:0.2, v/v, methanol: formic acid) in a test tube, sonicated with KQ-500DE ultrasonic cleaner (Ultrasonic instruments, Jiangsu Kunshan, China) at 20°C for 20 min, and then centrifuged in SIGMA 3K30 (SIGMA centrifuger, Germany) with 10,000 g for 10 min. The supernatants were collected into fresh tubes. We repeated the above operation for three times. All extracts were combined and filtered through 0.22 μm reinforced nylon membrane filters (Shanghai ANPEL, Shanghai, China) before the HPLC-DAD and HPLC-MS analyses. Three replicates were made for each sample. All concentrations used in this study were calculated from dry weight (DW).

An Agilent 1260 Infinity II LC system (Agilent Technologies, Santa Clara, CA, USA) was used for the analysis. The HPLC analysis was performed under the following conditions: column, Kromasil 100-5 C 18 column (250 mm × 4.6 mm; AKZO NOBEL, Sweden); solvent system, 2% formic acid aqueous solution (phase A) and 15% methanol acetonitrile (phase B); gradient program, 90:10 phase A/phase B at 0 min, 80:20 Phase A/phase B at 15.0 min, 77:23 phase A/phase B at 25.0 min, 60:40 phase A/phase B at 45.0 min, 10:90 phase A/phase B at 47.0 min, 10:90 phase A/phase B at 50.0 min; flow rate, 0.80 mL/min; temperature, 28 °C; injection volume: 10 µL. Chromatograms of anthocyanins and other flavonoids were acquired at 525 nm and 350 nm, respectively.

An Agilent 1260 Infinity II LC system coupled to an Agilent 6520 accurate-mass Q-TOF-MS/MS (Agilent Technologies, Santa Clara, CA, USA) was used for qualitative analysis. The liquid chromatographic conditions, mobile phase composition, and elution procedure were the same as those mentioned above. The following analysis conditions of mass spectrometry were adopted: the positive-ion (PI) mode for anthocyanins and negative-ion (NI) mode for other flavonoids; capillary voltage of 3.50 kV; nebulizer pressure of 0.103 MPa; desolvation gas (N 2 ) flow of 12 L/min; drying gas temperature of 350°C; scanning range of 50-2000 ( m/z ) units. Data capture and analysis were managed using Masshunter Qualitative Analysis Software B.04.00.

The contents of anthocyanins and other flavonoids were calculated using semi-quantitative standard Cyanidin 3- O -glucoside (Cy3G) and rutin. Mean values and SDs were calculated from three biological replicates.

RNA isolation, library construction and sequencing

Total RNA of D1 and D4 petals (mixture of inner petals, middle petals, and outer petals) was extracted using the E.Z.N.A. Plant RNA Kit (Omega Bio-tek, USA) according the instructions provided by the manufacturer. RNA integrity was assessed using the RNA Nano 6000 Assay Kit of the Agilent Bioanalyzer 2100 system (Agilent Technologies, CA, USA). RNA purity and concentration was measured using NanoDrop 2000 (Thermo Fisher Scientific, Wilmington, DE). A total amount of 1 µg RNA per sample was used as input material for the RNA sample preparations. Sequencing libraries were generated using Hieff NGS Ultima Dual-mode mRNA Library Prep Kit for Illumina (Yeasen Biotechnology (Shanghai) Co., Ltd.) following manufacturer’s recommendations and index codes were added to attribute sequences to each sample. Briefly, mRNA was isolated by Oligo (dT)-attached magnetic beads. Then mRNA was randomly fragmented into short fragments using fragmentation buffer and reverse transcribed into cDNA with random primers. Second-strand cDNA was synthesized with addition of PCR buffer, dNTPs, RNase H and DNA polymerase I. The cDNA fragments were purified with AMPure XP beads, end repaired, ‘A’ base added, and ligated to Illumina sequencing adapters. The ligation products were size selected by AMPure XP beads. In order to ensure the quality of library, Qubit 2.0 and Agilent 2100 were used to examine the concentration of cDNA and insert size. Q-PCR was processed to obtain a more accurate library concentration. At last, six libraries were sequenced on an Illumina NovaSeq platform to generate 150 bp paired-end reads, according to the manufacturer’s instructions (NCBI BioProject accession number: PRJNA1056490). The cDNA library construction and RNA-seq were performed by Biomarker Technologies Co., Ltd. (Beijing, China).

Transcriptome data analysis

Clean reads were obtained by removing reads containing adapter, reads containing ploy-N and low quality reads from raw data. These clean reads were then mapped to the N. colorata reference genome using the default parameters of Hisat2 software [ 22 ]. Gene function was annotated based on the following databases: Nr ( http://www.ncbi.nlm.nih.gov ), Pfam ( https://pfam.xfam.org/ ), KOG/COG ( http://www.ncbi.nlm.nih.gov/COG ), Swiss-Prot ( https://www.expasy.org/ ), KEGG database ( http://www.genome.jp/kegg ), and GO ( http://www.geneontology.org/ ). Fragments per kilobase of transcript per million mapped reads (FPKM) were used for transcription or quantification of gene expression levels. An absolute Log 2 FC ≥ 1 and false discovery rate (FDR) < 0.05 were used as thresholds for the identification of differentially expressed genes (DEGs) using DESeq2 software [ 28 ]. Gene Ontology (GO) enrichment analysis of the DEGs was implemented by the GOseq packages based Wallenius non-central hypergeometric distribution [ 29 ]. KOBAS database and clusterProfiler software were used to test the statistical enrichment of DEGs in KEGG pathways [ 30 ].

RT-qPCR expression analysis of genes involved in flavonoids biosynthesis

For the purpose of gene validation and expression analysis, 16 candidate genes related to flavonoid biosynthesis were subjected to real-time quantitative PCR (RT-qPCR) on ABI StepOnePlus™ Real-Time PCR System (Applied Biosystems, USA). cDNA synthesis and RT-qPCR were performed using EX RT Kit (gDNAremover) and 2×HQ SYBR qPCR Mix (Zomanbio, Beijing, China). Actin 11 was selected as an internal reference gene to normalize the expression data [ 31 ]. The gene-specific primers were shown in Supplementary Table S1 . The 2 −ΔΔCT method was used to quantify gene expression [ 32 ].

Correlation analysis of metabolite profiling and transcriptome

Mean and standard deviation for each sample were obtained from triplicate for further analysis. The heatmaps were performed by using TBtools [ 33 ]. Correlation analysis was performed by calculating the Pearson correlation coefficient (PCC with SPSS 21.0 (SPSS Inc., Chicago, IL)), and the screening criterion was PCC ≥ 0.80 or ≤ -0.80. Cytoscape (The Cytoscape Consortium, USA, version 3.9.1) was used to visualize the interaction networks.

Petal color phenotype analysis of Nymphaea ‘Feitian 2’ at different flowering stages

A single flower of Nymphaea ‘Feitian 2’ bloomed for six days above the water. The color of petals differs across flowering stages; even different petals possess different levels of discoloration (Fig.  1 A). In general, the outer petals change color first, while the inner petals change color beginning on day three after blooming (D3). Therefore, we divided the petals into inner petals, middle petals, and outer petals for color parameter determination. In the CIE L * a * b * color system, the parameter L * describes the lightness of the color, ranging from black (0) to white (100), the parameter a * represents green and red color from negative value to positive value, and the parameter b * represents blue and yellow color from negative value to positive value. The color coordinate values measured ranged across all petals as follows: L * from 38.20 to 84.88, a * from − 4.09 to 36.45, and b * from − 11.22 to 3.11 (Supplementary Table S2 ). In general, a gradual discoloration process for N. ‘Feitian 2’ was observed, as its L * , a * , and b * values were roughly distributed along a three-dimensional curve (Fig.  1 B). The colors of different petals were generally consistent, owing to the a * value and b * value of both inner, middle, and outer petals being consistent (Fig.  1 C).

figure 1

Various flowering stages and flower color distribution in coordinate systems of N. ‘Feitian 2’. A, Various flowering stages; B, flower color distribution across coordinate systems of trivariate ( L * , a * , and b * ); C, a * and b * value distribution during different flowering stages. (D1-D6 represents the first to the sixth blooming day, respectively)

Identification and quantification of flavonoids

We examined the flavonoid metabolites present in the petals of N . ‘Feitian 2’ across six different flowering stages (D1-D6) to determine the cause of petals becoming increasingly purple-red throughout flowering. A total of 18 flavonoid metabolites were identified, including five anthocyanins and 13 flavonols (Fig.  2 A, Supplementary Table S3 ). The total anthocyanin contents (TA) accumulated constantly during anthesis over the first five days and decreased on the last day, with TA at D5 nearly 53 times higher than that at D1 (Fig.  2 B, C, Supplementary Table S4 ). Only two aglycones of anthocyanidin, delphinidin and cyanidin, were identified. The levels of delphinidin derivatives were higher than cyanidin derivatives, which was the highest in D1, reaching 88.57% of TA, and then gradually decreased, reaching 50% of TA (Supplementary Table S4 ).

Flavonols, including kaempferol derivatives (Km), quercetin derivatives (Qu), and myricetin derivatives (My) were also identified. The total flavonol contents (TF) were approximately 147.46-183.88 mg g − 1 DW, and the highest TF level was detected at D3. The changing trend of TF from D1 to D6 showed a bell curve trend, which first increased and then fell. Myricetin derivatives represented the main components of flavonols, accounting for more than 50% of TF, followed by quercetin derivatives (34.41% of TF at D2 and 36.70% of TF at D6). The levels of kaempferol derivatives were lower in the petals (Supplementary Table S4 ). Unlike anthocyanins, the alterations in flavonol content across different flowering stages were complex. There were three types of change curves for flavonol contents, including an initial decrease followed by an increase, continuously increasing, and initially increasing and then decreasing (Fig.  2 C).

figure 2

Analysis of flavonoid metabolites in the petals of N. ‘Feitian 2’. A, HPLC chromatograms of flavonoid detected at 525 nm and 350 nm. B, anthocyanin and flavonol content accumulation in petals, encompassing cyanidin derivatives (Cy), delphinidin derivatives (Dp), kaempferol derivatives (Ka) and quercetin derivatives (Qu), and myricetin derivatives (My). Three independent biological experiments were conducted. Values represent means ± SD. C, heatmap diagram of the 18 flavonoid metabolites across six flowering stages

Correlation analysis indicated that 10 flavonoids were significantly correlated with color parameters, including five anthocyanins and five flavonols (Supplementary Figure S1 ). Anthocyanins were significantly negatively correlated with the L * value, positively correlated with the a * value in both inner petals, middle petals and outer petals. And only negatively correlated with the b * value in the inner petals. While there was no significant correlation with the middle or outer petals. The flavonols that were significantly correlated with color parameters were quercetin-3- O -galactoside (f2), myricetin 3- O -(3’’- O -acetyl)-rhamnoside (f4), quercetin 3- O -(3’’- O -acetyl)-rhamnoside (f8), kaempferol 3- O -(3’’- O -acetyl)-rhamnoside (f10), kaempferol 3- O -(3’’- O -malonyl) -rhamnoside (f13). Among them, f2 and f13 were only positively correlated with the a * value of outer petals. In contrast, f4, f8, and f10 were negatively correlated with the L * value and positively correlated with the a * value of all petals.

According to the changes of flavonoids across different stages, all anthocyanins exhibited the largest changes in content. Delphinidin 3’- O -(2’’- O -galloyl-6’’- O -acetyl-galactoside) (a3) exhibited the lowest content among all anthocyanins, and its content increased nearly 5.74-fold, from 0.0629 mg g − 1 DW at D1 to 0.3612 mg g − 1 DW at D6 (Supplementary Table S4 ). In contrast, the five flavonols (f2, f4, f8, f10, and f13) were not changed as considerably as the anthocyanins. These results indicated that N . ‘Feitian 2’ flower color change is closely tied to the accumulation of five anthocyanins.

RNA-Seq analysis

To delve deeper into the molecular mechanism of waterlily flower color changes during anthesis, six libraries (D1-1, D1-2, D1-3, D4-1, D4-2, and D4-3) were developed using N. ‘Feitian 2’ at two different flowering stages, replicated three times. After cleaning and quality control, approximately 64.36 Gb of clean data were obtained, and each library produced no less than 10.33 Gb of clean data. The Q30% of all libraries was over 92% (Supplementary Table S5 ). Between 62.92 and 64.73% of the sequenced reads could be aligned to the waterlily reference genome (Supplementary Table S5 ) [ 22 ]. Using an absolute Log 2 FC ≥ 1 and FDR < 0.01 as filter criteria for differentially expressed genes (DEGs), a total of 2,912 DEGs were found in D1 vs. D4, including 1,220 up-regulated DEGs and 1,692 down-regulated DEGs (Supplementary Figure S2 ). Therefore, a large number of genes may participate in petal color changing in N . ‘Feitian 2’.

To characterize the major functional categories of the DEGs, Gene Ontology (GO) enrichment analysis was performed. In total, 2,912 DEGs were distributed across three gene ontology categories: cellular component, biological process, and molecular function. In detail, the ‘metabolic process,’ ‘cellular process,’ and ‘single-organism process’ were the most enriched terms in biological processes. For the cellular component category, ‘membrane,’ ‘membrane part,’ and ‘cell’ were the most abundant proportions. Under the molecular function category, ‘catalytic activity,’ ‘binding,’ and ‘transporter activity’ were the most represented (Supplementary Figure S3 ).

Pathway analysis assists in understanding biological functions and gene interactions. Our findings demonstrated that pathways with the highest DEG representations were plant hormone signal transduction (ko00340), followed by starch and sucrose metabolism (ko00500) and phenylpropanoid biosynthesis (ko00940). The classification indicated that a large number of genes were enriched in ‘flavonoid biosynthesis’ (ko00941), ‘flavone and flavonol biosynthesis’ (ko00944), and ‘anthocyanin biosynthesis’ (ko00942), which were important for petal coloration (Fig.  3 ).

figure 3

Bar chart depicting the top 20 genes analyzed using Kyoto Encyclopedia of Genes and Genomes (KEGG) terms enriched in differentially expressed genes (DEGs) in N. ‘Feitian 2’

Combined metabolomic and transcriptomic analysis

By integrating data from metabolomics and transcriptomics analyses, the flavonoid biosynthesis in N. ‘Feitian 2’ petals was revealed. We explored the genes of flavonoids, especially the anthocyanin biosynthesis pathway, and uncovered the key genes of flavonoid metabolism in the petals of N. ‘Feitian 2’ to analyze the mechanism of altered flower color. A total of 159 unigenes encoding nine enzymes involved in the above three pathways were the study focus (Table  1 ). We analyzed transcriptional profiles for the genes involved in flavonoid metabolism between D1 and D4 to identify the key genes for color changing. The results indicated that a total of 26 key unigenes not only had different expression levels but also FPKM values ≥ 10, including 19 up-regulated and seven down-regulated unigenes (Supplementary Table S6 ).

All identified DEGs contained both upstream genes ( CHS , CHI , etc.) and downstream genes ( UFGT , etc.) (Fig.  4 ). The majority of upstream genes, CHS , CHI , and F3H , were expressed higher at D4. However, LOC116265581 ( CHS -2), LOC116245897 ( F3H -1), and LOC116263301 ( F3H -3) had higher expression levels at D1. F3’H , F3’5'H , DFR , and ANS were all expressed more highly at D4, which was consistent with the accumulation of anthocyanins. FLS and UFGT were key enzyme genes for flavonoid biosynthesis and modification. Among them, the expression level of LOC116261229 ( FLS ) was higher at D1, consistent with the trend of flavonol content. Additionally, LOC116249850 ( UFGT -3), LOC116254426 ( UFGT -7), and LOC116265943 ( UFGT -11), whose expression levels were higher at D1, might be the key genes for the modification of flavonol compounds. The expression of the remaining eight genes of UFGT s was higher at D4 than at D1, wherein LOC116247679 ( UFGT- 2), LOC116253945 ( UFGT- 4), and LOC116257005 ( UFGT- 8) possessed higher FPKM values, and may represent the key unigenes related to anthocyanin modification. These suggested that the color alteration of waterlily petals was triggered by the co-expression of many unigenes.

figure 4

Evaluation of DEGs in the flavonoid biosynthesis pathway. CHS, chalcone synthase; CHI, chalcone isomerase; F3H, flavanone 3-hydroxylase; F3’H, flavonoid 3 '- hydroxylase; F3’5'H, flavonoid 3’5'-hydroxylase; FLS, flavonol synthase; DFR, dihydroflavonol reductase; ANS, anthocyanidin synthase; UFGT, Flavonoid 3- O -glucosyltransferase

Correlation analysis evaluated the correlation between the 26 DEGs mentioned above and five anthocyanins correlated with flower color change. Correlation analysis (Fig.  5 ) indicated that 19 DEGs were significantly positively correlated with five anthocyanins, including one CHS , three CHI , one F3H , one F3’H , two F3’5'H , one DFR , two ANS , and eight UFGT . In contrast, seven DEGs were significantly negatively correlated with five anthocyanins. After analyzing the expression levels of 19 positively correlated genes (Supplementary Table S6 ), we selected 12 candidate key structural genes, including LOC116265292 ( CHS -1), LOC116256153 ( CHI -1), LOC116262004 ( CHI -3), LOC116246718 ( F3H -2), LOC116257842 ( F3’H ), LOC116260989 ( F3’5'H -1), LOC116268364 ( DFR ), LOC116249327 ( ANS -1), LOC116260841 ( ANS -2), LOC116247679 ( UFGT -2), LOC116253945 ( UFGT -4), and LOC116257005 ( UFGT -8).

figure 5

Co-expression network of DEGs in structure genes and flavonoid metabolism. The dark gray edges indicate a positive correlation, and the light gray indicates a negative correlation. a1, delphinidin 3- O -galactoside; a2, cyanidin-3- O -glucoside; a3, delphinidin 3’- O -(2’’- O -galloyl-6’’- O -acetyl-galactoside); a4, delphinidin-3- O -(6’’- O -acetyl)-galactoside; a5, cyanidin-3- O -(6’’- O -acetyl)-galactoside

RT-qPCR analysis

According to the correlation analysis using key flavonoids and expression level, 12 candidate enzyme genes were selected for RT-qPCR. The expression of these candidate genes increased continuously from D1 to D4 (Fig.  6 ), mirroring the trend measured in the transcriptome, indicating the accuracy of the sequencing data.

figure 6

Relative expression levels of 12 candidate genes in the flavonoid biosynthetic pathway by RT-qPCR analysis. Data are presented as means ± SD

All 12 candidate genes had similar expression patterns, and their expression curves were “bell curves,” following an upward trend first, reaching the maximum level, and then exhibiting a downward trend. While the expression curves were similar, there were some differences. The expression levels of LOC116265292 ( CHS -1), LOC116256153 ( CHI -1), LOC116249327 ( ANS -1), and LOC116260841 ( ANS -2) were the highest at the D4 stage. The expression levels of LOC116262004 ( CHI -3), LOC116246718 ( F3H -2), LOC116257842 ( F3’H ), LOC116260989 ( F3’5'H -1), LOC116268364 ( DFR ), LOC116247679 ( UFGT -2), and LOC116253945 ( UFGT -4) were the highest at the D5 stage. The expression level of LOC116257005 ( UFGT- 8) was the highest at the D2 stage. This indicates that UFGT -8 may be associated with flavonol synthesis. Among these genes, CHI -3, and F3H -2 had the most rapid change at different stages. Subsequently, F3’H , F3’5'H -1, ANS -1, and ANS -2 rapidly changed at different stages. CHS -1, CHI -1, DFR , UFGT -2, and UFGT -4 exhibited smaller changes during D1 to D6. The differential expression of structural genes could result in different flower colors of N . ‘Feitian 2’ across different flowering stages, among which the most important structural genes were CHI -3, F3H -2, F3’H , F3’5'H -1, ANS -1, and ANS -2.

Specific transcription factor analysis

Transcription factors (TFs) execute critical roles in the growth and development of plants by modulating gene expression. Our data revealed a total of 1,208 unigenes predicted as TFs. A total of 104 transcription factors (FPKM value ≥ 10) with different expression levels were identified (Supplementary Table S7 ). Among these transcription factors, the most abundant were MYB (17), followed by AP2 / ERF (14), bHLH (14), and WRKY (10). The TFs associated with regulating the expression of flavonoid structural genes were mainly MYB, bHLH, and WDR [ 34 , 35 , 36 , 37 ]. A total of 17 MYB , 14 bHLH , and 3 WDR unigenes were identified in our data. By constructing a phylogenetic tree alongside Arabidopsis , three MYBs , including LOC116245731 ( MYB-1 ), LOC116259798 ( MYB-2 ), and LOC1 16261829 ( MYB-3 ) were analyzed as crucial regulatory genes, as they were clustered together with MYB , related to the regulation of flavonoid biosynthesis in Arabidopsis thaliana (S4, S5, S6, and S7) (Fig.  7 A) [ 38 ]. These three candidate regulatory genes were examined through RT-qPCR. These genes exhibited similar expression curves, known as a “bell curve” trend. The expression level of LOC116245731 ( MYB - 1 ) was the highest at the D4 stage, however LOC116259798 ( MYB - 2 ), and LOC116261829 ( MYB - 3 ) were the highest at the D5 stage. LOC116261829 ( MYB - 3 ) exhibited the most rapid change from D1 to D6 among three candidate regulatory genes. MYB may act as the mainly transcription factor in N . ‘Feitian 2’, as phylogenetic tree analysis identified that the candidate bHLHs did not converge with the S29 (IIIf subgroup), related to the regulation of flavonoid biosynthesis in Arabidopsis thaliana (Supplementary Figure S4 ) [ 39 , 40 ].

figure 7

Phylogenetic tree and relative expression levels of MYB s derived from N. ‘Feitian 2’. A, phylogenetic tree of MYB s with Arabidopsis . B, relative expression levels of three MYB s

Flavonoids, particularly anthocyanins, are vital in plant survival as attractants of pollinators and protectors in various stress situations [ 41 ]. To carry out their roles, they are typically produced transiently, undergoing regulated accumulation or degradation. Anthocyanin degradation causes color changes from deep to light, and the degradation mechanism is more comprehensively understood in Brunfelsia calycina , Brunfelsia acuminata , and Nelumbo ‘Qiusanse’ [ 9 , 16 , 42 , 43 ]. Although much is known about anthocyanin biosynthesis, the mechanism of flower color change in each species differs, and as a basal angiosperm, the mechanism of flower color change in waterlilies remains unclear.

In many plants, the color of flower petals varies from light to deep, primarily due to the accumulation of anthocyanins. In Rhododendron simsii , for instance, the cyanidin biosynthesis pathway is activated to generate a red color [ 44 ]. Similarly, in the Japanese tree peony cultivar ‘Taiyoh,’ the pelargonidin biosynthesis pathway is activated in petals, resulting in a vivid red color [ 45 ]. In Tulipa gesneiana ‘Queen of night,’ the accumulation of anthocyanins (delphinidin 3- O -rutinoside, cyanidin 3- O -rutinoside, and pelargonidin 3- O -rutinoside) triggers a color change from green to black [ 46 ]. The shift in the waterlily cultivar ‘King of Siam’ petal in color from colorless to violet-blue throughout development is believed to have been caused by the cyanidin and delphinidin biosynthesis pathway [ 26 ]. Coloration operates as a gradual process, and many flowers have concluded coloration before opening, while some flowers undergo coloration after receiving certain signals, such as Nymphaea ‘Feitian 2,’ which begins coloration after opening. The anthocyanin accumulation in petals plays a crucial role in the coloration process. This study detected flavonoids in N. ‘Feitian 2’ petals over D1 to D6. A total of 18 flavonoids were characterized, including 13 flavonols and five anthocyanins (Table S3 ). Anthocyanins accumulated from D1 to D5, then reduced at D6, while flavonols accumulated from D1 to D3, then were reduced from D4 to D6 (Fig.  2 ). Only cyanidin derivatives and delphinidin derivatives were identified, which was consistent with the findings of other research [ 25 , 47 , 48 ]. While anthocyanidins are simple, glycosylation and acylation modification of anthocyanins were extensively present, producing rich and varied colors in waterlilies.

Anthocyanins are produced at the end of the phenylpropanoid metabolic pathway, and the precursors of anthocyanin biosynthesis are malonyl-CoA and coumaroyl-CoA. Most anthocyanins are synthesized via CHS and CHI condensation; F3H, F3’H, or F3’5'H oxidation; DFR and ANS/LDOX catalysis; GT, and AT modification [ 4 , 6 ]. In our study, we conducted a transcriptome analysis of N . ‘Feitian 2’ at two stages (D1 and D4), and screened 26 enzyme genes that were significantly differentially expressed (Table  1 ; Fig.  4 ). Among these, 19 were up-regulated genes, and the remaining genes were down-regulated. DFR and ANS act as critical enzymes in the anthocyanin pathway that play an essential role in converting dihydroflavonol to anthocyanidins [ 49 ]. We identified five DFR genes, of which only one gene was up-regulated. Two ANS genes were detected, and all were up-regulated, aligned with the pattern of floral color development. Cyanidin and delphinidin undergo further glycosylation under the action of GTs to achieve stability [ 50 ]. Moreover, GTs make important contributions to flower color formation. The bicolor nature of the lotus ‘Dasajin’ is primarily caused by the defective accumulation of the gene NnUFGT2 in the white portion of its petals, preventing the formation of glycosylated anthocyanins in the final metabolic step for flower color [ 51 ]. In Lobelia erinus , rhamnosylation is an essential process for lobelinin synthesis, and the expression of RT (ABTR2 and ABTR4) is required for the blue color of Lobelia flowers [ 52 ]. The formation of peony red flowers also relies on the action of GTs [ 45 ]. UFGT homologous unigenes were identified in N . ‘Feitian 2,’ with 11 differentially expressed, eight up-regulated, and three down-regulated (Table  1 ). Correlation analysis demonstrated that eight up-regulated unigenes had a positive correlation with flower color formation. Among them, LOC116247679 (UFGT-2), LOC116253945 (UFGT-4), and LOC116257005 (UFGT-8) had higher FPKM values, which may be critical unigenes related to anthocyanin modification.

Anthocyanin biosynthesis is predominantly regulated by transcription factors at the transcriptional level. Currently, many kinds of transcription factors, including MYB, bHLH, WD40, DOF, MADS-box, and WRKY proteins, have been found to modulate anthocyanin biosynthesis [ 53 , 54 , 55 , 56 ]. Among them, MYB transcription factors exert a crucial influence on the regulation of flower color. MYB transcription factors can act independently. For instance, AtPAP1 and AtPAP2 both act as master regulators controlling anthocyanin biosynthesis in Arabidopsis thaliana [ 57 , 58 ]. MYB TFs can also interact with other MYB transcription factors to carry out their functions. For example, PrMYBa1was found to activate PrF3H by interacting with PrMYBa2 to generate an ‘MM’ complex in red-purple blotches formation of Paeonia rockii ‘ShuShengPengMo’ [ 59 ]. In addition, MYB can bind to bHLH or create MBW complexes with bHLH and WDR. In Actinidia chinensis , the AcMYBF110-AcbHLH1-AcWDR1 complex directly targeted the promoters of anthocyanin synthetic genes to promote fruit color formation [ 60 ]. In this study, we examined the transcriptome data and found that 104 important transcription factors, including MYB, AP2/ERF, WRKY, bHLH, WD40, NAC, bZIP, and others, displayed significantly different expression levels between D1 and D4 (Supplementary Table S6 ). MYB transcription factors were chosen for phylogenetic analysis. LOC116245731 ( MYB-1 ), LOC116259798 ( MYB-2 ), and LOC116261829 ( MYB-3 ) were clustered together with MYB, which were associated with the regulation of flavonoid biosynthesis in A. thaliana (Fig.  7 B) [ 38 ]. We speculated that these three transcription factors may be candidate regulators of anthocyanin biosynthesis in N . ‘Feitian’ flowers.

In this study, flavonoids at different flowering stages and transcriptome data were utilized to reveal the discoloration of N. ‘Feitian 2’ petals. There were 18 flavonoids identified in the petals. The variation of the content of five detected anthocyanins was a chemical mechanism that contributed to the change in flower color. Moreover, a total of 26 differentially expressed genes (DEGs) of structural genes in the flavonoid biosynthesis pathway were uncovered. Among them, six structural genes were identified as candidate genes, as they were not only significantly positively correlated with anthocyanin accumulation, but also had rapid change during different flowering stages. Furthermore, 104 differentially expressed transcription factors (TFs) were identified, and three MYBs associated with flavonoid biosynthesis were screened via sequence phylogenetic analysis. These findings can help clarify the molecular mechanism and regulatory networks of flower discoloration in waterlilies and provide a biological basis for the breeding of novel cultivars.

Data availability

All relevant supporting data sets are included in the article and its supplemental files. The raw RNA-seq data have been submitted to the SRA database under accession number PRJNA1056490, and they can also be freely available at: https:// www. ncbi. nlm. nih. gov/sra/PRJNA1056490.

Abbreviations

High-performance liquid chromatography

Fragments per kilobase of exon model per million mapped reads

False discovery rate

Kyoto Encyclopedia of Genes and Genomes

Gene Ontology

Total anthocyanin contents

Total flavonol contents

Cyanidin derivatives

Delphinidin derivatives

Kaempferol derivatives

Quercetin derivatives

Myricetin derivatives

Cyanidin 3- O -glucoside

Chalcone synthase

Chalcone isomerase

Flavanone 3-hydroxylase

Flavonoid 3’-hydroxylase

Flavonoid 3’5'-hydroxylase

Flavonol synthase

Dihydroflavonol 4-reductase

Anthocyanidin synthase

Flavonoid 3- O -glucosyltransferase

Leucoanthocyanin dioxygenase

V-myb avian myeloblastosis viral oncogene homolog

Basic helix-loop-helix

Cuthill IC, Allen WL, Arbuckle K, Caspers B, Chaplin G, Hauber ME, et al. The biology of color. Science. 2017;357:eaan0221.

Article   PubMed   Google Scholar  

Tanaka Y, Katsumoto Y, Brugliera F, Mason J. Genetic engineering in floriculture. Plant Cell Tissue Organ Cult. 2005;80:1–24.

Article   CAS   Google Scholar  

Stavenga DG, Leertouwer HL, Dudek B, van der Kooi CJ. Coloration of flowers by flavonoids and consequences of pH dependent absorption. Front Plant Sci. 2021;11:600124.

Article   PubMed   PubMed Central   Google Scholar  

Grotewold E. The genetics and biochemistry of floral pigments. Annu Rev Plant Biol. 2006;57:761–80.

Article   CAS   PubMed   Google Scholar  

Fu HS, Zeng T, Zhao YY, Luo TT, Deng HJ, Meng CW, et al. Identification of Chlorophyll metabolism- and photosynthesis-related genes regulating Green Flower Color in Chrysanthemum by Integrative Transcriptome and weighted correlation network analyses. Genes. 2021;12:449.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Tanaka Y, Sasaki N, Ohmiya A. Biosynthesis of plant pigments: anthocyanins, betalains and carotenoids. Plant J. 2008;54:733–49.

Weiss MR. Floral color change: a widespread functional convergence. Am J Bot. 1995;82:167–85.

Article   Google Scholar  

Pu XD, Li Z, Tian Y, Gao RR, Hao LJ, Hu YT, et al. The honeysuckle genome provides insight into the molecular mechanism of carotenoid metabolism underlying dynamic flower coloration. New Phytol. 2020;227:930–43.

Li M, Sun YT, Lu XC, Debnath B, Mitra S, Qiu DL. Proteomics reveal the profiles of Color Change in Brunfelsia Acuminata flowers. Int J Mol Sci. 2019;20:2000.

Ghissing U, Goswami A, Mitra A. Temporal accumulation of pigments during colour transformation from white to red in Combretum indicum (L.) DeFilipps (syn. Quisqualis indica L.) flowers. Nat Prod Res. 2021;37:529–33.

Ghissing U, Kutty NN, Bimolata W, Samanta T, Mitra A, Wittstock U. Comparative transcriptome analysis reveals an insight into the candidate genes involved in anthocyanin and scent volatiles biosynthesis in colour changing flowers of Combretum indicum . Plant Biol. 2023;25:85–95.

Wu Q, Li PC, Zhang HJ, Feng CY, Li SS, Yin DD, et al. Relationship between the flavonoid composition and flower colour variation in Victoria . Plant Biol. 2018;20:674–81.

Yang YZ, Liu XD, Shi XQ, Ma J, Zeng XM, Zhu ZS, et al. A High-Quality, chromosome-level genome provides insights into determinate Flowering Time and Color of Cotton Rose ( Hibiscus mutabilis ). Front Plant Sci. 2022;13:818206.

Zhu ZS, Zeng XM, Shi XQ, Ma J, Liu XL, Li Q. Transcription and metabolic profiling analysis of three discolorations in a day of Hibiscus mutabilis . Biology. 2023;12:1115.

Guo LP, Wang YJ, da Silva JAT, Fan YM, Yu XN. Transcriptome and chemical analysis reveal putative genes involved in flower color change in Paeonia ‘Coral Sunset’. Plant Physiol Biochem. 2019;138:130–9.

Liu J, Wang YX, Zhang MH, Wang YM, Deng XB, Sun H, et al. Color fading in lotus ( Nelumbo nucifera ) petals is manipulated both by anthocyanin biosynthesis reduction and active degradation. Plant Physiol Biochem. 2022;179:100–7.

Anand A, Komati A, Katragunta K, Shaik H, Nagendla NK, Kuncha M, et al. Phytometabolomic analysis of boiled rhizome of Nymphaea Nouchali (Burm. f.) using UPLC-Q-TOF-MSE, LC-QqQ-MS & GC-MS and evaluation of antihyperglycemic and antioxidant activities. Food Chem. 2021;342:128313.

Naznin M, Badrul Alam M, Alam R, Islam S, Rakhmat S, Lee SH, et al. Metabolite profiling of Nymphaea Rubra (Burm. f.) flower extracts using cyclic ion mobility-mass spectrometry and their associated biological activities. Food Chem. 2023;404:134544.

Yin DD, Yuan RY, Wu Q, Li SS, Shao S, Xu YJ, et al. Assessment of flavonoids and volatile compounds in tea infusions of water lily flowers and their antioxidant activities. Food Chem. 2015;187:20–8.

Povilus RA, Dacosta JM, Grassa C, Satyaki PRV, Moeglein M, Jaenisch J et al. Water lily ( Nymphaea Thermarum ) genome reveals variable genomic signatures of ancient vascular cambium losses. PNAS. 2020;201922873.

Xiong XH, Zhang J, Yang YZ, Chen YC, Su Q, Zhao Y, et al. Water lily research: past, present, and future. Trop Plants. 2023;2:1–8.

Zhang LS, Chen F, Zhang XT, Li Z, Zhao YY, Lohaus R, et al. The water lily genome and the early evolution of flowering plants. Nature. 2020;577:79–84.

Huang GZ, Deng HQ, Li ZX, Li G. Water lily. Beijing: China Forestry Publishing House; 2009.

Google Scholar  

Slocum PD. Waterlilies and lotuses. Portland Cambridge: Timber; 2005.

Wu Q, Zhang HJ, Wang XH, Zhao W, Zhou X, Wang LS. Research progress on flower color of waterlily ( Nymphaea ). Acta Horticulturae Sinica. 2021;48:1–13.

CAS   Google Scholar  

Wu Q, Wu J, Li SS, Zhang HJ, Feng CY, Yin DD, et al. Transcriptome sequencing and metabolite analysis for revealing the blue flower formation in waterlily. BMC Genomics. 2016;17:897.

Wei Q, Liu AC, Chen C, Lu Y, Zhang Y, Li SJ. The complete chloroplast genome of Nymphaea atrans (Surrey Wilfrid Laurance Jacobs, 1992: Nymphaeaceae). Mitochondrial DNA B. 2023;8:430–3.

Love MI, Huber W, Anders S. Moderated estimation of Fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550.

Young MD, Wakefield MJ, Smyth GK, Oshlack A. Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biol. 2010;11:R14.

Mao XZ, Cai T, Olyarchuk JG, Wei LP. Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary. Bioinformatics. 2005;21:3787–93.

Luo HL, Chen SM, Wan HJ, Chen FD, Gu CS, Liu ZL. Candidate reference genes for gene expression studies in water lily. Anal Biochem. 2010;404:100–2.

Livak KJ, Schmittgen TD. Analysis of relative gene expression data using real-time quantitative PCR and the 2 –∆∆CT method. Methods. 2001;25:402–8.

Chen CJ, Chen H, Zhang Y, Thomas HR, Frank MH, He YH, et al. TBtools: an integrative Toolkit developed for interactive analyses of big Biological Data. Mol Plant. 2020;13:1194–202.

LaFountain AM, McMahon HE, Reid NM, Yuan YW. To stripe or not to stripe: the origin of a novel foliar pigmentation pattern in monkeyflowers ( Mimulus ). New Phytol. 2023;237:310–22.

Mol J, Grotewold E, Koes R. How genes paint flowers and seeds. Trends Plant Sci. 1998;3:1360–85.

Ramsay NA, Glover BJ. MYB-bHLH-WD40 protein complex and the evolution of cellular diversity. Trends Plant Sci. 2005;10:63–70.

Yuan Y, Li X, Yao X, Fu XH, Cheng J, Shan HY, et al. Mechanisms underlying the formation of complex color patterns on Nigella Orientalis (Ranunculaceae) petals. New Phytol. 2023;237:2450–66.

Dubos C, Stracke R, Grotewold E, Weisshaar B, Martin C, Lepiniec L. MYB transcription factors in Arabidopsis . Trends Plant Sci. 2010;15:573–81.

Baudry A, Heim MA, Dubreucq B, Caboche M, Weisshaar B, Lepiniec L. TT2, TT8, and TTG1 synergistically specify the expression of BANYULS and proanthocyanidin biosynthesis in Arabidopsis thaliana . Plant J. 2004;39:366–80.

Qian YC, Zhang TY, Yu Y, Gou LP, Yang JT, Xu J, et al. Regulatory mechanisms of bHLH transcription factors in Plant adaptive responses to various Abiotic stresses. Front Plant Sci. 2021;12:677611.

Jezek M, Allan AC, Jones JJ, Geilfus CM. Why do plants blush when they are hungry? New Phytol. 2023;239:494–505.

Vaknin H, Bar-Akiva A, Ovadia R, Nissim-Levi A, Forer I, Weiss D, et al. Active anthocyanin degradation in Brunfelsia calycina (yesterday-today-tomorrow) flowers. Planta. 2005;222:19–26.

Zipor G, Duarte P, Carqueijeiro I, Shahar L, Ovadia R, Teper-Bamnolker P, et al. In planta anthocyanin degradation by a vacuolar class III peroxidase in Brunfelsia calycina flowers. New Phytol. 2014;205:653–65.

Du H, Lai L, Wang F, Sun W, Zhang L, Li X, et al. Characterisation of flower colouration in 30 Rhododendron species via anthocyanin and flavonol identification and quantitative traits. Plant Biol. 2018;20:121–9.

Wang QY, Zhu J, Li B, Li SS, Yang Y, Wang QY, et al. Functional identification of anthocyanin glucosyltransferase genes: a Ps3GT catalyzes pelargonidin to pelargonidin 3-O-glucoside painting the vivid red flower color of Paeonia . Planta. 2023;257:65.

Guo XY, Fu XQ, Li X, Tang DQ. Effect of Flavonoid dynamic changes on Flower Coloration of Tulipa gesneiana ‘Queen of night’ during Flower Development. Horticulturae. 2022;8:510.

Zhu ML, Zheng XC, Shu QY, Li H, Zhong PX, Zhang HJ, et al. Relationship between the composition of flavonoids and flower colors variation in tropical water lily ( Nymphaea ) cultivars. PLoS ONE. 2012b;7:e34335.

Zhu ML, Wang LS, Zhang HJ, Xu YJ, Zheng XC, Wang LJ. Relationship between the composition of anthocyanins and flower color variation in hardy water lily ( Nymphaea spp.) cultivars. Chin Bull Bot. 2012a;47:437–53.

Jiang T, Zhang MD, Wen CX, Xie XL, Tian W, Wen SQ, et al. Integrated metabolomic and transcriptomic analysis of the anthocyanin regulatory networks in Salvia miltiorrhiza Bge. Flowers. BMC Plant Biol. 2020;20:349.

Ross J, Li Y, Lim E-K, Bowles DJ. Higher plant glycosyltransferases. Genome Biol. 2001;2:3004.3001-3004.3006.

Deng J, Su MY, Zhang XY, Liu XL, Damaris RN, Lv SY, et al. Proteomic and metabolomic analyses showing the differentially accumulation of NnUFGT2 is involved in the petal red-white bicolor pigmentation in lotus ( Nelumbo nucifera ). Plant Physiol Biochem. 2023;198:107675.

Hsu YH, Tagami T, Matsunaga K, Okuyama M, Suzuki T, Noda N, et al. Functional characterization of UDP-rhamnose‐dependent rhamnosyltransferase involved in anthocyanin modification, a key enzyme determining blue coloration in Lobelia erinus . Plant J. 2017;89:325–37.

Li C, Wu J, Hu KD, Wei SW, Sun HY, Hu LY, et al. PyWRKY26 and PybHLH3 cotargeted the PyMYB114 promoter to regulate anthocyanin biosynthesis and transport in red-skinned pears. Hortic Res. 2020;7:37.

Lloyd A, Brockman A, Aguirre L, Campbell A, Bean A, Cantero A, et al. Advances in the MYB-BHLH-WD repeat (MBW) Pigment Regulatory Model: Addition of a WRKY factor and co-option of an anthocyanin MYB for Betalain Regulation. Plant Cell Physiol. 2017;58:1431–41.

Qi FT, Liu YT, Luo YL, Cui YM, Lu CF, Li H, et al. Functional analysis of the ScAG and ScAGL11 MADS-box transcription factors for anthocyanin biosynthesis and bicolour pattern formation in Senecio cruentus ray florets. Hortic Res. 2022;9:uhac071.

Skirycz A, Jozefczuk S, Stobiecki M, Muth D, Zanor MI, Witt I, et al. Transcription factor AtDOF4;2 affects phenylpropanoid metabolism in Arabidopsis thaliana . New Phytol. 2007;175:425–38.

Maier A, Schrader A, Kokkelink L, Falke C, Welter B, Iniesto E, et al. Light and the E3 ubiquitin ligase COP1/SPA control the protein stability of the MYB transcription factors PAP1 and PAP2 involved in anthocyanin accumulation in Arabidopsis. Plant J. 2013;74:638–51.

Teng S, Keurentjes J, Bentsink L, Koornneef M, Smeekens S. Sucrose-specific induction of Anthocyanin Biosynthesis in Arabidopsis requires the MYB75/PAP1 gene. Plant Physiol. 2005;139:1840–52.

Zhu J, Wang YZ, Wang QY, Li B, Wang XH, Zhou X, et al. The combination of DNA methylation and positive regulation of anthocyanin biosynthesis by MYB and bHLH transcription factors contributes to the petal blotch formation in Xibei tree peony. Hortic Res. 2023;10:uhad100.

Liu YF, Ma KX, Qi YW, Lv GW, Ren XL, Liu ZD, et al. Transcriptional regulation of anthocyanin synthesis by MYB-bHLH-WDR complexes in Kiwifruit ( Actinidia chinensis ). J Agric Food Chem. 2021;69:3677–91.

Download references

This study was financially supported by the National Natural Science Foundation of China (Grant No. 32102413) and Biological Resources Programme CAS (KFJ-BRP-017-44).

Author information

Authors and affiliations.

State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China

Xian Zhou, Xiaohan Wang, Haohui Wei, Huijin Zhang, Qian Wu & Liangsheng Wang

China National Botanical Garden, Beijing, 100093, China

University of Chinese Academy of Sciences, Beijing, 100049, China

Xian Zhou, Xiaohan Wang & Liangsheng Wang

Hunan Agricultural University, Changsha, 410128, China

You can also search for this author in PubMed   Google Scholar

Contributions

QW conceived and designed the experiments. XZ performed the experiments. XHW and HHW analyzed the data. HJZ cultivated the waterlily plants. QW and XZ wrote the manuscript. LSW reviewed and edited the manuscript. All authors contributed to the article and approved the final manuscript.

Corresponding authors

Correspondence to Qian Wu or Liangsheng Wang .

Ethics declarations

Ethics approval and consent to participate.

The plant materials (not endangered materials or species) comply with local institutional guidelines and legislation.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary material 2, supplementary material 3, supplementary material 4, supplementary material 5, supplementary material 6, supplementary material 7, supplementary material 8, supplementary material 9, supplementary material 10, supplementary material 11, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Zhou, X., Wang, X., Wei, H. et al. Integrative analysis of transcriptome and target metabolites uncovering flavonoid biosynthesis regulation of changing petal colors in Nymphaea ‘Feitian 2’. BMC Plant Biol 24 , 370 (2024). https://doi.org/10.1186/s12870-024-05078-5

Download citation

Received : 05 March 2024

Accepted : 28 April 2024

Published : 07 May 2024

DOI : https://doi.org/10.1186/s12870-024-05078-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Nymphaea ‘Feitian 2’
  • Transcriptome
  • Color change

BMC Plant Biology

ISSN: 1471-2229

importance of data analysis in research methodology

To read this content please select one of the options below:

Please note you do not have access to teaching notes, unlocking the potential of industry 4.0 in brics nations: a systematic literature review and meta-analysis.

International Journal of Quality & Reliability Management

ISSN : 0265-671X

Article publication date: 17 May 2024

This study is intended to introduce and summarise Industry 4.0 practices in BRICS nations (the abbreviation “BRICS” is made up of the first letters of the member countries: Brazil, Russia, India, China and South Africa) and determine each nation’s current contribution to Industry 4.0 practice implementation based on past literature. As the BRICS countries continue to play an essential role in the global economy, it is significant to understand Industry 4.0, focussing on these emerging economies.

Design/methodology/approach

To assess the present research work on Industry 4.0 practices and research studies in BRICS nations, a systematic literature review (SLR) is performed using the articles available on the SCOPUS database. This study is a descriptive analysis based on the frequency and year of publications, the most influential universities, most influential journals and most influential articles. Similarly, this study consists of category analysis based on multi-criteria decision-making (MCDM) methods, research design used, research method utilised, different data analysis techniques and different Industry 4.0 technologies were used to solve different applications in the BRICS nations.

According to the analysis of past literature, the primary identified practices are centred on operations productivity, waste management, energy reduction and sustainable processes. It also found that despite the abundance of research on Industry 4.0, the major academic journal publications are restricted to a small number of industries and issues in which the manufacturing and automotive industries are front runners. The categorisation of selected papers based on the year of publication demonstrates that the number of publications has been rising. It is also found that China and India, out of the BRICS countries, have contributed significantly to Industry 4.0-related publications by contributing 61 percent of the total articles identified. Similarly, this study identified that qualitative research design is the most adopted framework for research, and empirical triangulation is the least adopted framework in this field. The categorisation of selected articles facilitates the identification of numerous gaps, such as that 67.14% of the literature research is qualitative.

Practical implications

Understanding Industry 4.0 in the BRICS nations helps to identify opportunities for international collaboration and future cooperation possibilities. This study helps to promote collaboration between BRICS countries and other nations, organisations or businesses interested in capitalising on these growing economies' assets and capabilities related to Industry 4.0 technologies. This study helps to provide essential insights into the economic, technological and societal impacts, allowing for effective decision-making and strategic planning for a sustainable and competitive future. So, this contribution links the entire world in terms of the better utilisation of resources, the reduction of downtime, improving product quality, personalised products and the development of human resource capabilities through the application of cutting-edge technologies for nearly half of the world’s population.

Originality/value

In this study, BRICS nations are selected due to their significant impact on the world regarding social, economic and environmental contributions. In the current review, 423 articles published up to August 2022 were selected from the SCOPUS database. The comparison analysis of each BRICS nation in the form of applications of Industry 4.0, the primary area of focus, leading industry working, industry involvement with universities and the area that needs attention are discussed. To the best of our knowledge, this is the most recent SLR and meta-analysis study about Industry 4.0 in BRICS nations, which analysed the past available literature in nine different descriptive and category-wise classifications, considering a total of 423 articles. Based on this SLR, this study makes some important recommendations and future directions that will help achieve social, economic and environmental sustainability in BRICS nations.

  • Industry 4.0
  • BRICS nations
  • Emerging economies
  • Competitive advantages
  • Literature review
  • Meta-analysis

Yadav, A. , Yadav, G. and Desai, T.N. (2024), "Unlocking the potential of Industry 4.0 in BRICS nations: a systematic literature review and meta-analysis", International Journal of Quality & Reliability Management , Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/IJQRM-06-2023-0180

Emerald Publishing Limited

Copyright © 2024, Emerald Publishing Limited

Related articles

We’re listening — tell us what you think, something didn’t work….

Report bugs here

All feedback is valuable

Please share your general feedback

Join us on our journey

Platform update page.

Visit emeraldpublishing.com/platformupdate to discover the latest news and updates

Questions & More Information

Answers to the most commonly asked questions here

  • Search Menu
  • Advance Access
  • Collections
  • Author Guidelines
  • Submission Site
  • Open Access Policy
  • Self-Archiving Policy
  • Why Submit?
  • About Horticulture Research
  • About Nanjing Agricultural University
  • Editorial Board
  • Advertising & Corporate Services
  • Journals on Oxford Academic
  • Books on Oxford Academic

Nanjing Agricultural University

Article Contents

Introduction, conclusions, materials and methods, acknowledgements, author contributions, data availability, conflict of interest statement.

  • < Previous

Multi-omics analysis reveals key regulatory defense pathways and genes involved in salt tolerance of rose plants

These authors contributed equally to this work.

  • Article contents
  • Figures & tables
  • Supplementary Data

Haoran Ren, Wenjing Yang, Weikun Jing, Muhammad Owais Shahid, Yuming Liu, Xianhan Qiu, Patrick Choisy, Tao Xu, Nan Ma, Junping Gao, Xiaofeng Zhou, Multi-omics analysis reveals key regulatory defense pathways and genes involved in salt tolerance of rose plants, Horticulture Research , Volume 11, Issue 5, May 2024, uhae068, https://doi.org/10.1093/hr/uhae068

  • Permissions Icon Permissions

Salinity stress causes serious damage to crops worldwide, limiting plant production. However, the metabolic and molecular mechanisms underlying the response to salt stress in rose ( Rosa spp.) remain poorly studied. We therefore performed a multi-omics investigation of Rosa hybrida cv. Jardin de Granville (JDG) and Rosa damascena Mill. (DMS) under salt stress to determine the mechanisms underlying rose adaptability to salinity stress. Salt treatment of both JDG and DMS led to the buildup of reactive oxygen species (H 2 O 2 ). Palisade tissue was more severely damaged in DMS than in JDG, while the relative electrolyte permeability was lower and the soluble protein content was higher in JDG than in DMS. Metabolome profiling revealed significant alterations in phenolic acid, lipids, and flavonoid metabolite levels in JDG and DMS under salt stress. Proteome analysis identified enrichment of flavone and flavonol pathways in JDG under salt stress. RNA sequencing showed that salt stress influenced primary metabolism in DMS, whereas it substantially affected secondary metabolism in JDG. Integrating these datasets revealed that the phenylpropane pathway, especially the flavonoid pathway, is strongly enhanced in rose under salt stress. Consistent with this, weighted gene coexpression network analysis (WGCNA) identified the key regulatory gene chalcone synthase 1 ( CHS1 ), which is important in the phenylpropane pathway. Moreover, luciferase assays indicated that the bHLH74 transcription factor binds to the CHS1 promoter to block its transcription. These results clarify the role of the phenylpropane pathway, especially flavonoid and flavonol metabolism, in the response to salt stress in rose.

Rose ( Rosa spp.) is a popular ornamental crop that is also used in the cosmetics, perfume and medicine. Rose plants contains various bioactive substances, including flavonoids, fragrant components, and hydrolysable and condensed tannins, which have high value and market potential [ 1 ]. However, soil salinization is common in many rose-growing regions, and high salt concentrations in soil can severely inhibit rose plant growth, reduce flower quality, and cause significant economic losses [ 2 ]. Additionally, salt stress can enhance the secondary metabolites of roses such as citronellol, geraniol, and phenyl ethyl alcohol [ 3 , 4 ]. Such alterations in secondary metabolites may help to regulate the salt tolerance of rose. Research on roses has focused mainly on flower quality, petal development, and flower bloom [ 5–7 ], and there are limited data available regarding signaling pathways linking plant development and secondary metabolites associated with salt stress.

In plants, salt stress induces osmotic imbalances, which lead to the closure of leaf stomata, limit photosynthesis, and affect plant growth and metabolism [ 8 ]. To alleviate osmotic stress and protect themselves from its adverse effects, plants accumulate numerous compatible solutes (such as soluble proteins, soluble sugars, and proline), known collectively as osmoprotectants [ 9 ]. Moreover, plants generate reactive oxygen species (ROS) to cope with salt stress [ 10 ]. Nevertheless, excessive ROS accumulation can lead to oxidative DNA damage, affect protein biosynthesis, and ultimately result in cell damage and death [ 11 , 12 ]. Plant cells utilize both enzymatic and nonenzymatic antioxidant mechanisms to diminish ROS levels and prevent oxidative damage. Superoxide dismutase (SOD), peroxidase (POD), ascorbate peroxidase (APX), catalase (CAT), and glutathione peroxidase (GPX) are antioxidant enzymes that work as O 2− and H 2 O 2 scavengers [ 13 , 14 ]. Nonenzymatic antioxidants, such as ascorbate, glutathione, phenols, and flavonoids, also play vital roles in ROS scavenging [ 15 , 16 ].

Flavonoids are naturally occurring bioactive substances found in fruits, vegetables, tea, and medicinal plants [ 17 ]. Flavonoids comprise more than 9000 compounds and constitute a substantial category of plant secondary metabolites [ 18 ]. They have diverse biological functions in the growth and development of plants, including improving pollen fertility, imparting color, and influencing seed dormancy and germination [ 19 , 20 ]. In addition, flavonoids have protective roles against biotic and abiotic stresses, such as pathogen infections, ultraviolet (UV)-B, cold, drought, and salinity [ 21–23 ]. Flavonoids have also received widespread attention due to their possible benefits for human health [ 24 ].

The molecular mechanism of flavonoid biosynthesis has been elucidated in many plants [ 25 ]. Chalcone synthase (CHS) mediates the first step in flavonoid production, catalyzing the formation of naringenin chalcone from three molecules of malonyl CoA and one molecule of 4-coumaroyl CoA. Chalcone isomerase (CHI) then quickly converts naringenin chalcone into naringenin (flavanone), which is further biosynthesized into different flavonoids by the subsequent enzymes in this pathway [ 26 ]. Although the biosynthesis of flavonoids has attracted increasing attention from scholars, current research does not fully explain the effects of regulatory factors on the transcription and activity of the major enzymes in flavonoid metabolism. Therefore, further research on the signaling molecules and regulatory pathways associated with flavonoids, as well as their regulatory mechanisms, is needed to elucidate the physiological activity of flavonoids.

Rosa hybrida cv. Jardin de Granville (JDG) is a new hybrid rose developed by 'Les Roses Anciennes André Eve' for the Prestige range of Christian Dior skin care products. JDG possesses twice the vitality of a traditional rose and grows and blooms vigorously in the salty air and harsh winds of coastal climates. JDG is also rich in beneficial bioactive substances that are mainly used in cosmetics and anti-aging skin care creams [ 27 , 28 ]. Rosa damascena Mill. (DMS) is one of the most common fragrant roses in the Rosaceae family. Its essential oils and aromatic compounds are used extensively in the cosmetic and food industries worldwide [ 29 ]. DMS is considered an excellent rose throughout the world due to its high resistance to abiotic stress and abundance of beneficial secondary metabolites [ 30 ].

Here, we conducted an integrated analysis on the transcriptomes, proteomes, and metabolomes of JDG and DMS to explore the relationship between plant development and secondary metabolites of rose under salt stress. We used WGCNA and Cytoscape software to decipher the similarities and differences in the complex metabolic pathways and regulatory genes of JDG and DMS under salt stress. These results provide comprehensive information on the metabolic and molecular mechanisms of the response to salt stress in rose, promoting the cultivation of excellent new rose varieties that are both salt tolerant and rich in beneficial secondary metabolites.

JDG is more tolerant than DMS to salt stress

To explore the salt tolerance of rose, plants of JDG and DMS were treated with 400 mM NaCl for 2 weeks. DMS plants showed typical damage with yellowing and death of leaves, while JDG leaves only exhibited slight wilting ( Fig. 1A ). Additionally, detached rose leaves were treated with salt for 4 days; DMS leaves showed significantly more necrosis than JDG leaves ( Fig. 1B ). In order to quickly observe the response of rose cultivars to salt stress and convenience sampling, subsequent experiments mainly used detached rose leaves. To examine the overall anatomy and morphology of leaves treated for 2 days with NaCl, we stained treated and control leaves with toluidine blue and prepared thin sections. Palisade tissue damage in response to salt treatment was more severe in DMS than in JDG (indicated by red arrowheads in Fig. 1C ). To investigate ROS accumulation in response to salt stress, we performed 3, 3'-diaminobenzidine (DAB) staining. DMS leaves accumulated substantially more ROS (deeper staining) than JDG plants after salt stress, whereas there was no difference in ROS content between these two cultivars under normal conditions ( Fig. 1D, E ). Soluble protein content was higher in JDG leaves after 4 days of salt stress than after 2 days of salt stress, while the soluble protein content of DMS leaves was much higher than that of before treatment leaves after 2 days and decreased by 4 days of salt treatment ( Fig. 1F ). The relative electrolyte permeability of JDG leaves was increased slightly after 2 days of salt treatment and more substantially after 4 days of treatment, while relative electrolyte permeability was much higher in DMS than in JDG on both days after salt treatment ( Fig. 1G ). Phenotypic and physiological analyses indicated that JDG is more salt tolerant than DMS.

Phenotypes of JDG and DMS under salt stress. (A) Phenotypes of JDG and DMS plants after 2 weeks of treatment with 400 mM NaCl. Left, phenotype of the whole plant; right, enlarged image of the protruding part indicated by the red circle. Bars, 3 cm. (B) Detached leaves of rose on different days after onset of salt stress (400 mM NaCl). (C) Anatomical analysis of leaves in (B). Red arrowheads represent the palisade tissue. Mock (0 mM NaCl); NaCl (400 mM NaCl). Bars, 50 μm. (D) Tissue staining of rose leaves under salt stress using DAB. (E) Quantitative statistics of the relative staining intensity in (D). Brown staining area and total leaf area were measured using ImageJ software, their ratio is the relative staining intensity. (F) Soluble protein content of rose leaves at different days under salt treatment. (G) Relative electrolyte permeability of rose leaves at different days under salt treatment. Data are based on the mean ± SE of at least three repeated biological experiments.

Phenotypes of JDG and DMS under salt stress. (A) Phenotypes of JDG and DMS plants after 2 weeks of treatment with 400 mM NaCl. Left, phenotype of the whole plant; right, enlarged image of the protruding part indicated by the red circle. Bars, 3 cm. (B) Detached leaves of rose on different days after onset of salt stress (400 mM NaCl). (C) Anatomical analysis of leaves in (B). Red arrowheads represent the palisade tissue. Mock (0 mM NaCl); NaCl (400 mM NaCl). Bars, 50 μm. (D) Tissue staining of rose leaves under salt stress using DAB. (E) Quantitative statistics of the relative staining intensity in (D). Brown staining area and total leaf area were measured using ImageJ software, their ratio is the relative staining intensity. (F) Soluble protein content of rose leaves at different days under salt treatment. (G) Relative electrolyte permeability of rose leaves at different days under salt treatment. Data are based on the mean ± SE of at least three repeated biological experiments.

Flavonoid metabolites play an important role in the salinity tolerance of rose

To better understand how salt stress affects rose metabolites, we performed a comprehensive untargeted analysis of metabolites using ultra-performance liquid chromatography/mass spectrometry (UPLC/MS). Fig. S1A shows the different metabolites detected, and Fig. S1B shows the curves of the quality control samples, indicating that the mass spectral data were highly reproducible and reliable. Principal component analysis (PCA) was used to reduce the data dimensions and clarify the relationships among the samples. The two principal components PC1, and PC2 could explain 50.07% and 23.36% of the variance, respectively. Moreover, PC1 revealed variance in genotypes, while PC2 revealed differences in time of exposure to salt stress. Thus, the metabolite-based PCA revealed obvious differences in salt tolerance between the two cultivars ( Fig. S2A ).

Our screening for differentially accumulated metabolites (DAMs) identified hundreds of metabolites with significantly altered accumulation under salt stress ( Fig. 2A , Table S1 ). Preliminary analysis indicated that DAMs included amino acids and their derivatives, nucleotides and their derivatives, phenolic acids, flavonoids, lipids, tannins, lignans and coumarins, organic acids, alkaloids, and terpenoids, and most of the DAMs were upregulated under salt stress ( Fig. 2B ). Phenolic acids, lipids, and flavonoid metabolites showed significantly altered accumulation under salt stress in both JDG and DMS. Compared with their levels in DMS, flavonoid metabolites, phenolic acid metabolites, and lipids were differentially accumulated in JDG leaves under both control conditions and salt stress ( Table S1 ). These results indicate that flavonoid metabolites, phenolic acid metabolites, and lipids may play important roles in the salt tolerance of rose.

Metabolomic analysis of JDG and DMS under salt stress. (A) Number of DAMs in different comparison groups. (B) Classification of DAMs in each comparison. (C) Classification of DAMs upregulated in both JDG and DMS under salt treatment. (D) Classification of DAMs upregulated in JDG compared with DMS under both control and salt treatments. (E, F) KEGG pathway enrichment of DAMs under salt stress: (E) JDG-NaCl vs JDG-Mock and (F) DMS-NaCl vs DMS-Mock.

Metabolomic analysis of JDG and DMS under salt stress. (A) Number of DAMs in different comparison groups. (B) Classification of DAMs in each comparison. (C) Classification of DAMs upregulated in both JDG and DMS under salt treatment. (D) Classification of DAMs upregulated in JDG compared with DMS under both control and salt treatments. (E, F) KEGG pathway enrichment of DAMs under salt stress: (E) JDG-NaCl vs JDG-Mock and (F) DMS-NaCl vs DMS-Mock.

To determine how metabolites differ between JDG and DMS, we summarized the differences in metabolite accumulation in the different comparison groups using Venn diagrams. Groups JDG-NaCl vs JDG-Mock and DMS-NaCl vs DMS-Mock shared 109 of the same metabolite changes, of which 79 were increases and 15 were decreases. Among the upregulated metabolites, phenolic acids and flavonoids accounted for 21.52% and 7.59%, respectively. These metabolites included ferulic acid, coniferaldehyde, pinocembrin (dihydrochrysin), naringin, eucalyptin (5-hydroxy-7,4'-dimethoxy-6,8-dimethylflavone), patuletin (quercetagetin-6-methyl ether), naringenin-7- O -rutinoside-4'- O -glucoside, naringin (naringenin-7- O -neohesperidoside), and sudachitin ( Fig. 2C , Fig. S2B–D , Table S1 ). Notably, 5,7,8,4'-tetramethoxyflavone, vanillic acid-4- O -glucoside, and 3',4',5',5,7-pentamethoxyflavone were upregulated in JDG and downregulated in DMS under salt stress, while kaempferol-3- O -arabinoside-7- O -rhamnoside was upregulated in DMS and downregulated in JDG. Groups JDG-Mock vs DMS-Mock and JDG-NaCl vs DMS-NaCl shared 408 metabolites showing the same tendency in alteration, of which accumulation of 188 was increased and 202 was decreased. Among the upregulated metabolites, phenolic acids and flavonoids accounted for 29.26% and 33.51%, respectively ( Fig. 2D ). Notably, the genkwanin (apigenin 7-methyl ether) content was 12.74-fold higher, the 5,7-dihydroxy-6,3′,4′,5′-tetramethoxyflavone (arteanoflavone) content was 15.64-fold higher, the naringenin-4′,7-dimethyl ether content was 13-fold higher, and the naringin dihydrochalcone content was 13.30-fold in JDG compared with DMS under control conditions; all of these are flavonoid metabolites. Venn analysis also showed that many metabolites displaying changes under salt stress were genotype specific, indicating that the cultivars have different mechanisms of response to salinity. There were 77 metabolites that specifically accumulated in JDG under salt stress, which may represent the major metabolites in the salt stress response of JDG. Notably, four metabolites—ethylsalicylate (a phenolic acid), salidroside (a phenolic acid), L-ornithine (amino acids and derivatives), and epiafzelechin (a flavonoid)—accumulated specifically in JDG after salt treatment and were also highly accumulated under control conditions in JDG compared with DMS ( Fig. S2B–D , Table S1 ).

All DAMs were analyzed using Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment ( Fig. 2E, F , Fig. S3A, B ). In JDG (JDG-NaCl vs JDG-Mock group), salt stress induced changes in metabolites mainly involved 'purine metabolism,' 'phenylpropanoid biosynthesis,' 'linoleic acid metabolism,' and 'alpha-linolenic acid metabolism' ( Fig. 2E ). In DMS (DMS-NaCl vs DMS-Mock group), the DAMs in leaves under salt stress were mainly associated with 'phenylpropanoid biosynthesis,' 'alpha-linolenic acid metabolism,' 'linoleic acid metabolism,' and 'pentose and glucuronate interconversions' ( Fig. 2F ). In the JDG-Mock vs DMS-Mock group, DAMs between leaves of DMS and JDG were mostly associated with 'flavonoid biosynthesis,' 'flavone and flavonol biosynthesis,' and 'phenylpropanoid biosynthesis' ( Fig. S3A ). Meanwhile, in the JDG-NaCl vs DMS-NaCl group, DAMs were largely involved in 'flavonoid biosynthesis,' 'flavone and flavonol biosynthesis,' and 'linoleic acid metabolism' ( Fig. S3B ). KEGG enrichment analysis showed that 'linolenic acid/α-linolenic acid metabolism' and 'phenylpropanoid biosynthesis' were significantly enriched under salt stress in both cultivars, indicating that these two pathways play important roles under salt stress in rose. Regardless of the presence of salt stress, DAMs between DMS and JDG were concentrated in the flavone, flavonoid, and flavonol biosynthetic pathways, indicating that differential accumulation of these metabolites may be the main reason for different salt sensitivities among rose cultivars. Notably, 'caffeine metabolism' was enriched in JDG, while 'starch and sucrose metabolism' was significantly increased in DMS.

Salt stress causes dynamic changes in distinct sets of proteins

To delve deeper into the molecular mechanisms of the salt stress response in rose plants, we performed a proteome profiling analysis under the same salt treatment and control conditions as the metabolome analysis and characterized proteins on the basis of fold changes in their accumulation level. We identified 119 (87 upregulated and 32 downregulated) and 163 (83 downregulated and 80 upregulated) proteins with significantly differential accumulation under salt stress in JDG and DMS, respectively ( Fig. 3A, B ). Only 18 differentially accumulated proteins (DAPs) overlapped between the two cultivars, of which 13 were upregulated and 4 were downregulated in both JDG and DMS, while one DUF1279 domain–containing protein was upregulated in JDG and downregulated in DMS. Moreover, 101 DAPs were unique to JDG, whereas 145 DAPs were unique to DMS ( Table S2 ).

Proteomic analysis of rose under salt stress. (A) Number of DAPs in JDG and DMS. (B) Venn diagram of the DAPs in JDG and DMS. (C) Localizations of DAPs identified in JDG. (D) Functional categorization of DAPs unique to JDG. (E, F) KEGG enrichment analysis of DAPs in JDG (upregulated, E) and DMS (upregulated, F).

Proteomic analysis of rose under salt stress. (A) Number of DAPs in JDG and DMS. (B) Venn diagram of the DAPs in JDG and DMS. (C) Localizations of DAPs identified in JDG. (D) Functional categorization of DAPs unique to JDG. (E, F) KEGG enrichment analysis of DAPs in JDG (upregulated, E) and DMS (upregulated, F).

We predicted that most of the DAPs are located in chloroplasts in rose, according to the WoLFPSORT database ( Fig. 3C , Fig. S4A ). Gene Ontology (GO) and KEGG analyses were performed to analyze and annotate protein functions. The 20 most highly enriched GO terms associated with the DAPs are depicted in a circle diagram ( Fig. S5A, B , Table S2 ). Among them, GO:0046658 (anchored component of plasma membrane), GO:0051554 (flavonol metabolic process), GO:0047893 (flavonol 3- O -glucosyltransferase activity), and GO:0051555 (flavonol biosynthetic process) were highly enriched in JDG under salt stress. In DMS, GO:0006720 (isoprenoid catabolic process), GO:0005764 (lysosome), and GO:0004602 (glutathione peroxidase activity) were the most enriched among all GO terms. In addition, the GO data indicated that the DAPs specific to JDG were highly involved in the 'icosanoid metabolic process,' 'diterpenoid metabolic process,' and 'diterpenoid biosynthetic process' ( Fig. 3D ), whereas the DAPs specific to DMS were enriched in 'cellular hyperosmotic salinity response,' 'monocarboxylic acid catabolic process,' 'terpenoid catabolic process,' 'sesquiterpenoid catabolic process,' and 'apocarotenoid catabolic process' functions ( Fig. S4B ). DAPs shared by JDG and DMS included Q2VA35 (xyloglucan endotransglucosylase/hydrolase) and A0A2P6P708 (glutathione peroxidase), which are present only in extracellular regions ( Table S2 ). The DAPs in different comparison groups were classified and then clustered according to enrichment of their associated GO terms ( Fig. S4C ). We determined that salinity mainly influences flavone and flavonol metabolism pathways in JDG. Flavones and flavonols are antioxidants and bioactive reagents [ 24 ]. In DMS, salt mainly influences the osmotic response, water stimulus response, and salt stress response pathways, most of which are stress related [ 31 ]. We used KEGG enrichment to determine the metabolic pathways associated with the DAPs in JDG and DMS under salt stress ( Fig. 3E, F ). Many DAPs in JDG were associated with phenylpropanoid biosynthesis and alpha-linolenic acid metabolism, with examples including lipoxygenase (A0A2P6S713), 12-oxophytodienoate reductase (A0A2P6PFD8), peroxidase (A0A2P6R8H8), and flavone 3′- O -methyltransferase (A0A2P6RK21). The DAPs upregulated in DMS under salt stress were frequently associated with alpha-linolenic acid metabolism and glutathione metabolism, whereas the DAPs that were downregulated were associated with ribosomes ( Table S2 ). Notably, alpha-linolenic acid metabolism was significantly upregulated in both JDG and DMS under salt stress. Collectively, the GO and KEGG enrichment results show that salt stress causes dynamic changes in distinct sets of proteins in rose.

Salt stress differentially alters the transcriptomes of JDG and DMS

To identify the genes involved in salt stress and explore the molecular mechanisms of salt tolerance in DMS and JDG, we sequenced the transcriptomes of JDG and DMS leaves by RNA sequencing (RNA-seq). We obtained high-quality reads for transcriptome analysis ( Table S3 ). PCA showed a distinct difference between the two cultivars along PC1, and PC2 separated the treatment from the control. The three biological replicates in the ordination space were mostly clustered together, suggesting an acceptable correlation between replicates ( Fig. 4A ).

Transcriptomic analysis of JDG and DMS under salt stress. (A) PCA score plot of transcriptomic profiles from different cultivars. (B) Number of DEGs in JDG and DMS. (C–E) Venn diagrams of DEGs in JDG and DMS: (C) total DEGs, (D) upregulated DEGs, and (E) downregulated DEGs. (F, G) KEGG enrichment analysis of DEGs in JDG (F) and DMS (G).

Transcriptomic analysis of JDG and DMS under salt stress. (A) PCA score plot of transcriptomic profiles from different cultivars. (B) Number of DEGs in JDG and DMS. (C–E) Venn diagrams of DEGs in JDG and DMS: (C) total DEGs, (D) upregulated DEGs, and (E) downregulated DEGs. (F, G) KEGG enrichment analysis of DEGs in JDG (F) and DMS (G).

Correlation analysis of transcriptome, proteome, and metabolomics data. (A, B) KEGG enrichment analysis of combined transcriptome, proteome, and metabolome data: (A) JDG-NaCl vs JDG-Mock, and (B) DMS-NaCl vs DMS-Mock. The x-axis shows the enrichment factor of the pathway in different omics, and the y-axis shows the name of the KEGG pathway; the color from red to green represents the significance of enrichment from high to low (indicated by the P value). The size of bubbles indicates the number of DEGs, DAPs, or DAMs; the larger the number, the larger the symbol. The shape of bubbles illustrates the various omics: circles represent genes omics, triangles represent metabolites omics, and squares represent proteins omics. (C) Co-expression network of major genes, proteins, and metabolites in the phenylpropanoid pathway. Different colors indicate the value of log2Fold Change (NaCl/Mock), with red for upregulated and blue for downregulated genes, proteins, or metabolites.

Correlation analysis of transcriptome, proteome, and metabolomics data. (A, B) KEGG enrichment analysis of combined transcriptome, proteome, and metabolome data: (A) JDG-NaCl vs JDG-Mock, and (B) DMS-NaCl vs DMS-Mock. The x-axis shows the enrichment factor of the pathway in different omics, and the y-axis shows the name of the KEGG pathway; the color from red to green represents the significance of enrichment from high to low (indicated by the P value). The size of bubbles indicates the number of DEGs, DAPs, or DAMs; the larger the number, the larger the symbol. The shape of bubbles illustrates the various omics: circles represent genes omics, triangles represent metabolites omics, and squares represent proteins omics. (C) Co-expression network of major genes, proteins, and metabolites in the phenylpropanoid pathway. Different colors indicate the value of log 2 Fold Change (NaCl/Mock), with red for upregulated and blue for downregulated genes, proteins, or metabolites.

We analyzed differentially expressed genes (DEGs) in JDG and DMS under control and salt stress conditions. We detected 10,662 DEGs in DMS under salt stress, of which 4651 were upregulated and 6011 were downregulated. However, only 1990 genes were differentially expressed in JDG: 1102 upregulated and 888 downregulated ( Fig. 4B ). The smaller number of DEGs in JDG than in DMS under salt stress implies that JDG is less affected by salt stress. We used a Venn diagram to display the differences between various genes in DMS and JDG under salt stress. Group DMS-NaCl vs DMS-Mock and group JDG-NaCl vs JDG-Mock shared 1120 DEGs under salt stress, with 577 upregulated genes and 433 downregulated genes ( Fig. 4C–E ).

Next, we performed GO analysis of DEGs in the categories cellular component (CC), biological process (BP), and molecular function (MF). The top 21 most enriched GO terms associated with DEGs of JDG-NaCl vs JDG-Mock and DMS-NaCl vs DMS-Mock are presented in circle diagrams ( Fig. S6 , Table S4 ). Seven GO terms associated with the JDG-NaCl vs JDG-Mock group were highly involved in the BP category, among which GO:0016052 (carbohydrate catabolic process), GO:0009813 (flavonoid biosynthetic process), and GO:0009812 (flavonoid metabolic process) contained the most DEGs (43, 26, and 27, respectively), and most of these enriched genes were upregulated. Thirteen GO terms were highly involved in the MF category, among which GO:0010427 (abscisic acid binding), GO:0016832 (aldehyde-lyase activity), and GO:0019840 (isoprenoid binding) were highly significant. One GO term was highly involved in the CC category: GO:0031226 (intrinsic component of plasma membrane). Moreover, 19 GO terms associated with the DMS-NaCl vs DMS-Mock group were enriched in the BP category, among which GO:0036294 (cellular response to decreased oxygen levels), GO:0048511 (rhythmic process), and GO:0048585 (negative regulation of response to stimulus) contained the most DEGs (85, 95, and 146, respectively), and most of these enriched genes were downregulated. One GO term was enriched in the MF category: GO:0016854 (racemase and epimerase activity). Similarly, one GO term was enriched in the CC category: GO:0009501 (amyloplast). KEGG pathway enrichment analysis for JDG-NaCl vs JDG-Mock revealed that the DEGs were mainly involved in metabolic pathways, plant hormone signal transduction, biosynthesis of secondary metabolites, and glycolysis/gluconeogenesis ( Fig. 4F , Table S4 ). In the DMS-NaCl vs DMS-Mock group, the DEGs were chiefly enriched in metabolic pathways, plant hormone signal transduction, the MAPK signaling pathway, biosynthesis of cofactors, and ubiquitin-mediated proteolysis ( Fig. 4G , Table S4 ). These findings indicate that the biosynthesis of secondary metabolites is substantially enhanced under salt stress in JDG, but not in DMS. However, the biosynthesis of cofactors associated with primary metabolism is enhanced under salt stress in DMS. Therefore, we speculate that salinity results in large changes in primary metabolism in DMS, while it influences secondary metabolism in JDG.

Transcription factors (TFs) are essential for regulating the expression of stress response genes. Among the DEGs, we identified 114 TFs in JDG and 491 TFs in DMS, covering 39 TF families ( Table S4 ). The most abundant genes belonged to the AP2/ERF-ERF, MYB, NAC, bHLH, and C2C2 families ( Fig. S7A, B ). Moreover, 64 TFs were differentially expressed in both cultivars in response to salinity. We speculate that these TFs form a highly complex transcriptional regulatory network and could perform critical functions in the mechanism of salt tolerance in rose.

Expression of phenylpropanoid-related genes is correlated with proteins and metabolites affected by salt stress

Integrated analysis of multi-omics data provides a powerful tool for identifying significantly different pathways and crucial metabolites in biological processes. Here, we integrated our transcriptome, proteome, and metabolome data to determine the performance of the two rose cultivars under salt stress. Pathways associated with alpha-linolenic acid metabolism, phenylpropanoid biosynthesis, and starch and sucrose metabolism were significantly enriched in JDG under salt stress ( Fig. 5A ), while the pathways enriched in DMS were involved in starch and sucrose metabolism, cyanoamino acid metabolism, and phenylpropanoid biosynthesis ( Fig. 5B ). Starch and sucrose metabolism represent primary metabolic functions common to different cultivars [ 32 ], while alpha-linolenic acid metabolism is related to the biosynthesis of jasmonic acid, which is a phytohormone involved in fungal invasion and senescence [ 7 ]. The phenylpropanoid biosynthesis pathway comprises multiple secondary metabolites, which confer a range of colors, flavors, nutritional components, and bioactivities in plants. Flavonoids are an important type of phenylpropanoid that play key roles in resistance against biotic and abiotic stresses [ 24 ]. Thus, we focused on the phenylpropanoid pathway.

Gene–protein–metabolite correlation networks can be used to elucidate functional relationships and identify regulatory factors. Therefore, we analyzed the regulatory networks of the DEGs, DAPs, and DAMs related to phenylpropanoid metabolism. We identified 14 DEGs that were strongly correlated with one DAP and six DAMs in JDG under salt stress. Similarly, 25 DEGs were strongly correlated with one DAP and eight DAMs in DMS under salt stress ( Table S5 ). For example, in JDG, there was a strong correlation between the expression of one gene (RchiOBHmChr4g0430951) and the abundance of one protein (A0A2P6PM56) and two metabolites [coniferyl alcohol (mws0093) and sinapyl alcohol (mws0853)]. Epiafzelechin (mws1422) was also significantly associated with the expression of the gene RchiOBHmChr2g0092641. In DMS, there was a close association between the expression of three genes (RchiOBHmChr2g0092671, RchiOBHmChr3g0480401, and RchiOBHmChr5g0041231) and the abundance of one protein (A0A2P6QM41) and one metabolite [L-tyrosine (mws0250)]. The strong association of particular genes with phenylpropanoid proteins or metabolites suggests that these genes play a major role in phenylpropanoid biosynthesis under salt stress.

We selected 20 important genes in the biosynthetic pathway of phenylpropanoid and compared their expression between rose cultivars ( Table S6 ). The transcript levels of many genes ( 4CL1 , CCR1 , HCT1 , HCT2 , HCT3 , HCT4 , CHS1 , CHS2 , CHI , DFR , F3H , and ANR ) were higher in JDG than in DMS, which may be valuable for salt tolerance by stimulating JDG to produce more flavonoids. Our multi-omics analysis revealed that ferulic acid, sinapic acid, and coniferaldehyde accumulated to high levels in JDG under salt stress ( Fig. 5C , Table S1 ). We also compared the flavonoid compounds in the two cultivars. Quercetin-3,3′-dimethyl ether, 5,7-dihydroxy-6,3′,4′,5′-tetramethoxyflavone (arteanoflavone), naringenin-4′,7-dimethyl ether, naringin dihydrochalcone, genkwanin (apigenin 7-methyl ether), and mearnsetin accumulated to greater levels in JDG than in DMS under control conditions. Correspondingly, the flavonoids brickellin, 3- O -methylquercetin, 5,2′,5′-trihydroxy-3,7,4′-trimethoxyflavone-2′- O -glucoside, and kaempferol-3- O -(6′′-acetyl)glucosyl-(1→3)-galactoside were more abundant in JDG than in DMS under salt stress. By contrast, naringenin-4′,7-dimethyl ether, aromadendrin (dihydrokaempferol), pinocembrin-7- O -(6′′- O -malonyl)glucoside, Quercetin-3- O -(2”- O -glucosyl)glucuronide, were specifically accumulated in DMS. Moreover, 3′,4′,5′,5,7-pentamethoxyflavone, 3,5,7,3′4′-pentamethoxyflavone, and 5,7,8,4′-tetramethoxyflavone were abundant in JDG under salt stress but were decreased in DMS ( Table S7 ). Overall, the integration of the three omics datasets indicated that the phenylpropane pathway, especially the flavonoid pathway, is strongly enhanced under salinity conditions and that this contributes to salt tolerance in roses, especially in the JDG genotype.

Networks of co-expressed genes associated with phenylpropanoid biosynthesis are involved in the salt stress response

To identify candidate genes associated with phenylpropanoid biosynthesis, we constructed co-expression gene network modules via weighted gene correlation network analysis (WGCNA). We constructed a cluster tree based on correlation between expression levels (indicated by fragments per kilobase of script per million fragments mapped, FPKM), which partitioned the genes into 11 different gene modules ( Fig. 6A, B ). To identify candidate genes that play significant roles within the gene networks, we extracted annotation information for all these genes from the Rosa chinensis 'Old Blush' reference genome annotation database. We selected 16 genes contributing to phenylpropanoid biosynthesis and four genes associated with flavonoid biosynthesis. Table S8 lists the annotated genes participating in flavonoid-related pathways in JDG. Among the 11 modules, the green module contained 10 of these genes: CHS1 , CHS2 , CCR1 , HCT3 , HCT4 , CCoAOMT , F3H , DFR , ANR , and CHI . The turquoise module contained three genes: CCR2 , HCT1 , and CAD2 . The blue module contained three genes: PRDX1 , 4CL1 , and ANS . The red, yellow, brown, and black modules each contained one gene: CAD1 , PRDX2 , HCT2 , and 4CL2 , respectively ( Table S8 ). After combining certain genes in modules and comparing them with the DEGs, we checked and confirmed these results using reverse-transcription quantitative PCR (RT-qPCR). The expression trends of eight DEGs from phenylpropanoid and flavonoid biosynthesis pathways matched the results of RNA-seq ( Fig. S8 ).

Co-expression network related to flavonoid biosynthesis. (A) Clustering tree based on the correlation between gene expression levels. (B) Module–sample relationships. Each row represents a gene module, with the same color in as (A); each column represents a sample; the boxes within the chart contain corresponding correlations and P values. (C–E) Networks built from correlations among structural genes and TFs. Circles represent genes, and the size of the circle represents the number of relationships between genes in the network and surrounding genes. Lines represent regulatory relationships between genes, and different colored lines represent different connection strengths: red, strong connections; green, weak connections. (F) Heat map depicting the expression profiles of 15 TF genes. The scale bar denotes the Fold change/(mean expression levels across the three treatment groups). The color indicates relative levels of gene expression, horizontal rows represent the different treatments in JDG, and vertical columns show the TFs. (G) Representative images of transient expression of bHLH74 and LUC driven by the CHS1 promoter in Nicotiana benthamiana leaves. The color scale represents the signal level. High represents a strong signal, and low represents a weak signal. (H) Relative value of LUC/REN. Data are based on the mean ± SE of at least three repeated biological experiments. Significance determined using Student’s t-test (**P < 0.01).

Co-expression network related to flavonoid biosynthesis. (A) Clustering tree based on the correlation between gene expression levels. (B) Module–sample relationships. Each row represents a gene module, with the same color in as (A); each column represents a sample; the boxes within the chart contain corresponding correlations and P values. (C–E) Networks built from correlations among structural genes and TFs. Circles represent genes, and the size of the circle represents the number of relationships between genes in the network and surrounding genes. Lines represent regulatory relationships between genes, and different colored lines represent different connection strengths: red, strong connections; green, weak connections. (F) Heat map depicting the expression profiles of 15 TF genes. The scale bar denotes the Fold change/(mean expression levels across the three treatment groups). The color indicates relative levels of gene expression, horizontal rows represent the different treatments in JDG, and vertical columns show the TFs. (G) Representative images of transient expression of bHLH74 and LUC driven by the CHS1 promoter in Nicotiana benthamiana leaves. The color scale represents the signal level. High represents a strong signal, and low represents a weak signal. (H) Relative value of LUC/REN. Data are based on the mean ± SE of at least three repeated biological experiments. Significance determined using Student’s t -test ( ** P < 0.01).

To determine the regulatory genes involved in phenylpropanoid biosynthesis in JDG, we constructed three subnetworks from the different modules using the 20 phenylpropanoid biosynthesis–related DEGs as the nodes ( Table S9 ). In the regulatory networks of phenylpropanoid biosynthesis, we identified 15 TF genes from seven TF families: AP2/ERF-ERF (5 unigenes), bHLH (3 unigenes), MYB (3 unigenes), Alfin-like (1 unigene), SBP (1 unigene), C2C2-GATA (1 unigene), and TCP (1 unigene). bHLH62 and bHLH74 were strongly associated with CHS1 , CHS2 , CHI , CCR1 , and F3H ; ERF81 was strongly associated with 4CL1 ; and ERF110 and MYB-related were strongly associated with 4CL2 ( Fig. 6C–E ), indicating that CHS and 4CL are the major target genes in phenylpropanoid biosynthesis. Therefore, we speculated that the abundance of flavonoids is increased by enhancing the expression of upstream flavonoid biosynthesis genes. Fig. 6F shows a heat map of expression of the 15 TF genes after NaCl treatment. The green module contained a substantial number of phenylpropanoid biosynthesis genes, among which CHS1 was closely related to the TFs bHLH74 and bHLH62. Therefore, dual-luciferase reporter assays were conducted to determine their regulatory relationship ( Fig. 6G, H ). We used bHLH74 and bHLH62 driven by the CaMV35S promoter as effectors in a transient expression system, with the CHS1 promoter fused with LUC as a reporter. When we cotransformed Nicotiana benthamiana leaves with the effectors and the reporter, the LUC/REN ratio of CHS1 was 0.3/1, which was drastically lower than those of the controls ( Fig. 6G, H , Fig. S9A, B ). These results indicate that bHLH74, but not bHLH62, inhibits the expression of CHS1 .

Salt stress damages the structure and osmotic potential of rose leaves

Roses belong to the Rosaceae family and are one of the most important commercial flower crops. Extracts from various parts of the rose plant have also been shown to have excellent biological activity and are used in industries such as cosmetics, perfume and medicine [ 1 ]. Meanwhile, an increasing number of wild rose varieties with significant health benefits are being domesticated and brought into mainstream cultivation [ 33 ]. Salt stress is one of the most widespread abiotic constraints for rose cultivation. Salt stress threatens plant survival and growth but can stimulate an increase in the biosynthesis of secondary metabolites [ 34 ]. Previous studies have shown that optimal coordination between leaf structure and photosynthetic processes is essential for enabling plants to tolerate salt stress [ 35 ]. When exposed to salt treatment, leaves become thicker and smaller while the palisade tissue and spongy tissue become loose and jumbled and the intercellular space of the mesophyll becomes thinner [ 36–39 ]. We observed that the palisade tissue of DMS was loose, disordered, and severely damaged compared with that in JDG under salt stress ( Fig. 1C ). This indicates that DMS is more sensitive to salt stress than JDG. Typically, excessive ROS accumulate under stress conditions, which can lead to membrane oxidative damage (lipid peroxidation) [ 40 ]. Silencing of the gene GmNAC06 in soybean ( Glycine max ) leads to accumulation of ROS under salt stress, which in turn leads to significant losses in soybean production [ 41 ]. In Arabidopsis , the sibp1 mutant accumulates more ROS than wild-type plants or AtSIBP1-overexpressing plants, resulting in a lower survival rate under salt treatment [ 42 ]. In this study, salinity led to a greater accumulation of ROS in DMS compared with JDG, as detected by DAB staining ( Fig. 1D, E ). This indicates that DMS suffers greater damage under salinity stress. Excessive accumulation of ROS in cells can lead to membrane oxidative damage and trigger the production of enzyme systems or non-enzyme free radical scavengers to cope with oxidative damage [ 10 ]. Here, antioxidant enzyme activities such as peroxidase (A0A2P6R8H8) and glutathione peroxidase (A0A2P6P708) were upregulated in roses under salt treatment ( Table S2 ). This suggests that rose plants maintain lower ROS levels by upregulating the activity of antioxidant enzymes, thereby protecting photosynthetic mechanisms and maintaining plant growth under salt stress. Among the nonenzymatic antioxidants, phenols and flavonoids accumulate in various tissues and contribute to free radical scavenging that enhances plant salt tolerance [ 43 ]. Indeed, we identified significant differences in the contents of phenolic acids, lipids, and flavonoid metabolites in JDG and DMS under control and salt stress conditions ( Table S1 ). Moreover, our transcriptomic and proteomic analysis revealed the activation of genes and proteins within the phenylpropanoid and flavonol pathways. This activation results in the accumulation of various phenolic compounds, potentially enhancing their capacity for scavenging ROS.

Flavonoids are beneficial for improving salt stress in rose

Phenolic compounds, such as flavonoids, are among the most widespread secondary metabolites observed throughout the plant kingdom [ 44 ]. These compounds fulfill various biochemical and molecular functions within plants, encompassing roles in plant defense, signal transduction, antioxidant action, and the scavenging of free radicals [ 45 ]. Environmental changes commonly trigger the flavonoid pathway, which aids in shielding plants from the harmful effects of ultraviolet radiation, salt, heat, and drought [ 23 , 46 , 47 ]. Moreover, flavonoids demonstrate potent biological activity and serve as significant antioxidants [ 48 ]. Recently, researchers and consumers have been interested in plant-based polyphenols and flavonoids for their antioxidant potential, their dietary accessibility, and their role in preventing fatal diseases such as cardiovascular disease and cancer [ 49 ]. Our transcriptomics analysis showed that salinity causes significant alterations in the secondary metabolism of JDG, while affecting the primary metabolism of DMS. Proteomics showed that phenylpropanoid biosynthesis is significantly enhanced in JDG under salt stress, especially through the flavonoid pathway. In DMS, glutathione metabolism is significantly enhanced under salt stress, indicating differences in salt tolerance pathways between the two cultivars. Our metabolome data indicated that the abundance of phenolic acid and flavonoid metabolites was significantly altered in both JDG and DMS under salt stress. Furthermore, by comparing their contents in leaves under salt stress and control conditions, we found that more flavonoids accumulated in DMS than in JDG under salt stress. This evidence suggests that DMS requires an increased presence of flavones to withstand the damage caused by salinity. By contrast, salinity stress did not trigger a substantial buildup of flavonoids in JDG, possibly due to the adequate levels of flavonoids already present under normal conditions, which provided ample tolerance to salt-induced stress. This observation could also explain the higher tolerance of JDG to salt stress ( Table S1 ). When we compared the flavonoid metabolites of the phenylpropanoid pathway to identify flavonoid metabolites associated with salt tolerance, we found that 17 phenolic acid metabolites and 6 flavonoid metabolites were significantly differentially accumulated in both genotypes. Of these compounds, ferulic acid serves as a free radical scavenger, while simultaneously serving as an inhibitor for enzymes engaged in generating free radicals and boosting the activity of scavenger enzymes [ 49 ]. Sinapic acid is a bioactive phenolic acid with anti-inflammatory and anti-anxiety effects [ 50 ]. Pinocembrin, a naturally occurring flavonoid found in fruits, vegetables, nuts, seeds, flowers, and tea, is an anti-inflammatory, antimicrobial, and antioxidant agent [ 51 ]. This indicates that these two rose cultivars contain beneficial metabolites with some economic value. We investigated the possible effects of these metabolites in conferring salt tolerance in rose by comparing specific DAMs between JDG and DMS. Among these DAMs, eight metabolites were upregulated and six metabolites were downregulated under salt treatment in JDG compared to DMS. Among these eight upregulated DAMs, the contents of 3- O -methylquercetin, brickellin, 5,2′,5′-trihydroxy-3,7,4′-trimethoxyflavone-2′- O -glucoside, and kaempferol-3- O -(6′′-acetyl)glucosyl-(1→3)-galactoside accumulated significantly with salinity ( Table S7 ). These metabolites have important functions. For example, 3- O -methylquercetin has potent anticancer, antioxidant, antiallergy, and antimicrobial activities and shows strong antiviral activity against tomato ringspot virus [ 52 ]. Kaempferol, a biologically active compound found in numerous fruits, vegetables, and herbs, demonstrates various pharmacological benefits, such as antimicrobial, antioxidant, and anticancer properties [ 53 ]. This indicates that JDG is an excellent rose cultivar that is both salt tolerant and rich in beneficial bioactive substances.

bHLHL74 regulates flavonoid biosynthesis

The biosynthesis of flavonoids is initiated from the amino acid phenylalanine, giving rise to phenylpropanoids that subsequently enter the flavonoid-anthocyanin pathway [ 25 ]. The CHS enzyme is situated at a crucial regulatory position preceding the flavonoid biosynthetic pathway, directing the flow of the phenylpropanoid pathway towards flavonoid production, which has been extensively documented in many plant species [ 54 , 55 ]. In rice ( Oryza sativa ), defects in the flavonoid biosynthesis gene CHS can alter the distribution of flavonoids and lignin [ 56 ]. In eggplant ( Solanum melongena L.), CHS regulates the content of anthocyanins in eggplant skin under heat stress [ 57 ]. In apple ( Malus domestica ), overexpression of CHS increases the accumulation of flavonoids and enhances nitrogen absorption [ 58 ]. We identified a positive correlation between flavonoid accumulation and the expression of CHS genes, in agreement with previous reports. The bHLH TFs involved in regulating flavonoid biosynthesis work in a MYB-dependent or -independent manner. For example, DvIVS, a bHLH transcription factor in dahlia ( Dahlia variabilis ), activates flavonoid biosynthesis by regulating the expression of Chalcone synthase 1 ( CHS1 ) [ 59 ]. The Arabidopsis bHLH proteins TRANSPARENT TESTA 8 (AtTT8) and ENHANCER OF GLABRA 3 (AtEGL3) are all involved in the biosynthesis of various flavonoids [ 60–62 ]. In Chrysanthemum ( Chrysanthemum morifolium ), CmbHLH2 significantly activates CmDFR transcription, leading to anthocyanin accumulation, especially when in coordination with CmMYB6 [ 63 ]. In blueberry ( Vaccinium sect. Cyanococcus ), the bHLH25 and bHLH74 TFs potentially engage with MYB or directly hinder the expression of genes responsible for flavonoid biosynthesis, thereby regulating flavonoid accumulation [ 64 ]. In apple ( Malus domestica ), expression of bHLH62, bHLH74, and bHLH162 is significantly negatively correlated with anthocyanin content and has been shown to inhibit anthocyanin biosynthesis [ 65 ]. In apple fruit skin, hypermethylation of bHLH74 in the mCG context leads to transcriptional inhibition of downstream anthocyanin biosynthesis genes [ 66 ]. In rose, our co-expression network revealed a strong correlation between CHS and genes encoding TFs such as bHLH74 and bHLH62 in the key gene network. bHLH proteins can bind to the promoter regions of pivotal genes encoding enzymes, playing important roles in regulating DAMs under salt stress. Dual-luciferase reporter assays showed that LUC bioluminescence was suppressed well below background levels in Nicotiana benthamiana leaves infiltrated with pCHS1:LUC plus 35S:bHLH74, but not 35S:bHLH62 ( Fig. 6G, H , Fig. S9A, B ). Thus, we conclude that bHLHL74 TFs negatively regulate flavonoid biosynthesis by directly inhibiting the expression of CHS1 , which is involved in the flavonoid biosynthetic pathway.

We examined the morphological phenotypes, transcriptomes, proteomes, and widely targeted metabolomes of JDG and DMS under salt stress. Multi-omics analysis revealed that the phenylpropane pathway, especially the flavonoid pathway, contributes strongly to salt tolerance in rose, particularly JDG. Meanwhile, the bHLHL74 TF negatively regulates flavonoid biosynthesis by repressing the expression of the CHS1 gene involved in the flavonoid biosynthetic pathway. This research facilitates our understanding of the regulatory mechanisms of plant development and secondary metabolites underlying salt stress responses in rose, offering valuable insights that could be used to develop new strategies for improving plant tolerance to salinity.

Plant materials and growth conditions

Rosa hybrida cv. Jardin de Granville (JDG) and Rosa damascena Mill. (DMS) were planted in the Science and Technology Park of China Agricultural University (40°03′N, 116°29′E). Rose plants were propagated by cutting culture. Rose shoots with at least two nodes and approximately 6 cm in length were used as cuttings and inserted into square flowerpots (diameter 8 cm) containing a mixture of vermiculite and peat soil [1:1 (v/v)]. Cuttings were soaked in 0.15% (v/v) indole-3-butytric acid (IBA) before insertion into pots and then grown in a growth chamber at 25°C with 50% relative humidity and a cycle of 8 hours of darkness/16 hours of light for 1 month until rooting [ 67 ].

Nicotiana benthamiana plants were used for measurement of transient expression. Seeds were sown in square flowerpots (diameter 8 cm); after 1 week, seedlings were transplanted into different pots. The soil and cultivation conditions for N. benthamiana cultivation were the same as those for roses.

Salt treatment

Twenty JDG and 20 DMS rose cuttings displaying good rooting and uniform appearance were selected for salt treatment experiments. JDG or DMS plants were randomly divided into two groups watered with either 0 or 400 mM NaCl. Phenotypes were recorded after 2 weeks. This process was repeated three times [ 68 ].

Salt treatment of rose leaves was described previously [ 68 ]. Thirty JDG and 30 DMS rose cuttings with good rooting and uniform appearance were selected, and mature leaves of similar size were collected. The leaves were divided into two treatment groups, each containing 30 leaves: group A, immersed in deionized water treatment, and group B, immersed in 400 mM NaCl treatment. Phenotypes were observed after 0, 2, and 4 days. On the second day of treatment, leaves showed obvious differences. By the fourth day of treatment, the leaves had become soft or had died. Therefore, sequencing data from the second day were used. Three independent biological replicates were assayed.

Relative electrolyte permeability

Determination of relative electrolyte permeability was as previously reported [ 69 ] with the following modifications. Salt-treated leaves (0.1 g) were weighed, placed in a 50-ml centrifuge tube, and covered with 20 ml deionized water. The conductivity of the distilled water was measured and defined as EC0. After shaking for 20 minutes at 60 rpm on an orbital shaker, the conductivity at room temperature was measured and defined as EC1. The centrifuge tube was then placed in boiling water for 10 minutes and cooled to room temperature, and the conductivity of the solution was measured as EC2. The relative permeability of the electrolytes (as a percentage) was determined as (EC1-EC0) / (EC2-EC0) × 100%.

Soluble protein content

Soluble protein content was determined following the method of Bradford (1976) [ 70 ]. Leaf samples (0.5 g) were placed in a mortar with 8 ml distilled water and a small amount of quartz sand, crushed thoroughly, and incubated at room temperature for 0.5 hours. After centrifugation at 3,000 g for 20 minutes at 4 °C, the supernatant was transferred to a 10-ml volumetric flask and the volume was adjusted to 10 ml with distilled water. Two 1.0-ml aliquots of this sample extraction solution (or distilled water as a control) were transferred to clean test tubes, 5 ml of Coomassie Brilliant Blue reagent was added, and the tubes were shaken well. After 2 minutes, when the reaction was complete, the absorbance and chromaticity at 595 nm were measured, and the protein content was determined using a standard curve.

Leaf anatomical structure

Paraffin sections were prepared as described previously with some modifications [ 71 ]. Leaves from the control and NaCl treatments were collected, washed slowly with deionized water at normal room temperature, and stored at 4°C until further use. A 3-mm × 5-mm sample was cut from the same part of each leaf, and these leaf samples were fixed in 2.5% (v/v) glutaraldehyde. Samples were dehydrated using acetone through a concentration gradient of 30%, 50%, 70%, 80%, 95%, and 100% (v/v) and then embedded in paraffin. The embedded tissues (3-μm sections) were sectioned using a Leica RM2265 rotary slicer (Leica Microsystems, Wetzlar, Germany). Slides were stained with 0.02% (v/v) toluidine blue for 5 minutes, and the residual toluidine blue was removed using distilled water. Slides were allowed to dry and then observed under a microscope (OLYMPUS BH-2, Tokyo, Japan). Three independent biological replicates were examined.

DAB (3,3′-diaminobenzidine) staining for H 2 O 2

H 2 O 2 content was detected using the DAB staining method [ 72 ]. Leaves treated with NaCl or control leaves were rinsed clean with distilled water, immersed in DAB solution (1 mg/ml, pH 3.8), and placed under vacuum at approximately 0.8 Mpa for 5 minutes; this process was repeated three to six times until the leaves were completely infiltrated. Leaves were then incubated in a box in the dark for 8 hours until a brown sediment was observed. Chlorophyll was removed by repeatedly washing with eluent (ethanol:lactic acid:glycerol, 3:1:1, v/v/v). Decolorized leaves were photographed to record their phenotypes. ImageJ was used to quantify the stained areas.

UPLC-QQQ-based widely targeted metabolome analysis

Metabolomics analysis was performed on four groups of samples: JDG-Mock, JDG-NaCl, DMS-Mock, and DMS-NaCl. Extraction and determination of metabolites were performed with the assistance of Wuhan Metware Biotechnology Co., Ltd. Samples were crushed using a stirrer containing zirconia beads (MM 400, Retsch). Freeze-dried samples (0.1 g) were incubated overnight with 1.2 ml 70% (v/v) methanol solution at 4 °C, then centrifuged at 13,400 g for 10 minutes. The extracts were filtered and subjected to LC-MS/MS analysis [ 73 ]. A previously described procedure [ 74 ] was followed for analyzing the conditions and quantifying metabolites using an LC-ESI-Q TRAP-MS/MS in multi-reaction monitoring (MRM) mode. The prcomp function was used for PCA, significantly different metabolites were determined by |log 2 Fold Change| ≥ 1, and annotated metabolites were mapped to the KEGG pathway database ( http://www.kegg.jp/kegg/pathway.html ). Comparisons are described as follows: e.g., JDG-NaCl vs JDG-Mock, indicating that the treated sample is being compared with the untreated sample and that metabolites are upregulated or downregulated in the NaCl sample compared with the Mock sample.

Tandem mass tag-based proteomic analysis

Experiments were carried out with the assistance of Hangzhou Jingjie Biotechnology Co., Ltd. Samples were thoroughly ground into powder using liquid nitrogen, and protein extraction was performed using the phenol extraction method. The protein was added to trypsin for enzymolysis overnight, and then the peptide segments were labeled with TMT tags. LC-MS/MS analysis was performed using an EASY-nLC 1200 UPLC system (ThermoFisher Scientific) and a Q Active TM HF-X (ThermoFisher Scientific) [ 75 ]. An absolute value of 1.3 was used as the threshold for significant changes. GO ( http://www.ebi.ac.uk/GOA/ ) and KEGG categories were used to annotate DAPs; WoLFPSORT software was used to predict subcellular localization ( https://wolfpsort.hgc.jp/ ).

Transcriptome sequencing

We constructed 12 cDNA libraries (three biological replicates for each of JDG and DMS under each treatment) for RNA-seq. Transcriptome sequencing was completed at Wuhan Metware Biotechnology Co., Ltd. RNA purity and RNA integrity were determined using a nanophotometer spectrophotometer and an Agilent 2100 bioanalyzer, respectively. The RNA library was then sequenced on the Illumina Hiseq platform. Raw data were filtered using fastp v 0.19.3 and compared with the reference genome ( https://lipm-browsers.toulouse.inra.fr/pub/RchiOBHm-V2/ ). FPKM (fragments per kilobase of script per million fragments mapped) was used as an indicator to measure gene expression levels, with the threshold for significant differential expression being an absolute |log 2 Fold Change| ≥ 1 and False Discovery Rate < 0.05. GO and KEGG categories were used to annotate DEGs [ 76 ].

To identify modules with high gene correlation, co-expression network analysis was performed using the R-based WGCNA package (v.1.69) with default parameters [ 77 ]. The varFilter function of the R language genefilter package was used to remove genes with low or stable expression levels in all samples. Modules based on the correlation between gene expression levels were identified, and a correlation matrix between each module and the sample was calculated using the R-based WGCNA software package. The module network was visualized using Cytoscape software (v.3.7.2).

RT-qPCR was performed on eight DEGs in the phenylpropanoid pathway to verify the accuracy of the data obtained from high-throughput sequencing. Total RNA was extracted using the hot borate method [ 72 ] and reverse transcribed using HiScript III All-in-one RT SuperMix (R333-01, Vazyme Biotech Co., Ltd., Nanjing, China). Subsequently, 2 × ChamQ SYBR qPCR Master Mix (Q331, Vazyme Biotech Co., Ltd., Nanjing, China) was used for quantitative detection of gene expression. The relative expression of genes was calculated using the 2 −ΔΔCt method [ 76 ]. GAPDH was used as an endogenous control, and primers for RT-qPCR are listed in Table S10 .

Dual-LUC reporter assay

A transactivation assay was designed to evaluate the effect of BHLH74/BHLH62 on the CHS1 promoter using methods described previously [ 78 ]. Initially, a 2000-bp segment of the CHS1 promoter was cloned into the pGreenII 0800-LUC vector, generating the ProCHS1:LUC reporter plasmid. Concurrently, the coding sequences of BHLH74/BHLH62 were inserted into the pGreenII0029 62-SK vector, resulting in the construction of Pro35S: BHLH74/BHLH62 effector plasmids. pGreenII 0800-LUC vector containing REN under control of the 35S promoter was used as a positive control.

Following plasmid construction, these constructs were introduced into Agrobacterium tumefaciens strain GV3101, which harbored the pSoup plasmid. Subsequently, A. tumefaciens containing different combinations of effector and reporter plasmids was infiltrated into N. benthamiana plants with six to eight young leaves. After a 3-day incubation period, the ratios of LUC to REN were quantified using the Bio-Lite Luciferase Assay System (DD1201, Vazyme Biotech Co., Ltd., Nanjing, China). Images capturing LUC signals were acquired using a CCD camera (Night Shade LB 985, Germany). Primer sequences are listed in Table S10 .

Statistical analysis

Statistical analyses of data were conducted using IBM SPSS Statistics, while graphical representations were created using GraphPad Prism 8.0.1. Paired data comparisons were assessed through Student's t -tests ( * P < 0.05, ** P < 0.01, *** P < 0.001). Each experiment was performed using a minimum of three biological replicates, and error bars depicted on graphs denote the standard error (SE) of the mean value. The NetWare Cloud platform ( https://cloud.metware.cn ) and OmicShare tools ( https://www.chiplot.online/ ) were used for bioinformatics analyses and mapping.

This work was supported by the Consult of Flower Industry of Jinning District (202204BI090022), General Project of Shenzhen Science and Technology and Innovation Commission (Grant No. 6020330006K0).

ZX, MN conceived and designed the experiments. RH and YW conducted the experiments. RH, YW, ZX analyzed the data. LY, JW, QX, CP, XT, GJ and MN performed the research. RH, SM and ZX wrote the manuscript. All authors read and approved the manuscript. RH and YW contributed equally to this work.

The datasets generated and analyzed during the current study are available in the Biological Research Project Data (BioProject), National Center for Biotechnology Information (NCBI) repository, accession: PRJNA1030783.

The authors declare that they have no competing interests.

Mileva M , Ilieva Y , Jovtchev G . et al.  Rose flowers—a delicate perfume or a natural healer? Biomol Ther . 2021 ; 11 : 127

Google Scholar

Katsoulas N , Kittas C , Dimokas G . et al.  Effect of irrigation frequency on rose flower production and quality . Biosyst Eng . 2006 ; 93 : 237 – 44

Isah T . Stress and defense responses in plant secondary metabolites production . Biol Res . 2019 ; 52 : 39

Feng D , Zhang H , Qiu X . et al.  Comparative transcriptomic and metabonomic analysis revealed the relationships between biosynthesis of volatiles and flavonoid metabolites in Rosa rugosa . Ornam Plant Res . 2021 ; 1 : 1 – 10

Wang X , Zhao F , Wu Q . et al.  Physiological and transcriptome analyses to infer regulatory networks in flowering transition of Rosa rugosa . Ornam Plant Res . 2023 ; 3 : 1 – 12

Jia Y , Chen C , Gong F . et al.  An aux/IAA family member, RhIAA14 , involved in ethylene-inhibited petal expansion in rose ( Rosa hybrida ) . Genes . 2022 ; 13 : 1041

Ren H , Bai M , Sun J . et al.  RcMYB84 and RcMYB123 mediate jasmonate-induced defense responses against Botrytis cinerea in rose ( Rosa chinensis ) . Plant J . 2020 ; 103 : 1839 – 49

Chaves MM , Flexas J , Pinheiro C . Photosynthesis under drought and salt stress: regulation mechanisms from whole plant to cell . Ann Bot . 2009 ; 103 : 551 – 60

Askari Kelestani A , Ramezanpour S , Borzouei A . et al.  Application of gamma rays on salinity tolerance of wheat ( Triticum aestivum L.) and expression of genes related to biosynthesis of proline, glycine betaine and antioxidant enzymes . Physiol Mol Biol Plants . 2021 ; 27 : 2533 – 47

Qi S , Wang X , Wu Q . et al.  Morphological, physiological and transcriptomic analyses reveal potential candidate genes responsible for salt stress in Rosa rugosa . Ornam Plant Res . 2023 ; 3 :21

Gill SS , Tuteja N . Reactive oxygen species and antioxidant machinery in abiotic stress tolerance in crop plants . Plant Physiol Biochem . 2010 ; 48 : 909 – 30

Ye C , Zheng S , Jiang D . et al.  Initiation and execution of programmed cell death and regulation of reactive oxygen species in plants . Int J Mol Sci . 2021 ; 22 : 12942

He L , He T , Farrar S . et al.  Antioxidants maintain cellular redox homeostasis by elimination of reactive oxygen species . Cell Physiol Biochem . 2017 ; 44 : 532 – 53

Challabathula D , Analin B , Mohanan A . et al.  Differential modulation of photosynthesis, ROS and antioxidant enzyme activities in stress-sensitive and -tolerant rice cultivars during salinity and drought upon restriction of COX and AOX pathways of mitochondrial oxidative electron transport . J Plant Physiol . 2022 ; 268 :153583

Li C , Mur LAJ , Wang Q . et al.  ROS scavenging and ion homeostasis is required for the adaptation of halophyte Karelinia caspia to high salinity . Front Plant Sci . 2022 ; 13 :

Ren G , Yang P , Cui J . et al.  Multiomics analyses of two sorghum cultivars reveal the molecular mechanism of salt tolerance . Front Plant Sci . 2022 ; 13 :

Petrussa E , Braidot E , Zancani M . et al.  Plant Flavonoids--Biosynthesis, Transport and Involvement in Stress Responses . Int J Mol Sci . 2013 ; 14 : 14950 – 73

Das S , Rosazza JPN . Microbial and enzymatic transformations of flavonoids . J Nat Prod . 2006 ; 69 : 499 – 508

Gao Y , Liu J , Chen Y . et al.  Tomato SlAN11 regulates flavonoid biosynthesis and seed dormancy by interaction with bHLH proteins but not with MYB proteins . Hortic Res . 2018 ; 5 :

Zhang Z , Liu Y , Yuan Q . et al.  The bHLH1-DTX35/DFR module regulates pollen fertility by promoting flavonoid biosynthesis in Capsicum annuum L . Hortic Res . 2022 ; 9 :

Ramaroson M , Koutouan C , Helesbeux JJ . et al.  Role of Phenylpropanoids and flavonoids in plant resistance to pests and diseases . Molecules . 2022 ; 27 : 8371

Schulz E , Tohge T , Winkler JB . et al.  Natural variation among Arabidopsis accessions in the regulation of flavonoid metabolism and stress gene expression by combined UV radiation and cold . Plant Cell Physiol . 2021 ; 62 : 502 – 14

Wang F , Zhu H , Kong W . et al.  The antirrhinum AmDEL gene enhances flavonoids accumulation and salt and drought tolerance in transgenic Arabidopsis . Planta . 2016 ; 244 : 59 – 73

Shen N , Wang T , Gan Q . et al.  Plant flavonoids: classification, distribution, biosynthesis, and antioxidant activity . Food Chem . 2022 ; 383 :132531

Liu W , Feng Y , Yu S . et al.  The flavonoid biosynthesis network in plants . Int J Mol Sci . 2021 ; 22 : 12824

Zhang X , Abrahan C , Colquhoun TA . et al.  A proteolytic regulator controlling chalcone synthase stability and flavonoid biosynthesis in Arabidopsis . Plant Cell . 2017 ; 29 : 1157 – 74

Riffault-Valois L , Blanchot L , Colas C . et al.  Molecular fingerprint comparison of closely related rose varieties based on UHPLC-HRMS analysis and chemometrics . Phytochem Anal . 2017 ; 28 : 42 – 9

Riffault L , Destandau E , Pasquier L . et al.  Phytochemical analysis of Rosa hybrida cv. ‘Jardin de Granville' by HPTLC, HPLC-DAD and HPLC-ESI-HRMS: polyphenolic fingerprints of six plant organs . Phytochemistry . 2014 ; 99 : 127 – 34

Omidi M , Khandan-Mirkohi A , Kafi M . et al.  Biochemical and molecular responses of Rosa damascena mill. cv. Kashan to salicylic acid under salinity stress . BMC Plant Biol . 2022 ; 22 : 373

Azizi S , Seyed Hajizadeh H , Aghaee A . et al.  In vitro assessment of physiological traits and ROS detoxification pathways involved in tolerance of damask rose genotypes under salt stress . Sci Rep . 2023 ; 13 : 17795

Zhao S , Zhang Q , Liu M . et al.  Regulation of plant responses to salt stress . Int J Mol Sci . 2021 ; 22 : 4609

Zhang C , Zhang H , Zhan Z . et al.  Transcriptome analysis of sucrose metabolism during bulb swelling and development in onion ( Allium cepa L.) . Front Plant Sci . 2016 ; 7 :1425

Kumari P , Raju DVS , Prasad KV . et al.  Characterization of anthocyanins and their antioxidant activities in Indian rose varieties ( Rosa × hybrida ) using HPLC . Antioxidants . 2022 ; 11 : 2032

Akula R , Ravishankar GA . Influence of abiotic stress signals on secondary metabolites in plants . Plant Signal Behav . 2011 ; 6 : 1720 – 31

Barhoumi Z , Djebali W , Chaïbi W . et al.  Salt impact on photosynthesis and leaf ultrastructure of Aeluropus littoralis . J Plant Res . 2007 ; 120 : 529 – 37

Jiang D , Lu B , Liu L . et al.  Exogenous melatonin improves the salt tolerance of cotton by removing active oxygen and protecting photosynthetic organs . BMC Plant Biol . 2021 ; 21 : 331

Liu D , Dong S , Miao H . et al.  A large-scale genomic association analysis identifies the candidate genes regulating salt tolerance in cucumber ( Cucumis sativus L.) seedlings . Int J Mol Sci . 2022 ; 23 : 8260

Garrido Y , Tudela JA , Marín A . et al.  Physiological, phytochemical and structural changes of multi-leaf lettuce caused by salt stress . J Sci Food Agric . 2014 ; 94 : 1592 – 9

Yao X , Meng L , Zhao W . et al.  Changes in the morphology traits, anatomical structure of the leaves and transcriptome in Lycium barbarum L. under salt stress . Front Plant Sci . 2023 ; 14 :1090366

Tan Y , Duan Y , Chi Q . et al.  The role of reactive oxygen species in plant response to radiation . Int J Mol Sci . 2023 ; 24 : 3346

Li M , Chen R , Jiang Q . et al.  GmNAC06 , a NAC domain transcription factor enhances salt stress tolerance in soybean . Plant Mol Biol . 2021 ; 105 : 333 – 45

Wan X , Peng L , Xiong J . et al.  AtSIBP1 , a novel BTB domain-containing protein, positively regulates salt signaling in Arabidopsis thaliana . Plan Theory . 2019 ; 8 : 573

Rezayian M , Niknam V , Ebrahimzadeh H . Oxidative damage and antioxidative system in algae . Toxicol Rep . 2019 ; 6 : 1309 – 13

Liu X , Cheng X , Cao J . et al.  GOLDEN 2-LIKE transcription factors regulate chlorophyll biosynthesis and flavonoid accumulation in response to UV-B in tea plants . Hortic Plant J . 2023 ; 9 : 1055 – 66

Barreca D , Gattuso G , Bellocco E . et al.  Flavanones: citrus phytochemical with health-promoting properties . Biofactors . 2017 ; 43 : 495 – 506

Zhang F , Huang J , Guo H . et al.  OsRLCK160 contributes to flavonoid accumulation and UV-B tolerance by regulating OsbZIP48 in rice . Sci China Life Sci . 2022 ; 65 : 1380 – 94

Cui M , Liang Z , Liu Y . et al.  Flavonoid profile of Anoectochilus roxburghii (wall.) Lindl. Under short-term heat stress revealed by integrated metabolome, transcriptome, and biochemical analyses . Plant Physiol Biochem . 2023 ; 201 :107896

Dias MC , Pinto DCGA , Silva AMS . Plant flavonoids: chemical characteristics and biological activity . Molecules . 2021 ; 26 : 5377

Kumar S , Pandey AK . Chemistry and biological activities of flavonoids: an overview . Sci World J . 2013 ; 2013 : 1 – 16

Chen C . Sinapic acid and its derivatives as medicine in oxidative stress-induced diseases and aging . Oxidative Med Cell Longev . 2016 ; 2016 : 1 – 10

Rasul A , Millimouno FM , Ali Eltayb W . et al.  Pinocembrin: a novel natural compound with versatile pharmacological and biological activities . Biomed Res Int . 2013 ; 2013 : 1 – 9

Doneda E , Bianchi SE , Pittol V . et al.  3-O-methylquercetin from Achyrocline satureioides -cytotoxic activity against A375-derived human melanoma cell lines and its incorporation into cyclodextrins-hydrogels for topical administration . Drug Deliv Transl Res . 2021 ; 11 : 2151 – 68

Alam W , Khan H , Shah MA . et al.  Kaempferol as a dietary anti-inflammatory agent: current therapeutic standing . Molecules . 2020 ; 25 : 4073

Chen Y , Mao Y , Liu H . et al.  Transcriptome analysis of differentially expressed genes relevant to variegation in peach flowers . PLoS One . 2014 ; 9 :e90842

Duan B , Tan X , Long J . et al.  Integrated transcriptomic-metabolomic analysis reveals that cinnamaldehyde exposure positively regulates the phenylpropanoid pathway in postharvest Satsuma mandarin ( Citrus unshiu ) . Pestic Biochem Physiol . 2023 ; 189 :105312

Lam PY , Wang L , Lui ACW . et al.  Deficiency in flavonoid biosynthesis genes CHS , CHI , and CHIL alters rice flavonoid and lignin profiles . Plant Physiol . 2022 ; 188 : 1993 – 2011

Wu X , Zhang S , Liu X . et al.  Chalcone synthase (CHS) family members analysis from eggplant ( Solanum melongena L.) in the flavonoid biosynthetic pathway and expression patterns in response to heat stress . PLoS One . 2020 ; 15 :e0226537

Wang X , Chai X , Gao B . et al.  Multi-omics analysis reveals the mechanism of bHLH130 responding to low-nitrogen stress of apple rootstock . Plant Physiol . 2023 ; 191 : 1305 – 23

Ohno S , Hosokawa M , Hoshino A . et al.  A bHLH transcription factor, DvIVS , is involved in regulation of anthocyanin synthesis in dahlia ( Dahlia variabilis ) . J Exp Bot . 2011 ; 62 : 5105 – 16

Baudry A , Caboche M , Lepiniec L . TT8 controls its own expression in a feedback regulation involving TTG1 and homologous MYB and bHLH factors, allowing a strong and cell-specific accumulation of flavonoids in Arabidopsis thaliana . Plant J . 2006 ; 46 : 768 – 79

Gao C , Guo Y , Wang J . et al.  Brassica napus GLABRA3-1 promotes anthocyanin biosynthesis and trichome formation in true leaves when expressed in Arabidopsis thaliana . Plant Biol (Stuttg) . 2018 ; 20 : 3 – 9

Feyissa DN , Løvdal T , Olsen KM . et al.  The endogenous GL3 , but not EGL3 , gene is necessary for anthocyanin accumulation as induced by nitrogen depletion in Arabidopsis rosette stage leaves . Planta . 2009 ; 230 : 747 – 54

Lim S , Kim D , Jung J . et al.  Alternative splicing of the basic helix-loop-helix transcription factor gene CmbHLH2 affects anthocyanin biosynthesis in ray florets of chrysanthemum ( Chrysanthemum morifolium ) . Front Plant Sci . 2021 ; 12 :

Song Y , Ma B , Guo Q . et al.  UV-B induces the expression of flavonoid biosynthetic pathways in blueberry ( Vaccinium corymbosum ) calli . Front Plant Sci . 2022 ; 13 :

Li W , Mao J , Yang SJ . et al.  Anthocyanin accumulation correlates with hormones in the fruit skin of 'Red Delicious' and its four generation bud sport mutants . BMC Plant Biol . 2018 ; 18 : 363

Li W , Ning GX , Mao J . et al.  Whole-genome DNA methylation patterns and complex associations with gene expression associated with anthocyanin biosynthesis in apple fruit skin . Planta . 2019 ; 250 : 1833 – 47

Sun J , Lu J , Bai M . et al.  Phytochrome-interacting factors interact with transcription factor CONSTANS to suppress flowering in rose . Plant Physiol . 2021 ; 186 : 1186 – 201

Su L , Zhang Y , Yu S . et al.  RcbHLH59-RcPRs module enhances salinity stress tolerance by balancing Na+/K+ through callose deposition in rose ( Rosa chinensis ) . Hortic Res . 2023 ; 10 :

Liu W , Zhang R , Xiang C . et al.  Transcriptomic and physiological analysis reveal that α-linolenic acid biosynthesis responds to early chilling tolerance in pumpkin rootstock varieties . Front Plant Sci . 2021 ; 12 :

Bradford MM . A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding . Anal Biochem . 1976 ; 72 : 248 – 54

Cheng C , Yu Q , Wang Y . et al.  Ethylene-regulated asymmetric growth of the petal base promotes flower opening in rose ( Rosa hybrida ) . Plant Cell . 2021 ; 33 : 1229 – 51

Zhang Y , Wu Z , Feng M . et al.  The circadian-controlled PIF8-BBX28 module regulates petal senescence in rose flowers by governing mitochondrial ROS homeostasis at night . Plant Cell . 2021 ; 33 : 2716 – 35

Meng Y , Zhang H , Fan Y . et al.  Anthocyanins accumulation analysis of correlated genes by metabolome and transcriptome in green and purple peppers ( Capsicum annuum ) . BMC Plant Biol . 2022 ; 22 : 358

Deng H , Wu G , Zhang R . et al.  Comparative nutritional and metabolic analysis reveals the taste variations during yellow rambutan fruit maturation . Food Chem X . 2023 ; 17 :100580

Liu D , Pan Y , Li K . et al.  Proteomics reveals the mechanism underlying the inhibition of Phytophthora sojae by propyl gallate . J Agric Food Chem . 2020 ; 68 : 8151 – 62

Yang B , He S , Liu Y . et al.  Transcriptomics integrated with metabolomics reveals the effect of regulated deficit irrigation on anthocyanin biosynthesis in cabernet sauvignon grape berries . Food Chem . 2020 ; 314 :126170

Umer MJ , Bin Safdar L , Gebremeskel H . et al.  Identification of key gene networks controlling organic acid and sugar metabolism during watermelon fruit development by integrating metabolic phenotypes and gene expression profiles . Hortic Res . 2020 ; 7 : 193

Liang Y , Jiang C , Liu Y . et al.  Auxin regulates sucrose transport to repress petal abscission in rose ( Rosa hybrida ) . Plant Cell . 2020 ; 32 : 3485 – 99

Author notes

Supplementary data, email alerts, citing articles via.

  • International Horticulture Research Conference
  • Advertising & Corporate Services

Affiliations

  • Online ISSN 2052-7276
  • Print ISSN 2662-6810
  • Copyright © 2024 Nanjing Agricultural University
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

COMMENTS

  1. The Importance of Data Analysis in Research

    Data analysis is important in research because it makes studying data a lot simpler and more accurate. It helps the researchers straightforwardly interpret the data so that researchers don't leave anything out that could help them derive insights from it. Data analysis is a way to study and analyze huge amounts of data.

  2. Data analysis

    Recent News. data analysis, the process of systematically collecting, cleaning, transforming, describing, modeling, and interpreting data, generally employing statistical techniques. Data analysis is an important part of both scientific research and business, where demand has grown in recent years for data-driven decision making.

  3. Data Analysis in Research: Types & Methods

    Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. Three essential things occur during the data ...

  4. A practical guide to data analysis in general literature reviews

    This article is a practical guide to conducting data analysis in general literature reviews. The general literature review is a synthesis and analysis of published research on a relevant clinical issue, and is a common format for academic theses at the bachelor's and master's levels in nursing, physiotherapy, occupational therapy, public health and other related fields.

  5. What is Data Analysis? An Expert Guide With Examples

    Data analysis is a comprehensive method of inspecting, cleansing, transforming, and modeling data to discover useful information, draw conclusions, and support decision-making. It is a multifaceted process involving various techniques and methodologies to interpret data from various sources in different formats, both structured and unstructured.

  6. Importance of Data Collection and Analysis Methods

    Data validation is a streamlined process that ensures the quality and accuracy of collected data. Inaccurate data may keep a researcher from uncovering important discoveries or lead to spurious results. At times, the amount of data collected might help unravel existing patterns that are important. The data validation process can also provide a ...

  7. Guides: Data Analysis: Introduction to Data Analysis

    Data analysis can be quantitative, qualitative, or mixed methods. Quantitative research typically involves numbers and "close-ended questions and responses" (Creswell & Creswell, 2018, p. 3).Quantitative research tests variables against objective theories, usually measured and collected on instruments and analyzed using statistical procedures (Creswell & Creswell, 2018, p. 4).

  8. Learning to Do Qualitative Data Analysis: A Starting Point

    For many researchers unfamiliar with qualitative research, determining how to conduct qualitative analyses is often quite challenging. Part of this challenge is due to the seemingly limitless approaches that a qualitative researcher might leverage, as well as simply learning to think like a qualitative researcher when analyzing data. From framework analysis (Ritchie & Spencer, 1994) to content ...

  9. Data Analysis in Quantitative Research

    Abstract. Quantitative data analysis serves as part of an essential process of evidence-making in health and social sciences. It is adopted for any types of research question and design whether it is descriptive, explanatory, or causal. However, compared with qualitative counterpart, quantitative data analysis has less flexibility.

  10. LibGuides: Research Methods: Data Analysis & Interpretation

    Qualitative Data. Data analysis for a qualitative study can be complex because of the variety of types of data that can be collected. Qualitative researchers aren't attempting to measure observable characteristics, they are often attempting to capture an individual's interpretation of a phenomena or situation in a particular context or setting.

  11. Data Analysis

    Data Analysis. Definition: Data analysis refers to the process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, drawing conclusions, and supporting decision-making. It involves applying various statistical and computational techniques to interpret and derive insights from large datasets.

  12. What Is Data Analysis: A Comprehensive Guide

    Data analysis is a catalyst for continuous improvement. It allows organizations to monitor performance metrics, track progress, and identify areas for enhancement. This iterative process of analyzing data, implementing changes, and analyzing again leads to ongoing refinement and excellence in processes and products.

  13. Basic statistical tools in research and data analysis

    Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if ...

  14. Data Analysis Techniques In Research

    Data analysis methods refer to the techniques and procedures used to analyze, interpret, and draw conclusions from data. These methods are essential for transforming raw data into meaningful insights, facilitating decision-making processes, and driving strategies across various fields. ... The importance of data analysis in research cannot be ...

  15. Research Guide: Data analysis and reporting findings

    Analyzing Group Interactions by Matthias Huber (Editor); Dominik E. Froehlich (Editor) Analyzing Group Interactions gives a comprehensive overview of the use of different methods for the analysis of group interactions. International experts from a range of different disciplines within the social sciences illustrate their step-by-step procedures of how they analyze interactions within groups ...

  16. Introduction to Research Statistical Analysis: An Overview of the

    Introduction. Statistical analysis is necessary for any research project seeking to make quantitative conclusions. The following is a primer for research-based statistical analysis. It is intended to be a high-level overview of appropriate statistical testing, while not diving too deep into any specific methodology.

  17. Data Analysis: Types, Methods & Techniques (a Complete List)

    Quantitative data analysis then splits into mathematical analysis and artificial intelligence (AI) analysis. Mathematical types then branch into descriptive, diagnostic, predictive, and prescriptive. Methods falling under mathematical analysis include clustering, classification, forecasting, and optimization.

  18. Data Analysis in Research

    There are two major types of data analysis methods that are used in research: qualitative analysis, which is characteristics-focused, and quantitative analysis, which is numbers-focused. Within ...

  19. Analysis of Data: Techniques and Importance in Research

    Importance of Data Analysis. Data analysis is critical in research as it helps to identify patterns, relationships, and correlations between variables. By analyzing data, researchers can draw inferences, make predictions, and identify trends. The insights derived from data analysis help to inform decision-making, assess the impact of ...

  20. Data Analysis: Importance, Types, Methods of Data Analytics

    Data accuracy is paramount in analysis. Analysts must verify the reliability of the data source, address missing or inconsistent data, and ensure that the chosen analysis methods are appropriate for the dataset. Rigorous validation processes contribute to the credibility of the analysis. Methods of Data Analytics in Research

  21. Data analysis in qualitative research

    Unquestionably, data analysis is the most complex and mysterious of all of the phases of a qualitative project, and the one that receives the least thoughtful discussion in the literature. For neophyte nurse researchers, many of the data collection strategies involved in a qualitative project may feel familiar and comfortable. After all, nurses have always based their clinical practice on ...

  22. 7 Reasons Why Data Analysis is Important for Research

    4. Data analysis saves time and money. Data analysis allows researchers to collect and analyze data faster than with manual data analysis methods, which helps them save time and money. Data analysis techniques can help researchers to identify and eliminate unnecessary or redundant experiments. By analyzing data from previous experiments ...

  23. PDF Data Analysis and its Importance

    The process of scrutinizing raw data with the purpose of drawing conclusion about that information is called ―Data Analysis‖. The main aim of Data Analysis is to convert the available cluttered data into a format which is easy to understand, more legible, conclusive and which supports the mechanism of decision-making.

  24. Recent Advances in Functional Data Analysis for Electronic Device Data

    Methods for the principled statistical analysis of electrical device data are discussed and two current areas of research are described that are expected to produce widely applicable methods. Accurate understanding of the behavior of commercial-off-the-shelf electrical devices is important in many applications. This paper discusses methods for the principled statistical analysis of electrical ...

  25. What are the strengths and limitations to utilising creative methods in

    There is increasing interest in using patient and public involvement (PPI) in research to improve the quality of healthcare. Ordinarily, traditional methods have been used such as interviews or focus groups. However, these methods tend to engage a similar demographic of people. Thus, creative methods are being developed to involve patients for whom traditional methods are inaccessible or non ...

  26. Risk factors and incidence of central venous access device ...

    The risk factors for central venous access device-related thrombosis (CRT) in children are not fully understood. We used evidence-based medicine to find the risk factors for CRT by pooling current ...

  27. USDA

    Access the portal of NASS, the official source of agricultural data and statistics in the US, and explore various reports and products.

  28. Integrative analysis of transcriptome and target metabolites uncovering

    Background Nymphaea (waterlily) is known for its rich colors and role as an important aquatic ornamental plant globally. Nymphaea atrans and some hybrids, including N. 'Feitian 2,' are more appealing due to the gradual color change of their petals at different flower developmental stages. The petals of N. 'Feitian 2' gradually change color from light blue-purple to deep rose-red ...

  29. Unlocking the potential of Industry 4.0 in BRICS nations: a systematic

    Similarly, this study consists of category analysis based on multi-criteria decision-making (MCDM) methods, research design used, research method utilised, different data analysis techniques and different Industry 4.0 technologies were used to solve different applications in the BRICS nations.,According to the analysis of past literature, the ...

  30. Multi-omics analysis reveals key regulatory defense pathways and genes

    Abstract. Salinity stress causes serious damage to crops worldwide, limiting plant production. However, the metabolic and molecular mechanisms underlying the response to salt stress in rose (Rosa spp.) remain poorly studied.We therefore performed a multi-omics investigation of Rosa hybrida cv. Jardin de Granville (JDG) and Rosa damascena Mill. (DMS) under salt stress to determine the ...