- Privacy Policy
Home » Secondary Data – Types, Methods and Examples
Secondary Data – Types, Methods and Examples
Table of Contents
Secondary Data
Definition:
Secondary data refers to information that has been collected, processed, and published by someone else, rather than the researcher gathering the data firsthand. This can include data from sources such as government publications, academic journals, market research reports, and other existing datasets.
Secondary Data Types
Types of secondary data are as follows:
- Published data: Published data refers to data that has been published in books, magazines, newspapers, and other print media. Examples include statistical reports, market research reports, and scholarly articles.
- Government data: Government data refers to data collected by government agencies and departments. This can include data on demographics, economic trends, crime rates, and health statistics.
- Commercial data: Commercial data is data collected by businesses for their own purposes. This can include sales data, customer feedback, and market research data.
- Academic data: Academic data refers to data collected by researchers for academic purposes. This can include data from experiments, surveys, and observational studies.
- Online data: Online data refers to data that is available on the internet. This can include social media posts, website analytics, and online customer reviews.
- Organizational data: Organizational data is data collected by businesses or organizations for their own purposes. This can include data on employee performance, financial records, and customer satisfaction.
- Historical data : Historical data refers to data that was collected in the past and is still available for research purposes. This can include census data, historical documents, and archival records.
- International data: International data refers to data collected from other countries for research purposes. This can include data on international trade, health statistics, and demographic trends.
- Public data : Public data refers to data that is available to the general public. This can include data from government agencies, non-profit organizations, and other sources.
- Private data: Private data refers to data that is not available to the general public. This can include confidential business data, personal medical records, and financial data.
- Big data: Big data refers to large, complex datasets that are difficult to manage and analyze using traditional data processing methods. This can include social media data, sensor data, and other types of data generated by digital devices.
Secondary Data Collection Methods
Secondary Data Collection Methods are as follows:
- Published sources: Researchers can gather secondary data from published sources such as books, journals, reports, and newspapers. These sources often provide comprehensive information on a variety of topics.
- Online sources: With the growth of the internet, researchers can now access a vast amount of secondary data online. This includes websites, databases, and online archives.
- Government sources : Government agencies often collect and publish a wide range of secondary data on topics such as demographics, crime rates, and health statistics. Researchers can obtain this data through government websites, publications, or data portals.
- Commercial sources: Businesses often collect and analyze data for marketing research or customer profiling. Researchers can obtain this data through commercial data providers or by purchasing market research reports.
- Academic sources: Researchers can also obtain secondary data from academic sources such as published research studies, academic journals, and dissertations.
- Personal contacts: Researchers can also obtain secondary data from personal contacts, such as experts in a particular field or individuals with specialized knowledge.
Secondary Data Formats
Secondary data can come in various formats depending on the source from which it is obtained. Here are some common formats of secondary data:
- Numeric Data: Numeric data is often in the form of statistics and numerical figures that have been compiled and reported by organizations such as government agencies, research institutions, and commercial enterprises. This can include data such as population figures, GDP, sales figures, and market share.
- Textual Data: Textual data is often in the form of written documents, such as reports, articles, and books. This can include qualitative data such as descriptions, opinions, and narratives.
- Audiovisual Data : Audiovisual data is often in the form of recordings, videos, and photographs. This can include data such as interviews, focus group discussions, and other types of qualitative data.
- Geospatial Data: Geospatial data is often in the form of maps, satellite images, and geographic information systems (GIS) data. This can include data such as demographic information, land use patterns, and transportation networks.
- Transactional Data : Transactional data is often in the form of digital records of financial and business transactions. This can include data such as purchase histories, customer behavior, and financial transactions.
- Social Media Data: Social media data is often in the form of user-generated content from social media platforms such as Facebook, Twitter, and Instagram. This can include data such as user demographics, content trends, and sentiment analysis.
Secondary Data Analysis Methods
Secondary data analysis involves the use of pre-existing data for research purposes. Here are some common methods of secondary data analysis:
- Descriptive Analysis: This method involves describing the characteristics of a dataset, such as the mean, standard deviation, and range of the data. Descriptive analysis can be used to summarize data and provide an overview of trends.
- Inferential Analysis: This method involves making inferences and drawing conclusions about a population based on a sample of data. Inferential analysis can be used to test hypotheses and determine the statistical significance of relationships between variables.
- Content Analysis: This method involves analyzing textual or visual data to identify patterns and themes. Content analysis can be used to study the content of documents, media coverage, and social media posts.
- Time-Series Analysis : This method involves analyzing data over time to identify trends and patterns. Time-series analysis can be used to study economic trends, climate change, and other phenomena that change over time.
- Spatial Analysis : This method involves analyzing data in relation to geographic location. Spatial analysis can be used to study patterns of disease spread, land use patterns, and the effects of environmental factors on health outcomes.
- Meta-Analysis: This method involves combining data from multiple studies to draw conclusions about a particular phenomenon. Meta-analysis can be used to synthesize the results of previous research and provide a more comprehensive understanding of a particular topic.
Secondary Data Gathering Guide
Here are some steps to follow when gathering secondary data:
- Define your research question: Start by defining your research question and identifying the specific information you need to answer it. This will help you identify the type of secondary data you need and where to find it.
- Identify relevant sources: Identify potential sources of secondary data, including published sources, online databases, government sources, and commercial data providers. Consider the reliability and validity of each source.
- Evaluate the quality of the data: Evaluate the quality and reliability of the data you plan to use. Consider the data collection methods, sample size, and potential biases. Make sure the data is relevant to your research question and is suitable for the type of analysis you plan to conduct.
- Collect the data: Collect the relevant data from the identified sources. Use a consistent method to record and organize the data to make analysis easier.
- Validate the data: Validate the data to ensure that it is accurate and reliable. Check for inconsistencies, missing data, and errors. Address any issues before analyzing the data.
- Analyze the data: Analyze the data using appropriate statistical and analytical methods. Use descriptive and inferential statistics to summarize and draw conclusions from the data.
- Interpret the results: Interpret the results of your analysis and draw conclusions based on the data. Make sure your conclusions are supported by the data and are relevant to your research question.
- Communicate the findings : Communicate your findings clearly and concisely. Use appropriate visual aids such as graphs and charts to help explain your results.
Examples of Secondary Data
Here are some examples of secondary data from different fields:
- Healthcare : Hospital records, medical journals, clinical trial data, and disease registries are examples of secondary data sources in healthcare. These sources can provide researchers with information on patient demographics, disease prevalence, and treatment outcomes.
- Marketing : Market research reports, customer surveys, and sales data are examples of secondary data sources in marketing. These sources can provide marketers with information on consumer preferences, market trends, and competitor activity.
- Education : Student test scores, graduation rates, and enrollment statistics are examples of secondary data sources in education. These sources can provide researchers with information on student achievement, teacher effectiveness, and educational disparities.
- Finance : Stock market data, financial statements, and credit reports are examples of secondary data sources in finance. These sources can provide investors with information on market trends, company performance, and creditworthiness.
- Social Science : Government statistics, census data, and survey data are examples of secondary data sources in social science. These sources can provide researchers with information on population demographics, social trends, and political attitudes.
- Environmental Science : Climate data, remote sensing data, and ecological monitoring data are examples of secondary data sources in environmental science. These sources can provide researchers with information on weather patterns, land use, and biodiversity.
Purpose of Secondary Data
The purpose of secondary data is to provide researchers with information that has already been collected by others for other purposes. Secondary data can be used to support research questions, test hypotheses, and answer research objectives. Some of the key purposes of secondary data are:
- To gain a better understanding of the research topic : Secondary data can be used to provide context and background information on a research topic. This can help researchers understand the historical and social context of their research and gain insights into relevant variables and relationships.
- To save time and resources: Collecting new primary data can be time-consuming and expensive. Using existing secondary data sources can save researchers time and resources by providing access to pre-existing data that has already been collected and organized.
- To provide comparative data : Secondary data can be used to compare and contrast findings across different studies or datasets. This can help researchers identify trends, patterns, and relationships that may not have been apparent from individual studies.
- To support triangulation: Triangulation is the process of using multiple sources of data to confirm or refute research findings. Secondary data can be used to support triangulation by providing additional sources of data to support or refute primary research findings.
- To supplement primary data : Secondary data can be used to supplement primary data by providing additional information or insights that were not captured by the primary research. This can help researchers gain a more complete understanding of the research topic and draw more robust conclusions.
When to use Secondary Data
Secondary data can be useful in a variety of research contexts, and there are several situations in which it may be appropriate to use secondary data. Some common situations in which secondary data may be used include:
- When primary data collection is not feasible : Collecting primary data can be time-consuming and expensive, and in some cases, it may not be feasible to collect primary data. In these situations, secondary data can provide valuable insights and information.
- When exploring a new research area : Secondary data can be a useful starting point for researchers who are exploring a new research area. Secondary data can provide context and background information on a research topic, and can help researchers identify key variables and relationships to explore further.
- When comparing and contrasting research findings: Secondary data can be used to compare and contrast findings across different studies or datasets. This can help researchers identify trends, patterns, and relationships that may not have been apparent from individual studies.
- When triangulating research findings: Triangulation is the process of using multiple sources of data to confirm or refute research findings. Secondary data can be used to support triangulation by providing additional sources of data to support or refute primary research findings.
- When validating research findings : Secondary data can be used to validate primary research findings by providing additional sources of data that support or refute the primary findings.
Characteristics of Secondary Data
Secondary data have several characteristics that distinguish them from primary data. Here are some of the key characteristics of secondary data:
- Non-reactive: Secondary data are non-reactive, meaning that they are not collected for the specific purpose of the research study. This means that the researcher has no control over the data collection process, and cannot influence how the data were collected.
- Time-saving: Secondary data are pre-existing, meaning that they have already been collected and organized by someone else. This can save the researcher time and resources, as they do not need to collect the data themselves.
- Wide-ranging : Secondary data sources can provide a wide range of information on a variety of topics. This can be useful for researchers who are exploring a new research area or seeking to compare and contrast research findings.
- Less expensive: Secondary data are generally less expensive than primary data, as they do not require the researcher to incur the costs associated with data collection.
- Potential for bias : Secondary data may be subject to biases that were present in the original data collection process. For example, data may have been collected using a biased sampling method or the data may be incomplete or inaccurate.
- Lack of control: The researcher has no control over the data collection process and cannot ensure that the data were collected using appropriate methods or measures.
- Requires careful evaluation : Secondary data sources must be evaluated carefully to ensure that they are appropriate for the research question and analysis. This includes assessing the quality, reliability, and validity of the data sources.
Advantages of Secondary Data
There are several advantages to using secondary data in research, including:
- Time-saving : Collecting primary data can be time-consuming and expensive. Secondary data can be accessed quickly and easily, which can save researchers time and resources.
- Cost-effective: Secondary data are generally less expensive than primary data, as they do not require the researcher to incur the costs associated with data collection.
- Large sample size : Secondary data sources often have larger sample sizes than primary data sources, which can increase the statistical power of the research.
- Access to historical data : Secondary data sources can provide access to historical data, which can be useful for researchers who are studying trends over time.
- No ethical concerns: Secondary data are already in existence, so there are no ethical concerns related to collecting data from human subjects.
- May be more objective : Secondary data may be more objective than primary data, as the data were not collected for the specific purpose of the research study.
Limitations of Secondary Data
While there are many advantages to using secondary data in research, there are also some limitations that should be considered. Some of the main limitations of secondary data include:
- Lack of control over data quality : Researchers do not have control over the data collection process, which means they cannot ensure the accuracy or completeness of the data.
- Limited availability: Secondary data may not be available for the specific research question or study design.
- Lack of information on sampling and data collection methods: Researchers may not have access to information on the sampling and data collection methods used to gather the secondary data. This can make it difficult to evaluate the quality of the data.
- Data may not be up-to-date: Secondary data may not be up-to-date or relevant to the current research question.
- Data may be incomplete or inaccurate : Secondary data may be incomplete or inaccurate due to missing or incorrect data points, data entry errors, or other factors.
- Biases in data collection: The data may have been collected using biased sampling or data collection methods, which can limit the validity of the data.
- Lack of control over variables: Researchers have limited control over the variables that were measured in the original data collection process, which can limit the ability to draw conclusions about causality.
About the author
Muhammad Hassan
Researcher, Academic Writer, Web developer
You may also like
Qualitative Data – Types, Methods and Examples
Information in Research – Types and Examples
Research Data – Types Methods and Examples
Primary Data – Types, Methods and Examples
Quantitative Data – Types, Methods and Examples
Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service
Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve
Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground
Know how your people feel and empower managers to improve employee engagement, productivity, and retention
Take action in the moments that matter most along the employee journey and drive bottom line growth
Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people
Get faster, richer insights with qual and quant tools that make powerful market research available to everyone
Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts
Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market
Explore the platform powering Experience Management
- Free Account
- Product Demos
- For Digital
- For Customer Care
- For Human Resources
- For Researchers
- Financial Services
- All Industries
Popular Use Cases
- Customer Experience
- Employee Experience
- Net Promoter Score
- Voice of Customer
- Customer Success Hub
- Product Documentation
- Training & Certification
- XM Institute
- Popular Resources
- Customer Stories
- Artificial Intelligence
Market Research
- Partnerships
- Marketplace
The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Salt Lake City.
- English/AU & NZ
- Español/Europa
- Español/América Latina
- Português Brasileiro
- REQUEST DEMO
- Experience Management
- Secondary Research
Try Qualtrics for free
Secondary research: definition, methods, & examples.
19 min read This ultimate guide to secondary research helps you understand changes in market trends, customers buying patterns and your competition using existing data sources.
In situations where you’re not involved in the data gathering process ( primary research ), you have to rely on existing information and data to arrive at specific research conclusions or outcomes. This approach is known as secondary research.
In this article, we’re going to explain what secondary research is, how it works, and share some examples of it in practice.
Free eBook: The ultimate guide to conducting market research
What is secondary research?
Secondary research, also known as desk research, is a research method that involves compiling existing data sourced from a variety of channels . This includes internal sources (e.g.in-house research) or, more commonly, external sources (such as government statistics, organizational bodies, and the internet).
Secondary research comes in several formats, such as published datasets, reports, and survey responses , and can also be sourced from websites, libraries, and museums.
The information is usually free — or available at a limited access cost — and gathered using surveys , telephone interviews, observation, face-to-face interviews, and more.
When using secondary research, researchers collect, verify, analyze and incorporate it to help them confirm research goals for the research period.
As well as the above, it can be used to review previous research into an area of interest. Researchers can look for patterns across data spanning several years and identify trends — or use it to verify early hypothesis statements and establish whether it’s worth continuing research into a prospective area.
How to conduct secondary research
There are five key steps to conducting secondary research effectively and efficiently:
1. Identify and define the research topic
First, understand what you will be researching and define the topic by thinking about the research questions you want to be answered.
Ask yourself: What is the point of conducting this research? Then, ask: What do we want to achieve?
This may indicate an exploratory reason (why something happened) or confirm a hypothesis. The answers may indicate ideas that need primary or secondary research (or a combination) to investigate them.
2. Find research and existing data sources
If secondary research is needed, think about where you might find the information. This helps you narrow down your secondary sources to those that help you answer your questions. What keywords do you need to use?
Which organizations are closely working on this topic already? Are there any competitors that you need to be aware of?
Create a list of the data sources, information, and people that could help you with your work.
3. Begin searching and collecting the existing data
Now that you have the list of data sources, start accessing the data and collect the information into an organized system. This may mean you start setting up research journal accounts or making telephone calls to book meetings with third-party research teams to verify the details around data results.
As you search and access information, remember to check the data’s date, the credibility of the source, the relevance of the material to your research topic, and the methodology used by the third-party researchers. Start small and as you gain results, investigate further in the areas that help your research’s aims.
4. Combine the data and compare the results
When you have your data in one place, you need to understand, filter, order, and combine it intelligently. Data may come in different formats where some data could be unusable, while other information may need to be deleted.
After this, you can start to look at different data sets to see what they tell you. You may find that you need to compare the same datasets over different periods for changes over time or compare different datasets to notice overlaps or trends. Ask yourself: What does this data mean to my research? Does it help or hinder my research?
5. Analyze your data and explore further
In this last stage of the process, look at the information you have and ask yourself if this answers your original questions for your research. Are there any gaps? Do you understand the information you’ve found? If you feel there is more to cover, repeat the steps and delve deeper into the topic so that you can get all the information you need.
If secondary research can’t provide these answers, consider supplementing your results with data gained from primary research. As you explore further, add to your knowledge and update your findings. This will help you present clear, credible information.
Primary vs secondary research
Unlike secondary research, primary research involves creating data first-hand by directly working with interviewees, target users, or a target market. Primary research focuses on the method for carrying out research, asking questions, and collecting data using approaches such as:
- Interviews (panel, face-to-face or over the phone)
- Questionnaires or surveys
- Focus groups
Using these methods, researchers can get in-depth, targeted responses to questions, making results more accurate and specific to their research goals. However, it does take time to do and administer.
Unlike primary research, secondary research uses existing data, which also includes published results from primary research. Researchers summarize the existing research and use the results to support their research goals.
Both primary and secondary research have their places. Primary research can support the findings found through secondary research (and fill knowledge gaps), while secondary research can be a starting point for further primary research. Because of this, these research methods are often combined for optimal research results that are accurate at both the micro and macro level.
First-hand research to collect data. May require a lot of time | The research collects existing, published data. May require a little time |
Creates raw data that the researcher owns | The researcher has no control over data method or ownership |
Relevant to the goals of the research | May not be relevant to the goals of the research |
The researcher conducts research. May be subject to researcher bias | The researcher collects results. No information on what researcher bias existsSources of secondary research |
Can be expensive to carry out | More affordable due to access to free data |
Sources of Secondary Research
There are two types of secondary research sources: internal and external. Internal data refers to in-house data that can be gathered from the researcher’s organization. External data refers to data published outside of and not owned by the researcher’s organization.
Internal data
Internal data is a good first port of call for insights and knowledge, as you may already have relevant information stored in your systems. Because you own this information — and it won’t be available to other researchers — it can give you a competitive edge . Examples of internal data include:
- Database information on sales history and business goal conversions
- Information from website applications and mobile site data
- Customer-generated data on product and service efficiency and use
- Previous research results or supplemental research areas
- Previous campaign results
External data
External data is useful when you: 1) need information on a new topic, 2) want to fill in gaps in your knowledge, or 3) want data that breaks down a population or market for trend and pattern analysis. Examples of external data include:
- Government, non-government agencies, and trade body statistics
- Company reports and research
- Competitor research
- Public library collections
- Textbooks and research journals
- Media stories in newspapers
- Online journals and research sites
Three examples of secondary research methods in action
How and why might you conduct secondary research? Let’s look at a few examples:
1. Collecting factual information from the internet on a specific topic or market
There are plenty of sites that hold data for people to view and use in their research. For example, Google Scholar, ResearchGate, or Wiley Online Library all provide previous research on a particular topic. Researchers can create free accounts and use the search facilities to look into a topic by keyword, before following the instructions to download or export results for further analysis.
This can be useful for exploring a new market that your organization wants to consider entering. For instance, by viewing the U.S Census Bureau demographic data for that area, you can see what the demographics of your target audience are , and create compelling marketing campaigns accordingly.
2. Finding out the views of your target audience on a particular topic
If you’re interested in seeing the historical views on a particular topic, for example, attitudes to women’s rights in the US, you can turn to secondary sources.
Textbooks, news articles, reviews, and journal entries can all provide qualitative reports and interviews covering how people discussed women’s rights. There may be multimedia elements like video or documented posters of propaganda showing biased language usage.
By gathering this information, synthesizing it, and evaluating the language, who created it and when it was shared, you can create a timeline of how a topic was discussed over time.
3. When you want to know the latest thinking on a topic
Educational institutions, such as schools and colleges, create a lot of research-based reports on younger audiences or their academic specialisms. Dissertations from students also can be submitted to research journals, making these places useful places to see the latest insights from a new generation of academics.
Information can be requested — and sometimes academic institutions may want to collaborate and conduct research on your behalf. This can provide key primary data in areas that you want to research, as well as secondary data sources for your research.
Advantages of secondary research
There are several benefits of using secondary research, which we’ve outlined below:
- Easily and readily available data – There is an abundance of readily accessible data sources that have been pre-collected for use, in person at local libraries and online using the internet. This data is usually sorted by filters or can be exported into spreadsheet format, meaning that little technical expertise is needed to access and use the data.
- Faster research speeds – Since the data is already published and in the public arena, you don’t need to collect this information through primary research. This can make the research easier to do and faster, as you can get started with the data quickly.
- Low financial and time costs – Most secondary data sources can be accessed for free or at a small cost to the researcher, so the overall research costs are kept low. In addition, by saving on preliminary research, the time costs for the researcher are kept down as well.
- Secondary data can drive additional research actions – The insights gained can support future research activities (like conducting a follow-up survey or specifying future detailed research topics) or help add value to these activities.
- Secondary data can be useful pre-research insights – Secondary source data can provide pre-research insights and information on effects that can help resolve whether research should be conducted. It can also help highlight knowledge gaps, so subsequent research can consider this.
- Ability to scale up results – Secondary sources can include large datasets (like Census data results across several states) so research results can be scaled up quickly using large secondary data sources.
Disadvantages of secondary research
The disadvantages of secondary research are worth considering in advance of conducting research :
- Secondary research data can be out of date – Secondary sources can be updated regularly, but if you’re exploring the data between two updates, the data can be out of date. Researchers will need to consider whether the data available provides the right research coverage dates, so that insights are accurate and timely, or if the data needs to be updated. Also, fast-moving markets may find secondary data expires very quickly.
- Secondary research needs to be verified and interpreted – Where there’s a lot of data from one source, a researcher needs to review and analyze it. The data may need to be verified against other data sets or your hypotheses for accuracy and to ensure you’re using the right data for your research.
- The researcher has had no control over the secondary research – As the researcher has not been involved in the secondary research, invalid data can affect the results. It’s therefore vital that the methodology and controls are closely reviewed so that the data is collected in a systematic and error-free way.
- Secondary research data is not exclusive – As data sets are commonly available, there is no exclusivity and many researchers can use the same data. This can be problematic where researchers want to have exclusive rights over the research results and risk duplication of research in the future.
When do we conduct secondary research?
Now that you know the basics of secondary research, when do researchers normally conduct secondary research?
It’s often used at the beginning of research, when the researcher is trying to understand the current landscape . In addition, if the research area is new to the researcher, it can form crucial background context to help them understand what information exists already. This can plug knowledge gaps, supplement the researcher’s own learning or add to the research.
Secondary research can also be used in conjunction with primary research. Secondary research can become the formative research that helps pinpoint where further primary research is needed to find out specific information. It can also support or verify the findings from primary research.
You can use secondary research where high levels of control aren’t needed by the researcher, but a lot of knowledge on a topic is required from different angles.
Secondary research should not be used in place of primary research as both are very different and are used for various circumstances.
Questions to ask before conducting secondary research
Before you start your secondary research, ask yourself these questions:
- Is there similar internal data that we have created for a similar area in the past?
If your organization has past research, it’s best to review this work before starting a new project. The older work may provide you with the answers, and give you a starting dataset and context of how your organization approached the research before. However, be mindful that the work is probably out of date and view it with that note in mind. Read through and look for where this helps your research goals or where more work is needed.
- What am I trying to achieve with this research?
When you have clear goals, and understand what you need to achieve, you can look for the perfect type of secondary or primary research to support the aims. Different secondary research data will provide you with different information – for example, looking at news stories to tell you a breakdown of your market’s buying patterns won’t be as useful as internal or external data e-commerce and sales data sources.
- How credible will my research be?
If you are looking for credibility, you want to consider how accurate the research results will need to be, and if you can sacrifice credibility for speed by using secondary sources to get you started. Bear in mind which sources you choose — low-credibility data sites, like political party websites that are highly biased to favor their own party, would skew your results.
- What is the date of the secondary research?
When you’re looking to conduct research, you want the results to be as useful as possible , so using data that is 10 years old won’t be as accurate as using data that was created a year ago. Since a lot can change in a few years, note the date of your research and look for earlier data sets that can tell you a more recent picture of results. One caveat to this is using data collected over a long-term period for comparisons with earlier periods, which can tell you about the rate and direction of change.
- Can the data sources be verified? Does the information you have check out?
If you can’t verify the data by looking at the research methodology, speaking to the original team or cross-checking the facts with other research, it could be hard to be sure that the data is accurate. Think about whether you can use another source, or if it’s worth doing some supplementary primary research to replicate and verify results to help with this issue.
We created a front-to-back guide on conducting market research, The ultimate guide to conducting market research , so you can understand the research journey with confidence.
In it, you’ll learn more about:
- What effective market research looks like
- The use cases for market research
- The most important steps to conducting market research
- And how to take action on your research findings
Download the free guide for a clearer view on secondary research and other key research types for your business.
Related resources
Market intelligence 10 min read, marketing insights 11 min read, ethnographic research 11 min read, qualitative vs quantitative research 13 min read, qualitative research questions 11 min read, qualitative research design 12 min read, primary vs secondary research 14 min read, request demo.
Ready to learn more about Qualtrics?
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
- Publications
- Account settings
Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .
- Advanced Search
- Journal List
- J Gen Intern Med
- v.26(8); 2011 Aug
Conducting High-Value Secondary Dataset Analysis: An Introductory Guide and Resources
Alexander k. smith.
1 Division of Geriatrics, Department of Medicine, University of California, San Francisco, 4150 Clement St (181G), 94121 San Francisco, CA USA
2 Veterans Affairs Medical Center, San Francisco, CA USA
John Z. Ayanian
3 Harvard Medical School, Boston, MA USA
4 Department of Health Care Policy, Harvard School of Public Health, Boston, MA USA
5 Division of General Medicine, Brigham and Women’s Hospital, Boston, MA USA
Kenneth E. Covinsky
Bruce e. landon.
6 Division of General Medicine and Primary Care, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA USA
Ellen P. McCarthy
Christina c. wee, michael a. steinman.
Secondary analyses of large datasets provide a mechanism for researchers to address high impact questions that would otherwise be prohibitively expensive and time-consuming to study. This paper presents a guide to assist investigators interested in conducting secondary data analysis, including advice on the process of successful secondary data analysis as well as a brief summary of high-value datasets and online resources for researchers, including the SGIM dataset compendium ( www.sgim.org/go/datasets ). The same basic research principles that apply to primary data analysis apply to secondary data analysis, including the development of a clear and clinically relevant research question, study sample, appropriate measures, and a thoughtful analytic approach. A real-world case description illustrates key steps: (1) define your research topic and question; (2) select a dataset; (3) get to know your dataset; and (4) structure your analysis and presentation of findings in a way that is clinically meaningful. Secondary dataset analysis is a well-established methodology. Secondary analysis is particularly valuable for junior investigators, who have limited time and resources to demonstrate expertise and productivity.
INTRODUCTION
Secondary data analysis is analysis of data that was collected by someone else for another primary purpose. Increasingly, generalist researchers start their careers conducting analyses of existing datasets, and some continue to make this the focus of their career. Using secondary data enables one to conduct studies of high-impact research questions with dramatically less time and resources than required for most studies involving primary data collection. For fellows and junior faculty who need to demonstrate productivity by completing and publishing research in a timely manner, secondary data analysis can be a key foundation to successfully starting a research career. Successful completion demonstrates content and methodological expertise, and may yield useful data for future grants. Despite these attributes, conducting high quality secondary data research requires a distinct skill set and substantial effort. However, few frameworks are available to guide new investigators as they conduct secondary data analysies. 1 – 3
In this article we describe key principles and skills needed to conduct successful analysis of secondary data and provide a brief description of high-value datasets and online resources. The primary target audience of the article is investigators with an interest but limited prior experience in secondary data analysis, as well as mentors of these investigators, who may find this article a useful reference and teaching tool. While we focus on analysis of large, publicly available datasets, many of the concepts we cover are applicable to secondary analysis of proprietary datasets. Datasets we feature in this manuscript encompass a wide range of measures, and thus can be useful to evaluate not only one disease in isolation, but also its intersection with other clinical, demographic, and psychosocial characteristics of patients.
REASONS TO CONDUCT OR TO AVOID A SECONDARY DATA ANALYSIS
Many worthwhile studies simply cannot be done in a reasonable timeframe and cost with primary data collection. For example, if you wanted to examine racial and ethnic differences in health services utilization over the last 10 years of life, you could enroll a diverse cohort of subjects with chronic illness and wait a decade (or longer) for them to die, or you could find a dataset that includes a diverse sample of decedents. Even for less dramatic examples, primary data collection can be difficult without incurring substantial costs, including time and money—scarce resources for junior researchers in particular. Secondary datasets, in contrast, can provide access to large sample sizes, relevant measures, and longitudinal data, allowing junior investigators to formulate a generalizable answer to a high impact question. For those interested in conducting primary data collection, beginning with a secondary data analysis may provide a “bird’s eye view” of epidemiologic trends that future primary data studies examine in greater detail.
Secondary data analyses, however, have disadvantages that are important to consider. In a study focused on primary data, you can tightly control the desired study population, specify the exact measures that you would like to assess, and examine causal relationships (e.g., through a randomized controlled design). In secondary data analyses, the study population and measures collected are often not exactly what you might have chosen to collect, and the observational nature of most secondary data makes it difficult to assess causality (although some quasi-experimental methods, such as instrumental variable or regression discontinuity analysis, can partially address this issue). While not unique to secondary data analysis, another disadvantage to publicly available datasets is the potential to be “scooped,” meaning that someone else publishes a similar study from the same data set before you do. On the other hand, intentional replication of a study in a different dataset can be important in that it either supports or refutes the generalizability of the original findings. If you do find that someone has published the same study using the same dataset, try to find a unique angle to your study that builds on their findings.
STEPS TO CONDUCTING A SUCCESSFUL SECONDARY DATA ANALYSIS
The same basic research principles that apply to studies using primary data apply to secondary data analysis, including the development of a clear research question, study sample, appropriate measures, and a thoughtful analytic approach. For purposes of secondary data analysis, these principles can be conceived as a series of four key steps, described in Table 1 and the sections below. Table 2 provides a glossary of terms used in secondary analysis including dataset types and common sampling terminology.
Table 1
A Practical Approach to Successful Research with Large Datasets
Steps | Practical advice |
---|---|
(1) Define your research topic and question | (1) Start with a thorough literature review |
(2) Ensure that the research question has clinical or policy relevance and is based on sound a priori reasoning. A good question is what makes a study good, not a large sample size | |
(3) Be flexible to adapt your question to the strengths and limitations of the potential datasets | |
(2) Select a dataset | (1) Use a resource such as the Society of General Internal Medicine’s Online Compendium ( ) (Table ) |
(2) To increase the novelty of your work, consider selecting a dataset that has not been widely used in your field or link datasets together to gain a fresh perspective | |
(3) Factor in complexity of the dataset | |
(4) Factor in dataset cost and time to acquire the actual dataset | |
(5) Consider selecting a dataset your mentor has used previously | |
(3) Get to know your dataset | (1) Learn the answers to the following questions: |
•Why does the database exist? | |
•Who reports the data? | |
•What are the incentives for accurate reporting? | |
•How are the data audited, if at all? | |
•Can you link your dataset to other large datasets? | |
(2) Read everything you can about the database | |
(3) Check to see if your measures have been validated against other sources | |
(4) Get a close feel for the data by analyzing it yourself or closely reviewing outputs if someone else is doing the programming | |
(4) Structure your analysis and presentation of findings in a way that is clinically meaningful | (1) Think carefully about the clinical implications of your findings |
(2) Be cautious when interpreting statistical significance (i.e., p-values). Large sample sizes can yield associations that are highly statistically significant but not clinically meaningful | |
(3) Consult with a statistician for complex datasets and analyses | |
(4) Think carefully about how you portray the data. A nice figure sometimes tells the story better than rows of data |
Table 2
Glossary of Terms Used in Secondary Dataset Analysis Research
Term | Meaning |
---|---|
Types of datasets (not mutually exclusive) | |
Administrative or claims data | Datasets generated from reimbursement claims, such as ICD-9 codes used to bill for clinical encounters, or discharge data such as discharge diagnoses |
Longitudinal data | Datasets that measure factors of interest within the same subjects over time |
Clinical registries | Datasets generated from registries of specific clinical conditions, such as regional cancer registries used to create the Surveillance Epidemiology and End Results Program (SEER) dataset |
Population-based survey | A target population is available and well-defined, and a systematic approach is used to select members of that population to take part in the study. For example, SEER is a population-based survey because it aims to include data on all individuals with cancer cared for in the included regions |
Nationally representative survey | Survey sample that is designed to be representative of the target population on a national level. Often uses a complex sampling scheme. The Health and Retirement Study (HRS), for example, is nationally representative of community-dwelling adults over age 50 |
Panel survey | A longitudinal survey in which data are collected in the same panel of subjects over time. As one panel is at the middle or end of its participation, a panel of new participants is enrolled. In the Medical Expenditures Panel Survey (MEPS), for example, individuals in the same household are surveyed several times over the course of 2 years |
Statistical sampling terms | |
Clustering | Even simple random samples can be prohibitively expensive for practical reasons such as geographic distance between selected subjects. Identifying subjects within defined clusters, such as geographic regions or subjects treated by the same physicians, reduces cost and improves the feasibility of the study but may decrease the precision of the estimated variance (e.g., wider confidence intervals) |
Complex survey design | A survey design that is not a simple random selection of subjects. Surveys that incorporate stratification, clustering and oversampling (with patient weights) are examples of complex data. Statistical software is available that can account for complex survey designs and is often needed to generate accurate findings |
Oversampling | Intentionally sampling a greater proportion of a subgroup, increasing the precision of estimates for that subgroup. For example, in the HRS, African-Americans, Latinos, and residents of Florida are oversampled (see also survey weights) |
Stratification | In stratification, the target population is divided into relatively homogeneous groups, and a pre-specified number of subjects is sampled from within each stratum. For example, in the National Ambulatory Medical Care Survey physicians are divided by specialty within each geographic area targeted for the survey, and a certain number of each type of physician is then identified to participate and provide data about their patients |
Survey weights | Weights are used to account for the unequal probability of subject selection due to purposeful over- or under-sampling of certain types of subjects and non-response bias. The survey weight is the inverse probability of being selected. By applying survey weights, the effects of over- and under-sampling of certain types of patients can be corrected such that the data are representative of the entire target population |
Define your Research Topic and Question
A fellow in general medicine has a strong interest in studying palliative and end-of-life care. Building on his interest in racial and ethnic disparities, he wants to examine disparities in use of health services at the end of life. He is leaning toward conducting a secondary data analysis and is not sure if he should begin with a more focused research question or a search for a dataset.
Investigators new to secondary data research are frequently challenged by the question “which comes first, the question or the dataset?” In general, we advocate that researchers begin by defining their research topic or question. A good question is essential—an uninteresting study with a huge sample size or extensively validated measures is still uninteresting. The answer to a research question should have implications for patient care or public policy. Imagine the possible findings and ask the dreaded question: "so what?" If possible, select a question that will be interesting regardless of the direction of the findings: positive or negative. Also, determine a target audience who would find your work interesting and useful.
It is often useful to start with a thorough literature review of the question or topic of interest. This effort both avoids duplicating others’ work and develops ways to build upon the literature. Once the question is established, identify datasets that are the best fit, in terms of the patient population, sample size, and measures of the variables of interest (including predictors, outcomes, and potential confounders). Once a candidate dataset has been identified, we recommend being flexible and adapting the research question to the strengths and limitations of the dataset, as long as the question remains interesting and specific and the methods to answer it are scientifically sound. Be creative. Some measures of interest may not have been ascertained directly, but data may be available to construct a suitable proxy. In some cases, you may find a dataset that initially looked promising lacks the necessary data (or data quality) to answer research questions in your area of interest reliably. In that case, you should be prepared to search for an alternative dataset.
A specific research question is essential to good research. However, many researchers have a general area of interest but find it difficult to identify specific research questions without knowing the specific data available. In that case, combing research documentation for unexamined yet interesting measures in your area of interest can be fruitful. Beginning with the dataset and no focused area of interest may lead to data dredging—simply creating cross tabulations of unexplored variables in search of significant associations is bad science. Yet, in our experience, many good studies have resulted from a researcher with a general topic area of interest finding a clinically meaningful yet underutilized measure and having the insight to frame a research question that uses that measure to answer a novel and clinically compelling question (see references for examples). 4 – 8 Dr. Warren Browner once exhorted, “just because you were not smart enough to think of a research question in advance doesn’t mean it’s not important!” [quote used with permission].
Select a Dataset
Case continued.
After a review of available datasets that fit his topic area of interest, the fellow decides to use data from the Surveillance Epidemiology and End Results Program linked to Medicare claims (SEER-Medicare).
The range and intricacy of large datasets can be daunting to a junior researcher. Fortunately, several online compendia are available to guide researchers (Table 3 ), including one recently developed by this manuscript’s authors for the Society of General Internal Medicine (SGIM) ( www.sgim.org/go/datasets ). The SGIM Research Dataset Compendium was developed and is maintained by members of the SGIM research committee. SGIM Compendium developers consulted with experts to identify and profile high-value datasets for generalist researchers. The Compendium includes a description of and links to over 40 high-value datasets used for health services, clinical epidemiology, and medical education research. The SGIM Compendium provides detailed information of use in selecting a dataset, including sample sizes and characteristics, available measures and how data was measured, comments from expert users, links to the dataset, and example publications (see Box for example). A selection of datasets from this Compendium is listed in Table 4 . SGIM members can request a one-time telephone consultation with an expert user of a large dataset (see details on the Compendium website).
Table 3
Online Compendia of Secondary Datasets
Compendium | Web address | Description |
---|---|---|
Society of General Internal Medicine (SGIM) Research Dataset Compendium | Designed to assist investigators conducting research on existing datasets, with a particular emphasis on health services research, clinical epidemiology, and research on medical education. Includes information on strengths and weaknesses of datasets and the insights of experienced users about making best use of the data | |
National Information Center on Health Services Research and Health Care Technology (NICHSR) | This group of sites provides links to a wide variety of data tools and statistics, including research datasets, data repositories, health statistics, survey instruments, and more. It is sponsored by the National Library of Medicine | |
Inter-University Consortium for Political and Social Research (ICPSR) | World’s largest archive of digital social science data, including many datasets with extensive information on health and health care. ICPSR includes many sub-archives on specific topic areas, including minority health, international data, substance abuse, and mental health, and more | |
Partners in Information Access for the Public Health Workforce | Provides links to a variety of national, state, and local health and public health datasets. Also provides links to sites providing a wide variety of health statistics, information on health information technology and standards, and other resources. Sponsored by a collaboration of US government agencies, public health organizations, and health sciences libraries | |
Canadian Research Data Centres | Links to datasets available for analysis through Canada’s Research Data Centres (RDC) program | |
Directory of Health and Human Services Data Resources (US Dept. of Health and Human Services) | This site provides brief information and links to almost all datasets from National Institutes of Health (NIH), Centers for Disease Control and Prevention (CDC), Centers for Medicare and Medicaid Services (CMS), Agency for Healthcare Research and Quality (AHRQ), Food and Drug Administration (FDA), and other agencies of the US Department of Health and Human Services | |
National Center for Health Statistics (NCHS) | This site links to a variety of datasets from the National Center for Health Statistics, several of which are profiled in Table . These datasets are available for downloading at no cost | |
Medicare Research Data Assistance Center (RESDAC); and Centers for Medicare and Medicaid Services (CMS) Research, Statistics, Data & Systems | These sites link to a variety of datasets from the Centers for Medicare and Medicaid Services (CMS) | |
Veterans Affairs (VA) data | A series of datasets using administrative and computerized clinical data to describe care provided in the VA health care system, including information on outpatient visits, pharmacy data, inpatient data, cost data, and more. With some exceptions, use is generally restricted to researchers with VA affiliations (this can include a co-investigator with a VA affiliation) |
Table 4
Examples of High Value Datasets
Cost, availability, and complexity | Dataset | Description | Sample publications |
---|---|---|---|
Free. Readily available. Population-based survey with cross-sectional design. Does not require special statistical techniques to address complex sampling | Surveillance, Epidemiology and End Results Program (SEER) | Population-based multi-regional cancer registry database. SEER data are updated annually. Can be linked to Medicare claims and files (see Medicare below) | Trends in breast-conserving surgery among Asian Americans and Pacific Islanders, 1992–2000 |
Treatment and outcomes of gastric cancer among US-born and foreign-born Asians and Pacific Islanders | |||
Free. Readily available. Requires statistical considerations to account for complex sampling design and use of survey weights | National Ambulatory Medical Care Survey (NAMCS) & National Hospital Ambulatory Care Survey (NHAMCS) | Nationally-representative serial cross-sectional surveys of outpatient and emergency department visits. Can combine survey years to increase sample sizes (e.g., for uncommon conditions) or evaluate temporal trends. Provides national estimates | Preventive health examinations and preventive gynecological examinations in the US |
The NAMCS and NHAMCS are conducted annually. Do not link to other datasets | Primary care physician office visits for depression by older Americans | ||
National Health Interview Survey (NHIS) | Nationally-representative serial cross-sectional survey of individuals and families including information on health status, injuries, health insurance, access and utilization information. The NHIS is conducted annually. Can combine survey years to look at rare conditions | Psychological distress in long-term survivors of adult-onset cancer: results from a national survey | |
Can be linked to National Center for Health Statistics Mortality Data; Medicare enrollment and claims data; Social Security Benefit History Data; Medical Expenditure Panel Survey (MEPS) data; and National Immunization Provider Records Check Survey (NIPRCS) data from 1997–1999 | Diabetes and Cardiovascular Disease among Asian Indians in the US | ||
Behavioral Risk Factor Surveillance System (BRFSS) | Serial cross-sectional nationally-representative survey of health risk behaviors, preventative health practices, and health care access. Provides national and state estimates. Since 2002, the Selected Metropolitan/Micropolitan Area Risk Trends (SMART) project has also used BRFSS data to identify trends in selected metropolitan and micropolitan statistical areas (MMSAs) with 500 or more respondents. BRFSS data are collected monthly. Does not link to other datasets | Perceived discrimination in health care and use of preventive health services Use of recommended ambulatory care services: is the Veterans Affairs quality gap narrowing? | |
Free or minimal cost. Readily available. Can do more complex studies by combining data from multiple waves and/or records. Accounting for complex sampling design and use of survey weights can be more complex when using multiple waves—seek support from a statistician. Or can restrict sample to single waves for ease of use | Nationwide Inpatient Sample (NIS) | The largest US database of inpatient hospital stays that incorporates data from all payers, containing data from approximately 20% of US community hospitals. Sampling frame includes approximately 90% of discharges from US hospitals | Factors associated with patients who leave acute-care hospitals against medical advice |
NIS data is collected annually. For most states, the NIS includes hospital identifiers that permit linkages to the American Hospital Association (AHA) Annual Survey Database and county identifiers that permit linkages to the Area Resource File (ARF) | Impact of hospital volume on racial disparities in cardiovascular procedure mortality | ||
National Health and Nutrition Examination Survey (NHANES) | Nationally- representative series of studies combining data from interviews, physical examination, and laboratory tests | Demographic differences and trends of vitamin D insufficiency in the US population,1988-2004 | |
NHANES data are collected annually. Can be linked to National Death Index (NDI) mortality data; Medicare enrollment and claims data; Social Security Benefit History Data; and Medical Expenditure Panel Survey (MEPS) data; and Dual Energy X-Ray Absorptiometry (DXA) Multiple Imputation Data Files from 1999–2004 | Association of hypertension, diabetes, dyslipidemia, and metabolic syndrome with obesity: findings from the National Health and Nutrition Examination Survey, 1999 to 2004 | ||
The Health and Retirement Study (HRS) | A nationally-representative longitudinal survey of adults older than 50 designed to assess health status, employment decisions, and economic security during retirement | Chronic conditions and mortality among the oldest old | |
HRS data is collected every 2 years. Can be linked to Social Security Administration data; Internal Revenue Service data; Medicare claims data (see Medicare below); and Minimum Data Set (MDS) data | Advance directed and surrogate decision making before death | ||
Medical Expenditure Panel Survey (MEPS) | Serial nationally-representative panel survey of individuals, families, health care providers, and employers covering a variety of topics. MEPS data are collected annually | Loss of health insurance among non-elderly adults in Medicaid | |
Can be linked by request to the Agency for Healthcare Research and Quality to numerous datasets including the NHIS, Medicare data, and Social Security data | Influence of patient-provider communication on colorectal cancer screening | ||
Data costs are in the thousands to tens of thousands of dollars. Requires an extensive application and time to acquire data is on the order of months at a minimum. Databases frequently have observations on the order of 100,000 to >1,000,000. Require additional statistical considerations to account for complex sampling design, use of survey weights, or longitudinal analysis. Multiple records per individual. Complex database structure requires a higher degree of analytic and programming skill to create a study dataset efficiently. | Medicare claims data (alone), SEER-Medicare, and HRS-Medicare | Claims data on Medicare beneficiaries including demographics and resource utilization in a wide variety of inpatient and outpatient settings. Medicare claims data are collected continually and made available annually. Can be linked to other Medicare datasets that use the same unique identifier numbers for patients, providers, and institutions, for example, the Medicare Current Beneficiary Survey, the Long-Term Care Minimum Data Set, the American Hospital Association Annual Survey, and others. SEER and the HRS offer linkages to Medicare data as well (as described above) | Long-term outcomes and costs of ventricular assist devices among Medicare beneficiaries |
Association between the Medicare Modernization Act of 2003 and patient wait times and travel distance for chemotherapy | |||
Medicare Current Beneficiary Survey (MCBS) | Panel survey of a nationally-representative sample of Medicare beneficiaries including health status, health care use, health insurance, socioeconomic and demographic characteristics, and health expenditures. MCBS data are collected annually. Can be linked to other Medicare Data | Cost-related medication nonadherence and spending on basic needs following implementation of Medicare Part D | |
Medicare beneficiaries and free prescription drug samples: a national survey |
Dataset complexity, cost, and time to acquire the data and obtain institutional review board (IRB) approval are critical considerations for junior researchers, who are new to secondary analysis, have few financial resources, and limited time to demonstrate productivity. Table 4 illustrates the complexity and cost of large datasets across a range of high value datasets used by generalist researchers. Dataset complexity increases by number of subjects, file structure (e.g., single versus multiple records per individual), and complexity of the survey design. Many publicly available datasets are free, and others can cost tens of thousands of dollars to obtain. Time to acquire the datasets and obtain IRB board approval vary. Some datasets can be downloaded from the web, others require multiple layers of permission and security, and in some cases data must be analyzed in a central data processing center. If the project requires linking new data to an existing database, this linkage will add to the time needed to complete the project and probably require enhanced data security. One advantage of most secondary studies using publicly available datasets is the rapid time to IRB approval. Many publicly available large datasets contain de-identified data and are therefore eligible for expedited review or exempt status. If you can download the dataset from the web, it is probably exempt, but your local IRB must make this determination.
Linking datasets can be a powerful method for examining an issue by providing multiple perspectives of patient experience. Many datasets, including SEER, for example, can be linked to the Area Resource File to examine regional variation in practice patterns. However, linking datasets together increases the complexity and cost of data management. A new researcher might consider first conducting a study only on the initial database, and then conducting their next study using the linked database. For some new investigators, this approach can progressively advance programming skills and build confidence while demonstrating productivity.
Get to Know your Dataset
The fellow’s primary mentor encourages him to closely examine the accuracy of the primary predictor for his study—race and ethnicity—as reported in SEER-Medicare. The fellow has a breakthrough when he finds an entire issue of the journal Medical Care dedicated to SEER-Medicare, including a whole chapter on the accuracy of coding of sociodemographic factors. 9
In an analysis of primary data you select the patients to be studied and choose the study measures. This process gives you a close familiarity with study subjects, and how and what data were collected, that is invaluable in assessing the validity of their measures, the potential bias in measuring associations between predictors and outcome variables (internal validity), and the generalizability of their findings to target populations (external validity). The importance of this familiarity with the strengths and weaknesses of the dataset cannot be overemphasized. Secondary data research requires considerable effort to obtain the same level of familiarity with the data. Therefore, knowing your data in detail is critical. Practically, this objective requires scouring online documentation and technical survey manuals, searching PubMed for validation studies, and closely reading previous studies using your dataset, to answer the following types of questions: Who collected the data, and for what purpose? How did subjects get into your dataset? How were they followed? Do your measures capture what you think they capture?
We strongly recommend taking advantage of help offered by the dataset managers, typically described on the dataset’s website. For example, the Research Data Assistance Center (ResDAC) is a dedicated resource for researchers using data from the Centers for Medicare and Medicaid Services (CMS).
Assessing the validity of your measures is one of the central challenges of large dataset research. For large survey datasets, a good first step in assessing the validity of your measures is to read the questions as they were asked in the survey. Some questions simply have face validity. Others, unfortunately, were collected in a way that makes the measure meaningless, problematic, or open to a range of interpretations. These ambiguities can occur in how the question was asked or in how the data were recorded into response categories.
Another essential step is to search the online documentation and published literature for previous validation studies. A PubMed search using the dataset name or measure name/type and the publication type “validation studies” is a good starting point. The key question for a validity study relates to how and why the question was asked and data were collected (e.g., self-report, chart abstraction, physical measurements, billing claims) in relationship to a gold standard. For example, if you are using claims data you should recognize that the primary purpose of those data was not for research, but for reimbursement. Consequently, claims data are limited by the scope of services that are reimbursable and the accuracy of coding by clinicians completing encounter forms for billing or by coders in the claims departments of hospitals and clinics. Some clinical measures can be assessed by asking subjects if they have the condition of interest, such as self reported diagnosis of hypertension. Self-reported data may be adequate for some research questions (e.g., does a diagnosis of hypertension lead people to exercise more?), but inadequate for others (e.g., the prevalence of hypertension among people with diabetes). Even measured data, such as blood pressure, have limitations in that methods of measurement for a study may differ from methods used to diagnose a disorder in the clinician’s office. In the National Health and Nutrition Examination Survey, for example, subject’s blood pressure is based on the average of several measures in a single visit. This differs from the standard clinical practice of measuring blood pressure at separate office visits before diagnosing hypertension. Rarely do available measures capture exactly what you are trying to study. In our experience measures in existing datasets are often good enough to answer the research question, with proper interpretation to account for what the measures actually assesses and how they differ from the underlying constructs.
Finally, we suggest paying close attention to the completeness of measures, and evaluating whether missing data are random or non-random (the latter might result in bias, whereas the former is generally acceptable). Statistical approaches to missing data are beyond the scope of this paper, and most statisticians can help you address this problem appropriately. However, pay close attention to “skip patterns”; some data are missing simply because the survey item is only asked of a subset for which it applies. For example, in the Health and Retirement Study the question about need for assistance with toileting is only asked of subjects who respond that they have difficulty using the toilet. If you were unaware of this skip pattern and attempted to study assistance with toileting, you would be distressed to find over three-quarters of respondents had missing responses for this question (because they reported no difficulty using the toilet).
Fellows and other trainees usually do their own computer programming. Although this may be daunting, we encourage this practice so fellows can get a close feel for the data and become more skilled in statistical analysis. Datasets, however, range in complexity (Table 4 ). In our experience, fellows who have completed introductory training in SAS, STATA, SPSS, or other similar statistical software have been highly successful analyzing datasets of moderate complexity without the on-going assistance of a statistical programmer. However, if you do have a programmer who will do much of the coding, be closely involved and review all data cleaning and statistical output as if you had programmed it yourself. Close attention can reveal all sorts of patterns, problems, and opportunities with the data that are obscured by focusing only on the final outputs prepared by a statistical programmer. Programmers and statisticians are not clinicians; they will often not recognize when the values of variables or patterns of missingness don’t make sense. If estimates seem implausible or do not match previously published estimates, then the analytic plan, statistical code, and measures should be carefully rechecked.
Keep in mind that “the perfect may be the enemy of the good.” No one expects perfect measures (this is also true for primary data collection). The closer you are to the data, the more you see the warts—don’t be discouraged by this. The measures need to pass the sniff test, in other words have clinical validity based primarily on judgement that they make sense clinically or scientifically, but also supported where possible by validation procedures, reference to auditing procedures, or in other studies that have independently validated the measures of interest.
Structure your Analysis and Presentation of Findings in a Way that Is Clinically Meaningful
Case continued.
The fellow finds that Blacks are less likely to receive chemotherapy in the last 2 weeks of life (Blacks 4%, Whites 6%, p < 0.001). He debates the meaning of this statistically significant 2% absolute difference.
Often, the main challenge for investigators who are new to secondary data analysis is carefully structuring the analysis and presentation of findings in a way that tells a meaningful story. Based on what you’ve found, what is the story that you want your target audience to understand? When appropriate, it can be useful to conduct carefully planned sensitivity analysis to evaluate the robustness of your primary findings. A sensitivity analysis assesses the effect of variation in assumptions on the outcome of interest. For example, if 10% of subjects did not answer a “yes” or “no” question, you could conduct sensitivity analyses to estimate the effects of excluding missing responses, or categorizing them as all “yes” or all “no.” Because large datasets may contain multiple measures of interests, co-variates, and outcomes, a frequent temptation is to present huge tables with multiple rows and columns. This is a mistake. These tables can be challenging to sort through, and the clinical importance of the story resulting from the analysis can be lost. In our experience, a thoughtful figure often captures the take-home message in a way that is more interpretable and memorable to readers than rows of data tables.
You should keep careful track of subjects you decide to exclude from the analysis and why. Editors, reviewers, and readers will want to know this information. The best way to keep track is to construct a flow diagram from the original denominator to the final sample.
Don’t confuse statistical significance with clinical importance in large datasets. Due to large sample sizes, associations may be statistically significant but not clinically meaningful. Be mindful of what is meaningful from a clinical or policy perspective. One concern that frequently arises at this stage in large database research is the acceptability of “exploratory” analyses, or the practice of examining associations between multiple factors of interest. On the one hand, exploratory analyses risk finding a significant association by chance alone from testing multiple associations (a false-positive result). On the other hand, the critical issue is not a statistical one, but rather whether the issue is important. 10 Exploratory analyses are acceptable if done in a thoughtful way that serves an a priori hypothesis, but not if merely data dredging looking for associations.
We recommend consulting with a statistician when using data from a complex survey design (see Table 2 ) or developing a conceptually advanced study design, for example, using longitudinal data, multilevel modeling with clustered data, or surivival analysis. The value of input (even if informal) from a statistician or other advisor with substantial methodological expertise cannot be overstated.
CONCLUSIONS
Case conclusion.
Two years after he began the project the fellow completes the analysis and publishes the paper in a peer-reviewed journal. 11
A 2-year timeline from inception to publication is typical for large database research. Academic potential is commonly assessed by the ability to see a study through to publication in a peer-reviewed journal. This timeline allows a fellow who began a secondary analysis at the start of a 2-year training program to search for a job with an article under review or in press.
In conclusion, secondary dataset research has tremendous advantages, including the ability to assess outcomes that would be difficult or impossible to study using primary data collection, such as those involving exceptionally long follow-up times or rare outcomes. For junior investigators, the potential for a shorter time to publication may help secure a job or career development funding. Some of the time “saved” by not collecting data yourself, however, needs to be “spent” becoming familiar with the dataset in intimate detail. Ultimately, the same factors that apply to successful primary data analysis apply to secondary data analysis, including the development of a clear research question, study sample, appropriate measures, and a thoughtful analytic approach.
Contributors
The authors would like to thank Sei Lee, MD, Mark Freidberg, MD, MPP, and J. Michael McWilliams, MD, PhD, for their input on portions of this manuscript.
Grant Support
Dr. Smith is supported by a Research Supplement to Promote Diversity in Health Related Research from the National Institute on Aging (R01AG028481), the National Center for Research Resources UCSF-CTSI (UL1 RR024131), and the National Palliative Care Research Center. Dr. Steinman is supported by the National Institute on Aging and the American Federation for Aging Research (K23 AG030999). An unrestricted grant from the Society of General Internal Medicine (SGIM) supported development of the SGIM Research Dataset Compendium.
Prior Presentations
An earlier version of this work was presented as a workshop at the Annual Meeting of the Society of General Internal Medicine in Minneapolis, MN, April 2010.
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Conflict of Interest
None disclosed.
Have a language expert improve your writing
Run a free plagiarism check in 10 minutes, generate accurate citations for free.
- Knowledge Base
Methodology
- What Is a Research Design | Types, Guide & Examples
What Is a Research Design | Types, Guide & Examples
Published on June 7, 2021 by Shona McCombes . Revised on September 5, 2024 by Pritha Bhandari.
A research design is a strategy for answering your research question using empirical data. Creating a research design means making decisions about:
- Your overall research objectives and approach
- Whether you’ll rely on primary research or secondary research
- Your sampling methods or criteria for selecting subjects
- Your data collection methods
- The procedures you’ll follow to collect data
- Your data analysis methods
A well-planned research design helps ensure that your methods match your research objectives and that you use the right kind of analysis for your data.
You might have to write up a research design as a standalone assignment, or it might be part of a larger research proposal or other project. In either case, you should carefully consider which methods are most appropriate and feasible for answering your question.
Table of contents
Step 1: consider your aims and approach, step 2: choose a type of research design, step 3: identify your population and sampling method, step 4: choose your data collection methods, step 5: plan your data collection procedures, step 6: decide on your data analysis strategies, other interesting articles, frequently asked questions about research design.
- Introduction
Before you can start designing your research, you should already have a clear idea of the research question you want to investigate.
There are many different ways you could go about answering this question. Your research design choices should be driven by your aims and priorities—start by thinking carefully about what you want to achieve.
The first choice you need to make is whether you’ll take a qualitative or quantitative approach.
Qualitative approach | Quantitative approach |
---|---|
and describe frequencies, averages, and correlations about relationships between variables |
Qualitative research designs tend to be more flexible and inductive , allowing you to adjust your approach based on what you find throughout the research process.
Quantitative research designs tend to be more fixed and deductive , with variables and hypotheses clearly defined in advance of data collection.
It’s also possible to use a mixed-methods design that integrates aspects of both approaches. By combining qualitative and quantitative insights, you can gain a more complete picture of the problem you’re studying and strengthen the credibility of your conclusions.
Practical and ethical considerations when designing research
As well as scientific considerations, you need to think practically when designing your research. If your research involves people or animals, you also need to consider research ethics .
- How much time do you have to collect data and write up the research?
- Will you be able to gain access to the data you need (e.g., by travelling to a specific location or contacting specific people)?
- Do you have the necessary research skills (e.g., statistical analysis or interview techniques)?
- Will you need ethical approval ?
At each stage of the research design process, make sure that your choices are practically feasible.
Prevent plagiarism. Run a free check.
Within both qualitative and quantitative approaches, there are several types of research design to choose from. Each type provides a framework for the overall shape of your research.
Types of quantitative research designs
Quantitative designs can be split into four main types.
- Experimental and quasi-experimental designs allow you to test cause-and-effect relationships
- Descriptive and correlational designs allow you to measure variables and describe relationships between them.
Type of design | Purpose and characteristics |
---|---|
Experimental | relationships effect on a |
Quasi-experimental | ) |
Correlational | |
Descriptive |
With descriptive and correlational designs, you can get a clear picture of characteristics, trends and relationships as they exist in the real world. However, you can’t draw conclusions about cause and effect (because correlation doesn’t imply causation ).
Experiments are the strongest way to test cause-and-effect relationships without the risk of other variables influencing the results. However, their controlled conditions may not always reflect how things work in the real world. They’re often also more difficult and expensive to implement.
Types of qualitative research designs
Qualitative designs are less strictly defined. This approach is about gaining a rich, detailed understanding of a specific context or phenomenon, and you can often be more creative and flexible in designing your research.
The table below shows some common types of qualitative design. They often have similar approaches in terms of data collection, but focus on different aspects when analyzing the data.
Type of design | Purpose and characteristics |
---|---|
Grounded theory | |
Phenomenology |
Your research design should clearly define who or what your research will focus on, and how you’ll go about choosing your participants or subjects.
In research, a population is the entire group that you want to draw conclusions about, while a sample is the smaller group of individuals you’ll actually collect data from.
Defining the population
A population can be made up of anything you want to study—plants, animals, organizations, texts, countries, etc. In the social sciences, it most often refers to a group of people.
For example, will you focus on people from a specific demographic, region or background? Are you interested in people with a certain job or medical condition, or users of a particular product?
The more precisely you define your population, the easier it will be to gather a representative sample.
- Sampling methods
Even with a narrowly defined population, it’s rarely possible to collect data from every individual. Instead, you’ll collect data from a sample.
To select a sample, there are two main approaches: probability sampling and non-probability sampling . The sampling method you use affects how confidently you can generalize your results to the population as a whole.
Probability sampling | Non-probability sampling |
---|---|
Probability sampling is the most statistically valid option, but it’s often difficult to achieve unless you’re dealing with a very small and accessible population.
For practical reasons, many studies use non-probability sampling, but it’s important to be aware of the limitations and carefully consider potential biases. You should always make an effort to gather a sample that’s as representative as possible of the population.
Case selection in qualitative research
In some types of qualitative designs, sampling may not be relevant.
For example, in an ethnography or a case study , your aim is to deeply understand a specific context, not to generalize to a population. Instead of sampling, you may simply aim to collect as much data as possible about the context you are studying.
In these types of design, you still have to carefully consider your choice of case or community. You should have a clear rationale for why this particular case is suitable for answering your research question .
For example, you might choose a case study that reveals an unusual or neglected aspect of your research problem, or you might choose several very similar or very different cases in order to compare them.
Data collection methods are ways of directly measuring variables and gathering information. They allow you to gain first-hand knowledge and original insights into your research problem.
You can choose just one data collection method, or use several methods in the same study.
Survey methods
Surveys allow you to collect data about opinions, behaviors, experiences, and characteristics by asking people directly. There are two main survey methods to choose from: questionnaires and interviews .
Questionnaires | Interviews |
---|---|
) |
Observation methods
Observational studies allow you to collect data unobtrusively, observing characteristics, behaviors or social interactions without relying on self-reporting.
Observations may be conducted in real time, taking notes as you observe, or you might make audiovisual recordings for later analysis. They can be qualitative or quantitative.
Quantitative observation | |
---|---|
Other methods of data collection
There are many other ways you might collect data depending on your field and topic.
Field | Examples of data collection methods |
---|---|
Media & communication | Collecting a sample of texts (e.g., speeches, articles, or social media posts) for data on cultural norms and narratives |
Psychology | Using technologies like neuroimaging, eye-tracking, or computer-based tasks to collect data on things like attention, emotional response, or reaction time |
Education | Using tests or assignments to collect data on knowledge and skills |
Physical sciences | Using scientific instruments to collect data on things like weight, blood pressure, or chemical composition |
If you’re not sure which methods will work best for your research design, try reading some papers in your field to see what kinds of data collection methods they used.
Secondary data
If you don’t have the time or resources to collect data from the population you’re interested in, you can also choose to use secondary data that other researchers already collected—for example, datasets from government surveys or previous studies on your topic.
With this raw data, you can do your own analysis to answer new research questions that weren’t addressed by the original study.
Using secondary data can expand the scope of your research, as you may be able to access much larger and more varied samples than you could collect yourself.
However, it also means you don’t have any control over which variables to measure or how to measure them, so the conclusions you can draw may be limited.
As well as deciding on your methods, you need to plan exactly how you’ll use these methods to collect data that’s consistent, accurate, and unbiased.
Planning systematic procedures is especially important in quantitative research, where you need to precisely define your variables and ensure your measurements are high in reliability and validity.
Operationalization
Some variables, like height or age, are easily measured. But often you’ll be dealing with more abstract concepts, like satisfaction, anxiety, or competence. Operationalization means turning these fuzzy ideas into measurable indicators.
If you’re using observations , which events or actions will you count?
If you’re using surveys , which questions will you ask and what range of responses will be offered?
You may also choose to use or adapt existing materials designed to measure the concept you’re interested in—for example, questionnaires or inventories whose reliability and validity has already been established.
Reliability and validity
Reliability means your results can be consistently reproduced, while validity means that you’re actually measuring the concept you’re interested in.
Reliability | Validity |
---|---|
) ) |
For valid and reliable results, your measurement materials should be thoroughly researched and carefully designed. Plan your procedures to make sure you carry out the same steps in the same way for each participant.
If you’re developing a new questionnaire or other instrument to measure a specific concept, running a pilot study allows you to check its validity and reliability in advance.
Sampling procedures
As well as choosing an appropriate sampling method , you need a concrete plan for how you’ll actually contact and recruit your selected sample.
That means making decisions about things like:
- How many participants do you need for an adequate sample size?
- What inclusion and exclusion criteria will you use to identify eligible participants?
- How will you contact your sample—by mail, online, by phone, or in person?
If you’re using a probability sampling method , it’s important that everyone who is randomly selected actually participates in the study. How will you ensure a high response rate?
If you’re using a non-probability method , how will you avoid research bias and ensure a representative sample?
Data management
It’s also important to create a data management plan for organizing and storing your data.
Will you need to transcribe interviews or perform data entry for observations? You should anonymize and safeguard any sensitive data, and make sure it’s backed up regularly.
Keeping your data well-organized will save time when it comes to analyzing it. It can also help other researchers validate and add to your findings (high replicability ).
On its own, raw data can’t answer your research question. The last step of designing your research is planning how you’ll analyze the data.
Quantitative data analysis
In quantitative research, you’ll most likely use some form of statistical analysis . With statistics, you can summarize your sample data, make estimates, and test hypotheses.
Using descriptive statistics , you can summarize your sample data in terms of:
- The distribution of the data (e.g., the frequency of each score on a test)
- The central tendency of the data (e.g., the mean to describe the average score)
- The variability of the data (e.g., the standard deviation to describe how spread out the scores are)
The specific calculations you can do depend on the level of measurement of your variables.
Using inferential statistics , you can:
- Make estimates about the population based on your sample data.
- Test hypotheses about a relationship between variables.
Regression and correlation tests look for associations between two or more variables, while comparison tests (such as t tests and ANOVAs ) look for differences in the outcomes of different groups.
Your choice of statistical test depends on various aspects of your research design, including the types of variables you’re dealing with and the distribution of your data.
Qualitative data analysis
In qualitative research, your data will usually be very dense with information and ideas. Instead of summing it up in numbers, you’ll need to comb through the data in detail, interpret its meanings, identify patterns, and extract the parts that are most relevant to your research question.
Two of the most common approaches to doing this are thematic analysis and discourse analysis .
Approach | Characteristics |
---|---|
Thematic analysis | |
Discourse analysis |
There are many other ways of analyzing qualitative data depending on the aims of your research. To get a sense of potential approaches, try reading some qualitative research papers in your field.
If you want to know more about the research process , methodology , research bias , or statistics , make sure to check out some of our other articles with explanations and examples.
- Simple random sampling
- Stratified sampling
- Cluster sampling
- Likert scales
- Reproducibility
Statistics
- Null hypothesis
- Statistical power
- Probability distribution
- Effect size
- Poisson distribution
Research bias
- Optimism bias
- Cognitive bias
- Implicit bias
- Hawthorne effect
- Anchoring bias
- Explicit bias
A research design is a strategy for answering your research question . It defines your overall approach and determines how you will collect and analyze data.
A well-planned research design helps ensure that your methods match your research aims, that you collect high-quality data, and that you use the right kind of analysis to answer your questions, utilizing credible sources . This allows you to draw valid , trustworthy conclusions.
Quantitative research designs can be divided into two main categories:
- Correlational and descriptive designs are used to investigate characteristics, averages, trends, and associations between variables.
- Experimental and quasi-experimental designs are used to test causal relationships .
Qualitative research designs tend to be more flexible. Common types of qualitative design include case study , ethnography , and grounded theory designs.
The priorities of a research design can vary depending on the field, but you usually have to specify:
- Your research questions and/or hypotheses
- Your overall approach (e.g., qualitative or quantitative )
- The type of design you’re using (e.g., a survey , experiment , or case study )
- Your data collection methods (e.g., questionnaires , observations)
- Your data collection procedures (e.g., operationalization , timing and data management)
- Your data analysis methods (e.g., statistical tests or thematic analysis )
A sample is a subset of individuals from a larger population . Sampling means selecting the group that you will actually collect data from in your research. For example, if you are researching the opinions of students in your university, you could survey a sample of 100 students.
In statistics, sampling allows you to test a hypothesis about the characteristics of a population.
Operationalization means turning abstract conceptual ideas into measurable observations.
For example, the concept of social anxiety isn’t directly observable, but it can be operationally defined in terms of self-rating scores, behavioral avoidance of crowded places, or physical anxiety symptoms in social situations.
Before collecting data , it’s important to consider how you will operationalize the variables that you want to measure.
A research project is an academic, scientific, or professional undertaking to answer a research question . Research projects can take many forms, such as qualitative or quantitative , descriptive , longitudinal , experimental , or correlational . What kind of research approach you choose will depend on your topic.
Cite this Scribbr article
If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.
McCombes, S. (2024, September 05). What Is a Research Design | Types, Guide & Examples. Scribbr. Retrieved September 10, 2024, from https://www.scribbr.com/methodology/research-design/
Is this article helpful?
Shona McCombes
Other students also liked, guide to experimental design | overview, steps, & examples, how to write a research proposal | examples & templates, ethical considerations in research | types & examples, get unlimited documents corrected.
✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts
- Technical Support
- Technical Papers
- Knowledge Base
- Question Library
Call our friendly, no-pressure support team.
Secondary Research: Definition, Methods, Sources, Examples, and More
Table of Contents
What is Secondary Research? Secondary Research Meaning
Secondary research involves the analysis and synthesis of existing data and information that has been previously collected and published by others. This method contrasts with primary research , which entails the direct collection of original data from sources like surveys, interviews, and ethnographic studies.
The essence of secondary research lies in its efficiency and accessibility. Researchers who leverage secondary sources, including books, scholarly articles, government reports, and market analyses, gather valuable insights without the need for time-consuming and costly data collection efforts. This approach is particularly vital in marketing research, where understanding broad market trends and consumer behaviors is essential, yet often constrained by budgets and timelines. Secondary research serves as a fundamental step in the research process, providing a solid foundation upon which additional, targeted research can be built.
Secondary research enables researchers to quickly grasp the landscape of existing knowledge, identify gaps in the literature, and refine their research questions or business strategies accordingly. In marketing research, for instance, secondary research aids in understanding competitive landscapes, identifying market trends, and benchmarking against industry standards, thereby guiding strategic decision-making.
Get Started with Market Research Today!
Ready for your next market research study? Get access to our free survey research tool. In just a few minutes, you can create powerful surveys with our easy-to-use interface.
Start Market Research for Free or Request a Product Tour
When to Use Secondary Research
Choosing between secondary and primary research methods depends significantly on the objectives of your study or project. Secondary research is particularly beneficial in the initial stages of research planning and strategy, offering a broad understanding of the topic at hand and helping to pinpoint areas that may require more in-depth investigation through primary methods.
In academic contexts, secondary research is often used to build a theoretical foundation for a study, allowing researchers to position their work within the existing body of knowledge. Professionally, it serves as a cost-effective way to inform business strategies, market analyses, and policy development, providing insights into industry trends, consumer behaviors, and competitive landscapes.
Combining secondary research with primary research methods enhances the comprehensiveness and validity of research findings. For example, secondary research might reveal general trends in consumer behavior, while subsequent primary research could delve into specific consumer motivations and preferences, offering a more nuanced understanding of the market.
Key considerations for integrating secondary research into your research planning and strategy include:
- Research Objectives : Clearly defining what you aim to discover or decide based on your research.
- Availability of Data : Assessing the extent and relevance of existing data related to your research question.
- Budget and Time Constraints : Considering the resources available for conducting research, including time, money, and personnel.
- Research Scope : Determining the breadth and depth of the information needed to meet your research objectives.
Secondary research is a powerful tool when used strategically, providing a cost-effective, efficient way to gather insights and inform decision-making processes across academic and professional contexts.
How to Conduct Secondary Research
Conducting secondary research is a systematic process that involves several key steps to ensure the relevance, accuracy, and utility of the information gathered. Here's a step-by-step guide to effective secondary research:
- Identifying Research Objectives, Topics, and Questions : Begin with a clear understanding of what you aim to achieve with your research. This includes defining your research objectives, topics, and specific questions you seek to answer. This clarity guides the entire research process, ensuring that you remain focused on relevant information.
- Finding Relevant Data Sources : Search for secondary data sources that are likely to contain the information you need. This involves exploring a variety of sources such as academic journals, industry reports, government databases, and news archives. Prioritize sources known for their credibility and authority in the subject matter.
- Collecting and Verifying Existing Data : Once you've identified potential sources, collect the data that pertains to your research questions. Pay close attention to the publication date, authorship, and the methodology used in collecting the original data to ensure its relevance and reliability.
- Data Compilation and Analysis : Compile the collected data in a structured format that allows for analysis. Employ analytical methods suited to your research objectives, such as trend analysis, comparative analysis, or thematic analysis, to draw insights from the data.
The success of secondary research hinges on the critical evaluation of sources for their credibility, relevance, and timeliness. It's essential to approach this process with a discerning eye, acknowledging the limitations of secondary data and the potential need for further investigation through primary research.
Types of Secondary Research Methods with Examples
Secondary research methods offer a range of approaches for leveraging existing data, each providing value in extracting insights relevant to various business and academic needs. Understanding the unique advantages of each method can guide researchers in choosing the most appropriate approach for their specific objectives.
Literature Reviews
Literature reviews synthesize existing research and publications to identify trends, gaps, and consensus within a field of study. This method provides a comprehensive overview of what is already known about a topic, saving time and resources by building on existing knowledge rather than starting from scratch.
Real-World Example : A marketing firm conducting a literature review on consumer behavior in the digital age might uncover a trend towards increased mobile shopping. This insight leads to a strategic recommendation for a retail client to prioritize mobile app development and optimize their online store for mobile users, directly impacting the client's digital marketing strategy.
Data Mining
Data mining involves analyzing large sets of data to discover patterns, correlations, or trends that are not immediately apparent. This method can uncover hidden insights from the data that businesses can use to inform decision-making, such as identifying new market opportunities or optimizing operational efficiencies.
Real-World Example : Through data mining of customer purchase histories and online behavior data, a retail company identifies a previously unnoticed correlation between the purchase of certain products and the time of year. Utilizing this insight, the company adjusts its inventory levels and marketing campaigns seasonally, significantly boosting sales and customer satisfaction.
Meta-Analysis
Meta-analysis aggregates and systematically analyzes results from multiple studies to draw general conclusions about a research question. This method provides a high level of evidence by combining findings, offering a powerful tool for making informed decisions based on a broader range of data than any single study could provide.
Real-World Example : A pharmaceutical company uses meta-analysis to combine findings from various clinical trials of a new drug. The meta-analysis reveals a statistically significant benefit of the drug that was not conclusive in individual studies. This insight supports the company's application for regulatory approval and guides the development of marketing strategies targeting specific patient demographics.
Data Analysis
Secondary data analysis applies statistical techniques to analyze existing datasets, offering a cost-effective way to gain insights without the need for new data collection. This method can identify trends, patterns, and relationships that inform strategic planning and decision-making.
Real-World Example : An investment firm analyzes historical economic data and stock market trends using secondary data analysis. They identify a recurring pattern preceding market downturns. By applying this insight to their investment strategy, the firm successfully mitigates risk and enhances portfolio performance for their clients.
Content Analysis
Content analysis systematically examines the content of communication mediums to understand messages, themes, or biases . This qualitative method can reveal insights into public opinion, media representation, and communication strategies, offering valuable information for marketing, public relations, and media strategies.
Real-World Example : A technology company employs content analysis to review online customer reviews and social media mentions of its products. The analysis uncovers a common concern among customers about the usability of a product feature. Responding to this insight, the company revises its product design and launches a targeted communication campaign to address the concerns, improving customer satisfaction and brand perception.
Historical Research
Historical research examines past records and documents to understand historical contexts and trends, offering insights that can inform future predictions, strategy development, and understanding of long-term changes. This method is particularly valuable for understanding the evolution of markets, industries, or consumer behaviors over time.
Real-World Example : A consultancy specializing in sustainable business practices conducts historical research into the adoption of green technologies in the automotive industry. The research identifies key drivers and barriers to adoption over the decades. Leveraging these insights, the consultancy advises new green tech startups on strategies to overcome market resistance and capitalize on drivers of adoption, significantly impacting their market entry strategy.
Each of these secondary research methods provides distinct advantages and can yield valuable insights for businesses and researchers. By carefully selecting and applying the most suitable method(s), organizations can enhance their understanding of complex issues, inform strategic decisions, and achieve competitive advantage.
Free Survey Maker Tool
Get access to our free and intuitive survey maker. In just a few minutes, you can create powerful surveys with its easy-to-use interface.
Try our Free Survey Maker or Request a Product Tour
Examples of Secondary Sources in Research
Secondary sources are crucial for researchers across disciplines, offering a wealth of information that can provide insights, support hypotheses, and inform strategies. Understanding the unique value of different types of secondary sources can help researchers effectively harness this wealth of information. Below, we explore various secondary sources, highlighting their unique contributions and providing real-world examples of how they can yield valuable business insights.
Books provide comprehensive coverage of a topic, offering depth and context that shorter pieces might miss. They are particularly useful for gaining a thorough understanding of a subject's historical background and theoretical framework.
Example : A corporation exploring the feasibility of entering a new international market utilizes books on the country's cultural and economic history. This deep dive helps the company understand market nuances, leading to a tailored market entry strategy that aligns with local consumer preferences and cultural norms.
Scholarly Journals
Scholarly journals offer peer-reviewed, cutting-edge research findings, making them invaluable for staying abreast of the latest developments in a field. They provide detailed methodologies, rigorous data analysis, and discussions of findings in a specific area of study.
Example : An investment firm relies on scholarly articles to understand recent advancements in financial technology. Discovering research on blockchain's impact on transaction security and efficiency, the firm decides to invest in fintech startups specializing in blockchain technology, positioning itself ahead in the market.
Government Reports
Government reports deliver authoritative data on a wide range of topics, including economic indicators, demographic trends, and regulatory guidelines. Their reliability and the breadth of topics covered make them an essential resource for informed decision-making.
Example : A healthcare provider examines government health reports to identify trends in public health issues. Spotting an increase in lifestyle-related diseases, the provider expands its wellness programs, directly addressing the growing demand for preventive care services.
Market Research Reports
Market research reports provide insights into industry trends, consumer behavior, and competitive landscapes. These reports are invaluable for making informed business decisions, from product development to marketing strategies.
Example : A consumer goods company reviews market research reports to analyze trends in eco-friendly packaging. Learning about the positive consumer response to sustainable packaging, the company redesigns its packaging to be more environmentally friendly, resulting in increased brand loyalty and market share.
White Papers
White papers offer in-depth analysis or arguments on specific issues, often highlighting solutions or innovations. They are a key resource for understanding complex problems, technological advancements, and industry best practices.
Example : A technology firm exploring the implementation of AI in customer service operations consults white papers on AI applications. Insights from these papers guide the development of an AI-powered customer service chatbot, enhancing efficiency and customer satisfaction.
Private Company Data
Data from private companies, such as annual reports or case studies, provides insight into business strategies, performance metrics, and operational challenges. This information can be instrumental in benchmarking and strategic planning.
Example : By analyzing competitor annual reports, a retail chain identifies a gap in the market for affordable luxury products. This insight leads to the launch of a new product line that successfully captures this underserved segment, boosting the company's revenue and market positioning.
Advantages and Disadvantages of Secondary Research
Secondary research offers a foundation upon which organizations can build their knowledge base, informing everything from strategic planning to day-to-day decision-making. However, like any method, it comes with its own set of advantages and disadvantages. Understanding these can help researchers and businesses make the most of secondary research while being mindful of its limitations.
Advantages of Secondary Research
- Cost-Effectiveness : Secondary research is often less expensive than primary research, as it involves the analysis of existing data, eliminating the need for costly data collection processes like surveys or experiments.
- Time Efficiency : Accessing and analyzing existing data is generally faster than conducting primary research, allowing organizations to make timely decisions based on available information.
- Broad Scope of Data : Secondary research provides access to a wide range of data across different geographies and time periods, enabling comprehensive market analyses and trend identification.
- Basis for Primary Research : It can serve as a preliminary step to identify gaps in existing research, helping to pinpoint areas where primary research is needed.
Disadvantages of Secondary Research
- Relevance and Specificity : Existing data may not perfectly align with the current research objectives, leading to potential mismatches in relevance and specificity.
- Data Quality and Accuracy : The quality and accuracy of secondary data can vary, depending on the source. Researchers must critically assess the credibility of their sources to ensure the reliability of their findings.
- Timeliness : Data may be outdated, especially in fast-moving sectors where recent information is crucial for making informed decisions.
- Limited Control Over Data : Researchers have no control over how data was collected and processed, which may affect its suitability for their specific research needs.
Secondary research, when approached with an understanding of its strengths and weaknesses, has the potential be a powerful tool. By effectively navigating its advantages and limitations, businesses can lay a solid foundation for informed decision-making and strategic planning.
Primary vs. Secondary Research: A Comparative Analysis
When undertaking a research project, understanding the distinction between primary and secondary research is pivotal. Both forms of research serve their own purposes and can complement each other in providing a comprehensive overview of a given topic.
What is Primary Research?
Primary research involves the collection of original data directly from sources. This method is firsthand and is specific to the researcher's questions or hypotheses.
The main advantage of primary research is its specificity and relevancy to the particular issue or question at hand. It offers up-to-date and highly relevant data that is directly applicable to the research objectives.
Example : A company planning to launch a new beverage product conducts focus groups and survey research to understand consumer preferences. Through this process, they gather firsthand insights on flavors, packaging, and pricing preferences specific to their target market.
What is Secondary Research?
Secondary research involves the analysis of existing information compiled and collected by others. It includes studies, reports, and data from government agencies, trade associations, and other organizations.
Secondary research provides a broad understanding of the topic at hand, offering insights that can help frame primary research. It is cost-effective and time-saving, as it leverages already available data.
Example : The same company explores industry reports, academic research, and market analyses to understand broader market trends, competitor strategies, and consumer behavior within the beverage industry.
Comparative Analysis
|
|
|
Data Type | Original, firsthand data | Pre-existing, compiled data |
Collection Method | Surveys, interviews, observations | Analysis of existing sources |
Cost and Time | Higher cost, more time-consuming | Lower cost, less time-consuming |
Specificity | High specificity to research question | General overview of the topic |
Application | In-depth analysis of specific issues | Preliminary understanding, context setting |
Synergistic Use in Research
The most effective research strategies often involve a blend of both primary and secondary research. Secondary research can serve as a foundation, helping to inform the development of primary research by identifying gaps in existing knowledge and refining research questions.
Understanding the distinct roles and benefits of primary and secondary research is crucial for any successful research project. By effectively leveraging both types of research, researchers can gain a deeper, more nuanced understanding of their subject matter, leading to more informed decisions and strategies. Remember, the choice between primary and secondary research should be guided by your research objectives, resources, and the specificity of information required.
Sawtooth Software
3210 N Canyon Rd Ste 202
Provo UT 84604-6508
United States of America
Support: [email protected]
Consulting: [email protected]
Sales: [email protected]
Products & Services
Support & Resources
Have a language expert improve your writing
Run a free plagiarism check in 10 minutes, automatically generate references for free.
- Knowledge Base
- Methodology
Research Design | Step-by-Step Guide with Examples
Published on 5 May 2022 by Shona McCombes . Revised on 20 March 2023.
A research design is a strategy for answering your research question using empirical data. Creating a research design means making decisions about:
- Your overall aims and approach
- The type of research design you’ll use
- Your sampling methods or criteria for selecting subjects
- Your data collection methods
- The procedures you’ll follow to collect data
- Your data analysis methods
A well-planned research design helps ensure that your methods match your research aims and that you use the right kind of analysis for your data.
Table of contents
Step 1: consider your aims and approach, step 2: choose a type of research design, step 3: identify your population and sampling method, step 4: choose your data collection methods, step 5: plan your data collection procedures, step 6: decide on your data analysis strategies, frequently asked questions.
- Introduction
Before you can start designing your research, you should already have a clear idea of the research question you want to investigate.
There are many different ways you could go about answering this question. Your research design choices should be driven by your aims and priorities – start by thinking carefully about what you want to achieve.
The first choice you need to make is whether you’ll take a qualitative or quantitative approach.
Qualitative approach | Quantitative approach |
---|---|
Qualitative research designs tend to be more flexible and inductive , allowing you to adjust your approach based on what you find throughout the research process.
Quantitative research designs tend to be more fixed and deductive , with variables and hypotheses clearly defined in advance of data collection.
It’s also possible to use a mixed methods design that integrates aspects of both approaches. By combining qualitative and quantitative insights, you can gain a more complete picture of the problem you’re studying and strengthen the credibility of your conclusions.
Practical and ethical considerations when designing research
As well as scientific considerations, you need to think practically when designing your research. If your research involves people or animals, you also need to consider research ethics .
- How much time do you have to collect data and write up the research?
- Will you be able to gain access to the data you need (e.g., by travelling to a specific location or contacting specific people)?
- Do you have the necessary research skills (e.g., statistical analysis or interview techniques)?
- Will you need ethical approval ?
At each stage of the research design process, make sure that your choices are practically feasible.
Prevent plagiarism, run a free check.
Within both qualitative and quantitative approaches, there are several types of research design to choose from. Each type provides a framework for the overall shape of your research.
Types of quantitative research designs
Quantitative designs can be split into four main types. Experimental and quasi-experimental designs allow you to test cause-and-effect relationships, while descriptive and correlational designs allow you to measure variables and describe relationships between them.
Type of design | Purpose and characteristics |
---|---|
Experimental | |
Quasi-experimental | |
Correlational | |
Descriptive |
With descriptive and correlational designs, you can get a clear picture of characteristics, trends, and relationships as they exist in the real world. However, you can’t draw conclusions about cause and effect (because correlation doesn’t imply causation ).
Experiments are the strongest way to test cause-and-effect relationships without the risk of other variables influencing the results. However, their controlled conditions may not always reflect how things work in the real world. They’re often also more difficult and expensive to implement.
Types of qualitative research designs
Qualitative designs are less strictly defined. This approach is about gaining a rich, detailed understanding of a specific context or phenomenon, and you can often be more creative and flexible in designing your research.
The table below shows some common types of qualitative design. They often have similar approaches in terms of data collection, but focus on different aspects when analysing the data.
Type of design | Purpose and characteristics |
---|---|
Grounded theory | |
Phenomenology |
Your research design should clearly define who or what your research will focus on, and how you’ll go about choosing your participants or subjects.
In research, a population is the entire group that you want to draw conclusions about, while a sample is the smaller group of individuals you’ll actually collect data from.
Defining the population
A population can be made up of anything you want to study – plants, animals, organisations, texts, countries, etc. In the social sciences, it most often refers to a group of people.
For example, will you focus on people from a specific demographic, region, or background? Are you interested in people with a certain job or medical condition, or users of a particular product?
The more precisely you define your population, the easier it will be to gather a representative sample.
Sampling methods
Even with a narrowly defined population, it’s rarely possible to collect data from every individual. Instead, you’ll collect data from a sample.
To select a sample, there are two main approaches: probability sampling and non-probability sampling . The sampling method you use affects how confidently you can generalise your results to the population as a whole.
Probability sampling | Non-probability sampling |
---|---|
Probability sampling is the most statistically valid option, but it’s often difficult to achieve unless you’re dealing with a very small and accessible population.
For practical reasons, many studies use non-probability sampling, but it’s important to be aware of the limitations and carefully consider potential biases. You should always make an effort to gather a sample that’s as representative as possible of the population.
Case selection in qualitative research
In some types of qualitative designs, sampling may not be relevant.
For example, in an ethnography or a case study, your aim is to deeply understand a specific context, not to generalise to a population. Instead of sampling, you may simply aim to collect as much data as possible about the context you are studying.
In these types of design, you still have to carefully consider your choice of case or community. You should have a clear rationale for why this particular case is suitable for answering your research question.
For example, you might choose a case study that reveals an unusual or neglected aspect of your research problem, or you might choose several very similar or very different cases in order to compare them.
Data collection methods are ways of directly measuring variables and gathering information. They allow you to gain first-hand knowledge and original insights into your research problem.
You can choose just one data collection method, or use several methods in the same study.
Survey methods
Surveys allow you to collect data about opinions, behaviours, experiences, and characteristics by asking people directly. There are two main survey methods to choose from: questionnaires and interviews.
Questionnaires | Interviews |
---|---|
Observation methods
Observations allow you to collect data unobtrusively, observing characteristics, behaviours, or social interactions without relying on self-reporting.
Observations may be conducted in real time, taking notes as you observe, or you might make audiovisual recordings for later analysis. They can be qualitative or quantitative.
Quantitative observation | |
---|---|
Other methods of data collection
There are many other ways you might collect data depending on your field and topic.
Field | Examples of data collection methods |
---|---|
Media & communication | Collecting a sample of texts (e.g., speeches, articles, or social media posts) for data on cultural norms and narratives |
Psychology | Using technologies like neuroimaging, eye-tracking, or computer-based tasks to collect data on things like attention, emotional response, or reaction time |
Education | Using tests or assignments to collect data on knowledge and skills |
Physical sciences | Using scientific instruments to collect data on things like weight, blood pressure, or chemical composition |
If you’re not sure which methods will work best for your research design, try reading some papers in your field to see what data collection methods they used.
Secondary data
If you don’t have the time or resources to collect data from the population you’re interested in, you can also choose to use secondary data that other researchers already collected – for example, datasets from government surveys or previous studies on your topic.
With this raw data, you can do your own analysis to answer new research questions that weren’t addressed by the original study.
Using secondary data can expand the scope of your research, as you may be able to access much larger and more varied samples than you could collect yourself.
However, it also means you don’t have any control over which variables to measure or how to measure them, so the conclusions you can draw may be limited.
As well as deciding on your methods, you need to plan exactly how you’ll use these methods to collect data that’s consistent, accurate, and unbiased.
Planning systematic procedures is especially important in quantitative research, where you need to precisely define your variables and ensure your measurements are reliable and valid.
Operationalisation
Some variables, like height or age, are easily measured. But often you’ll be dealing with more abstract concepts, like satisfaction, anxiety, or competence. Operationalisation means turning these fuzzy ideas into measurable indicators.
If you’re using observations , which events or actions will you count?
If you’re using surveys , which questions will you ask and what range of responses will be offered?
You may also choose to use or adapt existing materials designed to measure the concept you’re interested in – for example, questionnaires or inventories whose reliability and validity has already been established.
Reliability and validity
Reliability means your results can be consistently reproduced , while validity means that you’re actually measuring the concept you’re interested in.
Reliability | Validity |
---|---|
For valid and reliable results, your measurement materials should be thoroughly researched and carefully designed. Plan your procedures to make sure you carry out the same steps in the same way for each participant.
If you’re developing a new questionnaire or other instrument to measure a specific concept, running a pilot study allows you to check its validity and reliability in advance.
Sampling procedures
As well as choosing an appropriate sampling method, you need a concrete plan for how you’ll actually contact and recruit your selected sample.
That means making decisions about things like:
- How many participants do you need for an adequate sample size?
- What inclusion and exclusion criteria will you use to identify eligible participants?
- How will you contact your sample – by mail, online, by phone, or in person?
If you’re using a probability sampling method, it’s important that everyone who is randomly selected actually participates in the study. How will you ensure a high response rate?
If you’re using a non-probability method, how will you avoid bias and ensure a representative sample?
Data management
It’s also important to create a data management plan for organising and storing your data.
Will you need to transcribe interviews or perform data entry for observations? You should anonymise and safeguard any sensitive data, and make sure it’s backed up regularly.
Keeping your data well organised will save time when it comes to analysing them. It can also help other researchers validate and add to your findings.
On their own, raw data can’t answer your research question. The last step of designing your research is planning how you’ll analyse the data.
Quantitative data analysis
In quantitative research, you’ll most likely use some form of statistical analysis . With statistics, you can summarise your sample data, make estimates, and test hypotheses.
Using descriptive statistics , you can summarise your sample data in terms of:
- The distribution of the data (e.g., the frequency of each score on a test)
- The central tendency of the data (e.g., the mean to describe the average score)
- The variability of the data (e.g., the standard deviation to describe how spread out the scores are)
The specific calculations you can do depend on the level of measurement of your variables.
Using inferential statistics , you can:
- Make estimates about the population based on your sample data.
- Test hypotheses about a relationship between variables.
Regression and correlation tests look for associations between two or more variables, while comparison tests (such as t tests and ANOVAs ) look for differences in the outcomes of different groups.
Your choice of statistical test depends on various aspects of your research design, including the types of variables you’re dealing with and the distribution of your data.
Qualitative data analysis
In qualitative research, your data will usually be very dense with information and ideas. Instead of summing it up in numbers, you’ll need to comb through the data in detail, interpret its meanings, identify patterns, and extract the parts that are most relevant to your research question.
Two of the most common approaches to doing this are thematic analysis and discourse analysis .
Approach | Characteristics |
---|---|
Thematic analysis | |
Discourse analysis |
There are many other ways of analysing qualitative data depending on the aims of your research. To get a sense of potential approaches, try reading some qualitative research papers in your field.
A sample is a subset of individuals from a larger population. Sampling means selecting the group that you will actually collect data from in your research.
For example, if you are researching the opinions of students in your university, you could survey a sample of 100 students.
Statistical sampling allows you to test a hypothesis about the characteristics of a population. There are various sampling methods you can use to ensure that your sample is representative of the population as a whole.
Operationalisation means turning abstract conceptual ideas into measurable observations.
For example, the concept of social anxiety isn’t directly observable, but it can be operationally defined in terms of self-rating scores, behavioural avoidance of crowded places, or physical anxiety symptoms in social situations.
Before collecting data , it’s important to consider how you will operationalise the variables that you want to measure.
The research methods you use depend on the type of data you need to answer your research question .
- If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts, and meanings, use qualitative methods .
- If you want to analyse a large amount of readily available data, use secondary data. If you want data specific to your purposes with control over how they are generated, collect primary data.
- If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.
Cite this Scribbr article
If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.
McCombes, S. (2023, March 20). Research Design | Step-by-Step Guide with Examples. Scribbr. Retrieved 9 September 2024, from https://www.scribbr.co.uk/research-methods/research-design/
Is this article helpful?
Shona McCombes
A Guide To Secondary Data Analysis
What is secondary data analysis? How do you carry it out? Find out in this post.
Historically, the only way data analysts could obtain data was to collect it themselves. This type of data is often referred to as primary data and is still a vital resource for data analysts.
However, technological advances over the last few decades mean that much past data is now readily available online for data analysts and researchers to access and utilize. This type of data—known as secondary data—is driving a revolution in data analytics and data science.
Primary and secondary data share many characteristics. However, there are some fundamental differences in how you prepare and analyze secondary data. This post explores the unique aspects of secondary data analysis. We’ll briefly review what secondary data is before outlining how to source, collect and validate them. We’ll cover:
- What is secondary data analysis?
- How to carry out secondary data analysis (5 steps)
- Summary and further reading
Ready for a crash course in secondary data analysis? Let’s go!
1. What is secondary data analysis?
Secondary data analysis uses data collected by somebody else. This contrasts with primary data analysis, which involves a researcher collecting predefined data to answer a specific question. Secondary data analysis has numerous benefits, not least that it is a time and cost-effective way of obtaining data without doing the research yourself.
It’s worth noting here that secondary data may be primary data for the original researcher. It only becomes secondary data when it’s repurposed for a new task. As a result, a dataset can simultaneously be a primary data source for one researcher and a secondary data source for another. So don’t panic if you get confused! We explain exactly what secondary data is in this guide .
In reality, the statistical techniques used to carry out secondary data analysis are no different from those used to analyze other kinds of data. The main differences lie in collection and preparation. Once the data have been reviewed and prepared, the analytics process continues more or less as it usually does. For a recap on what the data analysis process involves, read this post .
In the following sections, we’ll focus specifically on the preparation of secondary data for analysis. Where appropriate, we’ll refer to primary data analysis for comparison.
2. How to carry out secondary data analysis
Step 1: define a research topic.
The first step in any data analytics project is defining your goal. This is true regardless of the data you’re working with, or the type of analysis you want to carry out. In data analytics lingo, this typically involves defining:
- A statement of purpose
- Research design
Defining a statement of purpose and a research approach are both fundamental building blocks for any project. However, for secondary data analysis, the process of defining these differs slightly. Let’s find out how.
Step 2: Establish your statement of purpose
Before beginning any data analytics project, you should always have a clearly defined intent. This is called a ‘statement of purpose.’ A healthcare analyst’s statement of purpose, for example, might be: ‘Reduce admissions for mental health issues relating to Covid-19′. The more specific the statement of purpose, the easier it is to determine which data to collect, analyze, and draw insights from.
A statement of purpose is helpful for both primary and secondary data analysis. It’s especially relevant for secondary data analysis, though. This is because there are vast amounts of secondary data available. Having a clear direction will keep you focused on the task at hand, saving you from becoming overwhelmed. Being selective with your data sources is key.
Step 3: Design your research process
After defining your statement of purpose, the next step is to design the research process. For primary data, this involves determining the types of data you want to collect (e.g. quantitative, qualitative, or both ) and a methodology for gathering them.
For secondary data analysis, however, your research process will more likely be a step-by-step guide outlining the types of data you require and a list of potential sources for gathering them. It may also include (realistic) expectations of the output of the final analysis. This should be based on a preliminary review of the data sources and their quality.
Once you have both your statement of purpose and research design, you’re in a far better position to narrow down potential sources of secondary data. You can then start with the next step of the process: data collection.
Step 4: Locate and collect your secondary data
Collecting primary data involves devising and executing a complex strategy that can be very time-consuming to manage. The data you collect, though, will be highly relevant to your research problem.
Secondary data collection, meanwhile, avoids the complexity of defining a research methodology. However, it comes with additional challenges. One of these is identifying where to find the data. This is no small task because there are a great many repositories of secondary data available. Your job, then, is to narrow down potential sources. As already mentioned, it’s necessary to be selective, or else you risk becoming overloaded.
Some popular sources of secondary data include:
- Government statistics , e.g. demographic data, censuses, or surveys, collected by government agencies/departments (like the US Bureau of Labor Statistics).
- Technical reports summarizing completed or ongoing research from educational or public institutions (colleges or government).
- Scientific journals that outline research methodologies and data analysis by experts in fields like the sciences, medicine, etc.
- Literature reviews of research articles, books, and reports, for a given area of study (once again, carried out by experts in the field).
- Trade/industry publications , e.g. articles and data shared in trade publications, covering topics relating to specific industry sectors, such as tech or manufacturing.
- Online resources: Repositories, databases, and other reference libraries with public or paid access to secondary data sources.
Once you’ve identified appropriate sources, you can go about collecting the necessary data. This may involve contacting other researchers, paying a fee to an organization in exchange for a dataset, or simply downloading a dataset for free online .
Step 5: Evaluate your secondary data
Secondary data is usually well-structured, so you might assume that once you have your hands on a dataset, you’re ready to dive in with a detailed analysis. Unfortunately, that’s not the case!
First, you must carry out a careful review of the data. Why? To ensure that they’re appropriate for your needs. This involves two main tasks:
Evaluating the secondary dataset’s relevance
- Assessing its broader credibility
Both these tasks require critical thinking skills. However, they aren’t heavily technical. This means anybody can learn to carry them out.
Let’s now take a look at each in a bit more detail.
The main point of evaluating a secondary dataset is to see if it is suitable for your needs. This involves asking some probing questions about the data, including:
What was the data’s original purpose?
Understanding why the data were originally collected will tell you a lot about their suitability for your current project. For instance, was the project carried out by a government agency or a private company for marketing purposes? The answer may provide useful information about the population sample, the data demographics, and even the wording of specific survey questions. All this can help you determine if the data are right for you, or if they are biased in any way.
When and where were the data collected?
Over time, populations and demographics change. Identifying when the data were first collected can provide invaluable insights. For instance, a dataset that initially seems suited to your needs may be out of date.
On the flip side, you might want past data so you can draw a comparison with a present dataset. In this case, you’ll need to ensure the data were collected during the appropriate time frame. It’s worth mentioning that secondary data are the sole source of past data. You cannot collect historical data using primary data collection techniques.
Similarly, you should ask where the data were collected. Do they represent the geographical region you require? Does geography even have an impact on the problem you are trying to solve?
What data were collected and how?
A final report for past data analytics is great for summarizing key characteristics or findings. However, if you’re planning to use those data for a new project, you’ll need the original documentation. At the very least, this should include access to the raw data and an outline of the methodology used to gather them. This can be helpful for many reasons. For instance, you may find raw data that wasn’t relevant to the original analysis, but which might benefit your current task.
What questions were participants asked?
We’ve already touched on this, but the wording of survey questions—especially for qualitative datasets—is significant. Questions may deliberately be phrased to preclude certain answers. A question’s context may also impact the findings in a way that’s not immediately obvious. Understanding these issues will shape how you perceive the data.
What is the form/shape/structure of the data?
Finally, to practical issues. Is the structure of the data suitable for your needs? Is it compatible with other sources or with your preferred analytics approach? This is purely a structural issue. For instance, if a dataset of people’s ages is saved as numerical rather than continuous variables, this could potentially impact your analysis. In general, reviewing a dataset’s structure helps better understand how they are categorized, allowing you to account for any discrepancies. You may also need to tidy the data to ensure they are consistent with any other sources you’re using.
This is just a sample of the types of questions you need to consider when reviewing a secondary data source. The answers will have a clear impact on whether the dataset—no matter how well presented or structured it seems—is suitable for your needs.
Assessing secondary data’s credibility
After identifying a potentially suitable dataset, you must double-check the credibility of the data. Namely, are the data accurate and unbiased? To figure this out, here are some key questions you might want to include:
What are the credentials of those who carried out the original research?
Do you have access to the details of the original researchers? What are their credentials? Where did they study? Are they an expert in the field or a newcomer? Data collection by an undergraduate student, for example, may not be as rigorous as that of a seasoned professor.
And did the original researcher work for a reputable organization? What other affiliations do they have? For instance, if a researcher who works for a tobacco company gathers data on the effects of vaping, this represents an obvious conflict of interest! Questions like this help determine how thorough or qualified the researchers are and if they have any potential biases.
Do you have access to the full methodology?
Does the dataset include a clear methodology, explaining in detail how the data were collected? This should be more than a simple overview; it must be a clear breakdown of the process, including justifications for the approach taken. This allows you to determine if the methodology was sound. If you find flaws (or no methodology at all) it throws the quality of the data into question.
How consistent are the data with other sources?
Do the secondary data match with any similar findings? If not, that doesn’t necessarily mean the data are wrong, but it does warrant closer inspection. Perhaps the collection methodology differed between sources, or maybe the data were analyzed using different statistical techniques. Or perhaps unaccounted-for outliers are skewing the analysis. Identifying all these potential problems is essential. A flawed or biased dataset can still be useful but only if you know where its shortcomings lie.
Have the data been published in any credible research journals?
Finally, have the data been used in well-known studies or published in any journals? If so, how reputable are the journals? In general, you can judge a dataset’s quality based on where it has been published. If in doubt, check out the publication in question on the Directory of Open Access Journals . The directory has a rigorous vetting process, only permitting journals of the highest quality. Meanwhile, if you found the data via a blurry image on social media without cited sources, then you can justifiably question its quality!
Again, these are just a few of the questions you might ask when determining the quality of a secondary dataset. Consider them as scaffolding for cultivating a critical thinking mindset; a necessary trait for any data analyst!
Presuming your secondary data holds up to scrutiny, you should be ready to carry out your detailed statistical analysis. As we explained at the beginning of this post, the analytical techniques used for secondary data analysis are no different than those for any other kind of data. Rather than go into detail here, check out the different types of data analysis in this post.
3. Secondary data analysis: Key takeaways
In this post, we’ve looked at the nuances of secondary data analysis, including how to source, collect and review secondary data. As discussed, much of the process is the same as it is for primary data analysis. The main difference lies in how secondary data are prepared.
Carrying out a meaningful secondary data analysis involves spending time and effort exploring, collecting, and reviewing the original data. This will help you determine whether the data are suitable for your needs and if they are of good quality.
Why not get to know more about what data analytics involves with this free, five-day introductory data analytics short course ? And, for more data insights, check out these posts:
- Discrete vs continuous data variables: What’s the difference?
- What are the four levels of measurement? Nominal, ordinal, interval, and ratio data explained
- What are the best tools for data mining?
- What is Secondary Research? + [Methods & Examples]
In some situations, the researcher may not be directly involved in the data gathering process and instead, would rely on already existing data in order to arrive at research outcomes. This approach to systematic investigation is known as secondary research.
There are many reasons a researcher may want to make use of already existing data instead of collecting data samples, first-hand. In this article, we will share some of these reasons with you and show you how to conduct secondary research with Formplus.
What is Secondary Research?
Secondary research is a common approach to a systematic investigation in which the researcher depends solely on existing data in the course of the research process. This research design involves organizing, collating and analyzing these data samples for valid research conclusions.
Secondary research is also known as desk research since it involves synthesizing existing data that can be sourced from the internet, peer-reviewed journals , textbooks, government archives, and libraries. What the secondary researcher does is to study already established patterns in previous researches and apply this information to the specific research context.
Interestingly, secondary research often relies on data provided by primary research and this is why some researches combine both methods of investigation. In this sense, the researcher begins by evaluating and identifying gaps in existing knowledge before adopting primary research to gather new information that will serve his or her research.
What are Secondary Research Methods?
As already highlighted, secondary research involves data assimilation from different sources, that is, using available research materials instead of creating a new pool of data using primary research methods. Common secondary research methods include data collection through the internet, libraries, archives, schools and organizational reports.
- Online Data
Online data is data that is gathered via the internet. In recent times, this method has become popular because the internet provides a large pool of both free and paid research resources that can be easily accessed with the click of a button.
While this method simplifies the data gathering process , the researcher must take care to depend solely on authentic sites when collecting information. In some way, the internet is a virtual aggregation for all other sources of secondary research data.
- Data from Government and Non-government Archives
You can also gather useful research materials from government and non-government archives and these archives usually contain verifiable information that provides useful insights on varying research contexts. In many cases, you would need to pay a sum to gain access to these data.
The challenge, however, is that such data is not always readily available due to a number of factors. For instance, some of these materials are described as classified information as such, it would be difficult for researchers to have access to them.
- Data from Libraries
Research materials can also be accessed through public and private libraries. Think of a library as an information storehouse that contains an aggregation of important information that can serve as valid data in different research contexts.
Typically, researchers donate several copies of dissertations to public and private libraries; especially in cases of academic research. Also, business directories, newsletters, annual reports and other similar documents that can serve as research data, are gathered and stored in libraries, in both soft and hard copies.
- Data from Institutions of Learning
Educational facilities like schools, faculties, and colleges are also a great source of secondary data; especially in academic research. This is because a lot of research is carried out in educational institutions more than in other sectors.
It is relatively easier to obtain research data from educational institutions because these institutions are committed to solving problems and expanding the body of knowledge. You can easily request research materials from educational facilities for the purpose of a literature review.
Secondary research methods can also be categorized into qualitative and quantitative data collection methods . Quantitative data gathering methods include online questionnaires and surveys, reports about trends plus statistics about different areas of a business or industry.
Qualitative research methods include relying on previous interviews and data gathered through focus groups which helps an organization to understand the needs of its customers and plan to fulfill these needs. It also helps businesses to measure the level of employee satisfaction with organizational policies.
When Do We Conduct Secondary Research?
Typically, secondary research is the first step in any systematic investigation. This is because it helps the researcher to understand what research efforts have been made so far and to utilize this knowledge in mapping out a novel direction for his or her investigation.
For instance, you may want to carry out research into the nature of a respiratory condition with the aim of developing a vaccine. The best place to start is to gather existing research material about the condition which would help to point your research in the right direction.
When sifting through these pieces of information, you would gain insights into methods and findings from previous researches which would help you define your own research process. Secondary research also helps you to identify knowledge gaps that can serve as the name of your own research.
Questions to ask before conducting Secondary Research
Since secondary research relies on already existing data, the researcher must take extra care to ensure that he or she utilizes authentic data samples for the research. Falsified data can have a negative impact on the research outcomes; hence, it is important to always carry out resource evaluation by asking a number of questions as highlighted below:
- What is the purpose of the research? Again, it is important for every researcher to clearly define the purpose of the research before proceeding with it. Usually, the research purpose determines the approach that would be adopted.
- What is my research methodology? After identifying the purpose of the research, the next thing to do is outline the research methodology. This is the point where the researcher chooses to gather data using secondary research methods.
- What are my expected research outcomes?
- Who collected the data to be analyzed? Before going on to use secondary data for your research, it is necessary to ascertain the authenticity of the information. This usually affects the data reliability and determines if the researcher can trust the materials. For instance, data gathered from personal blogs and websites may not be as credible as information obtained from an organization’s website.
- When was the data collected? Data recency is another factor that must be considered since the recency of data can affect research outcomes. For instance, if you are carrying out research into the number of women who smoke in London, it would not be appropriate for you to make use of information that was gathered 5 years ago unless you plan to do some sort of data comparison.
- Is the data consistent with other data available from other sources? Always compare and contrast your data with other available research materials as this would help you to identify inconsistencies if any.
- What type of data was collected? Take care to determine if the secondary data aligns with your research goals and objectives.
- How was the data collected?
Advantages of Secondary Research
- Easily Accessible With secondary research, data can easily be accessed in no time; especially with the use of the internet. Apart from the internet, there are different data sources available in secondary research like public libraries and archives which are relatively easy to access too.
- Secondary research is cost-effective and it is not time-consuming. The researcher can cut down on costs because he or she is not directly involved in the data collection process which is also time-consuming.
- Secondary research helps researchers to identify knowledge gaps which can serve as the basis of further systematic investigation.
- It is useful for mapping out the scope of research thereby setting the stage for field investigations. When carrying out secondary research, the researchers may find that the exact information they were looking for is already available, thus eliminating the need and expense incurred in carrying out primary research in these areas.
Disadvantages of Secondary Research
- Questionable Data: With secondary research, it is hard to determine the authenticity of the data because the researcher is not directly involved in the research process. Invalid data can affect research outcomes negatively hence, it is important for the researcher to take extra care by evaluating the data before making use of it.
- Generalization: Secondary data is unspecific in nature and may not directly cater to the needs of the researcher. There may not be correlations between the existing data and the research process.
- Common Data: Research materials in secondary research are not exclusive to an individual or group. This means that everyone has access to the data and there is little or no “information advantage” gained by those who obtain the research.
- It has the risk of outdated research materials. Outdated information may offer little value especially for organizations competing in fast-changing markets.
How to Conduct Online Surveys with Formplus
Follow these 5 steps to create and administer online surveys for secondary research:
- Sign into Formplus
In the Formplus builder, you can easily create an online survey for secondary research by dragging and dropping preferred fields into your form. To access the Formplus builder, you will need to create an account on Formplus.
Once you do this, sign in to your account and click on “Create Form ” to begin.
- Edit Form Title
Click on the field provided to input your form title, for example, “Secondary Research Survey”.
- Click on the edit button to edit the form.
- Add Fields: Drag and drop preferred form fields into your form in the Formplus builder inputs column. There are several field input options for questionnaires in the Formplus builder.
- Edit fields
- Click on “Save”
- Preview form.
- Customize your Form
With the form customization options in the form builder, you can easily change the outlook of your form and make it more unique and personalized. Formplus allows you to change your form theme, add background images and even change the font according to your needs.
- Multiple Sharing Options
Formplus offers multiple form sharing options which enables you to easily share your questionnaire with respondents. You can use the direct social media sharing buttons to share your form link to your organization’s social media pages.
You can send out your survey form as email invitations to your research subjects too. If you wish, you can share your form’s QR code or embed it on your organization’s website for easy access.
Why Use Formplus as a Secondary Research Tool?
- Simple Form Builder Solution
The Formplus form builder is easy to use and does not require you to have any knowledge in computer programming, unlike other form builders. For instance, you can easily add form fields to your form by dragging and dropping them from the inputs section in the builder.
In the form builder, you can also modify your fields to be hidden or read-only and you can create smart forms with save and resume options, form lookup, and conditional logic. Formplus also allows you to customize your form by adding preferred background images and your organization’s logo.
- Over 25 Form Fields
With over 25 versatile form fields available in the form builder, you can easily collect data the way you like. You can receive payments directly in your form by adding payment fields and you can also add file upload fields to allow you receive files in your form too.
- Offline Form feature
With Formplus, you can collect data from respondents even without internet connectivity . Formplus automatically detects when there is no or poor internet access and allows forms to be filled out and submitted in offline mode.
Offline form responses are automatically synced with the servers when the internet connection is restored. This feature is extremely useful for field research that may involve sourcing for data in remote and rural areas plus it allows you to scale up on your audience reach.
- Team and Collaboration
You can add important collaborators and team members to your shared account so that you all can work on forms and responses together. With the multiple users options, you can assign different roles to team members and you can also grant and limit access to forms and folders.
This feature works with an audit trail that enables you to track changes and suggestions made to your form as the administrator of the shared account. You can set up permissions to limit access to the account while organizing and monitoring your form(s) effectively.
- Embeddable Form
Formplus allows you to easily add your form with respondents with the click of a button. For instance, you can directly embed your form in your organization’s web pages by adding Its unique shortcode to your site’s HTML.
You can also share your form to your social media pages using the social media direct sharing buttons available in the form builder. You can choose to embed the form as an iframe or web pop-up that is easy to fill.
With Formplus, you can share your form with numerous form respondents in no time. You can invite respondents to fill out your form via email invitation which allows you to also track responses and prevent multiple submissions in your form.
In addition, you can also share your form link as a QR code so that respondents only need to scan the code to access your form. Our forms have a unique QR code that you can add to your website or print in banners, business cards and the like.
While secondary research can be cost-effective and time-efficient, it requires the researcher to take extra care in ensuring that the data is authentic and valid. As highlighted earlier, data in secondary research can be sourced through the internet, archives, and libraries, amongst other methods.
Secondary research is usually the starting point of systematic investigation because it provides the researcher with a background of existing research efforts while identifying knowledge gaps to be filled. This type of research is typically used in science and education.
It is, however, important to note that secondary research relies on the outcomes of collective primary research data in carrying out its systematic investigation. Hence, the success of your research will depend, to a greater extent, on the quality of data provided by primary research in relation to the research context.
Connect to Formplus, Get Started Now - It's Free!
- primary secondary research differences
- primary secondary research method
- secondary data collection
- secondary research examples
- busayo.longe
You may also like:
Primary vs Secondary Research Methods: 15 Key Differences
Difference between primary and secondary research in definition, examples, data analysis, types, collection methods, advantages etc.
Recall Bias: Definition, Types, Examples & Mitigation
This article will discuss the impact of recall bias in studies and the best ways to avoid them during research.
What is Pure or Basic Research? + [Examples & Method]
Simple guide on pure or basic research, its methods, characteristics, advantages, and examples in science, medicine, education and psychology
Exploratory Research: What are its Method & Examples?
Overview on exploratory research, examples and methodology. Shows guides on how to conduct exploratory research with online surveys
Formplus - For Seamless Data Collection
Collect data the right way with a versatile data collection tool. try formplus and transform your work productivity today..
- Login to Survey Tool Review Center
Secondary Research Advantages, Limitations, and Sources
Summary: secondary research should be a prerequisite to the collection of primary data, but it rarely provides all the answers you need. a thorough evaluation of the secondary data is needed to assess its relevance and accuracy..
5 minutes to read. By author Michaela Mora on January 25, 2022 Topics: Relevant Methods & Tips , Business Strategy , Market Research
Secondary research is based on data already collected for purposes other than the specific problem you have. Secondary research is usually part of exploratory market research designs.
The connection between the specific purpose that originates the research is what differentiates secondary research from primary research. Primary research is designed to address specific problems. However, analysis of available secondary data should be a prerequisite to the collection of primary data.
Advantages of Secondary Research
Secondary data can be faster and cheaper to obtain, depending on the sources you use.
Secondary research can help to:
- Answer certain research questions and test some hypotheses.
- Formulate an appropriate research design (e.g., identify key variables).
- Interpret data from primary research as it can provide some insights into general trends in an industry or product category.
- Understand the competitive landscape.
Limitations of Secondary Research
The usefulness of secondary research tends to be limited often for two main reasons:
Lack of relevance
Secondary research rarely provides all the answers you need. The objectives and methodology used to collect the secondary data may not be appropriate for the problem at hand.
Given that it was designed to find answers to a different problem than yours, you will likely find gaps in answers to your problem. Furthermore, the data collection methods used may not provide the data type needed to support the business decisions you have to make (e.g., qualitative research methods are not appropriate for go/no-go decisions).
Lack of Accuracy
Secondary data may be incomplete and lack accuracy depending on;
- The research design (exploratory, descriptive, causal, primary vs. repackaged secondary data, the analytical plan, etc.)
- Sampling design and sources (target audiences, recruitment methods)
- Data collection method (qualitative and quantitative techniques)
- Analysis point of view (focus and omissions)
- Reporting stages (preliminary, final, peer-reviewed)
- Rate of change in the studied topic (slowly vs. rapidly evolving phenomenon, e.g., adoption of specific technologies).
- Lack of agreement between data sources.
Criteria for Evaluating Secondary Research Data
Before taking the information at face value, you should conduct a thorough evaluation of the secondary data you find using the following criteria:
- Purpose : Understanding why the data was collected and what questions it was trying to answer will tell us how relevant and useful it is since it may or may not be appropriate for your objectives.
- Methodology used to collect the data : Important to understand sources of bias.
- Accuracy of data: Sources of errors may include research design, sampling, data collection, analysis, and reporting.
- When the data was collected : Secondary data may not be current or updated frequently enough for the purpose that you need.
- Content of the data : Understanding the key variables, units of measurement, categories used and analyzed relationships may reveal how useful and relevant it is for your purposes.
- Source reputation : In the era of purposeful misinformation on the Internet, it is important to check the expertise, credibility, reputation, and trustworthiness of the data source.
Secondary Research Data Sources
Compared to primary research, the collection of secondary data can be faster and cheaper to obtain, depending on the sources you use.
Secondary data can come from internal or external sources.
Internal sources of secondary data include ready-to-use data or data that requires further processing available in internal management support systems your company may be using (e.g., invoices, sales transactions, Google Analytics for your website, etc.).
Prior primary qualitative and quantitative research conducted by the company are also common sources of secondary data. They often generate more questions and help formulate new primary research needed.
However, if there are no internal data collection systems yet or prior research, you probably won’t have much usable secondary data at your disposal.
External sources of secondary data include:
- Published materials
- External databases
- Syndicated services.
Published Materials
Published materials can be classified as:
- General business sources: Guides, directories, indexes, and statistical data.
- Government sources: Census data and other government publications.
External Databases
In many industries across a variety of topics, there are private and public databases that can bed accessed online or by downloading data for free, a fixed fee, or a subscription.
These databases can include bibliographic, numeric, full-text, directory, and special-purpose databases. Some public institutions make data collected through various methods, including surveys, available for others to analyze.
Syndicated Services
These services are offered by companies that collect and sell pools of data that have a commercial value and meet shared needs by a number of clients, even if the data is not collected for specific purposes those clients may have.
Syndicated services can be classified based on specific units of measurements (e.g., consumers, households, organizations, etc.).
The data collection methods for these data may include:
- Surveys (Psychographic and Lifestyle, advertising evaluations, general topics)
- Household panels (Purchase and media use)
- Electronic scanner services (volume tracking data, scanner panels, scanner panels with Cable TV)
- Audits (retailers, wholesalers)
- Direct inquiries to institutions
- Clipping services tracking PR for institutions
- Corporate reports
You can spend hours doing research on Google in search of external sources, but this is likely to yield limited insights. Books, articles journals, reports, blogs posts, and videos you may find online are usually analyses and summaries of data from a particular perspective. They may be useful and give you an indication of the type of data used, but they are not the actual data. Whenever possible, you should look at the actual raw data used to draw your own conclusion on its value for your research objectives. You should check professionally gathered secondary research.
Here are some external secondary data sources often used in market research that you may find useful as starting points in your research. Some are free, while others require payment.
- Pew Research Center : Reports about the issues, attitudes, and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis, and other empirical social science research.
- Data.Census.gov : Data dissemination platform to access demographic and economic data from the U.S. Census Bureau.
- Data.gov : The US. government’s open data source with almost 200,00 datasets ranges in topics from health, agriculture, climate, ecosystems, public safety, finance, energy, manufacturing, education, and business.
- Google Scholar : A web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines.
- Google Public Data Explorer : Makes large, public-interest datasets easy to explore, visualize and communicate.
- Google News Archive : Allows users to search historical newspapers and retrieve scanned images of their pages.
- Mckinsey & Company : Articles based on analyses of various industries.
- Statista : Business data platform with data across 170+ industries and 150+ countries.
- Claritas : Syndicated reports on various market segments.
- Mintel : Consumer reports combining exclusive consumer research with other market data and expert analysis.
- MarketResearch.com : Data aggregator with over 350 publishers covering every sector of the economy as well as emerging industries.
- Packaged Facts : Reports based on market research on consumer goods and services industries.
- Dun & Bradstreet : Company directory with business information.
Related Articles
- What Is Market Research?
- Step by Step Guide to the Market Research Process
- How to Leverage UX and Market Research To Understand Your Customers
- Why Your Business Needs Discovery Research
- Your Market Research Plan to Succeed As a Startup
- Top Reason Why Businesses Fail & What To Do About It
- What To Value In A Market Research Vendor
- Don’t Let The Budget Dictate Your Market Research Approach
- How To Use Research To Find High-Order Brand Benefits
- How To Prioritize What To Research
- Don’t Just Trust Your Gut — Do Research
- Understanding the Pros and Cons of Mixed-Mode Research
Subscribe to our newsletter to get notified about future articles
Subscribe and don’t miss anything!
Recent Articles
- How AI Can Further Remove Researchers in Search of Productivity and Lower Costs
- Re: Design/Growth Podcast – Researching User Experiences for Business Growth
- Why You Need Positioning Concept Testing in New Product Development
- Why Conjoint Analysis Is Best for Price Research
- The Rise of UX
- Making the Case Against the Van Westendorp Price Sensitivity Meter
- How to Future-Proof Experience Management and Your Business
- When Using Focus Groups Makes Sense
- How to Make Segmentation Research Actionable
- How To Integrate Market Research and UX Research for Desired Business Outcomes
Popular Articles
- Which Rating Scales Should I Use?
- What To Consider in Survey Design
- 6 Decisions To Make When Designing Product Concept Tests
- Write Winning Product Concepts To Get Accurate Results In Concept Tests
- How to Use Qualitative and Quantitative Research in Product Development
- The Opportunity of UX Research Webinar
- Myths & Misunderstandings About UX – MR Realities Podcast
- 12 Research Techniques to Solve Choice Overload
- Concept Testing for UX Researchers
- UX Research Geeks Podcast – Using Market Research for Better Context in UX
- A Researcher’s Path – Data Stories Leaders At Work Podcast
- How To Improve Racial and Gender Inclusion in Survey Design
- Privacy Overview
- Strictly Necessary Cookies
This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.
An official website of the United States government
Here's how you know
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
Understanding the value of secondary research data June 28, 2023
“Reduce, reuse, recycle” isn’t just a good motto for preserving the environment, it’s also a smart scientific principle, thanks to the value of secondary research.
Secondary research uses existing data or specimens initially collected for purposes other than the planned (or primary) research. For example, the same specimens originally collected for a clinical trial could also be used in secondary genomic research. Secondary research maximizes the usefulness of data and unique specimens while minimizing risk to study volunteers since no new procedures are needed.
Through previous blogs, NIA provided updates and tips on the NIH Data Management and Sharing (DMS) Policy . That same policy also emphasizes the importance of sharing data gleaned from secondary research. It requires investigators, including those conducting secondary research, to describe the type of scientific data they plan to generate, and encourages good data sharing practices when performing secondary research. NIA is actively supporting secondary research through our recent Notice of Special Interest on the topic .
Advantages and challenges
Secondary research has several benefits:
- Enables use of large-scale data sets or large samples of human or model organism specimens
- Can be less expensive and time-consuming than primary data collection
- May be simpler (and expedited) if an Institutional Review Board waives the need for informed consent for a secondary research project
Potential downsides to consider might include:
- Original data may not be a perfect fit for your current research question or study design
- Details on previous data collection procedures may be scarce
- Data may potentially lack depth
- Often requires special techniques for statistical data analysis
Know the rules of the road
As you consider secondary research, be sure to get familiar with related regulations and rules. There may be requirements to access and use secondary data or specimens as stipulated by NIH-supported scientific data repositories or other sources of information. Generally, data repositories with controlled access , such as the NIA Genetics of Alzheimer’s Disease Data Storage Site ( NIAGADS ) or the Database of Genotypes and Phenotypes , require investigators to sign a Data Use Certification Agreement (PDF, 775K) to ensure protection of sensitive data.
Additional potential requirements can include:
- IRB approval to meet human subject protections (per regulation 45 CFR 46 )
- NIH Institutional Certification (for large-scale genomic data generation)
- Data Distribution Agreement (for NIAGADS) (PDF, 673K)
- Attestation of Alzheimer’s Disease Genomics Sharing Plan (for Alzheimer’s and related dementias genomic research)
- Cloud Use Statement and Cloud Server Provider Information (as applicable)
- Possible participant consent
Reach out with questions!
With these guidelines in mind, secondary research can be quite valuable to your studies. If you have questions, please refer to the FAQs About Secondary Research or leave a comment below. For specific questions related to preparing a DMS plan for the generation of secondary data for your research, contact your NIA Program Officer .
Add new comment
A red asterisk ( * ) indicates a required field.
- Allowed HTML tags: <p> <br>
- No HTML tags allowed.
- Lines and paragraphs break automatically.
- Web page addresses and email addresses turn into links automatically.
nia.nih.gov
An official website of the National Institutes of Health
Pros and Cons of Secondary Data Analysis
A Review of the Advantages and Disadvantages in Social Science Research
- Research, Samples, and Statistics
- Key Concepts
- Major Sociologists
- News & Issues
- Recommended Reading
- Archaeology
Secondary data analysis is the analysis of data that was collected by someone else. Below, we’ll review the definition of secondary data, how it can be used by researchers, and the pros and cons of this type of research.
Key Takeaways: Secondary Data Analysis
- Primary data refers to data that researchers have collected themselves, while secondary data refers to data that was collected by someone else.
- Secondary data is available from a variety of sources, such as governments and research institutions.
- While using secondary data can be more economical, existing data sets may not answer all of a researcher’s questions.
Comparison of Primary and Secondary Data
In social science research, the terms primary data and secondary data are common parlance. Primary data is collected by a researcher or team of researchers for the specific purpose or analysis under consideration. Here, a research team conceives of and develops a research project, decides on a sampling technique , collects data designed to address specific questions, and performs their own analyses of the data they collected. In this case, the people involved in the data analysis are familiar with the research design and data collection process.
Secondary data analysis , on the other hand, is the use of data that was collected by someone else for some other purpose . In this case, the researcher poses questions that are addressed through the analysis of a data set that they were not involved in collecting. The data was not collected to answer the researcher’s specific research questions and was instead collected for another purpose. This means that the same data set can actually be a primary data set to one researcher and a secondary data set to a different one.
Using Secondary Data
There are some important things that must be done before using secondary data in an analysis. Since the researcher did not collect the data, it's important for them to become familiar with the data set: how the data was collected, what the response categories are for each question, whether or not weights need to be applied during the analysis, whether or not clusters or stratification need to be accounted for, who the population of study was, and more.
A great deal of secondary data resources and data sets are available for sociological research , many of which are public and easily accessible. The United States Census , the General Social Survey , and the American Community Survey are some of the most commonly used secondary data sets available.
Advantages of Secondary Data Analysis
The biggest advantage of using secondary data is that it can be more economical. Someone else has already collected the data, so the researcher does not have to devote money, time, energy and resources to this phase of research. Sometimes the secondary data set must be purchased, but the cost is almost always lower than the expense of collecting a similar data set from scratch, which usually entails salaries, travel and transportation, office space, equipment, and other overhead costs. In addition, since the data is already collected and usually cleaned and stored in electronic format, the researcher can spend most of their time analyzing the data instead of getting the data ready for analysis.
A second major advantage of using secondary data is the breadth of data available. The federal government conducts numerous studies on a large, national scale that individual researchers would have a difficult time collecting. Many of these data sets are also longitudinal , meaning that the same data has been collected from the same population over several different time periods. This allows researchers to look at trends and changes of phenomena over time.
A third important advantage of using secondary data is that the data collection process often maintains a level of expertise and professionalism that may not be present with individual researchers or small research projects. For example, data collection for many federal data sets is often performed by staff members who specialize in certain tasks and have many years of experience in that particular area and with that particular survey. Many smaller research projects do not have that level of expertise, as a lot of data is collected by students working part-time.
Disadvantages of Secondary Data Analysis
A major disadvantage of using secondary data is that it may not answer the researcher’s specific research questions or contain specific information that the researcher would like to have. It also may not have been collected in the geographic region or during the years desired, or with the specific population that the researcher is interested in studying. For example, a researcher who is interested in studying adolescents may find that the secondary data set only includes young adults.
Additionally, since the researcher did not collect the data, they have no control over what is contained in the data set. Often times this can limit the analysis or alter the original questions the researcher sought to answer. For example, a researcher who is studying happiness and optimism might find that a secondary data set only includes one of these variables , but not both.
A related problem is that the variables may have been defined or categorized differently than the researcher would have chosen. For example, age may have been collected in categories rather than as a continuous variable, or race may be defined as “white” and “other” instead of containing categories for every major race.
Another significant disadvantage of using secondary data is that the researcher doesn't know exactly how the data collection process was done or how well it was carried out. The researcher is not usually privy to information about how seriously the data is affected by problems such as low response rate or respondent misunderstanding of specific survey questions. Sometimes this information is readily available, as is the case with many federal data sets. However, many other secondary data sets are not accompanied by this type of information and the analyst must learn to read between the lines in order to uncover any potential limitations of the data.
- The Study of Cultural Artifacts via Content Analysis
- Data Cleaning for Data Analysis in Sociology
- Principal Components and Factor Analysis
- The Differences Between Indexes and Scales
- A Review of Software Tools for Quantitative Data Analysis
- Cluster Analysis and How Its Used in Research
- Data Sources For Sociological Research
- What are Controlled Experiments?
- An Overview of Qualitative Research Methods
- How to Conduct a Sociology Research Interview
- Social Surveys: Questionnaires, Interviews, and Telephone Polls
- Definition and Overview of Grounded Theory
- Units of Analysis as Related to Sociology
- Constructing a Deductive Theory
- What Is Participant Observation Research?
- Immersion Definition: Cultural, Language, and Virtual
BUS612: Data-Driven Communications
Research Design
Sociologists have used qualitative research methods to conduct research and obtain data to explain, predict or control an aspect of social reality. These research methods are increasingly being used in the business world to examine and explain consumer behavior and other social interactions that may impact a business. Read this article to explore the goals, sources, and primary methods used in qualitative research.
Methods of Data Collection for Qualitative Inquiry
Secondary data, archival data and methods of textual analysis.
While sociologists often engage in original research studies, they also contribute knowledge to the discipline through secondary data . Within qualitative inquiry, secondary data is frequently in the form of archival materials which may be formally stored in an archive or exist in the files and records of public and private organizations as well as private persons. Secondary data do not result from firsthand research collected from primary sources, but are drawn from the already-completed work of other researchers, scholars and writers (e.g., the texts of historians, economists, anthropologists, sociologists, teachers, journalists), records produced as a result of the everyday activities within various organizational contexts (e.g., government offices, non-profit organizations, private businesses and corporations), and personal memoirs (e.g., correspondence, diaries, photographs). In the contemporary context of digital technologies and communications, the internet is a rich and expanding resource of various types of text (e.g., written and image based) for the purpose of sociological analysis (e.g., website pages, facebook, twitter, blogs, etc.) alongside more traditional sources such as periodicals, newspapers, or magazines from any period in history. Using available information not only saves time and money, but it can add depth to a study. Sociologists often interpret findings in a new way, a way that was not part of an author's original purpose or intention. To study how women were encouraged to act and behave in the 1960s, for example, a researcher might watch movies, televisions shows, and situation comedies from that period. Or to research changes in behaviour and attitudes due to the emergence of television in the late 1950s and early 1960s, a sociologist would rely on new interpretations of secondary data. Decades from now, researchers will most likely conduct similar studies on the advent of mobile phones, the internet, or Facebook. Within sociological inquiry the potential sources of secondary data are limited only by the imagination of individual researchers. A particularly rich and longitudinal source of data for the study of everyday social reality is provided in the Mass Observation Project located in the UK. One methodology that sociologists employ with secondary data is content analysis . Content analysis is a quantitative approach to textual research that selects an item of textual content (i.e., a variable) that can be reliably and consistently observed and coded, and surveys the prevalence of that item in a sample of textual output. For example, Gilens wanted to find out why survey research shows that the American public substantially exaggerates the percentage of African Americans among the poor. He examined whether media representations influence public perceptions and did a content analysis of photographs of poor people in American news magazines. He coded and then systematically recorded incidences of three variables: (1) race: white, black, indeterminate; (2) employed: working, not working; and (3) age. Gilens discovered that not only were African Americans markedly overrepresented in news magazine photographs of poverty, but that the photos also tended to under represent "sympathetic" subgroups of the poor-the elderly and working poor-while over representing less sympathetic groups-unemployed, working age adults. Gilens concluded that by providing a distorted representation of poverty, U.S. news magazines "reinforce negative stereotypes of blacks as mired in poverty and contribute to the belief that poverty is primarily a ‘black problem'". Textual analysis is a qualitative methodology use to examine the structure, style, content, purpose and symbolic meaning of various written, oral and visual texts. The roots of textual analysis extend into the humanities and draw on the theory and methodologies of hermeneutic interpretation and linguistics (the science of language). Within the broader domain of textual analysis, narrative analysis draws on the strategies and techniques of literary scholars to analyze the stories people create and use to express meaning and experience within the context of everyday lived reality. Discourse analysis , another form of textual analysis, finds its roots in linguistics and is an interpretive approach to texts that focuses on the contextual meanings and social uses of larger chunks of communication. In addition to there being multiple sources of secondary data for the purpose of sociological analysis, there are a variety of analytical tools and techniques that can be drawn on from the domains of science and the humanities to enhance our capacity as sociologists to understand meaning and motivation within the context of everyday social reality.
Linguistics and Discourse Analysis
Social scientists also learn by analyzing the research of a variety of agencies. Governmental departments, public interest research groups, and global organizations like Statistics Canada, the Canadian Centre for Policy Alternatives, or the World Health Organization publish studies with findings that are useful to sociologists. A public statistic that measures inequality of incomes might be useful for studying who benefited and who lost as a result of the 2008 recession; a demographic profile of different immigrant groups might be compared with data on unemployment to examine the reasons why immigration settlement programs are more effective for some communities than for others. One of the advantages of secondary data is that it is nonreactive (or unobtrusive) research, meaning that it does not include direct contact with subjects and will not alter or influence people's behaviours. Unlike studies requiring direct contact with people, using previously published data does not require entering a population and the investment and risks inherent in that research process. Using available data does have its challenges. Public records are not always easy to access. A researcher needs to do some legwork to track them down and gain access to records. In some cases there is no way to verify the accuracy of existing data. It is easy, for example, to count how many drunk drivers are pulled over by the police. But how many are not? While it's possible to discover the percentage of teenage students who drop out of high school, it might be more challenging to determine the number who return to school or get their high school diplomas later. Another problem arises when data are unavailable in the exact form needed or do not include the precise angle the researcher seeks. For example, the salaries paid to professors at universities are often published, but the separate figures do not necessarily reveal how long it took each professor to reach the salary range, what their educational backgrounds are, or how long they have been teaching. In his research, sociologist Richard Sennett uses secondary data to shed light on current trends. In The Craftsman (2008) , he studied the human desire to perform quality work, from carpentry to computer programming. He studied the line between craftsmanship and skilled manual labour. He also studied changes in attitudes toward craftsmanship that occurred not only during and after the Industrial Revolution, but also in ancient times. Obviously, he could not have firsthand knowledge of periods of ancient history, so he had to rely on secondary data for part of his study. When conducting secondary data or textual analysis, it is important to consider the date of publication of an existing source and to take into account attitudes and common cultural ideals that may have influenced the research. For example, Robert and Helen Lynd gathered research for their book Middletown: A Study in Modern American Culture in the 1920s. Attitudes and cultural norms were vastly different then than they are now. Beliefs about gender roles, race, education, and work have changed significantly since then. At the time, the study's purpose was to reveal the truth about small American communities. Today, it is an illustration of 1920s attitudes and values. An important principle for sociological researchers to be mindful of is to exercise caution when presuming to impose today's values and attitudes on the practices and circumstances of the past.
IMAGES
VIDEO
COMMENTS
What is Secondary Research? | Definition, Types, & ...
Types of secondary data are as follows: Published data: Published data refers to data that has been published in books, magazines, newspapers, and other print media. Examples include statistical reports, market research reports, and scholarly articles. Government data: Government data refers to data collected by government agencies and departments.
Secondary Research: Definition, Methods & Examples
Secondary Analysis Research - PMC
Secondary Data Analysis: Using existing data to answer ...
A real-world case description illustrates key steps: (1) define your research topic and question; (2) select a dataset; (3) get to know your dataset; and (4) structure your analysis and presentation of findings in a way that is clinically meaningful. Secondary dataset analysis is a well-established methodology.
What Is a Research Design | Types, Guide & ...
Secondary Research: Definition, Methods, Sources, ...
Research Design | Step-by-Step Guide with Examples - Scribbr
Secondary Data Analysis: Your Complete How-To Guide
Secondary Analysis of Quantitative Data
Secondary data analysis is a valuable research approach that can be used to advance knowledge across many disciplines through the use of quantitative, qualitative, or mixed methods data to answer new research questions (Polit & Beck, 2021).This research method dates to the 1960s and involves the utilization of existing or primary data, originally collected for a variety, diverse, or assorted ...
Secondary research significantly influences primary research by providing a foundation of existing knowledge, which helps in identifying gaps and refining research questions. It informs the design of primary research, suggesting appropriate methodologies and data collection techniques. Techniques for Integrating Secondary Data with Primary Data
What is Secondary Research? [Methods & Examples]
Secondary data analysis contributes to these objectives through the application of "creative analytical techniques to data that have been amassed by others" (Kiecolt & Nathan, 1985, p. 10). Primary researchers design new studies to answer research questions, whereas the secondary data analyst uses existing resources. There is a deliberate ...
Secondary Qualitative Research Methodology Using Online ...
Secondary Research Advantages, Limitations, and Sources
Secondary research has several benefits: Enables use of large-scale data sets or large samples of human or model organism specimens. Can be less expensive and time-consuming than primary data collection. May be simpler (and expedited) if an Institutional Review Board waives the need for informed consent for a secondary research project.
This critical interpretive synthesis examined research articles (n = 71) published between 2006 and 2016 that involved qualitative secondary data analysis and assessed the context, purpose, and methodologies that were reported.
Pros and Cons of Secondary Data Analysis
Secondary Data, Archival Data and Methods of Textual Analysis. While sociologists often engage in original research studies, they also contribute knowledge to the discipline through secondary data. Within qualitative inquiry, secondary data is frequently in the form of archival materials which may be formally stored in an archive or exist in ...
Secondary Data In Research Methodology (With Examples)
Research Involving the Secondary Use of Existing Data