• Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Guided Meditations
  • Verywell Mind Insights
  • 2024 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

How to Write a Great Hypothesis

Hypothesis Definition, Format, Examples, and Tips

Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

what is the 3 parts of hypothesis

Amy Morin, LCSW, is a psychotherapist and international bestselling author. Her books, including "13 Things Mentally Strong People Don't Do," have been translated into more than 40 languages. Her TEDx talk,  "The Secret of Becoming Mentally Strong," is one of the most viewed talks of all time.

what is the 3 parts of hypothesis

Verywell / Alex Dos Diaz

  • The Scientific Method

Hypothesis Format

Falsifiability of a hypothesis.

  • Operationalization

Hypothesis Types

Hypotheses examples.

  • Collecting Data

A hypothesis is a tentative statement about the relationship between two or more variables. It is a specific, testable prediction about what you expect to happen in a study. It is a preliminary answer to your question that helps guide the research process.

Consider a study designed to examine the relationship between sleep deprivation and test performance. The hypothesis might be: "This study is designed to assess the hypothesis that sleep-deprived people will perform worse on a test than individuals who are not sleep-deprived."

At a Glance

A hypothesis is crucial to scientific research because it offers a clear direction for what the researchers are looking to find. This allows them to design experiments to test their predictions and add to our scientific knowledge about the world. This article explores how a hypothesis is used in psychology research, how to write a good hypothesis, and the different types of hypotheses you might use.

The Hypothesis in the Scientific Method

In the scientific method , whether it involves research in psychology, biology, or some other area, a hypothesis represents what the researchers think will happen in an experiment. The scientific method involves the following steps:

  • Forming a question
  • Performing background research
  • Creating a hypothesis
  • Designing an experiment
  • Collecting data
  • Analyzing the results
  • Drawing conclusions
  • Communicating the results

The hypothesis is a prediction, but it involves more than a guess. Most of the time, the hypothesis begins with a question which is then explored through background research. At this point, researchers then begin to develop a testable hypothesis.

Unless you are creating an exploratory study, your hypothesis should always explain what you  expect  to happen.

In a study exploring the effects of a particular drug, the hypothesis might be that researchers expect the drug to have some type of effect on the symptoms of a specific illness. In psychology, the hypothesis might focus on how a certain aspect of the environment might influence a particular behavior.

Remember, a hypothesis does not have to be correct. While the hypothesis predicts what the researchers expect to see, the goal of the research is to determine whether this guess is right or wrong. When conducting an experiment, researchers might explore numerous factors to determine which ones might contribute to the ultimate outcome.

In many cases, researchers may find that the results of an experiment  do not  support the original hypothesis. When writing up these results, the researchers might suggest other options that should be explored in future studies.

In many cases, researchers might draw a hypothesis from a specific theory or build on previous research. For example, prior research has shown that stress can impact the immune system. So a researcher might hypothesize: "People with high-stress levels will be more likely to contract a common cold after being exposed to the virus than people who have low-stress levels."

In other instances, researchers might look at commonly held beliefs or folk wisdom. "Birds of a feather flock together" is one example of folk adage that a psychologist might try to investigate. The researcher might pose a specific hypothesis that "People tend to select romantic partners who are similar to them in interests and educational level."

Elements of a Good Hypothesis

So how do you write a good hypothesis? When trying to come up with a hypothesis for your research or experiments, ask yourself the following questions:

  • Is your hypothesis based on your research on a topic?
  • Can your hypothesis be tested?
  • Does your hypothesis include independent and dependent variables?

Before you come up with a specific hypothesis, spend some time doing background research. Once you have completed a literature review, start thinking about potential questions you still have. Pay attention to the discussion section in the  journal articles you read . Many authors will suggest questions that still need to be explored.

How to Formulate a Good Hypothesis

To form a hypothesis, you should take these steps:

  • Collect as many observations about a topic or problem as you can.
  • Evaluate these observations and look for possible causes of the problem.
  • Create a list of possible explanations that you might want to explore.
  • After you have developed some possible hypotheses, think of ways that you could confirm or disprove each hypothesis through experimentation. This is known as falsifiability.

In the scientific method ,  falsifiability is an important part of any valid hypothesis. In order to test a claim scientifically, it must be possible that the claim could be proven false.

Students sometimes confuse the idea of falsifiability with the idea that it means that something is false, which is not the case. What falsifiability means is that  if  something was false, then it is possible to demonstrate that it is false.

One of the hallmarks of pseudoscience is that it makes claims that cannot be refuted or proven false.

The Importance of Operational Definitions

A variable is a factor or element that can be changed and manipulated in ways that are observable and measurable. However, the researcher must also define how the variable will be manipulated and measured in the study.

Operational definitions are specific definitions for all relevant factors in a study. This process helps make vague or ambiguous concepts detailed and measurable.

For example, a researcher might operationally define the variable " test anxiety " as the results of a self-report measure of anxiety experienced during an exam. A "study habits" variable might be defined by the amount of studying that actually occurs as measured by time.

These precise descriptions are important because many things can be measured in various ways. Clearly defining these variables and how they are measured helps ensure that other researchers can replicate your results.

Replicability

One of the basic principles of any type of scientific research is that the results must be replicable.

Replication means repeating an experiment in the same way to produce the same results. By clearly detailing the specifics of how the variables were measured and manipulated, other researchers can better understand the results and repeat the study if needed.

Some variables are more difficult than others to define. For example, how would you operationally define a variable such as aggression ? For obvious ethical reasons, researchers cannot create a situation in which a person behaves aggressively toward others.

To measure this variable, the researcher must devise a measurement that assesses aggressive behavior without harming others. The researcher might utilize a simulated task to measure aggressiveness in this situation.

Hypothesis Checklist

  • Does your hypothesis focus on something that you can actually test?
  • Does your hypothesis include both an independent and dependent variable?
  • Can you manipulate the variables?
  • Can your hypothesis be tested without violating ethical standards?

The hypothesis you use will depend on what you are investigating and hoping to find. Some of the main types of hypotheses that you might use include:

  • Simple hypothesis : This type of hypothesis suggests there is a relationship between one independent variable and one dependent variable.
  • Complex hypothesis : This type suggests a relationship between three or more variables, such as two independent and dependent variables.
  • Null hypothesis : This hypothesis suggests no relationship exists between two or more variables.
  • Alternative hypothesis : This hypothesis states the opposite of the null hypothesis.
  • Statistical hypothesis : This hypothesis uses statistical analysis to evaluate a representative population sample and then generalizes the findings to the larger group.
  • Logical hypothesis : This hypothesis assumes a relationship between variables without collecting data or evidence.

A hypothesis often follows a basic format of "If {this happens} then {this will happen}." One way to structure your hypothesis is to describe what will happen to the  dependent variable  if you change the  independent variable .

The basic format might be: "If {these changes are made to a certain independent variable}, then we will observe {a change in a specific dependent variable}."

A few examples of simple hypotheses:

  • "Students who eat breakfast will perform better on a math exam than students who do not eat breakfast."
  • "Students who experience test anxiety before an English exam will get lower scores than students who do not experience test anxiety."​
  • "Motorists who talk on the phone while driving will be more likely to make errors on a driving course than those who do not talk on the phone."
  • "Children who receive a new reading intervention will have higher reading scores than students who do not receive the intervention."

Examples of a complex hypothesis include:

  • "People with high-sugar diets and sedentary activity levels are more likely to develop depression."
  • "Younger people who are regularly exposed to green, outdoor areas have better subjective well-being than older adults who have limited exposure to green spaces."

Examples of a null hypothesis include:

  • "There is no difference in anxiety levels between people who take St. John's wort supplements and those who do not."
  • "There is no difference in scores on a memory recall task between children and adults."
  • "There is no difference in aggression levels between children who play first-person shooter games and those who do not."

Examples of an alternative hypothesis:

  • "People who take St. John's wort supplements will have less anxiety than those who do not."
  • "Adults will perform better on a memory task than children."
  • "Children who play first-person shooter games will show higher levels of aggression than children who do not." 

Collecting Data on Your Hypothesis

Once a researcher has formed a testable hypothesis, the next step is to select a research design and start collecting data. The research method depends largely on exactly what they are studying. There are two basic types of research methods: descriptive research and experimental research.

Descriptive Research Methods

Descriptive research such as  case studies ,  naturalistic observations , and surveys are often used when  conducting an experiment is difficult or impossible. These methods are best used to describe different aspects of a behavior or psychological phenomenon.

Once a researcher has collected data using descriptive methods, a  correlational study  can examine how the variables are related. This research method might be used to investigate a hypothesis that is difficult to test experimentally.

Experimental Research Methods

Experimental methods  are used to demonstrate causal relationships between variables. In an experiment, the researcher systematically manipulates a variable of interest (known as the independent variable) and measures the effect on another variable (known as the dependent variable).

Unlike correlational studies, which can only be used to determine if there is a relationship between two variables, experimental methods can be used to determine the actual nature of the relationship—whether changes in one variable actually  cause  another to change.

The hypothesis is a critical part of any scientific exploration. It represents what researchers expect to find in a study or experiment. In situations where the hypothesis is unsupported by the research, the research still has value. Such research helps us better understand how different aspects of the natural world relate to one another. It also helps us develop new hypotheses that can then be tested in the future.

Thompson WH, Skau S. On the scope of scientific hypotheses .  R Soc Open Sci . 2023;10(8):230607. doi:10.1098/rsos.230607

Taran S, Adhikari NKJ, Fan E. Falsifiability in medicine: what clinicians can learn from Karl Popper [published correction appears in Intensive Care Med. 2021 Jun 17;:].  Intensive Care Med . 2021;47(9):1054-1056. doi:10.1007/s00134-021-06432-z

Eyler AA. Research Methods for Public Health . 1st ed. Springer Publishing Company; 2020. doi:10.1891/9780826182067.0004

Nosek BA, Errington TM. What is replication ?  PLoS Biol . 2020;18(3):e3000691. doi:10.1371/journal.pbio.3000691

Aggarwal R, Ranganathan P. Study designs: Part 2 - Descriptive studies .  Perspect Clin Res . 2019;10(1):34-36. doi:10.4103/picr.PICR_154_18

Nevid J. Psychology: Concepts and Applications. Wadworth, 2013.

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

  • Resources Home 🏠
  • Try SciSpace Copilot
  • Search research papers
  • Add Copilot Extension
  • Try AI Detector
  • Try Paraphraser
  • Try Citation Generator
  • April Papers
  • June Papers
  • July Papers

SciSpace Resources

The Craft of Writing a Strong Hypothesis

Deeptanshu D

Table of Contents

Writing a hypothesis is one of the essential elements of a scientific research paper. It needs to be to the point, clearly communicating what your research is trying to accomplish. A blurry, drawn-out, or complexly-structured hypothesis can confuse your readers. Or worse, the editor and peer reviewers.

A captivating hypothesis is not too intricate. This blog will take you through the process so that, by the end of it, you have a better idea of how to convey your research paper's intent in just one sentence.

What is a Hypothesis?

The first step in your scientific endeavor, a hypothesis, is a strong, concise statement that forms the basis of your research. It is not the same as a thesis statement , which is a brief summary of your research paper .

The sole purpose of a hypothesis is to predict your paper's findings, data, and conclusion. It comes from a place of curiosity and intuition . When you write a hypothesis, you're essentially making an educated guess based on scientific prejudices and evidence, which is further proven or disproven through the scientific method.

The reason for undertaking research is to observe a specific phenomenon. A hypothesis, therefore, lays out what the said phenomenon is. And it does so through two variables, an independent and dependent variable.

The independent variable is the cause behind the observation, while the dependent variable is the effect of the cause. A good example of this is “mixing red and blue forms purple.” In this hypothesis, mixing red and blue is the independent variable as you're combining the two colors at your own will. The formation of purple is the dependent variable as, in this case, it is conditional to the independent variable.

Different Types of Hypotheses‌

Types-of-hypotheses

Types of hypotheses

Some would stand by the notion that there are only two types of hypotheses: a Null hypothesis and an Alternative hypothesis. While that may have some truth to it, it would be better to fully distinguish the most common forms as these terms come up so often, which might leave you out of context.

Apart from Null and Alternative, there are Complex, Simple, Directional, Non-Directional, Statistical, and Associative and casual hypotheses. They don't necessarily have to be exclusive, as one hypothesis can tick many boxes, but knowing the distinctions between them will make it easier for you to construct your own.

1. Null hypothesis

A null hypothesis proposes no relationship between two variables. Denoted by H 0 , it is a negative statement like “Attending physiotherapy sessions does not affect athletes' on-field performance.” Here, the author claims physiotherapy sessions have no effect on on-field performances. Even if there is, it's only a coincidence.

2. Alternative hypothesis

Considered to be the opposite of a null hypothesis, an alternative hypothesis is donated as H1 or Ha. It explicitly states that the dependent variable affects the independent variable. A good  alternative hypothesis example is “Attending physiotherapy sessions improves athletes' on-field performance.” or “Water evaporates at 100 °C. ” The alternative hypothesis further branches into directional and non-directional.

  • Directional hypothesis: A hypothesis that states the result would be either positive or negative is called directional hypothesis. It accompanies H1 with either the ‘<' or ‘>' sign.
  • Non-directional hypothesis: A non-directional hypothesis only claims an effect on the dependent variable. It does not clarify whether the result would be positive or negative. The sign for a non-directional hypothesis is ‘≠.'

3. Simple hypothesis

A simple hypothesis is a statement made to reflect the relation between exactly two variables. One independent and one dependent. Consider the example, “Smoking is a prominent cause of lung cancer." The dependent variable, lung cancer, is dependent on the independent variable, smoking.

4. Complex hypothesis

In contrast to a simple hypothesis, a complex hypothesis implies the relationship between multiple independent and dependent variables. For instance, “Individuals who eat more fruits tend to have higher immunity, lesser cholesterol, and high metabolism.” The independent variable is eating more fruits, while the dependent variables are higher immunity, lesser cholesterol, and high metabolism.

5. Associative and casual hypothesis

Associative and casual hypotheses don't exhibit how many variables there will be. They define the relationship between the variables. In an associative hypothesis, changing any one variable, dependent or independent, affects others. In a casual hypothesis, the independent variable directly affects the dependent.

6. Empirical hypothesis

Also referred to as the working hypothesis, an empirical hypothesis claims a theory's validation via experiments and observation. This way, the statement appears justifiable and different from a wild guess.

Say, the hypothesis is “Women who take iron tablets face a lesser risk of anemia than those who take vitamin B12.” This is an example of an empirical hypothesis where the researcher  the statement after assessing a group of women who take iron tablets and charting the findings.

7. Statistical hypothesis

The point of a statistical hypothesis is to test an already existing hypothesis by studying a population sample. Hypothesis like “44% of the Indian population belong in the age group of 22-27.” leverage evidence to prove or disprove a particular statement.

Characteristics of a Good Hypothesis

Writing a hypothesis is essential as it can make or break your research for you. That includes your chances of getting published in a journal. So when you're designing one, keep an eye out for these pointers:

  • A research hypothesis has to be simple yet clear to look justifiable enough.
  • It has to be testable — your research would be rendered pointless if too far-fetched into reality or limited by technology.
  • It has to be precise about the results —what you are trying to do and achieve through it should come out in your hypothesis.
  • A research hypothesis should be self-explanatory, leaving no doubt in the reader's mind.
  • If you are developing a relational hypothesis, you need to include the variables and establish an appropriate relationship among them.
  • A hypothesis must keep and reflect the scope for further investigations and experiments.

Separating a Hypothesis from a Prediction

Outside of academia, hypothesis and prediction are often used interchangeably. In research writing, this is not only confusing but also incorrect. And although a hypothesis and prediction are guesses at their core, there are many differences between them.

A hypothesis is an educated guess or even a testable prediction validated through research. It aims to analyze the gathered evidence and facts to define a relationship between variables and put forth a logical explanation behind the nature of events.

Predictions are assumptions or expected outcomes made without any backing evidence. They are more fictionally inclined regardless of where they originate from.

For this reason, a hypothesis holds much more weight than a prediction. It sticks to the scientific method rather than pure guesswork. "Planets revolve around the Sun." is an example of a hypothesis as it is previous knowledge and observed trends. Additionally, we can test it through the scientific method.

Whereas "COVID-19 will be eradicated by 2030." is a prediction. Even though it results from past trends, we can't prove or disprove it. So, the only way this gets validated is to wait and watch if COVID-19 cases end by 2030.

Finally, How to Write a Hypothesis

Quick-tips-on-how-to-write-a-hypothesis

Quick tips on writing a hypothesis

1.  Be clear about your research question

A hypothesis should instantly address the research question or the problem statement. To do so, you need to ask a question. Understand the constraints of your undertaken research topic and then formulate a simple and topic-centric problem. Only after that can you develop a hypothesis and further test for evidence.

2. Carry out a recce

Once you have your research's foundation laid out, it would be best to conduct preliminary research. Go through previous theories, academic papers, data, and experiments before you start curating your research hypothesis. It will give you an idea of your hypothesis's viability or originality.

Making use of references from relevant research papers helps draft a good research hypothesis. SciSpace Discover offers a repository of over 270 million research papers to browse through and gain a deeper understanding of related studies on a particular topic. Additionally, you can use SciSpace Copilot , your AI research assistant, for reading any lengthy research paper and getting a more summarized context of it. A hypothesis can be formed after evaluating many such summarized research papers. Copilot also offers explanations for theories and equations, explains paper in simplified version, allows you to highlight any text in the paper or clip math equations and tables and provides a deeper, clear understanding of what is being said. This can improve the hypothesis by helping you identify potential research gaps.

3. Create a 3-dimensional hypothesis

Variables are an essential part of any reasonable hypothesis. So, identify your independent and dependent variable(s) and form a correlation between them. The ideal way to do this is to write the hypothetical assumption in the ‘if-then' form. If you use this form, make sure that you state the predefined relationship between the variables.

In another way, you can choose to present your hypothesis as a comparison between two variables. Here, you must specify the difference you expect to observe in the results.

4. Write the first draft

Now that everything is in place, it's time to write your hypothesis. For starters, create the first draft. In this version, write what you expect to find from your research.

Clearly separate your independent and dependent variables and the link between them. Don't fixate on syntax at this stage. The goal is to ensure your hypothesis addresses the issue.

5. Proof your hypothesis

After preparing the first draft of your hypothesis, you need to inspect it thoroughly. It should tick all the boxes, like being concise, straightforward, relevant, and accurate. Your final hypothesis has to be well-structured as well.

Research projects are an exciting and crucial part of being a scholar. And once you have your research question, you need a great hypothesis to begin conducting research. Thus, knowing how to write a hypothesis is very important.

Now that you have a firmer grasp on what a good hypothesis constitutes, the different kinds there are, and what process to follow, you will find it much easier to write your hypothesis, which ultimately helps your research.

Now it's easier than ever to streamline your research workflow with SciSpace Discover . Its integrated, comprehensive end-to-end platform for research allows scholars to easily discover, write and publish their research and fosters collaboration.

It includes everything you need, including a repository of over 270 million research papers across disciplines, SEO-optimized summaries and public profiles to show your expertise and experience.

If you found these tips on writing a research hypothesis useful, head over to our blog on Statistical Hypothesis Testing to learn about the top researchers, papers, and institutions in this domain.

Frequently Asked Questions (FAQs)

1. what is the definition of hypothesis.

According to the Oxford dictionary, a hypothesis is defined as “An idea or explanation of something that is based on a few known facts, but that has not yet been proved to be true or correct”.

2. What is an example of hypothesis?

The hypothesis is a statement that proposes a relationship between two or more variables. An example: "If we increase the number of new users who join our platform by 25%, then we will see an increase in revenue."

3. What is an example of null hypothesis?

A null hypothesis is a statement that there is no relationship between two variables. The null hypothesis is written as H0. The null hypothesis states that there is no effect. For example, if you're studying whether or not a particular type of exercise increases strength, your null hypothesis will be "there is no difference in strength between people who exercise and people who don't."

4. What are the types of research?

• Fundamental research

• Applied research

• Qualitative research

• Quantitative research

• Mixed research

• Exploratory research

• Longitudinal research

• Cross-sectional research

• Field research

• Laboratory research

• Fixed research

• Flexible research

• Action research

• Policy research

• Classification research

• Comparative research

• Causal research

• Inductive research

• Deductive research

5. How to write a hypothesis?

• Your hypothesis should be able to predict the relationship and outcome.

• Avoid wordiness by keeping it simple and brief.

• Your hypothesis should contain observable and testable outcomes.

• Your hypothesis should be relevant to the research question.

6. What are the 2 types of hypothesis?

• Null hypotheses are used to test the claim that "there is no difference between two groups of data".

• Alternative hypotheses test the claim that "there is a difference between two data groups".

7. Difference between research question and research hypothesis?

A research question is a broad, open-ended question you will try to answer through your research. A hypothesis is a statement based on prior research or theory that you expect to be true due to your study. Example - Research question: What are the factors that influence the adoption of the new technology? Research hypothesis: There is a positive relationship between age, education and income level with the adoption of the new technology.

8. What is plural for hypothesis?

The plural of hypothesis is hypotheses. Here's an example of how it would be used in a statement, "Numerous well-considered hypotheses are presented in this part, and they are supported by tables and figures that are well-illustrated."

9. What is the red queen hypothesis?

The red queen hypothesis in evolutionary biology states that species must constantly evolve to avoid extinction because if they don't, they will be outcompeted by other species that are evolving. Leigh Van Valen first proposed it in 1973; since then, it has been tested and substantiated many times.

10. Who is known as the father of null hypothesis?

The father of the null hypothesis is Sir Ronald Fisher. He published a paper in 1925 that introduced the concept of null hypothesis testing, and he was also the first to use the term itself.

11. When to reject null hypothesis?

You need to find a significant difference between your two populations to reject the null hypothesis. You can determine that by running statistical tests such as an independent sample t-test or a dependent sample t-test. You should reject the null hypothesis if the p-value is less than 0.05.

what is the 3 parts of hypothesis

You might also like

Consensus GPT vs. SciSpace GPT: Choose the Best GPT for Research

Consensus GPT vs. SciSpace GPT: Choose the Best GPT for Research

Sumalatha G

Literature Review and Theoretical Framework: Understanding the Differences

Nikhil Seethi

Types of Essays in Academic Writing - Quick Guide (2024)

PrepScholar

Choose Your Test

Sat / act prep online guides and tips, what is a hypothesis and how do i write one.

author image

General Education

body-glowing-question-mark

Think about something strange and unexplainable in your life. Maybe you get a headache right before it rains, or maybe you think your favorite sports team wins when you wear a certain color. If you wanted to see whether these are just coincidences or scientific fact, you would form a hypothesis, then create an experiment to see whether that hypothesis is true or not.

But what is a hypothesis, anyway? If you’re not sure about what a hypothesis is--or how to test for one!--you’re in the right place. This article will teach you everything you need to know about hypotheses, including: 

  • Defining the term “hypothesis” 
  • Providing hypothesis examples 
  • Giving you tips for how to write your own hypothesis

So let’s get started!

body-picture-ask-sign

What Is a Hypothesis?

Merriam Webster defines a hypothesis as “an assumption or concession made for the sake of argument.” In other words, a hypothesis is an educated guess . Scientists make a reasonable assumption--or a hypothesis--then design an experiment to test whether it’s true or not. Keep in mind that in science, a hypothesis should be testable. You have to be able to design an experiment that tests your hypothesis in order for it to be valid. 

As you could assume from that statement, it’s easy to make a bad hypothesis. But when you’re holding an experiment, it’s even more important that your guesses be good...after all, you’re spending time (and maybe money!) to figure out more about your observation. That’s why we refer to a hypothesis as an educated guess--good hypotheses are based on existing data and research to make them as sound as possible.

Hypotheses are one part of what’s called the scientific method .  Every (good) experiment or study is based in the scientific method. The scientific method gives order and structure to experiments and ensures that interference from scientists or outside influences does not skew the results. It’s important that you understand the concepts of the scientific method before holding your own experiment. Though it may vary among scientists, the scientific method is generally made up of six steps (in order):

  • Observation
  • Asking questions
  • Forming a hypothesis
  • Analyze the data
  • Communicate your results

You’ll notice that the hypothesis comes pretty early on when conducting an experiment. That’s because experiments work best when they’re trying to answer one specific question. And you can’t conduct an experiment until you know what you’re trying to prove!

Independent and Dependent Variables 

After doing your research, you’re ready for another important step in forming your hypothesis: identifying variables. Variables are basically any factor that could influence the outcome of your experiment . Variables have to be measurable and related to the topic being studied.

There are two types of variables:  independent variables and dependent variables. I ndependent variables remain constant . For example, age is an independent variable; it will stay the same, and researchers can look at different ages to see if it has an effect on the dependent variable. 

Speaking of dependent variables... dependent variables are subject to the influence of the independent variable , meaning that they are not constant. Let’s say you want to test whether a person’s age affects how much sleep they need. In that case, the independent variable is age (like we mentioned above), and the dependent variable is how much sleep a person gets. 

Variables will be crucial in writing your hypothesis. You need to be able to identify which variable is which, as both the independent and dependent variables will be written into your hypothesis. For instance, in a study about exercise, the independent variable might be the speed at which the respondents walk for thirty minutes, and the dependent variable would be their heart rate. In your study and in your hypothesis, you’re trying to understand the relationship between the two variables.

Elements of a Good Hypothesis

The best hypotheses start by asking the right questions . For instance, if you’ve observed that the grass is greener when it rains twice a week, you could ask what kind of grass it is, what elevation it’s at, and if the grass across the street responds to rain in the same way. Any of these questions could become the backbone of experiments to test why the grass gets greener when it rains fairly frequently.

As you’re asking more questions about your first observation, make sure you’re also making more observations . If it doesn’t rain for two weeks and the grass still looks green, that’s an important observation that could influence your hypothesis. You'll continue observing all throughout your experiment, but until the hypothesis is finalized, every observation should be noted.

Finally, you should consult secondary research before writing your hypothesis . Secondary research is comprised of results found and published by other people. You can usually find this information online or at your library. Additionally, m ake sure the research you find is credible and related to your topic. If you’re studying the correlation between rain and grass growth, it would help you to research rain patterns over the past twenty years for your county, published by a local agricultural association. You should also research the types of grass common in your area, the type of grass in your lawn, and whether anyone else has conducted experiments about your hypothesis. Also be sure you’re checking the quality of your research . Research done by a middle school student about what minerals can be found in rainwater would be less useful than an article published by a local university.

body-pencil-notebook-writing

Writing Your Hypothesis

Once you’ve considered all of the factors above, you’re ready to start writing your hypothesis. Hypotheses usually take a certain form when they’re written out in a research report.

When you boil down your hypothesis statement, you are writing down your best guess and not the question at hand . This means that your statement should be written as if it is fact already, even though you are simply testing it.

The reason for this is that, after you have completed your study, you'll either accept or reject your if-then or your null hypothesis. All hypothesis testing examples should be measurable and able to be confirmed or denied. You cannot confirm a question, only a statement! 

In fact, you come up with hypothesis examples all the time! For instance, when you guess on the outcome of a basketball game, you don’t say, “Will the Miami Heat beat the Boston Celtics?” but instead, “I think the Miami Heat will beat the Boston Celtics.” You state it as if it is already true, even if it turns out you’re wrong. You do the same thing when writing your hypothesis.

Additionally, keep in mind that hypotheses can range from very specific to very broad.  These hypotheses can be specific, but if your hypothesis testing examples involve a broad range of causes and effects, your hypothesis can also be broad.  

body-hand-number-two

The Two Types of Hypotheses

Now that you understand what goes into a hypothesis, it’s time to look more closely at the two most common types of hypothesis: the if-then hypothesis and the null hypothesis.

#1: If-Then Hypotheses

First of all, if-then hypotheses typically follow this formula:

If ____ happens, then ____ will happen.

The goal of this type of hypothesis is to test the causal relationship between the independent and dependent variable. It’s fairly simple, and each hypothesis can vary in how detailed it can be. We create if-then hypotheses all the time with our daily predictions. Here are some examples of hypotheses that use an if-then structure from daily life: 

  • If I get enough sleep, I’ll be able to get more work done tomorrow.
  • If the bus is on time, I can make it to my friend’s birthday party. 
  • If I study every night this week, I’ll get a better grade on my exam. 

In each of these situations, you’re making a guess on how an independent variable (sleep, time, or studying) will affect a dependent variable (the amount of work you can do, making it to a party on time, or getting better grades). 

You may still be asking, “What is an example of a hypothesis used in scientific research?” Take one of the hypothesis examples from a real-world study on whether using technology before bed affects children’s sleep patterns. The hypothesis read s:

“We hypothesized that increased hours of tablet- and phone-based screen time at bedtime would be inversely correlated with sleep quality and child attention.”

It might not look like it, but this is an if-then statement. The researchers basically said, “If children have more screen usage at bedtime, then their quality of sleep and attention will be worse.” The sleep quality and attention are the dependent variables and the screen usage is the independent variable. (Usually, the independent variable comes after the “if” and the dependent variable comes after the “then,” as it is the independent variable that affects the dependent variable.) This is an excellent example of how flexible hypothesis statements can be, as long as the general idea of “if-then” and the independent and dependent variables are present.

#2: Null Hypotheses

Your if-then hypothesis is not the only one needed to complete a successful experiment, however. You also need a null hypothesis to test it against. In its most basic form, the null hypothesis is the opposite of your if-then hypothesis . When you write your null hypothesis, you are writing a hypothesis that suggests that your guess is not true, and that the independent and dependent variables have no relationship .

One null hypothesis for the cell phone and sleep study from the last section might say: 

“If children have more screen usage at bedtime, their quality of sleep and attention will not be worse.” 

In this case, this is a null hypothesis because it’s asking the opposite of the original thesis! 

Conversely, if your if-then hypothesis suggests that your two variables have no relationship, then your null hypothesis would suggest that there is one. So, pretend that there is a study that is asking the question, “Does the amount of followers on Instagram influence how long people spend on the app?” The independent variable is the amount of followers, and the dependent variable is the time spent. But if you, as the researcher, don’t think there is a relationship between the number of followers and time spent, you might write an if-then hypothesis that reads:

“If people have many followers on Instagram, they will not spend more time on the app than people who have less.”

In this case, the if-then suggests there isn’t a relationship between the variables. In that case, one of the null hypothesis examples might say:

“If people have many followers on Instagram, they will spend more time on the app than people who have less.”

You then test both the if-then and the null hypothesis to gauge if there is a relationship between the variables, and if so, how much of a relationship. 

feature_tips

4 Tips to Write the Best Hypothesis

If you’re going to take the time to hold an experiment, whether in school or by yourself, you’re also going to want to take the time to make sure your hypothesis is a good one. The best hypotheses have four major elements in common: plausibility, defined concepts, observability, and general explanation.

#1: Plausibility

At first glance, this quality of a hypothesis might seem obvious. When your hypothesis is plausible, that means it’s possible given what we know about science and general common sense. However, improbable hypotheses are more common than you might think. 

Imagine you’re studying weight gain and television watching habits. If you hypothesize that people who watch more than  twenty hours of television a week will gain two hundred pounds or more over the course of a year, this might be improbable (though it’s potentially possible). Consequently, c ommon sense can tell us the results of the study before the study even begins.

Improbable hypotheses generally go against  science, as well. Take this hypothesis example: 

“If a person smokes one cigarette a day, then they will have lungs just as healthy as the average person’s.” 

This hypothesis is obviously untrue, as studies have shown again and again that cigarettes negatively affect lung health. You must be careful that your hypotheses do not reflect your own personal opinion more than they do scientifically-supported findings. This plausibility points to the necessity of research before the hypothesis is written to make sure that your hypothesis has not already been disproven.

#2: Defined Concepts

The more advanced you are in your studies, the more likely that the terms you’re using in your hypothesis are specific to a limited set of knowledge. One of the hypothesis testing examples might include the readability of printed text in newspapers, where you might use words like “kerning” and “x-height.” Unless your readers have a background in graphic design, it’s likely that they won’t know what you mean by these terms. Thus, it’s important to either write what they mean in the hypothesis itself or in the report before the hypothesis.

Here’s what we mean. Which of the following sentences makes more sense to the common person?

If the kerning is greater than average, more words will be read per minute.

If the space between letters is greater than average, more words will be read per minute.

For people reading your report that are not experts in typography, simply adding a few more words will be helpful in clarifying exactly what the experiment is all about. It’s always a good idea to make your research and findings as accessible as possible. 

body-blue-eye

Good hypotheses ensure that you can observe the results. 

#3: Observability

In order to measure the truth or falsity of your hypothesis, you must be able to see your variables and the way they interact. For instance, if your hypothesis is that the flight patterns of satellites affect the strength of certain television signals, yet you don’t have a telescope to view the satellites or a television to monitor the signal strength, you cannot properly observe your hypothesis and thus cannot continue your study.

Some variables may seem easy to observe, but if you do not have a system of measurement in place, you cannot observe your hypothesis properly. Here’s an example: if you’re experimenting on the effect of healthy food on overall happiness, but you don’t have a way to monitor and measure what “overall happiness” means, your results will not reflect the truth. Monitoring how often someone smiles for a whole day is not reasonably observable, but having the participants state how happy they feel on a scale of one to ten is more observable. 

In writing your hypothesis, always keep in mind how you'll execute the experiment.

#4: Generalizability 

Perhaps you’d like to study what color your best friend wears the most often by observing and documenting the colors she wears each day of the week. This might be fun information for her and you to know, but beyond you two, there aren’t many people who could benefit from this experiment. When you start an experiment, you should note how generalizable your findings may be if they are confirmed. Generalizability is basically how common a particular phenomenon is to other people’s everyday life.

Let’s say you’re asking a question about the health benefits of eating an apple for one day only, you need to realize that the experiment may be too specific to be helpful. It does not help to explain a phenomenon that many people experience. If you find yourself with too specific of a hypothesis, go back to asking the big question: what is it that you want to know, and what do you think will happen between your two variables?

body-experiment-chemistry

Hypothesis Testing Examples

We know it can be hard to write a good hypothesis unless you’ve seen some good hypothesis examples. We’ve included four hypothesis examples based on some made-up experiments. Use these as templates or launch pads for coming up with your own hypotheses.

Experiment #1: Students Studying Outside (Writing a Hypothesis)

You are a student at PrepScholar University. When you walk around campus, you notice that, when the temperature is above 60 degrees, more students study in the quad. You want to know when your fellow students are more likely to study outside. With this information, how do you make the best hypothesis possible?

You must remember to make additional observations and do secondary research before writing your hypothesis. In doing so, you notice that no one studies outside when it’s 75 degrees and raining, so this should be included in your experiment. Also, studies done on the topic beforehand suggested that students are more likely to study in temperatures less than 85 degrees. With this in mind, you feel confident that you can identify your variables and write your hypotheses:

If-then: “If the temperature in Fahrenheit is less than 60 degrees, significantly fewer students will study outside.”

Null: “If the temperature in Fahrenheit is less than 60 degrees, the same number of students will study outside as when it is more than 60 degrees.”

These hypotheses are plausible, as the temperatures are reasonably within the bounds of what is possible. The number of people in the quad is also easily observable. It is also not a phenomenon specific to only one person or at one time, but instead can explain a phenomenon for a broader group of people.

To complete this experiment, you pick the month of October to observe the quad. Every day (except on the days where it’s raining)from 3 to 4 PM, when most classes have released for the day, you observe how many people are on the quad. You measure how many people come  and how many leave. You also write down the temperature on the hour. 

After writing down all of your observations and putting them on a graph, you find that the most students study on the quad when it is 70 degrees outside, and that the number of students drops a lot once the temperature reaches 60 degrees or below. In this case, your research report would state that you accept or “failed to reject” your first hypothesis with your findings.

Experiment #2: The Cupcake Store (Forming a Simple Experiment)

Let’s say that you work at a bakery. You specialize in cupcakes, and you make only two colors of frosting: yellow and purple. You want to know what kind of customers are more likely to buy what kind of cupcake, so you set up an experiment. Your independent variable is the customer’s gender, and the dependent variable is the color of the frosting. What is an example of a hypothesis that might answer the question of this study?

Here’s what your hypotheses might look like: 

If-then: “If customers’ gender is female, then they will buy more yellow cupcakes than purple cupcakes.”

Null: “If customers’ gender is female, then they will be just as likely to buy purple cupcakes as yellow cupcakes.”

This is a pretty simple experiment! It passes the test of plausibility (there could easily be a difference), defined concepts (there’s nothing complicated about cupcakes!), observability (both color and gender can be easily observed), and general explanation ( this would potentially help you make better business decisions ).

body-bird-feeder

Experiment #3: Backyard Bird Feeders (Integrating Multiple Variables and Rejecting the If-Then Hypothesis)

While watching your backyard bird feeder, you realized that different birds come on the days when you change the types of seeds. You decide that you want to see more cardinals in your backyard, so you decide to see what type of food they like the best and set up an experiment. 

However, one morning, you notice that, while some cardinals are present, blue jays are eating out of your backyard feeder filled with millet. You decide that, of all of the other birds, you would like to see the blue jays the least. This means you'll have more than one variable in your hypothesis. Your new hypotheses might look like this: 

If-then: “If sunflower seeds are placed in the bird feeders, then more cardinals will come than blue jays. If millet is placed in the bird feeders, then more blue jays will come than cardinals.”

Null: “If either sunflower seeds or millet are placed in the bird, equal numbers of cardinals and blue jays will come.”

Through simple observation, you actually find that cardinals come as often as blue jays when sunflower seeds or millet is in the bird feeder. In this case, you would reject your “if-then” hypothesis and “fail to reject” your null hypothesis . You cannot accept your first hypothesis, because it’s clearly not true. Instead you found that there was actually no relation between your different variables. Consequently, you would need to run more experiments with different variables to see if the new variables impact the results.

Experiment #4: In-Class Survey (Including an Alternative Hypothesis)

You’re about to give a speech in one of your classes about the importance of paying attention. You want to take this opportunity to test a hypothesis you’ve had for a while: 

If-then: If students sit in the first two rows of the classroom, then they will listen better than students who do not.

Null: If students sit in the first two rows of the classroom, then they will not listen better or worse than students who do not.

You give your speech and then ask your teacher if you can hand out a short survey to the class. On the survey, you’ve included questions about some of the topics you talked about. When you get back the results, you’re surprised to see that not only do the students in the first two rows not pay better attention, but they also scored worse than students in other parts of the classroom! Here, both your if-then and your null hypotheses are not representative of your findings. What do you do?

This is when you reject both your if-then and null hypotheses and instead create an alternative hypothesis . This type of hypothesis is used in the rare circumstance that neither of your hypotheses is able to capture your findings . Now you can use what you’ve learned to draft new hypotheses and test again! 

Key Takeaways: Hypothesis Writing

The more comfortable you become with writing hypotheses, the better they will become. The structure of hypotheses is flexible and may need to be changed depending on what topic you are studying. The most important thing to remember is the purpose of your hypothesis and the difference between the if-then and the null . From there, in forming your hypothesis, you should constantly be asking questions, making observations, doing secondary research, and considering your variables. After you have written your hypothesis, be sure to edit it so that it is plausible, clearly defined, observable, and helpful in explaining a general phenomenon.

Writing a hypothesis is something that everyone, from elementary school children competing in a science fair to professional scientists in a lab, needs to know how to do. Hypotheses are vital in experiments and in properly executing the scientific method . When done correctly, hypotheses will set up your studies for success and help you to understand the world a little better, one experiment at a time.

body-whats-next-post-it-note

What’s Next?

If you’re studying for the science portion of the ACT, there’s definitely a lot you need to know. We’ve got the tools to help, though! Start by checking out our ultimate study guide for the ACT Science subject test. Once you read through that, be sure to download our recommended ACT Science practice tests , since they’re one of the most foolproof ways to improve your score. (And don’t forget to check out our expert guide book , too.)

If you love science and want to major in a scientific field, you should start preparing in high school . Here are the science classes you should take to set yourself up for success.

If you’re trying to think of science experiments you can do for class (or for a science fair!), here’s a list of 37 awesome science experiments you can do at home

author image

Ashley Sufflé Robinson has a Ph.D. in 19th Century English Literature. As a content writer for PrepScholar, Ashley is passionate about giving college-bound students the in-depth information they need to get into the school of their dreams.

Ask a Question Below

Have any questions about this article or other topics? Ask below and we'll reply!

Improve With Our Famous Guides

  • For All Students

The 5 Strategies You Must Be Using to Improve 160+ SAT Points

How to Get a Perfect 1600, by a Perfect Scorer

Series: How to Get 800 on Each SAT Section:

Score 800 on SAT Math

Score 800 on SAT Reading

Score 800 on SAT Writing

Series: How to Get to 600 on Each SAT Section:

Score 600 on SAT Math

Score 600 on SAT Reading

Score 600 on SAT Writing

Free Complete Official SAT Practice Tests

What SAT Target Score Should You Be Aiming For?

15 Strategies to Improve Your SAT Essay

The 5 Strategies You Must Be Using to Improve 4+ ACT Points

How to Get a Perfect 36 ACT, by a Perfect Scorer

Series: How to Get 36 on Each ACT Section:

36 on ACT English

36 on ACT Math

36 on ACT Reading

36 on ACT Science

Series: How to Get to 24 on Each ACT Section:

24 on ACT English

24 on ACT Math

24 on ACT Reading

24 on ACT Science

What ACT target score should you be aiming for?

ACT Vocabulary You Must Know

ACT Writing: 15 Tips to Raise Your Essay Score

How to Get Into Harvard and the Ivy League

How to Get a Perfect 4.0 GPA

How to Write an Amazing College Essay

What Exactly Are Colleges Looking For?

Is the ACT easier than the SAT? A Comprehensive Guide

Should you retake your SAT or ACT?

When should you take the SAT or ACT?

Stay Informed

what is the 3 parts of hypothesis

Get the latest articles and test prep tips!

Looking for Graduate School Test Prep?

Check out our top-rated graduate blogs here:

GRE Online Prep Blog

GMAT Online Prep Blog

TOEFL Online Prep Blog

Holly R. "I am absolutely overjoyed and cannot thank you enough for helping me!”

Elsevier QRcode Wechat

  • Manuscript Preparation

What is and How to Write a Good Hypothesis in Research?

  • 4 minute read

Table of Contents

One of the most important aspects of conducting research is constructing a strong hypothesis. But what makes a hypothesis in research effective? In this article, we’ll look at the difference between a hypothesis and a research question, as well as the elements of a good hypothesis in research. We’ll also include some examples of effective hypotheses, and what pitfalls to avoid.

What is a Hypothesis in Research?

Simply put, a hypothesis is a research question that also includes the predicted or expected result of the research. Without a hypothesis, there can be no basis for a scientific or research experiment. As such, it is critical that you carefully construct your hypothesis by being deliberate and thorough, even before you set pen to paper. Unless your hypothesis is clearly and carefully constructed, any flaw can have an adverse, and even grave, effect on the quality of your experiment and its subsequent results.

Research Question vs Hypothesis

It’s easy to confuse research questions with hypotheses, and vice versa. While they’re both critical to the Scientific Method, they have very specific differences. Primarily, a research question, just like a hypothesis, is focused and concise. But a hypothesis includes a prediction based on the proposed research, and is designed to forecast the relationship of and between two (or more) variables. Research questions are open-ended, and invite debate and discussion, while hypotheses are closed, e.g. “The relationship between A and B will be C.”

A hypothesis is generally used if your research topic is fairly well established, and you are relatively certain about the relationship between the variables that will be presented in your research. Since a hypothesis is ideally suited for experimental studies, it will, by its very existence, affect the design of your experiment. The research question is typically used for new topics that have not yet been researched extensively. Here, the relationship between different variables is less known. There is no prediction made, but there may be variables explored. The research question can be casual in nature, simply trying to understand if a relationship even exists, descriptive or comparative.

How to Write Hypothesis in Research

Writing an effective hypothesis starts before you even begin to type. Like any task, preparation is key, so you start first by conducting research yourself, and reading all you can about the topic that you plan to research. From there, you’ll gain the knowledge you need to understand where your focus within the topic will lie.

Remember that a hypothesis is a prediction of the relationship that exists between two or more variables. Your job is to write a hypothesis, and design the research, to “prove” whether or not your prediction is correct. A common pitfall is to use judgments that are subjective and inappropriate for the construction of a hypothesis. It’s important to keep the focus and language of your hypothesis objective.

An effective hypothesis in research is clearly and concisely written, and any terms or definitions clarified and defined. Specific language must also be used to avoid any generalities or assumptions.

Use the following points as a checklist to evaluate the effectiveness of your research hypothesis:

  • Predicts the relationship and outcome
  • Simple and concise – avoid wordiness
  • Clear with no ambiguity or assumptions about the readers’ knowledge
  • Observable and testable results
  • Relevant and specific to the research question or problem

Research Hypothesis Example

Perhaps the best way to evaluate whether or not your hypothesis is effective is to compare it to those of your colleagues in the field. There is no need to reinvent the wheel when it comes to writing a powerful research hypothesis. As you’re reading and preparing your hypothesis, you’ll also read other hypotheses. These can help guide you on what works, and what doesn’t, when it comes to writing a strong research hypothesis.

Here are a few generic examples to get you started.

Eating an apple each day, after the age of 60, will result in a reduction of frequency of physician visits.

Budget airlines are more likely to receive more customer complaints. A budget airline is defined as an airline that offers lower fares and fewer amenities than a traditional full-service airline. (Note that the term “budget airline” is included in the hypothesis.

Workplaces that offer flexible working hours report higher levels of employee job satisfaction than workplaces with fixed hours.

Each of the above examples are specific, observable and measurable, and the statement of prediction can be verified or shown to be false by utilizing standard experimental practices. It should be noted, however, that often your hypothesis will change as your research progresses.

Language Editing Plus

Elsevier’s Language Editing Plus service can help ensure that your research hypothesis is well-designed, and articulates your research and conclusions. Our most comprehensive editing package, you can count on a thorough language review by native-English speakers who are PhDs or PhD candidates. We’ll check for effective logic and flow of your manuscript, as well as document formatting for your chosen journal, reference checks, and much more.

Systematic Literature Review or Literature Review

  • Research Process

Systematic Literature Review or Literature Review?

What is a Problem Statement

What is a Problem Statement? [with examples]

You may also like.

impactful introduction section

Make Hook, Line, and Sinker: The Art of Crafting Engaging Introductions

Limitations of a Research

Can Describing Study Limitations Improve the Quality of Your Paper?

Guide to Crafting Impactful Sentences

A Guide to Crafting Shorter, Impactful Sentences in Academic Writing

Write an Excellent Discussion in Your Manuscript

6 Steps to Write an Excellent Discussion in Your Manuscript

How to Write Clear Civil Engineering Papers

How to Write Clear and Crisp Civil Engineering Papers? Here are 5 Key Tips to Consider

Writing an Impactful Paper

The Clear Path to An Impactful Paper: ②

Essentials of Writing to Communicate Research in Medicine

The Essentials of Writing to Communicate Research in Medicine

There are some recognizable elements and patterns often used for framing engaging sentences in English. Find here the sentence patterns in Academic Writing

Changing Lines: Sentence Patterns in Academic Writing

Input your search keywords and press Enter.

Learn How To Write A Hypothesis For Your Next Research Project!

blog image

Undoubtedly, research plays a crucial role in substantiating or refuting our assumptions. These assumptions act as potential answers to our questions. Such assumptions, also known as hypotheses, are considered key aspects of research. In this blog, we delve into the significance of hypotheses. And provide insights on how to write them effectively. So, let’s dive in and explore the art of writing hypotheses together.

Table of Contents

What is a Hypothesis?

A hypothesis is a crucial starting point in scientific research. It is an educated guess about the relationship between two or more variables. In other words, a hypothesis acts as a foundation for a researcher to build their study.

Here are some examples of well-crafted hypotheses:

  • Increased exposure to natural sunlight improves sleep quality in adults.

A positive relationship between natural sunlight exposure and sleep quality in adult individuals.

  • Playing puzzle games on a regular basis enhances problem-solving abilities in children.

Engaging in frequent puzzle gameplay leads to improved problem-solving skills in children.

  • Students and improved learning hecks.

S tudents using online  paper writing service  platforms (as a learning tool for receiving personalized feedback and guidance) will demonstrate improved writing skills. (compared to those who do not utilize such platforms).

  • The use of APA format in research papers. 

Using the  APA format  helps students stay organized when writing research papers. Organized students can focus better on their topics and, as a result, produce better quality work.

The Building Blocks of a Hypothesis

To better understand the concept of a hypothesis, let’s break it down into its basic components:

  • Variables . A hypothesis involves at least two variables. An independent variable and a dependent variable. The independent variable is the one being changed or manipulated, while the dependent variable is the one being measured or observed.
  • Relationship : A hypothesis proposes a relationship or connection between the variables. This could be a cause-and-effect relationship or a correlation between them.
  • Testability : A hypothesis should be testable and falsifiable, meaning it can be proven right or wrong through experimentation or observation.

Types of Hypotheses

When learning how to write a hypothesis, it’s essential to understand its main types. These include; alternative hypotheses and null hypotheses. In the following section, we explore both types of hypotheses with examples. 

Alternative Hypothesis (H1)

This kind of hypothesis suggests a relationship or effect between the variables. It is the main focus of the study. The researcher wants to either prove or disprove it. Many research divides this hypothesis into two subsections: 

  • Directional 

This type of H1 predicts a specific outcome. Many researchers use this hypothesis to explore the relationship between variables rather than the groups. 

  • Non-directional

You can take a guess from the name. This type of H1 does not provide a specific prediction for the research outcome. 

Here are some examples for your better understanding of how to write a hypothesis.

  • Consuming caffeine improves cognitive performance.  (This hypothesis predicts that there is a positive relationship between caffeine consumption and cognitive performance.)
  • Aerobic exercise leads to reduced blood pressure.  (This hypothesis suggests that engaging in aerobic exercise results in lower blood pressure readings.)
  • Exposure to nature reduces stress levels among employees.  (Here, the hypothesis proposes that employees exposed to natural environments will experience decreased stress levels.)
  • Listening to classical music while studying increases memory retention.  (This hypothesis speculates that studying with classical music playing in the background boosts students’ ability to retain information.)
  • Early literacy intervention improves reading skills in children.  (This hypothesis claims that providing early literacy assistance to children results in enhanced reading abilities.)
  • Time management in nursing students. ( Students who use a  nursing research paper writing service  have more time to focus on their studies and can achieve better grades in other subjects. )

Null Hypothesis (H0)

A null hypothesis assumes no relationship or effect between the variables. If the alternative hypothesis is proven to be false, the null hypothesis is considered to be true. Usually a null hypothesis shows no direct correlation between the defined variables. 

Here are some of the examples

  • The consumption of herbal tea has no effect on sleep quality.  (This hypothesis assumes that herbal tea consumption does not impact the quality of sleep.)
  • The number of hours spent playing video games is unrelated to academic performance.  (Here, the null hypothesis suggests that no relationship exists between video gameplay duration and academic achievement.)
  • Implementing flexible work schedules has no influence on employee job satisfaction.  (This hypothesis contends that providing flexible schedules does not affect how satisfied employees are with their jobs.)
  • Writing ability of a 7th grader is not affected by reading editorial example. ( There is no relationship between reading an  editorial example  and improving a 7th grader’s writing abilities.) 
  • The type of lighting in a room does not affect people’s mood.  (In this null hypothesis, there is no connection between the kind of lighting in a room and the mood of those present.)
  • The use of social media during break time does not impact productivity at work.  (This hypothesis proposes that social media usage during breaks has no effect on work productivity.)

As you learn how to write a hypothesis, remember that aiming for clarity, testability, and relevance to your research question is vital. By mastering this skill, you’re well on your way to conducting impactful scientific research. Good luck!

Importance of a Hypothesis in Research

A well-structured hypothesis is a vital part of any research project for several reasons:

  • It provides clear direction for the study by setting its focus and purpose.
  • It outlines expectations of the research, making it easier to measure results.
  • It helps identify any potential limitations in the study, allowing researchers to refine their approach.

In conclusion, a hypothesis plays a fundamental role in the research process. By understanding its concept and constructing a well-thought-out hypothesis, researchers lay the groundwork for a successful, scientifically sound investigation.

How to Write a Hypothesis?

Here are five steps that you can follow to write an effective hypothesis. 

Step 1: Identify Your Research Question

The first step in learning how to compose a hypothesis is to clearly define your research question. This question is the central focus of your study and will help you determine the direction of your hypothesis.

Step 2: Determine the Variables

When exploring how to write a hypothesis, it’s crucial to identify the variables involved in your study. You’ll need at least two variables:

  • Independent variable : The factor you manipulate or change in your experiment.
  • Dependent variable : The outcome or result you observe or measure, which is influenced by the independent variable.

Step 3: Build the Hypothetical Relationship

In understanding how to compose a hypothesis, constructing the relationship between the variables is key. Based on your research question and variables, predict the expected outcome or connection. This prediction should be specific, testable, and, if possible, expressed in the “If…then” format.

Step 4: Write the Null Hypothesis

When mastering how to write a hypothesis, it’s important to create a null hypothesis as well. The null hypothesis assumes no relationship or effect between the variables, acting as a counterpoint to your primary hypothesis.

Step 5: Review Your Hypothesis

Finally, when learning how to compose a hypothesis, it’s essential to review your hypothesis for clarity, testability, and relevance to your research question. Make any necessary adjustments to ensure it provides a solid basis for your study.

In conclusion, understanding how to write a hypothesis is crucial for conducting successful scientific research. By focusing on your research question and carefully building relationships between variables, you will lay a strong foundation for advancing research and knowledge in your field.

Hypothesis vs. Prediction: What’s the Difference?

Understanding the differences between a hypothesis and a prediction is crucial in scientific research. Often, these terms are used interchangeably, but they have distinct meanings and functions. This segment aims to clarify these differences and explain how to compose a hypothesis correctly, helping you improve the quality of your research projects.

Hypothesis: The Foundation of Your Research

A hypothesis is an educated guess about the relationship between two or more variables. It provides the basis for your research question and is a starting point for an experiment or observational study.

The critical elements for a hypothesis include:

  • Specificity: A clear and concise statement that describes the relationship between variables.
  • Testability: The ability to test the hypothesis through experimentation or observation.

To learn how to write a hypothesis, it’s essential to identify your research question first and then predict the relationship between the variables.

Prediction: The Expected Outcome

A prediction is a statement about a specific outcome you expect to see in your experiment or observational study. It’s derived from the hypothesis and provides a measurable way to test the relationship between variables.

Here’s an example of how to write a hypothesis and a related prediction:

  • Hypothesis: Consuming a high-sugar diet leads to weight gain.
  • Prediction: People who consume a high-sugar diet for six weeks will gain more weight than those who maintain a low-sugar diet during the same period.

Key Differences Between a Hypothesis and a Prediction

While a hypothesis and prediction are both essential components of scientific research, there are some key differences to keep in mind:

  • A hypothesis is an educated guess that suggests a relationship between variables, while a prediction is a specific and measurable outcome based on that hypothesis.
  • A hypothesis can give rise to multiple experiment or observational study predictions.

To conclude, understanding the differences between a hypothesis and a prediction, and learning how to write a hypothesis, are essential steps to form a robust foundation for your research. By creating clear, testable hypotheses along with specific, measurable predictions, you lay the groundwork for scientifically sound investigations.

Here’s a wrap-up for this guide on how to write a hypothesis. We’re confident this article was helpful for many of you. We understand that many students struggle with writing their school research . However, we hope to continue assisting you through our blog tutorial on writing different aspects of academic assignments.

For further information, you can check out our reverent blog or contact our professionals to avail amazing writing services. Paper perk experts tailor assignments to reflect your unique voice and perspectives. Our professionals make sure to stick around till your satisfaction. So what are you waiting for? Pick your required service and order away!

Order Original Papers & Essays

Your First Custom Paper Sample is on Us!

timely deliveries

Timely Deliveries

premium quality

No Plagiarism & AI

unlimited revisions

100% Refund

Try Our Free Paper Writing Service

Related blogs.

blog-img

Connections with Writers and support

safe service

Privacy and Confidentiality Guarantee

quality-score

Average Quality Score

How to write a research hypothesis

Last updated

19 January 2023

Reviewed by

Miroslav Damyanov

Start with a broad subject matter that excites you, so your curiosity will motivate your work. Conduct a literature search to determine the range of questions already addressed and spot any holes in the existing research.

Narrow the topics that interest you and determine your research question. Rather than focusing on a hole in the research, you might choose to challenge an existing assumption, a process called problematization. You may also find yourself with a short list of questions or related topics.

Use the FINER method to determine the single problem you'll address with your research. FINER stands for:

I nteresting

You need a feasible research question, meaning that there is a way to address the question. You should find it interesting, but so should a larger audience. Rather than repeating research that others have already conducted, your research hypothesis should test something novel or unique. 

The research must fall into accepted ethical parameters as defined by the government of your country and your university or college if you're an academic. You'll also need to come up with a relevant question since your research should provide a contribution to the existing research area.

This process typically narrows your shortlist down to a single problem you'd like to study and the variable you want to test. You're ready to write your hypothesis statements.

Make research less tedious

Dovetail streamlines research to help you uncover and share actionable insights

  • Types of research hypotheses

It is important to narrow your topic down to one idea before trying to write your research hypothesis. You'll only test one problem at a time. To do this, you'll write two hypotheses – a null hypothesis (H0) and an alternative hypothesis (Ha).

You'll come across many terms related to developing a research hypothesis or referring to a specific type of hypothesis. Let's take a quick look at these terms.

Null hypothesis

The term null hypothesis refers to a research hypothesis type that assumes no statistically significant relationship exists within a set of observations or data. It represents a claim that assumes that any observed relationship is due to chance. Represented as H0, the null represents the conjecture of the research.

Alternative hypothesis

The alternative hypothesis accompanies the null hypothesis. It states that the situation presented in the null hypothesis is false or untrue, and claims an observed effect in your test. This is typically denoted by Ha or H(n), where “n” stands for the number of alternative hypotheses. You can have more than one alternative hypothesis. 

Simple hypothesis

The term simple hypothesis refers to a hypothesis or theory that predicts the relationship between two variables - the independent (predictor) and the dependent (predicted). 

Complex hypothesis

The term complex hypothesis refers to a model – either quantitative (mathematical) or qualitative . A complex hypothesis states the surmised relationship between two or more potentially related variables.

Directional hypothesis

When creating a statistical hypothesis, the directional hypothesis (the null hypothesis) states an assumption regarding one parameter of a population. Some academics call this the “one-sided” hypothesis. The alternative hypothesis indicates whether the researcher tests for a positive or negative effect by including either the greater than (">") or less than ("<") sign.

Non-directional hypothesis

We refer to the alternative hypothesis in a statistical research question as a non-directional hypothesis. It includes the not equal ("≠") sign to show that the research tests whether or not an effect exists without specifying the effect's direction (positive or negative).

Associative hypothesis

The term associative hypothesis assumes a link between two variables but stops short of stating that one variable impacts the other. Academic statistical literature asserts in this sense that correlation does not imply causation. So, although the hypothesis notes the correlation between two variables – the independent and dependent - it does not predict how the two interact.

Logical hypothesis

Typically used in philosophy rather than science, researchers can't test a logical hypothesis because the technology or data set doesn't yet exist. A logical hypothesis uses logic as the basis of its assumptions. 

In some cases, a logical hypothesis can become an empirical hypothesis once technology provides an opportunity for testing. Until that time, the question remains too expensive or complex to address. Note that a logical hypothesis is not a statistical hypothesis.

Empirical hypothesis

When we consider the opposite of a logical hypothesis, we call this an empirical or working hypothesis. This type of hypothesis considers a scientifically measurable question. A researcher can consider and test an empirical hypothesis through replicable tests, observations, and measurements.

Statistical hypothesis

The term statistical hypothesis refers to a test of a theory that uses representative statistical models to test relationships between variables to draw conclusions regarding a large population. This requires an existing large data set, commonly referred to as big data, or implementing a survey to obtain original statistical information to form a data set for the study. 

Testing this type of hypothesis requires the use of random samples. Note that the null and alternative hypotheses are used in statistical hypothesis testing.

Causal hypothesis

The term causal hypothesis refers to a research hypothesis that tests a cause-and-effect relationship. A causal hypothesis is utilized when conducting experimental or quasi-experimental research.

Descriptive hypothesis

The term descriptive hypothesis refers to a research hypothesis used in non-experimental research, specifying an influence in the relationship between two variables.

  • What makes an effective research hypothesis?

An effective research hypothesis offers a clearly defined, specific statement, using simple wording that contains no assumptions or generalizations, and that you can test. A well-written hypothesis should predict the tested relationship and its outcome. It contains zero ambiguity and offers results you can observe and test. 

The research hypothesis should address a question relevant to a research area. Overall, your research hypothesis needs the following essentials:

Hypothesis Essential #1: Specificity & Clarity

Hypothesis Essential #2: Testability (Provability)

  • How to develop a good research hypothesis

In developing your hypothesis statements, you must pre-plan some of your statistical analysis. Once you decide on your problem to examine, determine three aspects:

the parameter you'll test

the test's direction (left-tailed, right-tailed, or non-directional)

the hypothesized parameter value

Any quantitative research includes a hypothesized parameter value of a mean, a proportion, or the difference between two proportions. Here's how to note each parameter:

Single mean (μ)

Paired means (μd)

Single proportion (p)

Difference between two independent means (μ1−μ2)

Difference between two proportions (p1−p2)

Simple linear regression slope (β)

Correlation (ρ)

Defining these parameters and determining whether you want to test the mean, proportion, or differences helps you determine the statistical tests you'll conduct to analyze your data. When writing your hypothesis, you only need to decide which parameter to test and in what overarching way.

The null research hypothesis must include everyday language, in a single sentence, stating the problem you want to solve. Write it as an if-then statement with defined variables. Write an alternative research hypothesis that states the opposite.

  • What is the correct format for writing a hypothesis?

The following example shows the proper format and textual content of a hypothesis. It follows commonly accepted academic standards.

Null hypothesis (H0): High school students who participate in varsity sports as opposed to those who do not, fail to score higher on leadership tests than students who do not participate.

Alternative hypothesis (H1): High school students who play a varsity sport as opposed to those who do not participate in team athletics will score higher on leadership tests than students who do not participate in athletics.

The research question tests the correlation between varsity sports participation and leadership qualities expressed as a score on leadership tests. It compares the population of athletes to non-athletes.

  • What are the five steps of a hypothesis?

Once you decide on the specific problem or question you want to address, you can write your research hypothesis. Use this five-step system to hone your null hypothesis and generate your alternative hypothesis.

Step 1 : Create your research question. This topic should interest and excite you; answering it provides relevant information to an industry or academic area.

Step 2 : Conduct a literature review to gather essential existing research.

Step 3 : Write a clear, strong, simply worded sentence that explains your test parameter, test direction, and hypothesized parameter.

Step 4 : Read it a few times. Have others read it and ask them what they think it means. Refine your statement accordingly until it becomes understandable to everyone. While not everyone can or will comprehend every research study conducted, any person from the general population should be able to read your hypothesis and alternative hypothesis and understand the essential question you want to answer.

Step 5 : Re-write your null hypothesis until it reads simply and understandably. Write your alternative hypothesis.

What is the Red Queen hypothesis?

Some hypotheses are well-known, such as the Red Queen hypothesis. Choose your wording carefully, since you could become like the famed scientist Dr. Leigh Van Valen. In 1973, Dr. Van Valen proposed the Red Queen hypothesis to describe coevolutionary activity, specifically reciprocal evolutionary effects between species to explain extinction rates in the fossil record. 

Essentially, Van Valen theorized that to survive, each species remains in a constant state of adaptation, evolution, and proliferation, and constantly competes for survival alongside other species doing the same. Only by doing this can a species avoid extinction. Van Valen took the hypothesis title from the Lewis Carroll book, "Through the Looking Glass," which contains a key character named the Red Queen who explains to Alice that for all of her running, she's merely running in place.

  • Getting started with your research

In conclusion, once you write your null hypothesis (H0) and an alternative hypothesis (Ha), you’ve essentially authored the elevator pitch of your research. These two one-sentence statements describe your topic in simple, understandable terms that both professionals and laymen can understand. They provide the starting point of your research project.

Editor’s picks

Last updated: 11 January 2024

Last updated: 15 January 2024

Last updated: 25 November 2023

Last updated: 12 May 2023

Last updated: 30 April 2024

Last updated: 18 May 2023

Last updated: 10 April 2023

Latest articles

Related topics, .css-je19u9{-webkit-align-items:flex-end;-webkit-box-align:flex-end;-ms-flex-align:flex-end;align-items:flex-end;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;-webkit-box-flex-wrap:wrap;-webkit-flex-wrap:wrap;-ms-flex-wrap:wrap;flex-wrap:wrap;-webkit-box-pack:center;-ms-flex-pack:center;-webkit-justify-content:center;justify-content:center;row-gap:0;text-align:center;max-width:671px;}@media (max-width: 1079px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}}@media (max-width: 799px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}} decide what to .css-1kiodld{max-height:56px;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}@media (max-width: 1079px){.css-1kiodld{display:none;}} build next, decide what to build next.

what is the 3 parts of hypothesis

Users report unexpectedly high data usage, especially during streaming sessions.

what is the 3 parts of hypothesis

Users find it hard to navigate from the home page to relevant playlists in the app.

what is the 3 parts of hypothesis

It would be great to have a sleep timer feature, especially for bedtime listening.

what is the 3 parts of hypothesis

I need better filters to find the songs or artists I’m looking for.

Log in or sign up

Get started for free

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

Hypothesis Testing

  • Last updated
  • Save as PDF
  • Page ID 31289

CO-6: Apply basic concepts of probability, random variation, and commonly used statistical probability distributions.

Learning Objectives

LO 6.26: Outline the logic and process of hypothesis testing.

LO 6.27: Explain what the p-value is and how it is used to draw conclusions.

Video: Hypothesis Testing (8:43)

Introduction

We are in the middle of the part of the course that has to do with inference for one variable.

So far, we talked about point estimation and learned how interval estimation enhances it by quantifying the magnitude of the estimation error (with a certain level of confidence) in the form of the margin of error. The result is the confidence interval — an interval that, with a certain confidence, we believe captures the unknown parameter.

We are now moving to the other kind of inference, hypothesis testing . We say that hypothesis testing is “the other kind” because, unlike the inferential methods we presented so far, where the goal was estimating the unknown parameter, the idea, logic and goal of hypothesis testing are quite different.

In the first two parts of this section we will discuss the idea behind hypothesis testing, explain how it works, and introduce new terminology that emerges in this form of inference. The final two parts will be more specific and will discuss hypothesis testing for the population proportion ( p ) and the population mean ( μ, mu).

If this is your first statistics course, you will need to spend considerable time on this topic as there are many new ideas. Many students find this process and its logic difficult to understand in the beginning.

In this section, we will use the hypothesis test for a population proportion to motivate our understanding of the process. We will conduct these tests manually. For all future hypothesis test procedures, including problems involving means, we will use software to obtain the results and focus on interpreting them in the context of our scenario.

General Idea and Logic of Hypothesis Testing

The purpose of this section is to gradually build your understanding about how statistical hypothesis testing works. We start by explaining the general logic behind the process of hypothesis testing. Once we are confident that you understand this logic, we will add some more details and terminology.

To start our discussion about the idea behind statistical hypothesis testing, consider the following example:

A case of suspected cheating on an exam is brought in front of the disciplinary committee at a certain university.

There are two opposing claims in this case:

  • The student’s claim: I did not cheat on the exam.
  • The instructor’s claim: The student did cheat on the exam.

Adhering to the principle “innocent until proven guilty,” the committee asks the instructor for evidence to support his claim. The instructor explains that the exam had two versions, and shows the committee members that on three separate exam questions, the student used in his solution numbers that were given in the other version of the exam.

The committee members all agree that it would be extremely unlikely to get evidence like that if the student’s claim of not cheating had been true. In other words, the committee members all agree that the instructor brought forward strong enough evidence to reject the student’s claim, and conclude that the student did cheat on the exam.

What does this example have to do with statistics?

While it is true that this story seems unrelated to statistics, it captures all the elements of hypothesis testing and the logic behind it. Before you read on to understand why, it would be useful to read the example again. Please do so now.

Statistical hypothesis testing is defined as:

  • Assessing evidence provided by the data against the null claim (the claim which is to be assumed true unless enough evidence exists to reject it).

Here is how the process of statistical hypothesis testing works:

  • We have two claims about what is going on in the population. Let’s call them claim 1 (this will be the null claim or hypothesis) and claim 2 (this will be the alternative) . Much like the story above, where the student’s claim is challenged by the instructor’s claim, the null claim 1 is challenged by the alternative claim 2. (For us, these claims are usually about the value of population parameter(s) or about the existence or nonexistence of a relationship between two variables in the population).
  • We choose a sample, collect relevant data and summarize them (this is similar to the instructor collecting evidence from the student’s exam). For statistical tests, this step will also involve checking any conditions or assumptions.
  • We figure out how likely it is to observe data like the data we obtained, if claim 1 is true. (Note that the wording “how likely …” implies that this step requires some kind of probability calculation). In the story, the committee members assessed how likely it is to observe evidence such as the instructor provided, had the student’s claim of not cheating been true.
  • If, after assuming claim 1 is true, we find that it would be extremely unlikely to observe data as strong as ours or stronger in favor of claim 2, then we have strong evidence against claim 1, and we reject it in favor of claim 2. Later we will see this corresponds to a small p-value.
  • If, after assuming claim 1 is true, we find that observing data as strong as ours or stronger in favor of claim 2 is NOT VERY UNLIKELY , then we do not have enough evidence against claim 1, and therefore we cannot reject it in favor of claim 2. Later we will see this corresponds to a p-value which is not small.

In our story, the committee decided that it would be extremely unlikely to find the evidence that the instructor provided had the student’s claim of not cheating been true. In other words, the members felt that it is extremely unlikely that it is just a coincidence (random chance) that the student used the numbers from the other version of the exam on three separate problems. The committee members therefore decided to reject the student’s claim and concluded that the student had, indeed, cheated on the exam. (Wouldn’t you conclude the same?)

Hopefully this example helped you understand the logic behind hypothesis testing.

Interactive Applet: Reasoning of a Statistical Test

To strengthen your understanding of the process of hypothesis testing and the logic behind it, let’s look at three statistical examples.

A recent study estimated that 20% of all college students in the United States smoke. The head of Health Services at Goodheart University (GU) suspects that the proportion of smokers may be lower at GU. In hopes of confirming her claim, the head of Health Services chooses a random sample of 400 Goodheart students, and finds that 70 of them are smokers.

Let’s analyze this example using the 4 steps outlined above:

  • claim 1: The proportion of smokers at Goodheart is 0.20.
  • claim 2: The proportion of smokers at Goodheart is less than 0.20.

Claim 1 basically says “nothing special goes on at Goodheart University; the proportion of smokers there is no different from the proportion in the entire country.” This claim is challenged by the head of Health Services, who suspects that the proportion of smokers at Goodheart is lower.

  • Choosing a sample and collecting data: A sample of n = 400 was chosen, and summarizing the data revealed that the sample proportion of smokers is p -hat = 70/400 = 0.175.While it is true that 0.175 is less than 0.20, it is not clear whether this is strong enough evidence against claim 1. We must account for sampling variation.
  • Assessment of evidence: In order to assess whether the data provide strong enough evidence against claim 1, we need to ask ourselves: How surprising is it to get a sample proportion as low as p -hat = 0.175 (or lower), assuming claim 1 is true? In other words, we need to find how likely it is that in a random sample of size n = 400 taken from a population where the proportion of smokers is p = 0.20 we’ll get a sample proportion as low as p -hat = 0.175 (or lower).It turns out that the probability that we’ll get a sample proportion as low as p -hat = 0.175 (or lower) in such a sample is roughly 0.106 (do not worry about how this was calculated at this point – however, if you think about it hopefully you can see that the key is the sampling distribution of p -hat).
  • Conclusion: Well, we found that if claim 1 were true there is a probability of 0.106 of observing data like that observed or more extreme. Now you have to decide …Do you think that a probability of 0.106 makes our data rare enough (surprising enough) under claim 1 so that the fact that we did observe it is enough evidence to reject claim 1? Or do you feel that a probability of 0.106 means that data like we observed are not very likely when claim 1 is true, but they are not unlikely enough to conclude that getting such data is sufficient evidence to reject claim 1. Basically, this is your decision. However, it would be nice to have some kind of guideline about what is generally considered surprising enough.

A certain prescription allergy medicine is supposed to contain an average of 245 parts per million (ppm) of a certain chemical. If the concentration is higher than 245 ppm, the drug will likely cause unpleasant side effects, and if the concentration is below 245 ppm, the drug may be ineffective. The manufacturer wants to check whether the mean concentration in a large shipment is the required 245 ppm or not. To this end, a random sample of 64 portions from the large shipment is tested, and it is found that the sample mean concentration is 250 ppm with a sample standard deviation of 12 ppm.

  • Claim 1: The mean concentration in the shipment is the required 245 ppm.
  • Claim 2: The mean concentration in the shipment is not the required 245 ppm.

Note that again, claim 1 basically says: “There is nothing unusual about this shipment, the mean concentration is the required 245 ppm.” This claim is challenged by the manufacturer, who wants to check whether that is, indeed, the case or not.

  • Choosing a sample and collecting data: A sample of n = 64 portions is chosen and after summarizing the data it is found that the sample mean concentration is x-bar = 250 and the sample standard deviation is s = 12.Is the fact that x-bar = 250 is different from 245 strong enough evidence to reject claim 1 and conclude that the mean concentration in the whole shipment is not the required 245? In other words, do the data provide strong enough evidence to reject claim 1?
  • Assessing the evidence: In order to assess whether the data provide strong enough evidence against claim 1, we need to ask ourselves the following question: If the mean concentration in the whole shipment were really the required 245 ppm (i.e., if claim 1 were true), how surprising would it be to observe a sample of 64 portions where the sample mean concentration is off by 5 ppm or more (as we did)? It turns out that it would be extremely unlikely to get such a result if the mean concentration were really the required 245. There is only a probability of 0.0007 (i.e., 7 in 10,000) of that happening. (Do not worry about how this was calculated at this point, but again, the key will be the sampling distribution.)
  • Making conclusions: Here, it is pretty clear that a sample like the one we observed or more extreme is VERY rare (or extremely unlikely) if the mean concentration in the shipment were really the required 245 ppm. The fact that we did observe such a sample therefore provides strong evidence against claim 1, so we reject it and conclude with very little doubt that the mean concentration in the shipment is not the required 245 ppm.

Do you think that you’re getting it? Let’s make sure, and look at another example.

Is there a relationship between gender and combined scores (Math + Verbal) on the SAT exam?

Following a report on the College Board website, which showed that in 2003, males scored generally higher than females on the SAT exam, an educational researcher wanted to check whether this was also the case in her school district. The researcher chose random samples of 150 males and 150 females from her school district, collected data on their SAT performance and found the following:

Again, let’s see how the process of hypothesis testing works for this example:

  • Claim 1: Performance on the SAT is not related to gender (males and females score the same).
  • Claim 2: Performance on the SAT is related to gender – males score higher.

Note that again, claim 1 basically says: “There is nothing going on between the variables SAT and gender.” Claim 2 represents what the researcher wants to check, or suspects might actually be the case.

  • Choosing a sample and collecting data: Data were collected and summarized as given above. Is the fact that the sample mean score of males (1,025) is higher than the sample mean score of females (1,010) by 15 points strong enough information to reject claim 1 and conclude that in this researcher’s school district, males score higher on the SAT than females?
  • Assessment of evidence: In order to assess whether the data provide strong enough evidence against claim 1, we need to ask ourselves: If SAT scores are in fact not related to gender (claim 1 is true), how likely is it to get data like the data we observed, in which the difference between the males’ average and females’ average score is as high as 15 points or higher? It turns out that the probability of observing such a sample result if SAT score is not related to gender is approximately 0.29 (Again, do not worry about how this was calculated at this point).
  • Conclusion: Here, we have an example where observing a sample like the one we observed or more extreme is definitely not surprising (roughly 30% chance) if claim 1 were true (i.e., if indeed there is no difference in SAT scores between males and females). We therefore conclude that our data does not provide enough evidence for rejecting claim 1.
  • “The data provide enough evidence to reject claim 1 and accept claim 2”; or
  • “The data do not provide enough evidence to reject claim 1.”

In particular, note that in the second type of conclusion we did not say: “ I accept claim 1 ,” but only “ I don’t have enough evidence to reject claim 1 .” We will come back to this issue later, but this is a good place to make you aware of this subtle difference.

Hopefully by now, you understand the logic behind the statistical hypothesis testing process. Here is a summary:

A flow chart describing the process. First, we state Claim 1 and Claim 2. Claim 1 says "nothing special is going on" and is challenged by claim 2. Second, we collect relevant data and summarize it. Third, we assess how surprising it woudl be to observe data like that observed if Claim 1 is true. Fourth, we draw conclusions in context.

Learn by Doing: Logic of Hypothesis Testing

Did I Get This?: Logic of Hypothesis Testing

Steps in Hypothesis Testing

Video: Steps in Hypothesis Testing (16:02)

Now that we understand the general idea of how statistical hypothesis testing works, let’s go back to each of the steps and delve slightly deeper, getting more details and learning some terminology.

Hypothesis Testing Step 1: State the Hypotheses

In all three examples, our aim is to decide between two opposing points of view, Claim 1 and Claim 2. In hypothesis testing, Claim 1 is called the null hypothesis (denoted “ Ho “), and Claim 2 plays the role of the alternative hypothesis (denoted “ Ha “). As we saw in the three examples, the null hypothesis suggests nothing special is going on; in other words, there is no change from the status quo, no difference from the traditional state of affairs, no relationship. In contrast, the alternative hypothesis disagrees with this, stating that something is going on, or there is a change from the status quo, or there is a difference from the traditional state of affairs. The alternative hypothesis, Ha, usually represents what we want to check or what we suspect is really going on.

Let’s go back to our three examples and apply the new notation:

In example 1:

  • Ho: The proportion of smokers at GU is 0.20.
  • Ha: The proportion of smokers at GU is less than 0.20.

In example 2:

  • Ho: The mean concentration in the shipment is the required 245 ppm.
  • Ha: The mean concentration in the shipment is not the required 245 ppm.

In example 3:

  • Ho: Performance on the SAT is not related to gender (males and females score the same).
  • Ha: Performance on the SAT is related to gender – males score higher.

Learn by Doing: State the Hypotheses

Did I Get This?: State the Hypotheses

Hypothesis Testing Step 2: Collect Data, Check Conditions and Summarize Data

This step is pretty obvious. This is what inference is all about. You look at sampled data in order to draw conclusions about the entire population. In the case of hypothesis testing, based on the data, you draw conclusions about whether or not there is enough evidence to reject Ho.

There is, however, one detail that we would like to add here. In this step we collect data and summarize it. Go back and look at the second step in our three examples. Note that in order to summarize the data we used simple sample statistics such as the sample proportion ( p -hat), sample mean (x-bar) and the sample standard deviation (s).

In practice, you go a step further and use these sample statistics to summarize the data with what’s called a test statistic . We are not going to go into any details right now, but we will discuss test statistics when we go through the specific tests.

This step will also involve checking any conditions or assumptions required to use the test.

Hypothesis Testing Step 3: Assess the Evidence

As we saw, this is the step where we calculate how likely is it to get data like that observed (or more extreme) when Ho is true. In a sense, this is the heart of the process, since we draw our conclusions based on this probability.

  • If this probability is very small (see example 2), then that means that it would be very surprising to get data like that observed (or more extreme) if Ho were true. The fact that we did observe such data is therefore evidence against Ho, and we should reject it.
  • On the other hand, if this probability is not very small (see example 3) this means that observing data like that observed (or more extreme) is not very surprising if Ho were true. The fact that we observed such data does not provide evidence against Ho. This crucial probability, therefore, has a special name. It is called the p-value of the test.

In our three examples, the p-values were given to you (and you were reassured that you didn’t need to worry about how these were derived yet):

  • Example 1: p-value = 0.106
  • Example 2: p-value = 0.0007
  • Example 3: p-value = 0.29

Obviously, the smaller the p-value, the more surprising it is to get data like ours (or more extreme) when Ho is true, and therefore, the stronger the evidence the data provide against Ho.

Looking at the three p-values of our three examples, we see that the data that we observed in example 2 provide the strongest evidence against the null hypothesis, followed by example 1, while the data in example 3 provides the least evidence against Ho.

  • Right now we will not go into specific details about p-value calculations, but just mention that since the p-value is the probability of getting data like those observed (or more extreme) when Ho is true, it would make sense that the calculation of the p-value will be based on the data summary, which, as we mentioned, is the test statistic. Indeed, this is the case. In practice, we will mostly use software to provide the p-value for us.

Hypothesis Testing Step 4: Making Conclusions

Since our statistical conclusion is based on how small the p-value is, or in other words, how surprising our data are when Ho is true, it would be nice to have some kind of guideline or cutoff that will help determine how small the p-value must be, or how “rare” (unlikely) our data must be when Ho is true, for us to conclude that we have enough evidence to reject Ho.

This cutoff exists, and because it is so important, it has a special name. It is called the significance level of the test and is usually denoted by the Greek letter α (alpha). The most commonly used significance level is α (alpha) = 0.05 (or 5%). This means that:

  • if the p-value < α (alpha) (usually 0.05), then the data we obtained is considered to be “rare (or surprising) enough” under the assumption that Ho is true, and we say that the data provide statistically significant evidence against Ho, so we reject Ho and thus accept Ha.
  • if the p-value > α (alpha)(usually 0.05), then our data are not considered to be “surprising enough” under the assumption that Ho is true, and we say that our data do not provide enough evidence to reject Ho (or, equivalently, that the data do not provide enough evidence to accept Ha).

Now that we have a cutoff to use, here are the appropriate conclusions for each of our examples based upon the p-values we were given.

In Example 1:

  • Using our cutoff of 0.05, we fail to reject Ho.
  • Conclusion : There IS NOT enough evidence that the proportion of smokers at GU is less than 0.20
  • Still we should consider: Does the evidence seen in the data provide any practical evidence towards our alternative hypothesis?

In Example 2:

  • Using our cutoff of 0.05, we reject Ho.
  • Conclusion : There IS enough evidence that the mean concentration in the shipment is not the required 245 ppm.

In Example 3:

  • Conclusion : There IS NOT enough evidence that males score higher on average than females on the SAT.

Notice that all of the above conclusions are written in terms of the alternative hypothesis and are given in the context of the situation. In no situation have we claimed the null hypothesis is true. Be very careful of this and other issues discussed in the following comments.

  • Although the significance level provides a good guideline for drawing our conclusions, it should not be treated as an incontrovertible truth. There is a lot of room for personal interpretation. What if your p-value is 0.052? You might want to stick to the rules and say “0.052 > 0.05 and therefore I don’t have enough evidence to reject Ho”, but you might decide that 0.052 is small enough for you to believe that Ho should be rejected. It should be noted that scientific journals do consider 0.05 to be the cutoff point for which any p-value below the cutoff indicates enough evidence against Ho, and any p-value above it, or even equal to it , indicates there is not enough evidence against Ho. Although a p-value between 0.05 and 0.10 is often reported as marginally statistically significant.
  • It is important to draw your conclusions in context . It is never enough to say: “p-value = …, and therefore I have enough evidence to reject Ho at the 0.05 significance level.” You should always word your conclusion in terms of the data. Although we will use the terminology of “rejecting Ho” or “failing to reject Ho” – this is mostly due to the fact that we are instructing you in these concepts. In practice, this language is rarely used. We also suggest writing your conclusion in terms of the alternative hypothesis.Is there or is there not enough evidence that the alternative hypothesis is true?
  • Let’s go back to the issue of the nature of the two types of conclusions that I can make.
  • Either I reject Ho (when the p-value is smaller than the significance level)
  • or I cannot reject Ho (when the p-value is larger than the significance level).

As we mentioned earlier, note that the second conclusion does not imply that I accept Ho, but just that I don’t have enough evidence to reject it. Saying (by mistake) “I don’t have enough evidence to reject Ho so I accept it” indicates that the data provide evidence that Ho is true, which is not necessarily the case . Consider the following slightly artificial yet effective example:

An employer claims to subscribe to an “equal opportunity” policy, not hiring men any more often than women for managerial positions. Is this credible? You’re not sure, so you want to test the following two hypotheses:

  • Ho: The proportion of male managers hired is 0.5
  • Ha: The proportion of male managers hired is more than 0.5

Data: You choose at random three of the new managers who were hired in the last 5 years and find that all 3 are men.

Assessing Evidence: If the proportion of male managers hired is really 0.5 (Ho is true), then the probability that the random selection of three managers will yield three males is therefore 0.5 * 0.5 * 0.5 = 0.125. This is the p-value (using the multiplication rule for independent events).

Conclusion: Using 0.05 as the significance level, you conclude that since the p-value = 0.125 > 0.05, the fact that the three randomly selected managers were all males is not enough evidence to reject the employer’s claim of subscribing to an equal opportunity policy (Ho).

However, the data (all three selected are males) definitely does NOT provide evidence to accept the employer’s claim (Ho).

Learn By Doing: Using p-values

Did I Get This?: Using p-values

Comment about wording: Another common wording in scientific journals is:

  • “The results are statistically significant” – when the p-value < α (alpha).
  • “The results are not statistically significant” – when the p-value > α (alpha).

Often you will see significance levels reported with additional description to indicate the degree of statistical significance. A general guideline (although not required in our course) is:

  • If 0.01 ≤ p-value < 0.05, then the results are (statistically) significant .
  • If 0.001 ≤ p-value < 0.01, then the results are highly statistically significant .
  • If p-value < 0.001, then the results are very highly statistically significant .
  • If p-value > 0.05, then the results are not statistically significant (NS).
  • If 0.05 ≤ p-value < 0.10, then the results are marginally statistically significant .

Let’s summarize

We learned quite a lot about hypothesis testing. We learned the logic behind it, what the key elements are, and what types of conclusions we can and cannot draw in hypothesis testing. Here is a quick recap:

Video: Hypothesis Testing Overview (2:20)

Here are a few more activities if you need some additional practice.

Did I Get This?: Hypothesis Testing Overview

  • Notice that the p-value is an example of a conditional probability . We calculate the probability of obtaining results like those of our data (or more extreme) GIVEN the null hypothesis is true. We could write P(Obtaining results like ours or more extreme | Ho is True).
  • We could write P(Obtaining a test statistic as or more extreme than ours | Ho is True).
  • In this case we are asking “Assuming the null hypothesis is true, how rare is it to observe something as or more extreme than what I have found in my data?”
  • If after assuming the null hypothesis is true, what we have found in our data is extremely rare (small p-value), this provides evidence to reject our assumption that Ho is true in favor of Ha.
  • The p-value can also be thought of as the probability, assuming the null hypothesis is true, that the result we have seen is solely due to random error (or random chance). We have already seen that statistics from samples collected from a population vary. There is random error or random chance involved when we sample from populations.

In this setting, if the p-value is very small, this implies, assuming the null hypothesis is true, that it is extremely unlikely that the results we have obtained would have happened due to random error alone, and thus our assumption (Ho) is rejected in favor of the alternative hypothesis (Ha).

  • It is EXTREMELY important that you find a definition of the p-value which makes sense to you. New students often need to contemplate this idea repeatedly through a variety of examples and explanations before becoming comfortable with this idea. It is one of the two most important concepts in statistics (the other being confidence intervals).
  • We infer that the alternative hypothesis is true ONLY by rejecting the null hypothesis.
  • A statistically significant result is one that has a very low probability of occurring if the null hypothesis is true.
  • Results which are statistically significant may or may not have practical significance and vice versa.

Error and Power

LO 6.28: Define a Type I and Type II error in general and in the context of specific scenarios.

LO 6.29: Explain the concept of the power of a statistical test including the relationship between power, sample size, and effect size.

Video: Errors and Power (12:03)

Type I and Type II Errors in Hypothesis Tests

We have not yet discussed the fact that we are not guaranteed to make the correct decision by this process of hypothesis testing. Maybe you are beginning to see that there is always some level of uncertainty in statistics.

Let’s think about what we know already and define the possible errors we can make in hypothesis testing. When we conduct a hypothesis test, we choose one of two possible conclusions based upon our data.

If the p-value is smaller than your pre-specified significance level (α, alpha), you reject the null hypothesis and either

  • You have made the correct decision since the null hypothesis is false
  • You have made an error ( Type I ) and rejected Ho when in fact Ho is true (your data happened to be a RARE EVENT under Ho)

If the p-value is greater than (or equal to) your chosen significance level (α, alpha), you fail to reject the null hypothesis and either

  • You have made the correct decision since the null hypothesis is true
  • You have made an error ( Type II ) and failed to reject Ho when in fact Ho is false (the alternative hypothesis, Ha, is true)

The following summarizes the four possible results which can be obtained from a hypothesis test. Notice the rows represent the decision made in the hypothesis test and the columns represent the (usually unknown) truth in reality.

mod12-errors1

Although the truth is unknown in practice – or we would not be conducting the test – we know it must be the case that either the null hypothesis is true or the null hypothesis is false. It is also the case that either decision we make in a hypothesis test can result in an incorrect conclusion!

A TYPE I Error occurs when we Reject Ho when, in fact, Ho is True. In this case, we mistakenly reject a true null hypothesis.

  • P(TYPE I Error) = P(Reject Ho | Ho is True) = α = alpha = Significance Level

A TYPE II Error occurs when we fail to Reject Ho when, in fact, Ho is False. In this case we fail to reject a false null hypothesis.

P(TYPE II Error) = P(Fail to Reject Ho | Ho is False) = β = beta

When our significance level is 5%, we are saying that we will allow ourselves to make a Type I error less than 5% of the time. In the long run, if we repeat the process, 5% of the time we will find a p-value < 0.05 when in fact the null hypothesis was true.

In this case, our data represent a rare occurrence which is unlikely to happen but is still possible. For example, suppose we toss a coin 10 times and obtain 10 heads, this is unlikely for a fair coin but not impossible. We might conclude the coin is unfair when in fact we simply saw a very rare event for this fair coin.

Our testing procedure CONTROLS for the Type I error when we set a pre-determined value for the significance level.

Notice that these probabilities are conditional probabilities. This is one more reason why conditional probability is an important concept in statistics.

Unfortunately, calculating the probability of a Type II error requires us to know the truth about the population. In practice we can only calculate this probability using a series of “what if” calculations which depend upon the type of problem.

Comment: As you initially read through the examples below, focus on the broad concepts instead of the small details. It is not important to understand how to calculate these values yourself at this point.

  • Try to understand the pictures we present. Which pictures represent an assumed null hypothesis and which represent an alternative?
  • It may be useful to come back to this page (and the activities here) after you have reviewed the rest of the section on hypothesis testing and have worked a few problems yourself.

Interactive Applet: Statistical Significance

Here are two examples of using an older version of this applet. It looks slightly different but the same settings and options are available in the version above.

In both cases we will consider IQ scores.

Our null hypothesis is that the true mean is 100. Assume the standard deviation is 16 and we will specify a significance level of 5%.

In this example we will specify that the true mean is indeed 100 so that the null hypothesis is true. Most of the time (95%), when we generate a sample, we should fail to reject the null hypothesis since the null hypothesis is indeed true.

Here is one sample that results in a correct decision:

mod12-significance_ex1a

In the sample above, we obtain an x-bar of 105, which is drawn on the distribution which assumes μ (mu) = 100 (the null hypothesis is true). Notice the sample is shown as blue dots along the x-axis and the shaded region shows for which values of x-bar we would reject the null hypothesis. In other words, we would reject Ho whenever the x-bar falls in the shaded region.

Enter the same values and generate samples until you obtain a Type I error (you falsely reject the null hypothesis). You should see something like this:

mod12-significance_ex2

If you were to generate 100 samples, you should have around 5% where you rejected Ho. These would be samples which would result in a Type I error.

The previous example illustrates a correct decision and a Type I error when the null hypothesis is true. The next example illustrates a correct decision and Type II error when the null hypothesis is false. In this case, we must specify the true population mean.

Let’s suppose we are sampling from an honors program and that the true mean IQ for this population is 110. We do not know the probability of a Type II error without more detailed calculations.

Let’s start with a sample which results in a correct decision.

mod12-significance_ex3

In the sample above, we obtain an x-bar of 111, which is drawn on the distribution which assumes μ (mu) = 100 (the null hypothesis is true).

Enter the same values and generate samples until you obtain a Type II error (you fail to reject the null hypothesis). You should see something like this:

mod12-significance_ex4

You should notice that in this case (when Ho is false), it is easier to obtain an incorrect decision (a Type II error) than it was in the case where Ho is true. If you generate 100 samples, you can approximate the probability of a Type II error.

We can find the probability of a Type II error by visualizing both the assumed distribution and the true distribution together. The image below is adapted from an applet we will use when we discuss the power of a statistical test.

mod12-significance_ex5a

There is a 37.4% chance that, in the long run, we will make a Type II error and fail to reject the null hypothesis when in fact the true mean IQ is 110 in the population from which we sample our 10 individuals.

Can you visualize what will happen if the true population mean is really 115 or 108? When will the Type II error increase? When will it decrease? We will look at this idea again when we discuss the concept of power in hypothesis tests.

  • It is important to note that there is a trade-off between the probability of a Type I and a Type II error. If we decrease the probability of one of these errors, the probability of the other will increase! The practical result of this is that if we require stronger evidence to reject the null hypothesis (smaller significance level = probability of a Type I error), we will increase the chance that we will be unable to reject the null hypothesis when in fact Ho is false (increases the probability of a Type II error).
  • When α (alpha) = 0.05 we obtained a Type II error probability of 0.374 = β = beta

mod12-significance_ex4

  • When α (alpha) = 0.01 (smaller than before) we obtain a Type II error probability of 0.644 = β = beta (larger than before)

mod12-significance_ex6a

  • As the blue line in the picture moves farther right, the significance level (α, alpha) is decreasing and the Type II error probability is increasing.
  • As the blue line in the picture moves farther left, the significance level (α, alpha) is increasing and the Type II error probability is decreasing

Let’s return to our very first example and define these two errors in context.

  • Ho = The student’s claim: I did not cheat on the exam.
  • Ha = The instructor’s claim: The student did cheat on the exam.

Adhering to the principle “innocent until proven guilty,” the committee asks the instructor for evidence to support his claim.

There are four possible outcomes of this process. There are two possible correct decisions:

  • The student did cheat on the exam and the instructor brings enough evidence to reject Ho and conclude the student did cheat on the exam. This is a CORRECT decision!
  • The student did not cheat on the exam and the instructor fails to provide enough evidence that the student did cheat on the exam. This is a CORRECT decision!

Both the correct decisions and the possible errors are fairly easy to understand but with the errors, you must be careful to identify and define the two types correctly.

TYPE I Error: Reject Ho when Ho is True

  • The student did not cheat on the exam but the instructor brings enough evidence to reject Ho and conclude the student cheated on the exam. This is a Type I Error.

TYPE II Error: Fail to Reject Ho when Ho is False

  • The student did cheat on the exam but the instructor fails to provide enough evidence that the student cheated on the exam. This is a Type II Error.

In most situations, including this one, it is more “acceptable” to have a Type II error than a Type I error. Although allowing a student who cheats to go unpunished might be considered a very bad problem, punishing a student for something he or she did not do is usually considered to be a more severe error. This is one reason we control for our Type I error in the process of hypothesis testing.

Did I Get This?: Type I and Type II Errors (in context)

  • The probabilities of Type I and Type II errors are closely related to the concepts of sensitivity and specificity that we discussed previously. Consider the following hypotheses:

Ho: The individual does not have diabetes (status quo, nothing special happening)

Ha: The individual does have diabetes (something is going on here)

In this setting:

When someone tests positive for diabetes we would reject the null hypothesis and conclude the person has diabetes (we may or may not be correct!).

When someone tests negative for diabetes we would fail to reject the null hypothesis so that we fail to conclude the person has diabetes (we may or may not be correct!)

Let’s take it one step further:

Sensitivity = P(Test + | Have Disease) which in this setting equals P(Reject Ho | Ho is False) = 1 – P(Fail to Reject Ho | Ho is False) = 1 – β = 1 – beta

Specificity = P(Test – | No Disease) which in this setting equals P(Fail to Reject Ho | Ho is True) = 1 – P(Reject Ho | Ho is True) = 1 – α = 1 – alpha

Notice that sensitivity and specificity relate to the probability of making a correct decision whereas α (alpha) and β (beta) relate to the probability of making an incorrect decision.

Usually α (alpha) = 0.05 so that the specificity listed above is 0.95 or 95%.

Next, we will see that the sensitivity listed above is the power of the hypothesis test!

Reasons for a Type I Error in Practice

Assuming that you have obtained a quality sample:

  • The reason for a Type I error is random chance.
  • When a Type I error occurs, our observed data represented a rare event which indicated evidence in favor of the alternative hypothesis even though the null hypothesis was actually true.

Reasons for a Type II Error in Practice

Again, assuming that you have obtained a quality sample, now we have a few possibilities depending upon the true difference that exists.

  • The sample size is too small to detect an important difference. This is the worst case, you should have obtained a larger sample. In this situation, you may notice that the effect seen in the sample seems PRACTICALLY significant and yet the p-value is not small enough to reject the null hypothesis.
  • The sample size is reasonable for the important difference but the true difference (which might be somewhat meaningful or interesting) is smaller than your test was capable of detecting. This is tolerable as you were not interested in being able to detect this difference when you began your study. In this situation, you may notice that the effect seen in the sample seems to have some potential for practical significance.
  • The sample size is more than adequate, the difference that was not detected is meaningless in practice. This is not a problem at all and is in effect a “correct decision” since the difference you did not detect would have no practical meaning.
  • Note: We will discuss the idea of practical significance later in more detail.

Power of a Hypothesis Test

It is often the case that we truly wish to prove the alternative hypothesis. It is reasonable that we would be interested in the probability of correctly rejecting the null hypothesis. In other words, the probability of rejecting the null hypothesis, when in fact the null hypothesis is false. This can also be thought of as the probability of being able to detect a (pre-specified) difference of interest to the researcher.

Let’s begin with a realistic example of how power can be described in a study.

In a clinical trial to study two medications for weight loss, we have an 80% chance to detect a difference in the weight loss between the two medications of 10 pounds. In other words, the power of the hypothesis test we will conduct is 80%.

In other words, if one medication comes from a population with an average weight loss of 25 pounds and the other comes from a population with an average weight loss of 15 pounds, we will have an 80% chance to detect that difference using the sample we have in our trial.

If we were to repeat this trial many times, 80% of the time we will be able to reject the null hypothesis (that there is no difference between the medications) and 20% of the time we will fail to reject the null hypothesis (and make a Type II error!).

The difference of 10 pounds in the previous example, is often called the effect size . The measure of the effect differs depending on the particular test you are conducting but is always some measure related to the true effect in the population. In this example, it is the difference between two population means.

Recall the definition of a Type II error:

Notice that P(Reject Ho | Ho is False) = 1 – P(Fail to Reject Ho | Ho is False) = 1 – β = 1- beta.

The POWER of a hypothesis test is the probability of rejecting the null hypothesis when the null hypothesis is false . This can also be stated as the probability of correctly rejecting the null hypothesis .

POWER = P(Reject Ho | Ho is False) = 1 – β = 1 – beta

Power is the test’s ability to correctly reject the null hypothesis. A test with high power has a good chance of being able to detect the difference of interest to us, if it exists .

As we mentioned on the bottom of the previous page, this can be thought of as the sensitivity of the hypothesis test if you imagine Ho = No disease and Ha = Disease.

Factors Affecting the Power of a Hypothesis Test

The power of a hypothesis test is affected by numerous quantities (similar to the margin of error in a confidence interval).

Assume that the null hypothesis is false for a given hypothesis test. All else being equal, we have the following:

  • Larger samples result in a greater chance to reject the null hypothesis which means an increase in the power of the hypothesis test.
  • If the effect size is larger, it will become easier for us to detect. This results in a greater chance to reject the null hypothesis which means an increase in the power of the hypothesis test. The effect size varies for each test and is usually closely related to the difference between the hypothesized value and the true value of the parameter under study.
  • From the relationship between the probability of a Type I and a Type II error (as α (alpha) decreases, β (beta) increases), we can see that as α (alpha) decreases, Power = 1 – β = 1 – beta also decreases.
  • There are other mathematical ways to change the power of a hypothesis test, such as changing the population standard deviation; however, these are not quantities that we can usually control so we will not discuss them here.

In practice, we specify a significance level and a desired power to detect a difference which will have practical meaning to us and this determines the sample size required for the experiment or study.

For most grants involving statistical analysis, power calculations must be completed to illustrate that the study will have a reasonable chance to detect an important effect. Otherwise, the money spent on the study could be wasted. The goal is usually to have a power close to 80%.

For example, if there is only a 5% chance to detect an important difference between two treatments in a clinical trial, this would result in a waste of time, effort, and money on the study since, when the alternative hypothesis is true, the chance a treatment effect can be found is very small.

  • In order to calculate the power of a hypothesis test, we must specify the “truth.” As we mentioned previously when discussing Type II errors, in practice we can only calculate this probability using a series of “what if” calculations which depend upon the type of problem.

The following activity involves working with an interactive applet to study power more carefully.

Learn by Doing: Power of Hypothesis Tests

The following reading is an excellent discussion about Type I and Type II errors.

(Optional) Outside Reading: A Good Discussion of Power (≈ 2500 words)

We will not be asking you to perform power calculations manually. You may be asked to use online calculators and applets. Most statistical software packages offer some ability to complete power calculations. There are also many online calculators for power and sample size on the internet, for example, Russ Lenth’s power and sample-size page .

Proportions (Introduction & Step 1)

CO-4: Distinguish among different measurement scales, choose the appropriate descriptive and inferential statistical methods based on these distinctions, and interpret the results.

LO 4.33: In a given context, distinguish between situations involving a population proportion and a population mean and specify the correct null and alternative hypothesis for the scenario.

LO 4.34: Carry out a complete hypothesis test for a population proportion by hand.

Video: Proportions (Introduction & Step 1) (7:18)

Now that we understand the process of hypothesis testing and the logic behind it, we are ready to start learning about specific statistical tests (also known as significance tests).

The first test we are going to learn is the test about the population proportion (p).

This test is widely known as the “z-test for the population proportion (p).”

We will understand later where the “z-test” part is coming from.

This will be the only type of problem you will complete entirely “by-hand” in this course. Our goal is to use this example to give you the tools you need to understand how this process works. After working a few problems, you should review the earlier material again. You will likely need to review the terminology and concepts a few times before you fully understand the process.

In reality, you will often be conducting more complex statistical tests and allowing software to provide the p-value. In these settings it will be important to know what test to apply for a given situation and to be able to explain the results in context.

Review: Types of Variables

When we conduct a test about a population proportion, we are working with a categorical variable. Later in the course, after we have learned a variety of hypothesis tests, we will need to be able to identify which test is appropriate for which situation. Identifying the variable as categorical or quantitative is an important component of choosing an appropriate hypothesis test.

Learn by Doing: Review Types of Variables

One Sample Z-Test for a Population Proportion

In this part of our discussion on hypothesis testing, we will go into details that we did not go into before. More specifically, we will use this test to introduce the idea of a test statistic , and details about how p-values are calculated .

Let’s start by introducing the three examples, which will be the leading examples in our discussion. Each example is followed by a figure illustrating the information provided, as well as the question of interest.

A machine is known to produce 20% defective products, and is therefore sent for repair. After the machine is repaired, 400 products produced by the machine are chosen at random and 64 of them are found to be defective. Do the data provide enough evidence that the proportion of defective products produced by the machine (p) has been reduced as a result of the repair?

The following figure displays the information, as well as the question of interest:

The question of interest helps us formulate the null and alternative hypotheses in terms of p, the proportion of defective products produced by the machine following the repair:

  • Ho: p = 0.20 (No change; the repair did not help).
  • Ha: p < 0.20 (The repair was effective at reducing the proportion of defective parts).

There are rumors that students at a certain liberal arts college are more inclined to use drugs than U.S. college students in general. Suppose that in a simple random sample of 100 students from the college, 19 admitted to marijuana use. Do the data provide enough evidence to conclude that the proportion of marijuana users among the students in the college (p) is higher than the national proportion, which is 0.157? (This number is reported by the Harvard School of Public Health.)

Again, the following figure displays the information as well as the question of interest:

As before, we can formulate the null and alternative hypotheses in terms of p, the proportion of students in the college who use marijuana:

  • Ho: p = 0.157 (same as among all college students in the country).
  • Ha: p > 0.157 (higher than the national figure).

Polls on certain topics are conducted routinely in order to monitor changes in the public’s opinions over time. One such topic is the death penalty. In 2003 a poll estimated that 64% of U.S. adults support the death penalty for a person convicted of murder. In a more recent poll, 675 out of 1,000 U.S. adults chosen at random were in favor of the death penalty for convicted murderers. Do the results of this poll provide evidence that the proportion of U.S. adults who support the death penalty for convicted murderers (p) changed between 2003 and the later poll?

Here is a figure that displays the information, as well as the question of interest:

Again, we can formulate the null and alternative hypotheses in term of p, the proportion of U.S. adults who support the death penalty for convicted murderers.

  • Ho: p = 0.64 (No change from 2003).
  • Ha: p ≠ 0.64 (Some change since 2003).

Learn by Doing: Proportions (Overview)

Did I Get This?: Proportions ( Overview )

Recall that there are basically 4 steps in the process of hypothesis testing:

  • STEP 1: State the appropriate null and alternative hypotheses, Ho and Ha.
  • STEP 2: Obtain a random sample, collect relevant data, and check whether the data meet the conditions under which the test can be used . If the conditions are met, summarize the data using a test statistic.
  • STEP 3: Find the p-value of the test.
  • STEP 4: Based on the p-value, decide whether or not the results are statistically significant and draw your conclusions in context.
  • Note: In practice, we should always consider the practical significance of the results as well as the statistical significance.

We are now going to go through these steps as they apply to the hypothesis testing for the population proportion p. It should be noted that even though the details will be specific to this particular test, some of the ideas that we will add apply to hypothesis testing in general.

Step 1. Stating the Hypotheses

Here again are the three set of hypotheses that are being tested in each of our three examples:

Has the proportion of defective products been reduced as a result of the repair?

Is the proportion of marijuana users in the college higher than the national figure?

Did the proportion of U.S. adults who support the death penalty change between 2003 and a later poll?

The null hypothesis always takes the form:

  • Ho: p = some value

and the alternative hypothesis takes one of the following three forms:

  • Ha: p < that value (like in example 1) or
  • Ha: p > that value (like in example 2) or
  • Ha: p ≠ that value (like in example 3).

Note that it was quite clear from the context which form of the alternative hypothesis would be appropriate. The value that is specified in the null hypothesis is called the null value , and is generally denoted by p 0 . We can say, therefore, that in general the null hypothesis about the population proportion (p) would take the form:

  • Ho: p = p 0

We write Ho: p = p 0 to say that we are making the hypothesis that the population proportion has the value of p 0 . In other words, p is the unknown population proportion and p 0 is the number we think p might be for the given situation.

The alternative hypothesis takes one of the following three forms (depending on the context):

Ha: p < p 0 (one-sided)

Ha: p > p 0 (one-sided)

Ha: p ≠ p 0 (two-sided)

The first two possible forms of the alternatives (where the = sign in Ho is challenged by < or >) are called one-sided alternatives , and the third form of alternative (where the = sign in Ho is challenged by ≠) is called a two-sided alternative. To understand the intuition behind these names let’s go back to our examples.

Example 3 (death penalty) is a case where we have a two-sided alternative:

In this case, in order to reject Ho and accept Ha we will need to get a sample proportion of death penalty supporters which is very different from 0.64 in either direction, either much larger or much smaller than 0.64.

In example 2 (marijuana use) we have a one-sided alternative:

Here, in order to reject Ho and accept Ha we will need to get a sample proportion of marijuana users which is much higher than 0.157.

Similarly, in example 1 (defective products), where we are testing:

in order to reject Ho and accept Ha, we will need to get a sample proportion of defective products which is much smaller than 0.20.

Learn by Doing: State Hypotheses (Proportions)

Did I Get This?: State Hypotheses (Proportions)

Proportions (Step 2)

Video: Proportions (Step 2) (12:38)

Step 2. Collect Data, Check Conditions, and Summarize Data

After the hypotheses have been stated, the next step is to obtain a sample (on which the inference will be based), collect relevant data , and summarize them.

It is extremely important that our sample is representative of the population about which we want to draw conclusions. This is ensured when the sample is chosen at random. Beyond the practical issue of ensuring representativeness, choosing a random sample has theoretical importance that we will mention later.

In the case of hypothesis testing for the population proportion (p), we will collect data on the relevant categorical variable from the individuals in the sample and start by calculating the sample proportion p-hat (the natural quantity to calculate when the parameter of interest is p).

Let’s go back to our three examples and add this step to our figures.

As we mentioned earlier without going into details, when we summarize the data in hypothesis testing, we go a step beyond calculating the sample statistic and summarize the data with a test statistic . Every test has a test statistic, which to some degree captures the essence of the test. In fact, the p-value, which so far we have looked upon as “the king” (in the sense that everything is determined by it), is actually determined by (or derived from) the test statistic. We will now introduce the test statistic.

The test statistic is a measure of how far the sample proportion p-hat is from the null value p 0 , the value that the null hypothesis claims is the value of p. In other words, since p-hat is what the data estimates p to be, the test statistic can be viewed as a measure of the “distance” between what the data tells us about p and what the null hypothesis claims p to be.

Let’s use our examples to understand this:

The parameter of interest is p, the proportion of defective products following the repair.

The data estimate p to be p-hat = 0.16

The null hypothesis claims that p = 0.20

The data are therefore 0.04 (or 4 percentage points) below the null hypothesis value.

It is hard to evaluate whether this difference of 4% in defective products is enough evidence to say that the repair was effective at reducing the proportion of defective products, but clearly, the larger the difference, the more evidence it is against the null hypothesis. So if, for example, our sample proportion of defective products had been, say, 0.10 instead of 0.16, then I think you would all agree that cutting the proportion of defective products in half (from 20% to 10%) would be extremely strong evidence that the repair was effective at reducing the proportion of defective products.

The parameter of interest is p, the proportion of students in a college who use marijuana.

The data estimate p to be p-hat = 0.19

The null hypothesis claims that p = 0.157

The data are therefore 0.033 (or 3.3. percentage points) above the null hypothesis value.

The parameter of interest is p, the proportion of U.S. adults who support the death penalty for convicted murderers.

The data estimate p to be p-hat = 0.675

The null hypothesis claims that p = 0.64

There is a difference of 0.035 (or 3.5. percentage points) between the data and the null hypothesis value.

The problem with looking only at the difference between the sample proportion, p-hat, and the null value, p 0 is that we have not taken into account the variability of our estimator p-hat which, as we know from our study of sampling distributions, depends on the sample size.

For this reason, the test statistic cannot simply be the difference between p-hat and p 0 , but must be some form of that formula that accounts for the sample size. In other words, we need to somehow standardize the difference so that comparison between different situations will be possible. We are very close to revealing the test statistic, but before we construct it, let’s be reminded of the following two facts from probability:

Fact 1: When we take a random sample of size n from a population with population proportion p, then

mod9-sampp_hat2

Fact 2: The z-score of any normal value (a value that comes from a normal distribution) is calculated by finding the difference between the value and the mean and then dividing that difference by the standard deviation (of the normal distribution associated with the value). The z-score represents how many standard deviations below or above the mean the value is.

Thus, our test statistic should be a measure of how far the sample proportion p-hat is from the null value p 0 relative to the variation of p-hat (as measured by the standard error of p-hat).

Recall that the standard error is the standard deviation of the sampling distribution for a given statistic. For p-hat, we know the following:

sampdistsummaryphat

To find the p-value, we will need to determine how surprising our value is assuming the null hypothesis is true. We already have the tools needed for this process from our study of sampling distributions as represented in the table above.

If we assume the null hypothesis is true, we can specify that the center of the distribution of all possible values of p-hat from samples of size 400 would be 0.20 (our null value).

We can calculate the standard error, assuming p = 0.20 as

\(\sqrt{\dfrac{p_{0}\left(1-p_{0}\right)}{n}}=\sqrt{\dfrac{0.2(1-0.2)}{400}}=0.02\)

The following picture represents the sampling distribution of all possible values of p-hat of samples of size 400, assuming the true proportion p is 0.20 and our other requirements for the sampling distribution to be normal are met (we will review these during the next step).

A normal curve representing samping distribution of p-hat assuming that p=p_0. Marked on the horizontal axis is p_0 and a particular value of p-hat. z is the difference between p-hat and p_0 measured in standard deviations (with the sign of z indicating whether p-hat is below or above p_0)

In order to calculate probabilities for the picture above, we would need to find the z-score associated with our result.

This z-score is the test statistic ! In this example, the numerator of our z-score is the difference between p-hat (0.16) and null value (0.20) which we found earlier to be -0.04. The denominator of our z-score is the standard error calculated above (0.02) and thus quickly we find the z-score, our test statistic, to be -2.

The sample proportion based upon this data is 2 standard errors below the null value.

Hopefully you now understand more about the reasons we need probability in statistics!!

Now we will formalize the definition and look at our remaining examples before moving on to the next step, which will be to determine if a normal distribution applies and calculate the p-value.

Test Statistic for Hypothesis Tests for One Proportion is:

\(z=\dfrac{\hat{p}-p_{0}}{\sqrt{\dfrac{p_{0}\left(1-p_{0}\right)}{n}}}\)

It represents the difference between the sample proportion and the null value, measured in standard deviations (standard error of p-hat).

The picture above is a representation of the sampling distribution of p-hat assuming p = p 0 . In other words, this is a model of how p-hat behaves if we are drawing random samples from a population for which Ho is true.

Notice the center of the sampling distribution is at p 0 , which is the hypothesized proportion given in the null hypothesis (Ho: p = p 0 .) We could also mark the axis in standard error units,

\(\sqrt{\dfrac{p_{0}\left(1-p_{0}\right)}{n}}\)

For example, if our null hypothesis claims that the proportion of U.S. adults supporting the death penalty is 0.64, then the sampling distribution is drawn as if the null is true. We draw a normal distribution centered at 0.64 (p 0 ) with a standard error dependent on sample size,

\(\sqrt{\dfrac{0.64(1-0.64)}{n}}\).

Important Comment:

  • Note that under the assumption that Ho is true (and if the conditions for the sampling distribution to be normal are satisfied) the test statistic follows a N(0,1) (standard normal) distribution. Another way to say the same thing which is quite common is: “The null distribution of the test statistic is N(0,1).”

By “null distribution,” we mean the distribution under the assumption that Ho is true. As we’ll see and stress again later, the null distribution of the test statistic is what the calculation of the p-value is based on.

Let’s go back to our remaining two examples and find the test statistic in each case:

Since the null hypothesis is Ho: p = 0.157, the standardized (z) score of p-hat = 0.19 is

\(z=\dfrac{0.19-0.157}{\sqrt{\dfrac{0.157(1-0.157)}{100}}} \approx 0.91\)

This is the value of the test statistic for this example.

We interpret this to mean that, assuming that Ho is true, the sample proportion p-hat = 0.19 is 0.91 standard errors above the null value (0.157).

Since the null hypothesis is Ho: p = 0.64, the standardized (z) score of p-hat = 0.675 is

\(z=\dfrac{0.675-0.64}{\sqrt{\dfrac{0.64(1-0.64)}{1000}}} \approx 2.31\)

We interpret this to mean that, assuming that Ho is true, the sample proportion p-hat = 0.675 is 2.31 standard errors above the null value (0.64).

Learn by Doing: Proportions (Step 2)

Comments about the Test Statistic:

  • We mentioned earlier that to some degree, the test statistic captures the essence of the test. In this case, the test statistic measures the difference between p-hat and p 0 in standard errors. This is exactly what this test is about. Get data, and look at the discrepancy between what the data estimates p to be (represented by p-hat) and what Ho claims about p (represented by p 0 ).
  • You can think about this test statistic as a measure of evidence in the data against Ho. The larger the test statistic, the “further the data are from Ho” and therefore the more evidence the data provide against Ho.

Learn by Doing: Proportions (Step 2) Understanding the Test Statistic

Did I Get This?: Proportions (Step 2)

  • It should now be clear why this test is commonly known as the z-test for the population proportion . The name comes from the fact that it is based on a test statistic that is a z-score.
  • Recall fact 1 that we used for constructing the z-test statistic. Here is part of it again:

When we take a random sample of size n from a population with population proportion p 0 , the possible values of the sample proportion p-hat ( when certain conditions are met ) have approximately a normal distribution with a mean of p 0 … and a standard deviation of

stderror

This result provides the theoretical justification for constructing the test statistic the way we did, and therefore the assumptions under which this result holds (in bold, above) are the conditions that our data need to satisfy so that we can use this test. These two conditions are:

i. The sample has to be random.

ii. The conditions under which the sampling distribution of p-hat is normal are met. In other words:

sampsizprop

  • Here we will pause to say more about condition (i.) above, the need for a random sample. In the Probability Unit we discussed sampling plans based on probability (such as a simple random sample, cluster, or stratified sampling) that produce a non-biased sample, which can be safely used in order to make inferences about a population. We noted in the Probability Unit that, in practice, other (non-random) sampling techniques are sometimes used when random sampling is not feasible. It is important though, when these techniques are used, to be aware of the type of bias that they introduce, and thus the limitations of the conclusions that can be drawn from them. For our purpose here, we will focus on one such practice, the situation in which a sample is not really chosen randomly, but in the context of the categorical variable that is being studied, the sample is regarded as random. For example, say that you are interested in the proportion of students at a certain college who suffer from seasonal allergies. For that purpose, the students in a large engineering class could be considered as a random sample, since there is nothing about being in an engineering class that makes you more or less likely to suffer from seasonal allergies. Technically, the engineering class is a convenience sample, but it is treated as a random sample in the context of this categorical variable. On the other hand, if you are interested in the proportion of students in the college who have math anxiety, then the class of engineering students clearly could not possibly be viewed as a random sample, since engineering students probably have a much lower incidence of math anxiety than the college population overall.

Learn by Doing: Proportions (Step 2) Valid or Invalid Sampling?

Let’s check the conditions in our three examples.

i. The 400 products were chosen at random.

ii. n = 400, p 0 = 0.2 and therefore:

\(n p_{0}=400(0.2)=80 \geq 10\)

\(n\left(1-p_{0}\right)=400(1-0.2)=320 \geq 10\)

i. The 100 students were chosen at random.

ii. n = 100, p 0 = 0.157 and therefore:

\begin{gathered} n p_{0}=100(0.157)=15.7 \geq 10 \\ n\left(1-p_{0}\right)=100(1-0.157)=84.3 \geq 10 \end{gathered}

i. The 1000 adults were chosen at random.

ii. n = 1000, p 0 = 0.64 and therefore:

\begin{gathered} n p_{0}=1000(0.64)=640 \geq 10 \\ n\left(1-p_{0}\right)=1000(1-0.64)=360 \geq 10 \end{gathered}

Learn by Doing: Proportions (Step 2) Verify Conditions

Checking that our data satisfy the conditions under which the test can be reliably used is a very important part of the hypothesis testing process. Be sure to consider this for every hypothesis test you conduct in this course and certainly in practice.

The Four Steps in Hypothesis Testing

With respect to the z-test, the population proportion that we are currently discussing we have:

Step 1: Completed

Step 2: Completed

Step 3: This is what we will work on next.

Proportions (Step 3)

Video: Proportions (Step 3) (14:46)

Calculators and Tables

Step 3. Finding the P-value of the Test

So far we’ve talked about the p-value at the intuitive level: understanding what it is (or what it measures) and how we use it to draw conclusions about the statistical significance of our results. We will now go more deeply into how the p-value is calculated.

It should be mentioned that eventually we will rely on technology to calculate the p-value for us (as well as the test statistic), but in order to make intelligent use of the output, it is important to first understand the details, and only then let the computer do the calculations for us. Again, our goal is to use this simple example to give you the tools you need to understand the process entirely. Let’s start.

Recall that so far we have said that the p-value is the probability of obtaining data like those observed assuming that Ho is true. Like the test statistic, the p-value is, therefore, a measure of the evidence against Ho. In the case of the test statistic, the larger it is in magnitude (positive or negative), the further p-hat is from p 0 , the more evidence we have against Ho. In the case of the p-value , it is the opposite; the smaller it is, the more unlikely it is to get data like those observed when Ho is true, the more evidence it is against Ho . One can actually draw conclusions in hypothesis testing just using the test statistic, and as we’ll see the p-value is, in a sense, just another way of looking at the test statistic. The reason that we actually take the extra step in this course and derive the p-value from the test statistic is that even though in this case (the test about the population proportion) and some other tests, the value of the test statistic has a very clear and intuitive interpretation, there are some tests where its value is not as easy to interpret. On the other hand, the p-value keeps its intuitive appeal across all statistical tests.

How is the p-value calculated?

Intuitively, the p-value is the probability of observing data like those observed assuming that Ho is true. Let’s be a bit more formal:

  • Since this is a probability question about the data , it makes sense that the calculation will involve the data summary, the test statistic.
  • What do we mean by “like” those observed? By “like” we mean “as extreme or even more extreme.”

Putting it all together, we get that in general:

The p-value is the probability of observing a test statistic as extreme as that observed (or even more extreme) assuming that the null hypothesis is true.

By “extreme” we mean extreme in the direction(s) of the alternative hypothesis.

Specifically , for the z-test for the population proportion:

  • If the alternative hypothesis is Ha: p < p 0 (less than) , then “extreme” means small or less than , and the p-value is: The probability of observing a test statistic as small as that observed or smaller if the null hypothesis is true.
  • If the alternative hypothesis is Ha: p > p 0 (greater than) , then “extreme” means large or greater than , and the p-value is: The probability of observing a test statistic as large as that observed or larger if the null hypothesis is true.
  • If the alternative is Ha: p ≠ p 0 (different from) , then “extreme” means extreme in either direction either small or large (i.e., large in magnitude) or just different from , and the p-value therefore is: The probability of observing a test statistic as large in magnitude as that observed or larger if the null hypothesis is true.(Examples: If z = -2.5: p-value = probability of observing a test statistic as small as -2.5 or smaller or as large as 2.5 or larger. If z = 1.5: p-value = probability of observing a test statistic as large as 1.5 or larger, or as small as -1.5 or smaller.)

OK, hopefully that makes (some) sense. But how do we actually calculate it?

Recall the important comment from our discussion about our test statistic,

ztestprop

which said that when the null hypothesis is true (i.e., when p = p 0 ), the possible values of our test statistic follow a standard normal (N(0,1), denoted by Z) distribution. Therefore, the p-value calculations (which assume that Ho is true) are simply standard normal distribution calculations for the 3 possible alternative hypotheses.

Alternative Hypothesis is “Less Than”

The probability of observing a test statistic as small as that observed or smaller , assuming that the values of the test statistic follow a standard normal distribution. We will now represent this probability in symbols and also using the normal distribution.

Looking at the shaded region, you can see why this is often referred to as a left-tailed test. We shaded to the left of the test statistic, since less than is to the left.

Alternative Hypothesis is “Greater Than”

The probability of observing a test statistic as large as that observed or larger , assuming that the values of the test statistic follow a standard normal distribution. Again, we will represent this probability in symbols and using the normal distribution

Looking at the shaded region, you can see why this is often referred to as a right-tailed test. We shaded to the right of the test statistic, since greater than is to the right.

Alternative Hypothesis is “Not Equal To”

The probability of observing a test statistic which is as large in magnitude as that observed or larger, assuming that the values of the test statistic follow a standard normal distribution.

This is often referred to as a two-tailed test, since we shaded in both directions.

Next, we will apply this to our three examples. But first, work through the following activities, which should help your understanding.

Learn by Doing: Proportions (Step 3)

Did I Get This?: Proportions (Step 3)

The p-value in this case is:

  • The probability of observing a test statistic as small as -2 or smaller, assuming that Ho is true.

OR (recalling what the test statistic actually means in this case),

  • The probability of observing a sample proportion that is 2 standard deviations or more below the null value (p 0 = 0.20), assuming that p 0 is the true population proportion.

OR, more specifically,

  • The probability of observing a sample proportion of 0.16 or lower in a random sample of size 400, when the true population proportion is p 0 =0.20

In either case, the p-value is found as shown in the following figure:

To find P(Z ≤ -2) we can either use the calculator or table we learned to use in the probability unit for normal random variables. Eventually, after we understand the details, we will use software to run the test for us and the output will give us all the information we need. The p-value that the statistical software provides for this specific example is 0.023. The p-value tells us that it is pretty unlikely (probability of 0.023) to get data like those observed (test statistic of -2 or less) assuming that Ho is true.

  • The probability of observing a test statistic as large as 0.91 or larger, assuming that Ho is true.
  • The probability of observing a sample proportion that is 0.91 standard deviations or more above the null value (p 0 = 0.157), assuming that p 0 is the true population proportion.
  • The probability of observing a sample proportion of 0.19 or higher in a random sample of size 100, when the true population proportion is p 0 =0.157

Again, at this point we can either use the calculator or table to find that the p-value is 0.182, this is P(Z ≥ 0.91).

The p-value tells us that it is not very surprising (probability of 0.182) to get data like those observed (which yield a test statistic of 0.91 or higher) assuming that the null hypothesis is true.

  • The probability of observing a test statistic as large as 2.31 (or larger) or as small as -2.31 (or smaller), assuming that Ho is true.
  • The probability of observing a sample proportion that is 2.31 standard deviations or more away from the null value (p 0 = 0.64), assuming that p 0 is the true population proportion.
  • The probability of observing a sample proportion as different as 0.675 is from 0.64, or even more different (i.e. as high as 0.675 or higher or as low as 0.605 or lower) in a random sample of size 1,000, when the true population proportion is p 0 = 0.64

Again, at this point we can either use the calculator or table to find that the p-value is 0.021, this is P(Z ≤ -2.31) + P(Z ≥ 2.31) = 2*P(Z ≥ |2.31|)

The p-value tells us that it is pretty unlikely (probability of 0.021) to get data like those observed (test statistic as high as 2.31 or higher or as low as -2.31 or lower) assuming that Ho is true.

  • We’ve just seen that finding p-values involves probability calculations about the value of the test statistic assuming that Ho is true. In this case, when Ho is true, the values of the test statistic follow a standard normal distribution (i.e., the sampling distribution of the test statistic when the null hypothesis is true is N(0,1)). Therefore, p-values correspond to areas (probabilities) under the standard normal curve.

Similarly, in any test , p-values are found using the sampling distribution of the test statistic when the null hypothesis is true (also known as the “null distribution” of the test statistic). In this case, it was relatively easy to argue that the null distribution of our test statistic is N(0,1). As we’ll see, in other tests, other distributions come up (like the t-distribution and the F-distribution), which we will just mention briefly, and rely heavily on the output of our statistical package for obtaining the p-values.

We’ve just completed our discussion about the p-value, and how it is calculated both in general and more specifically for the z-test for the population proportion. Let’s go back to the four-step process of hypothesis testing and see what we’ve covered and what still needs to be discussed.

With respect to the z-test the population proportion:

Step 3: Completed

Step 4. This is what we will work on next.

Learn by Doing: Proportions (Step 3) Understanding P-values

Proportions (Step 4 & Summary)

Video: Proportions (Step 4 & Summary) (4:30)

Step 4. Drawing Conclusions Based on the P-Value

This last part of the four-step process of hypothesis testing is the same across all statistical tests, and actually, we’ve already said basically everything there is to say about it, but it can’t hurt to say it again.

The p-value is a measure of how much evidence the data present against Ho. The smaller the p-value, the more evidence the data present against Ho.

We already mentioned that what determines what constitutes enough evidence against Ho is the significance level (α, alpha), a cutoff point below which the p-value is considered small enough to reject Ho in favor of Ha. The most commonly used significance level is 0.05.

  • Conclusion: There IS enough evidence that Ha is True
  • Conclusion: There IS NOT enough evidence that Ha is True

Where instead of Ha is True , we write what this means in the words of the problem, in other words, in the context of the current scenario.

It is important to mention again that this step has essentially two sub-steps:

(i) Based on the p-value, determine whether or not the results are statistically significant (i.e., the data present enough evidence to reject Ho).

(ii) State your conclusions in the context of the problem.

Note: We always still must consider whether the results have any practical significance, particularly if they are statistically significant as a statistically significant result which has not practical use is essentially meaningless!

Let’s go back to our three examples and draw conclusions.

We found that the p-value for this test was 0.023.

Since 0.023 is small (in particular, 0.023 < 0.05), the data provide enough evidence to reject Ho.

Conclusion:

  • There IS enough evidence that the proportion of defective products is less than 20% after the repair .

The following figure is the complete story of this example, and includes all the steps we went through, starting from stating the hypotheses and ending with our conclusions:

We found that the p-value for this test was 0.182.

Since .182 is not small (in particular, 0.182 > 0.05), the data do not provide enough evidence to reject Ho.

  • There IS NOT enough evidence that the proportion of students at the college who use marijuana is higher than the national figure.

Here is the complete story of this example:

Learn by Doing: Learn by Doing – Proportions (Step 4)

We found that the p-value for this test was 0.021.

Since 0.021 is small (in particular, 0.021 < 0.05), the data provide enough evidence to reject Ho

  • There IS enough evidence that the proportion of adults who support the death penalty for convicted murderers has changed since 2003.

Did I Get This?: Proportions (Step 4)

Many Students Wonder: Hypothesis Testing for the Population Proportion

Many students wonder why 5% is often selected as the significance level in hypothesis testing, and why 1% is the next most typical level. This is largely due to just convenience and tradition.

When Ronald Fisher (one of the founders of modern statistics) published one of his tables, he used a mathematically convenient scale that included 5% and 1%. Later, these same 5% and 1% levels were used by other people, in part just because Fisher was so highly esteemed. But mostly these are arbitrary levels.

The idea of selecting some sort of relatively small cutoff was historically important in the development of statistics; but it’s important to remember that there is really a continuous range of increasing confidence towards the alternative hypothesis, not a single all-or-nothing value. There isn’t much meaningful difference, for instance, between a p-value of .049 or .051, and it would be foolish to declare one case definitely a “real” effect and to declare the other case definitely a “random” effect. In either case, the study results were roughly 5% likely by chance if there’s no actual effect.

Whether such a p-value is sufficient for us to reject a particular null hypothesis ultimately depends on the risk of making the wrong decision, and the extent to which the hypothesized effect might contradict our prior experience or previous studies.

Let’s Summarize!!

We have now completed going through the four steps of hypothesis testing, and in particular we learned how they are applied to the z-test for the population proportion. Here is a brief summary:

Step 1: State the hypotheses

State the null hypothesis:

State the alternative hypothesis:

where the choice of the appropriate alternative (out of the three) is usually quite clear from the context of the problem. If you feel it is not clear, it is most likely a two-sided problem. Students are usually good at recognizing the “more than” and “less than” terminology but differences can sometimes be more difficult to spot, sometimes this is because you have preconceived ideas of how you think it should be! Use only the information given in the problem.

Step 2: Obtain data, check conditions, and summarize data

Obtain data from a sample and:

(i) Check whether the data satisfy the conditions which allow you to use this test.

random sample (or at least a sample that can be considered random in context)

the conditions under which the sampling distribution of p-hat is normal are met

sampsizprop

(ii) Calculate the sample proportion p-hat, and summarize the data using the test statistic:

ztestprop

( Recall: This standardized test statistic represents how many standard deviations above or below p 0 our sample proportion p-hat is.)

Step 3: Find the p-value of the test by using the test statistic as follows

IMPORTANT FACT: In all future tests, we will rely on software to obtain the p-value.

When the alternative hypothesis is “less than” the probability of observing a test statistic as small as that observed or smaller , assuming that the values of the test statistic follow a standard normal distribution. We will now represent this probability in symbols and also using the normal distribution.

When the alternative hypothesis is “greater than” the probability of observing a test statistic as large as that observed or larger , assuming that the values of the test statistic follow a standard normal distribution. Again, we will represent this probability in symbols and using the normal distribution

When the alternative hypothesis is “not equal to” the probability of observing a test statistic which is as large in magnitude as that observed or larger, assuming that the values of the test statistic follow a standard normal distribution.

Step 4: Conclusion

Reach a conclusion first regarding the statistical significance of the results, and then determine what it means in the context of the problem.

If p-value ≤ 0.05 then WE REJECT Ho Conclusion: There IS enough evidence that Ha is True

If p-value > 0.05 then WE FAIL TO REJECT Ho Conclusion: There IS NOT enough evidence that Ha is True

Recall that: If the p-value is small (in particular, smaller than the significance level, which is usually 0.05), the results are statistically significant (in the sense that there is a statistically significant difference between what was observed in the sample and what was claimed in Ho), and so we reject Ho.

If the p-value is not small, we do not have enough statistical evidence to reject Ho, and so we continue to believe that Ho may be true. ( Remember: In hypothesis testing we never “accept” Ho ).

Finally, in practice, we should always consider the practical significance of the results as well as the statistical significance.

Learn by Doing: Z-Test for a Population Proportion

What’s next?

Before we move on to the next test, we are going to use the z-test for proportions to bring up and illustrate a few more very important issues regarding hypothesis testing. This might also be a good time to review the concepts of Type I error, Type II error, and Power before continuing on.

More about Hypothesis Testing

CO-1: Describe the roles biostatistics serves in the discipline of public health.

LO 1.11: Recognize the distinction between statistical significance and practical significance.

LO 6.30: Use a confidence interval to determine the correct conclusion to the associated two-sided hypothesis test.

Video: More about Hypothesis Testing (18:25)

The issues regarding hypothesis testing that we will discuss are:

  • The effect of sample size on hypothesis testing.
  • Statistical significance vs. practical importance.
  • Hypothesis testing and confidence intervals—how are they related?

Let’s begin.

1. The Effect of Sample Size on Hypothesis Testing

We have already seen the effect that the sample size has on inference, when we discussed point and interval estimation for the population mean (μ, mu) and population proportion (p). Intuitively …

Larger sample sizes give us more information to pin down the true nature of the population. We can therefore expect the sample mean and sample proportion obtained from a larger sample to be closer to the population mean and proportion, respectively. As a result, for the same level of confidence, we can report a smaller margin of error, and get a narrower confidence interval. What we’ve seen, then, is that larger sample size gives a boost to how much we trust our sample results.

In hypothesis testing, larger sample sizes have a similar effect. We have also discussed that the power of our test increases when the sample size increases, all else remaining the same. This means, we have a better chance to detect the difference between the true value and the null value for larger samples.

The following two examples will illustrate that a larger sample size provides more convincing evidence (the test has greater power), and how the evidence manifests itself in hypothesis testing. Let’s go back to our example 2 (marijuana use at a certain liberal arts college).

We do not have enough evidence to conclude that the proportion of students at the college who use marijuana is higher than the national figure.

Now, let’s increase the sample size.

There are rumors that students in a certain liberal arts college are more inclined to use drugs than U.S. college students in general. Suppose that in a simple random sample of 400 students from the college, 76 admitted to marijuana use . Do the data provide enough evidence to conclude that the proportion of marijuana users among the students in the college (p) is higher than the national proportion, which is 0.157? (Reported by the Harvard School of Public Health).

Our results here are statistically significant . In other words, in example 2* the data provide enough evidence to reject Ho.

  • Conclusion: There is enough evidence that the proportion of marijuana users at the college is higher than among all U.S. students.

What do we learn from this?

We see that sample results that are based on a larger sample carry more weight (have greater power).

In example 2, we saw that a sample proportion of 0.19 based on a sample of size of 100 was not enough evidence that the proportion of marijuana users in the college is higher than 0.157. Recall, from our general overview of hypothesis testing, that this conclusion (not having enough evidence to reject the null hypothesis) doesn’t mean the null hypothesis is necessarily true (so, we never “accept” the null); it only means that the particular study didn’t yield sufficient evidence to reject the null. It might be that the sample size was simply too small to detect a statistically significant difference.

However, in example 2*, we saw that when the sample proportion of 0.19 is obtained from a sample of size 400, it carries much more weight, and in particular, provides enough evidence that the proportion of marijuana users in the college is higher than 0.157 (the national figure). In this case, the sample size of 400 was large enough to detect a statistically significant difference.

The following activity will allow you to practice the ideas and terminology used in hypothesis testing when a result is not statistically significant.

Learn by Doing: Interpreting Non-significant Results

2. Statistical significance vs. practical importance.

Now, we will address the issue of statistical significance versus practical importance (which also involves issues of sample size).

The following activity will let you explore the effect of the sample size on the statistical significance of the results yourself, and more importantly will discuss issue 2: Statistical significance vs. practical importance.

Important Fact: In general, with a sufficiently large sample size you can make any result that has very little practical importance statistically significant! A large sample size alone does NOT make a “good” study!!

This suggests that when interpreting the results of a test, you should always think not only about the statistical significance of the results but also about their practical importance.

Learn by Doing: Statistical vs. Practical Significance

3. Hypothesis Testing and Confidence Intervals

The last topic we want to discuss is the relationship between hypothesis testing and confidence intervals. Even though the flavor of these two forms of inference is different (confidence intervals estimate a parameter, and hypothesis testing assesses the evidence in the data against one claim and in favor of another), there is a strong link between them.

We will explain this link (using the z-test and confidence interval for the population proportion), and then explain how confidence intervals can be used after a test has been carried out.

Recall that a confidence interval gives us a set of plausible values for the unknown population parameter. We may therefore examine a confidence interval to informally decide if a proposed value of population proportion seems plausible.

For example, if a 95% confidence interval for p, the proportion of all U.S. adults already familiar with Viagra in May 1998, was (0.61, 0.67), then it seems clear that we should be able to reject a claim that only 50% of all U.S. adults were familiar with the drug, since based on the confidence interval, 0.50 is not one of the plausible values for p.

In fact, the information provided by a confidence interval can be formally related to the information provided by a hypothesis test. ( Comment: The relationship is more straightforward for two-sided alternatives, and so we will not present results for the one-sided cases.)

Suppose we want to carry out the two-sided test:

  • Ha: p ≠ p 0

using a significance level of 0.05.

An alternative way to perform this test is to find a 95% confidence interval for p and check:

  • If p 0 falls outside the confidence interval, reject Ho.
  • If p 0 falls inside the confidence interval, do not reject Ho.

In other words,

  • If p 0 is not one of the plausible values for p, we reject Ho.
  • If p 0 is a plausible value for p, we cannot reject Ho.

( Comment: Similarly, the results of a test using a significance level of 0.01 can be related to the 99% confidence interval.)

Let’s look at an example:

Recall example 3, where we wanted to know whether the proportion of U.S. adults who support the death penalty for convicted murderers has changed since 2003, when it was 0.64.

We are testing:

and as the figure reminds us, we took a sample of 1,000 U.S. adults, and the data told us that 675 supported the death penalty for convicted murderers (p-hat = 0.675).

A 95% confidence interval for p, the proportion of all U.S. adults who support the death penalty, is:

\(0.675 \pm 1.96 \sqrt{\dfrac{0.675(1-0.675)}{1000}} \approx 0.675 \pm 0.029=(0.646,0.704)\)

Since the 95% confidence interval for p does not include 0.64 as a plausible value for p, we can reject Ho and conclude (as we did before) that there is enough evidence that the proportion of U.S. adults who support the death penalty for convicted murderers has changed since 2003.

You and your roommate are arguing about whose turn it is to clean the apartment. Your roommate suggests that you settle this by tossing a coin and takes one out of a locked box he has on the shelf. Suspecting that the coin might not be fair, you decide to test it first. You toss the coin 80 times, thinking to yourself that if, indeed, the coin is fair, you should get around 40 heads. Instead you get 48 heads. You are puzzled. You are not sure whether getting 48 heads out of 80 is enough evidence to conclude that the coin is unbalanced, or whether this a result that could have happened just by chance when the coin is fair.

Statistics can help you answer this question.

Let p be the true proportion (probability) of heads. We want to test whether the coin is fair or not.

  • Ho: p = 0.5 (the coin is fair).
  • Ha: p ≠ 0.5 (the coin is not fair).

The data we have are that out of n = 80 tosses, we got 48 heads, or that the sample proportion of heads is p-hat = 48/80 = 0.6.

A 95% confidence interval for p, the true proportion of heads for this coin, is:

\(0.6 \pm 1.96 \sqrt{\dfrac{0.6(1-0.6)}{80}} \approx 0.6 \pm 0.11=(0.49,0.71)\)

Since in this case 0.5 is one of the plausible values for p, we cannot reject Ho. In other words, the data do not provide enough evidence to conclude that the coin is not fair.

The context of the last example is a good opportunity to bring up an important point that was discussed earlier.

Even though we use 0.05 as a cutoff to guide our decision about whether the results are statistically significant, we should not treat it as inviolable and we should always add our own judgment. Let’s look at the last example again.

It turns out that the p-value of this test is 0.0734. In other words, it is maybe not extremely unlikely, but it is quite unlikely (probability of 0.0734) that when you toss a fair coin 80 times you’ll get a sample proportion of heads of 48/80 = 0.6 (or even more extreme). It is true that using the 0.05 significance level (cutoff), 0.0734 is not considered small enough to conclude that the coin is not fair. However, if you really don’t want to clean the apartment, the p-value might be small enough for you to ask your roommate to use a different coin, or to provide one yourself!

Did I Get This?: Connection between Confidence Intervals and Hypothesis Tests

Did I Get This?: Hypothesis Tests for Proportions (Extra Practice)

Here is our final point on this subject:

When the data provide enough evidence to reject Ho, we can conclude (depending on the alternative hypothesis) that the population proportion is either less than, greater than, or not equal to the null value p 0 . However, we do not get a more informative statement about its actual value. It might be of interest, then, to follow the test with a 95% confidence interval that will give us more insight into the actual value of p.

In our example 3,

we concluded that the proportion of U.S. adults who support the death penalty for convicted murderers has changed since 2003, when it was 0.64. It is probably of interest not only to know that the proportion has changed, but also to estimate what it has changed to. We’ve calculated the 95% confidence interval for p on the previous page and found that it is (0.646, 0.704).

We can combine our conclusions from the test and the confidence interval and say:

Data provide evidence that the proportion of U.S. adults who support the death penalty for convicted murderers has changed since 2003, and we are 95% confident that it is now between 0.646 and 0.704. (i.e. between 64.6% and 70.4%).

Let’s look at our example 1 to see how a confidence interval following a test might be insightful in a different way.

Here is a summary of example 1:

We conclude that as a result of the repair, the proportion of defective products has been reduced to below 0.20 (which was the proportion prior to the repair). It is probably of great interest to the company not only to know that the proportion of defective has been reduced, but also estimate what it has been reduced to, to get a better sense of how effective the repair was. A 95% confidence interval for p in this case is:

\(0.16 \pm 1.96 \sqrt{\dfrac{0.16(1-0.16)}{400}} \approx 0.16 \pm 0.036=(0.124,0.196)\)

We can therefore say that the data provide evidence that the proportion of defective products has been reduced, and we are 95% confident that it has been reduced to somewhere between 12.4% and 19.6%. This is very useful information, since it tells us that even though the results were significant (i.e., the repair reduced the number of defective products), the repair might not have been effective enough, if it managed to reduce the number of defective products only to the range provided by the confidence interval. This, of course, ties back in to the idea of statistical significance vs. practical importance that we discussed earlier. Even though the results are statistically significant (Ho was rejected), practically speaking, the repair might still be considered ineffective.

Learn by Doing: Hypothesis Tests and Confidence Intervals

Even though this portion of the current section is about the z-test for population proportion, it is loaded with very important ideas that apply to hypothesis testing in general. We’ve already summarized the details that are specific to the z-test for proportions, so the purpose of this summary is to highlight the general ideas.

The process of hypothesis testing has four steps :

I. Stating the null and alternative hypotheses (Ho and Ha).

II. Obtaining a random sample (or at least one that can be considered random) and collecting data. Using the data:

Check that the conditions under which the test can be reliably used are met.

Summarize the data using a test statistic.

  • The test statistic is a measure of the evidence in the data against Ho. The larger the test statistic is in magnitude, the more evidence the data present against Ho.

III. Finding the p-value of the test. The p-value is the probability of getting data like those observed (or even more extreme) assuming that the null hypothesis is true, and is calculated using the null distribution of the test statistic. The p-value is a measure of the evidence against Ho. The smaller the p-value, the more evidence the data present against Ho.

IV. Making conclusions.

Conclusions about the statistical significance of the results:

If the p-value is small, the data present enough evidence to reject Ho (and accept Ha).

If the p-value is not small, the data do not provide enough evidence to reject Ho.

To help guide our decision, we use the significance level as a cutoff for what is considered a small p-value. The significance cutoff is usually set at 0.05.

Conclusions should then be provided in the context of the problem.

Additional Important Ideas about Hypothesis Testing

  • Results that are based on a larger sample carry more weight, and therefore as the sample size increases, results become more statistically significant.
  • Even a very small and practically unimportant effect becomes statistically significant with a large enough sample size. The distinction between statistical significance and practical importance should therefore always be considered.
  • Confidence intervals can be used in order to carry out two-sided tests (95% confidence for the 0.05 significance level). If the null value is not included in the confidence interval (i.e., is not one of the plausible values for the parameter), we have enough evidence to reject Ho. Otherwise, we cannot reject Ho.
  • If the results are statistically significant, it might be of interest to follow up the tests with a confidence interval in order to get insight into the actual value of the parameter of interest.
  • It is important to be aware that there are two types of errors in hypothesis testing ( Type I and Type II ) and that the power of a statistical test is an important measure of how likely we are to be able to detect a difference of interest to us in a particular problem.

Means (All Steps)

NOTE: Beginning on this page, the Learn By Doing and Did I Get This activities are presented as interactive PDF files. The interactivity may not work on mobile devices or with certain PDF viewers. Use an official ADOBE product such as ADOBE READER .

If you have any issues with the Learn By Doing or Did I Get This interactive PDF files, you can view all of the questions and answers presented on this page in this document:

  • QUESTION/Answer (SPOILER ALERT!)

Tests About μ (mu) When σ (sigma) is Unknown – The t-test for a Population Mean

The t-distribution.

Video: Means (All Steps) (13:11)

So far we have talked about the logic behind hypothesis testing and then illustrated how this process proceeds in practice, using the z-test for the population proportion (p).

We are now moving on to discuss testing for the population mean (μ, mu), which is the parameter of interest when the variable of interest is quantitative.

A few comments about the structure of this section:

  • The basic groundwork for carrying out hypothesis tests has already been laid in our general discussion and in our presentation of tests about proportions.

Therefore we can easily modify the four steps to carry out tests about means instead, without going into all of the details again.

We will use this approach for all future tests so be sure to go back to the discussion in general and for proportions to review the concepts in more detail.

  • In our discussion about confidence intervals for the population mean, we made the distinction between whether the population standard deviation, σ (sigma) was known or if we needed to estimate this value using the sample standard deviation, s .

In this section, we will only discuss the second case as in most realistic settings we do not know the population standard deviation .

In this case we need to use the t- distribution instead of the standard normal distribution for the probability aspects of confidence intervals (choosing table values) and hypothesis tests (finding p-values).

  • Although we will discuss some theoretical or conceptual details for some of the analyses we will learn, from this point on we will rely on software to conduct tests and calculate confidence intervals for us , while we focus on understanding which methods are used for which situations and what the results say in context.

If you are interested in more information about the z-test, where we assume the population standard deviation σ (sigma) is known, you can review the Carnegie Mellon Open Learning Statistics Course (you will need to click “ENTER COURSE”).

Like any other tests, the t- test for the population mean follows the four-step process:

  • STEP 1: Stating the hypotheses H o and H a .
  • STEP 2: Collecting relevant data, checking that the data satisfy the conditions which allow us to use this test, and summarizing the data using a test statistic.
  • STEP 3: Finding the p-value of the test, the probability of obtaining data as extreme as those collected (or even more extreme, in the direction of the alternative hypothesis), assuming that the null hypothesis is true. In other words, how likely is it that the only reason for getting data like those observed is sampling variability (and not because H o is not true)?
  • STEP 4: Drawing conclusions, assessing the statistical significance of the results based on the p-value, and stating our conclusions in context. (Do we or don’t we have evidence to reject H o and accept H a ?)
  • Note: In practice, we should also always consider the practical significance of the results as well as the statistical significance.

We will now go through the four steps specifically for the t- test for the population mean and apply them to our two examples.

Only in a few cases is it reasonable to assume that the population standard deviation, σ (sigma), is known and so we will not cover hypothesis tests in this case. We discussed both cases for confidence intervals so that we could still calculate some confidence intervals by hand.

For this and all future tests we will rely on software to obtain our summary statistics, test statistics, and p-values for us.

The case where σ (sigma) is unknown is much more common in practice. What can we use to replace σ (sigma)? If you don’t know the population standard deviation, the best you can do is find the sample standard deviation, s, and use it instead of σ (sigma). (Note that this is exactly what we did when we discussed confidence intervals).

Is that it? Can we just use s instead of σ (sigma), and the rest is the same as the previous case? Unfortunately, it’s not that simple, but not very complicated either.

Here, when we use the sample standard deviation, s, as our estimate of σ (sigma) we can no longer use a normal distribution to find the cutoff for confidence intervals or the p-values for hypothesis tests.

Instead we must use the t- distribution (with n-1 degrees of freedom) to obtain the p-value for this test.

We discussed this issue for confidence intervals. We will talk more about the t- distribution after we discuss the details of this test for those who are interested in learning more.

It isn’t really necessary for us to understand this distribution but it is important that we use the correct distributions in practice via our software.

We will wait until UNIT 4B to look at how to accomplish this test in the software. For now focus on understanding the process and drawing the correct conclusions from the p-values given.

Now let’s go through the four steps in conducting the t- test for the population mean.

The null and alternative hypotheses for the t- test for the population mean (μ, mu) have exactly the same structure as the hypotheses for z-test for the population proportion (p):

The null hypothesis has the form:

  • Ho: μ = μ 0 (mu = mu_zero)

(where μ 0 (mu_zero) is often called the null value)

  • Ha: μ < μ 0 (mu < mu_zero) (one-sided)
  • Ha: μ > μ 0 (mu > mu_zero) (one-sided)
  • Ha: μ ≠ μ 0 (mu ≠ mu_zero) (two-sided)

where the choice of the appropriate alternative (out of the three) is usually quite clear from the context of the problem.

If you feel it is not clear, it is most likely a two-sided problem. Students are usually good at recognizing the “more than” and “less than” terminology but differences can sometimes be more difficult to spot, sometimes this is because you have preconceived ideas of how you think it should be! You also cannot use the information from the sample to help you determine the hypothesis. We would not know our data when we originally asked the question.

Now try it yourself. Here are a few exercises on stating the hypotheses for tests for a population mean.

Learn by Doing: State the Hypotheses for a test for a population mean

Here are a few more activities for practice.

Did I Get This?: State the Hypotheses for a test for a population mean

When setting up hypotheses, be sure to use only the information in the research question. We cannot use our sample data to help us set up our hypotheses.

For this test, it is still important to correctly choose the alternative hypothesis as “less than”, “greater than”, or “different” although generally in practice two-sample tests are used.

Obtain data from a sample:

  • In this step we would obtain data from a sample. This is not something we do much of in courses but it is done very often in practice!

Check the conditions:

  • Then we check the conditions under which this test (the t- test for one population mean) can be safely carried out – which are:
  • The sample is random (or at least can be considered random in context).
  • We are in one of the three situations marked with a green check mark in the following table (which ensure that x-bar is at least approximately normal and the test statistic using the sample standard deviation, s, is therefore a t- distribution with n-1 degrees of freedom – proving this is beyond the scope of this course):
  • For large samples, we don’t need to check for normality in the population . We can rely on the sample size as the basis for the validity of using this test.
  • For small samples , we need to have data from a normal population in order for the p-values and confidence intervals to be valid.

In practice, for small samples, it can be very difficult to determine if the population is normal. Here is a simulation to give you a better understanding of the difficulties.

Video: Simulations – Are Samples from a Normal Population? (4:58)

Now try it yourself with a few activities.

Learn by Doing: Checking Conditions for Hypothesis Testing for the Population Mean

  • It is always a good idea to look at the data and get a sense of their pattern regardless of whether you actually need to do it in order to assess whether the conditions are met.
  • This idea of looking at the data is relevant to all tests in general. In the next module—inference for relationships—conducting exploratory data analysis before inference will be an integral part of the process.

Here are a few more problems for extra practice.

Did I Get This?: Checking Conditions for Hypothesis Testing for the Population Mean

When setting up hypotheses, be sure to use only the information in the res

Calculate Test Statistic

Assuming that the conditions are met, we calculate the sample mean x-bar and the sample standard deviation, s (which estimates σ (sigma)), and summarize the data with a test statistic.

The test statistic for the t -test for the population mean is:

\(t=\dfrac{\bar{x} - \mu_0}{s/ \sqrt{n}}\)

Recall that such a standardized test statistic represents how many standard deviations above or below μ 0 (mu_zero) our sample mean x-bar is.

Therefore our test statistic is a measure of how different our data are from what is claimed in the null hypothesis. This is an idea that we mentioned in the previous test as well.

Again we will rely on the p-value to determine how unusual our data would be if the null hypothesis is true.

As we mentioned, the test statistic in the t -test for a population mean does not follow a standard normal distribution. Rather, it follows another bell-shaped distribution called the t- distribution.

We will present the details of this distribution at the end for those interested but for now we will work on the process of the test.

Here are a few important facts.

  • In statistical language we say that the null distribution of our test statistic is the t- distribution with (n-1) degrees of freedom. In other words, when Ho is true (i.e., when μ = μ 0 (mu = mu_zero)), our test statistic has a t- distribution with (n-1) d.f., and this is the distribution under which we find p-values.
  • For a large sample size (n), the null distribution of the test statistic is approximately Z, so whether we use t (n – 1) or Z to calculate the p-values does not make a big difference. However, software will use the t -distribution regardless of the sample size and so will we.

Although we will not calculate p-values by hand for this test, we can still easily calculate the test statistic.

Try it yourself:

Learn by Doing: Calculate the Test Statistic for a Test for a Population Mean

From this point in this course and certainly in practice we will allow the software to calculate our test statistics and we will use the p-values provided to draw our conclusions.

We will use software to obtain the p-value for this (and all future) tests but here are the images illustrating how the p-value is calculated in each of the three cases corresponding to the three choices for our alternative hypothesis.

Note that due to the symmetry of the t distribution, for a given value of the test statistic t, the p-value for the two-sided test is twice as large as the p-value of either of the one-sided tests. The same thing happens when p-values are calculated under the t distribution as when they are calculated under the Z distribution.

We will show some examples of p-values obtained from software in our examples. For now let’s continue our summary of the steps.

As usual, based on the p-value (and some significance level of choice) we assess the statistical significance of results, and draw our conclusions in context.

To review what we have said before:

If p-value ≤ 0.05 then WE REJECT Ho

If p-value > 0.05 then WE FAIL TO REJECT Ho

This step has essentially two sub-steps:

We are now ready to look at two examples.

A certain prescription medicine is supposed to contain an average of 250 parts per million (ppm) of a certain chemical. If the concentration is higher than this, the drug may cause harmful side effects; if it is lower, the drug may be ineffective.

The manufacturer runs a check to see if the mean concentration in a large shipment conforms to the target level of 250 ppm or not.

A simple random sample of 100 portions is tested, and the sample mean concentration is found to be 247 ppm with a sample standard deviation of 12 ppm.

Here is a figure that represents this example:

A large circle represents the population, which is the shipment. μ represents the concentration of the chemical. The question we want to answer is "is the mean concentration the required 250ppm or not? (Assume: SD = 12)." Selected from the population is a sample of size n=100, represented by a smaller circle. x-bar for this sample is 247.

1. The hypotheses being tested are:

  • Ha: μ ≠ μ 0 (mu ≠ mu_zero)
  • Where μ = population mean part per million of the chemical in the entire shipment

2. The conditions that allow us to use the t-test are met since:

  • The sample is random
  • The sample size is large enough for the Central Limit Theorem to apply and ensure the normality of x-bar. We do not need normality of the population in order to be able to conduct this test for the population mean. We are in the 2 nd column in the table below.
  • The test statistic is:

\(t=\dfrac{\bar{x}-\mu_{0}}{s / \sqrt{n}}=\dfrac{247-250}{12 / \sqrt{100}}=-2.5\)

  • The data (represented by the sample mean) are 2.5 standard errors below the null value.

3. Finding the p-value.

  • To find the p-value we use statistical software, and we calculate a p-value of 0.014.

4. Conclusions:

  • The p-value is small (.014) indicating that at the 5% significance level, the results are significant.
  • We reject the null hypothesis.
  • There is enough evidence to conclude that the mean concentration in entire shipment is not the required 250 ppm.
  • It is difficult to comment on the practical significance of this result without more understanding of the practical considerations of this problem.

Here is a summary:

  • The 95% confidence interval for μ (mu) can be used here in the same way as for proportions to conduct the two-sided test (checking whether the null value falls inside or outside the confidence interval) or following a t- test where Ho was rejected to get insight into the value of μ (mu).
  • We find the 95% confidence interval to be (244.619, 249.381) . Since 250 is not in the interval we know we would reject our null hypothesis that μ (mu) = 250. The confidence interval gives additional information. By accounting for estimation error, it estimates that the population mean is likely to be between 244.62 and 249.38. This is lower than the target concentration and that information might help determine the seriousness and appropriate course of action in this situation.

In most situations in practice we use TWO-SIDED HYPOTHESIS TESTS, followed by confidence intervals to gain more insight.

For completeness in covering one sample t-tests for a population mean, we still cover all three possible alternative hypotheses here HOWEVER, this will be the last test for which we will do so.

A research study measured the pulse rates of 57 college men and found a mean pulse rate of 70 beats per minute with a standard deviation of 9.85 beats per minute.

Researchers want to know if the mean pulse rate for all college men is different from the current standard of 72 beats per minute.

  • The hypotheses being tested are:
  • Ho: μ = 72
  • Ha: μ ≠ 72
  • Where μ = population mean heart rate among college men
  • The conditions that allow us to use the t- test are met since:
  • The sample is random.
  • The sample size is large (n = 57) so we do not need normality of the population in order to be able to conduct this test for the population mean. We are in the 2 nd column in the table below.

\(t=\dfrac{\bar{x}-\mu}{s / \sqrt{n}}=\dfrac{70-72}{9.85 / \sqrt{57}}=-1.53\)

  • The data (represented by the sample mean) are 1.53 estimated standard errors below the null value.
  • Recall that in general the p-value is calculated under the null distribution of the test statistic, which, in the t- test case, is t (n-1). In our case, in which n = 57, the p-value is calculated under the t (56) distribution. Using statistical software, we find that the p-value is 0.132 .
  • Here is how we calculated the p-value. http://homepage.stat.uiowa.edu/~mbognar/applets/t.html .

A t(56) curve, for which the horizontal axis has been labeled with t-scores of -2.5 and 2.5 . The area under the curve and to the left of -1.53 and to the right of 1.53 is the p-value.

4. Making conclusions.

  • The p-value (0.132) is not small, indicating that the results are not significant.
  • We fail to reject the null hypothesis.
  • There is not enough evidence to conclude that the mean pulse rate for all college men is different from the current standard of 72 beats per minute.
  • The results from this sample do not appear to have any practical significance either with a mean pulse rate of 70, this is very similar to the hypothesized value, relative to the variation expected in pulse rates.

Now try a few yourself.

Learn by Doing: Hypothesis Testing for the Population Mean

From this point in this course and certainly in practice we will allow the software to calculate our test statistic and p-value and we will use the p-values provided to draw our conclusions.

That concludes our discussion of hypothesis tests in Unit 4A.

In the next unit we will continue to use both confidence intervals and hypothesis test to investigate the relationship between two variables in the cases we covered in Unit 1 on exploratory data analysis – we will look at Case CQ, Case CC, and Case QQ.

Before moving on, we will discuss the details about the t- distribution as a general object.

We have seen that variables can be visually modeled by many different sorts of shapes, and we call these shapes distributions. Several distributions arise so frequently that they have been given special names, and they have been studied mathematically.

So far in the course, the only one we’ve named, for continuous quantitative variables, is the normal distribution, but there are others. One of them is called the t- distribution.

The t- distribution is another bell-shaped (unimodal and symmetric) distribution, like the normal distribution; and the center of the t- distribution is standardized at zero, like the center of the standard normal distribution.

Like all distributions that are used as probability models, the normal and the t- distribution are both scaled, so the total area under each of them is 1.

So how is the t-distribution fundamentally different from the normal distribution?

  • The spread .

The following picture illustrates the fundamental difference between the normal distribution and the t-distribution:

Here we have an image which illustrates the fundamental difference between the normal distribution and the t- distribution:

You can see in the picture that the t- distribution has slightly less area near the expected central value than the normal distribution does, and you can see that the t distribution has correspondingly more area in the “tails” than the normal distribution does. (It’s often said that the t- distribution has “fatter tails” or “heavier tails” than the normal distribution.)

This reflects the fact that the t- distribution has a larger spread than the normal distribution. The same total area of 1 is spread out over a slightly wider range on the t- distribution, making it a bit lower near the center compared to the normal distribution, and giving the t- distribution slightly more probability in the ‘tails’ compared to the normal distribution.

Therefore, the t- distribution ends up being the appropriate model in certain cases where there is more variability than would be predicted by the normal distribution. One of these cases is stock values, which have more variability (or “volatility,” to use the economic term) than would be predicted by the normal distribution.

There’s actually an entire family of t- distributions. They all have similar formulas (but the math is beyond the scope of this introductory course in statistics), and they all have slightly “fatter tails” than the normal distribution. But some are closer to normal than others.

The t- distributions that have higher “degrees of freedom” are closer to normal (degrees of freedom is a mathematical concept that we won’t study in this course, beyond merely mentioning it here). So, there’s a t- distribution “with one degree of freedom,” another t- distribution “with 2 degrees of freedom” which is slightly closer to normal, another t- distribution “with 3 degrees of freedom” which is a bit closer to normal than the previous ones, and so on.

The following picture illustrates this idea with just a couple of t- distributions (note that “degrees of freedom” is abbreviated “d.f.” on the picture):

The test statistic for our t-test for one population mean is a t -score which follows a t- distribution with (n – 1) degrees of freedom. Recall that each t- distribution is indexed according to “degrees of freedom.” Notice that, in the context of a test for a mean, the degrees of freedom depend on the sample size in the study.

Remember that we said that higher degrees of freedom indicate that the t- distribution is closer to normal. So in the context of a test for the mean, the larger the sample size , the higher the degrees of freedom, and the closer the t- distribution is to a normal z distribution .

As a result, in the context of a test for a mean, the effect of the t- distribution is most important for a study with a relatively small sample size .

We are now done introducing the t-distribution. What are implications of all of this?

  • The null distribution of our t-test statistic is the t-distribution with (n-1) d.f. In other words, when Ho is true (i.e., when μ = μ 0 (mu = mu_zero)), our test statistic has a t-distribution with (n-1) d.f., and this is the distribution under which we find p-values.
  • For a large sample size (n), the null distribution of the test statistic is approximately Z, so whether we use t(n – 1) or Z to calculate the p-values does not make a big difference.

Module 8: Inference for One Proportion

Introduction to hypothesis testing, what you’ll learn to do: given a claim about a population, construct an appropriate set of hypotheses to test and properly interpret p values and type i / ii errors. .

Hypothesis testing is part of inference. Given a claim about a population, we will learn to determine the null and alternative hypotheses. We will recognize the logic behind a hypothesis test and how it relates to the P-value as well as recognizing type I and type II errors. These are powerful tools in exploring and understanding data in real-life.

Contribute!

Improve this page Learn More

  • Concepts in Statistics. Provided by : Open Learning Initiative. Located at : http://oli.cmu.edu . License : CC BY: Attribution
  • Inferential Statistics Decision Making Table. Provided by : Wikimedia Commons: Adapted by Lumen Learning. Located at : https://upload.wikimedia.org/wikipedia/commons/thumb/e/e2/Inferential_Statistics_Decision_Making_Table.png/120px-Inferential_Statistics_Decision_Making_Table.png . License : CC BY: Attribution

Footer Logo Lumen Waymaker

  • How To Write A Hypothesis

How to Write a Hypothesis: Types and Tips to Remember

Many people might not know what a hypothesis is, the purpose of a hypothesis or where a hypothesis is needed. A hypothesis is a statement that explains the research’s predictions and the reasons behind the research. It is an “educated guess” of the final result of the research problem and is written for an academic research paper. A good hypothesis is carefully stated as a key aspect of the scientific method, yet even the simplest ones can be difficult to explain.

If you are unaware of the process of writing a hypothesis, we are here to help you with all your queries. Read the article and learn how to write a hypothesis for your academic paper/thesis.

Table of Contents

What is a hypothesis.

  • Simple Hypothesis
  • Complex Hypothesis
  • Null Hypothesis
  • Alternative Hypothesis
  • Logical Hypothesis
  • Empirical Hypothesis
  • Statistical Hypothesis

Writing a Good Hypothesis – Points to Remember

  • How to Write a Hypothesis

Frequently Asked Questions on How to Write a Hypothesis

A hypothesis is prepared in the early stages of a research project. Based on the preliminary research observations, a hypothesis is framed. It is the prediction of the end result of the research problem. For example, suppose you have observed that the plants grow up better with regular watering. In that case, your hypothesis can be “Plants grow better with regular watering”. Once you have got your hypothesis, you can begin the experiments required to support and prove it.

A hypothesis must include variables. It can be some events, objects or concepts which are to be observed and tested for your research experiments. There are two kinds of variables – dependent variables and independent variables. The independent variables are the ones which can be modified in the experiment, and the dependent variables are the ones which can only be observed.

Hypotheses are a crucial part of the research paper since they influence the direction and arrangement of the research methods. The readers will want to know if the hypothesis was proven right or wrong, and therefore it must be mentioned clearly in the introduction or the abstract of the paper.

Types of Hypotheses

Depending on the nature of the research and the findings, the hypothesis can be categorised into one or more of the seven major categories.

1. Simple Hypothesis

A simple hypothesis states the relationship between the two variables (dependent and independent variables).

2. Complex Hypothesis

A complex hypothesis entails the existence of a relationship between two or more variables. It can be two dependent variables and one independent variable or vice versa.

3. Null Hypothesis

A null hypothesis is a statement that states that the variables have no relationship.

4. Alternative Hypothesis

The null hypothesis is the polar opposite of the alternative hypothesis. It states that the two variables under study have a link (one variable has an effect on the other).

5. Logical Hypothesis

In the absence of verifiable proof, a logical hypothesis indicates a relationship between variables. Assertions are based on inference or logic rather than evidence.

6. Empirical Hypothesis

An empirical hypothesis, often known as a “working hypothesis,” is one that is being researched right now. Empirical hypotheses, unlike logical assumptions, are supported by evidence.

7. Statistical Hypothesis

When you test a sample of a population and then use the collected statistical evidence to draw conclusions about the full population, you’ve generated a statistical hypothesis. You test a section of it and then make an educated guess about the rest based on the results.

A good hypothesis is written following the same format and guidelines. To write a good hypothesis, the below-mentioned information has to be added.

Causes and Effects: A hypothesis always includes a cause-and-effect relationship where one variable causes another to change or not change, depending on the type of hypothesis.

Measurable Prediction: Other than logical hypotheses, most hypotheses are designed to be tested. Before you commit to any hypothesis, make sure that it is experimented. Select a testable hypothesis involving an independent variable over which you can have complete control.

Dependent and Independent Variables: You can define the type of variables of your research for the readers.

Language used in a Hypothesis: Make sure to write the hypothesis in simple and clear language.

Adhere to Ethics: Before conducting your research, keep an eye on what you are experimenting with. Those hypotheses which are objectionable, questionable or taboo can be avoided unless they are absolutely necessary.

How to Write a Hypothesis?

A good hypothesis can be written in the following six steps.

Asking a Question

Arousing curiosity in the minds of the readers can be a good way to start a hypothesis. It would make the readers think about the topic critically.

Conducting a Preliminary Research

Before writing the hypothesis, it is essential to get background information regarding the topic. The preliminary research can be done through various web searches, reading books, etc.

Defining the Variables

After you have decided on your hypothesis, you can now decide on your variables. Keep in mind that the independent variables are the ones over which you have complete control and accordingly decide the limits of your hypothesis.

Writing the Hypothesis in the “if-then” Statement

While writing a hypothesis, keep in mind that it must be written in an “if-then” format statement which is a reliable method of expressing the causes and effects. A simple example would be, “If we water the plants daily, then they might grow really well.”

Collection of Adequate Data to Back the Hypothesis

A hypothesis is written to reach the conclusion of the research. After writing the hypothesis, the experiments can be conducted. See to it that you collect the adequate data needed to support the hypothesis.

Writing with Confidence

After you have collected enough data, you can start writing the hypothesis. Make sure you write confidently, without any errors. It would be good to get your writing counter-checked by an expert if you are not confident about it.

What is a hypothesis?

A hypothesis is a statement that explains the research’s predictions and the reasons behind the research. It is written based on various observations.

Why is a hypothesis important?

A hypothesis is important in an academic paper because it explains the result of the research problem. It will help the researcher, as well as the audience, to stay focused and not deviate from the main idea.

what is the 3 parts of hypothesis

  • Share Share

Register with BYJU'S & Download Free PDFs

Register with byju's & watch live videos.

SEP home page

  • Table of Contents
  • Random Entry
  • Chronological
  • Editorial Information
  • About the SEP
  • Editorial Board
  • How to Cite the SEP
  • Special Characters
  • Advanced Tools
  • Support the SEP
  • PDFs for SEP Friends
  • Make a Donation
  • SEPIA for Libraries
  • Entry Contents

Bibliography

Academic tools.

  • Friends PDF Preview
  • Author and Citation Info
  • Back to Top

The Structure of Scientific Theories

Scientific inquiry has led to immense explanatory and technological successes, partly as a result of the pervasiveness of scientific theories. Relativity theory, evolutionary theory, and plate tectonics were, and continue to be, wildly successful families of theories within physics, biology, and geology. Other powerful theory clusters inhabit comparatively recent disciplines such as cognitive science, climate science, molecular biology, microeconomics, and Geographic Information Science (GIS). Effective scientific theories magnify understanding, help supply legitimate explanations, and assist in formulating predictions. Moving from their knowledge-producing representational functions to their interventional roles (Hacking 1983), theories are integral to building technologies used within consumer, industrial, and scientific milieus.

This entry explores the structure of scientific theories from the perspective of the Syntactic, Semantic, and Pragmatic Views. Each of these answers questions such as the following in unique ways. What is the best characterization of the composition and function of scientific theory? How is theory linked with world? Which philosophical tools can and should be employed in describing and reconstructing scientific theory? Is an understanding of practice and application necessary for a comprehension of the core structure of a scientific theory? Finally, and most generally, how are these three views ultimately related?

1.1 Syntactic, Semantic, and Pragmatic Views: The Basics

1.2 two examples: newtonian mechanics and population genetics, 2.1 theory structure per the syntactic view, 2.2 a running example: newtonian mechanics, 2.3 interpreting theory structure per the syntactic view, 2.4 taking stock: syntactic view, 3.1 theory structure per the semantic view, 3.2 a running example: newtonian mechanics, 3.3 interpreting theory structure per the semantic view, 3.4 taking stock: semantic view, 4.1 theory structure per the pragmatic view, 4.2 a running example: newtonian mechanics, 4.3 interpreting theory structure per the pragmatic view, 4.4 taking stock: pragmatic view, 5. population genetics, 6. conclusion, other internet resources, related entries, 1. introduction.

In philosophy, three families of perspectives on scientific theory are operative: the Syntactic View , the Semantic View , and the Pragmatic View. Savage distills these philosophical perspectives thus:

The syntactic view that a theory is an axiomatized collection of sentences has been challenged by the semantic view that a theory is a collection of nonlinguistic models, and both are challenged by the view that a theory is an amorphous entity consisting perhaps of sentences and models, but just as importantly of exemplars, problems, standards, skills, practices and tendencies. (Savage 1990, vii–viii)

Mormann (2007) characterizes the Syntactic and Semantic Views in similar terms, and is among the first to use the term “Pragmatic View” to capture the third view (137). The three views are baptized via a trichotomy from linguistics deriving from the work of Charles Morris, following Charles S. Peirce. In a classic exposition, the logical positivist Carnap writes:

If in an investigation explicit reference is made to the speaker, or, to put it in more general terms, to the user of a language, then we assign it to the field of pragmatics . (Whether in this case reference to designata is made or not makes no difference for this classification.) If we abstract from the user of the language and analyze only the expressions and their designata, we are in the field of semantics . And if, finally, we abstract from the designata also and analyze only the relations between the expressions, we are in (logical) syntax . The whole science of language, consisting of the three parts mentioned, is called semiotic . (1942, 9; see also Carnap 1939, 3–5, 16)

To summarize, syntax concerns grammar and abstract structures; semantics investigates meaning and representation; and pragmatics explores use. Importantly, while no view is oblivious to the syntax, semantics, or pragmatics of theory, the baptism of each is a product of how one of the three aspects of language is perceived to be dominant: theory as syntactic logical reconstruction (Syntactic View); theory as semantically meaningful mathematical modeling (Semantic View); or theory structure as complex and as closely tied to theory pragmatics, i.e., function and context (Pragmatic View). Each of these philosophical perspectives on scientific theory will be reviewed in this entry. Their relations will be briefly considered in the Conclusion.

It will be helpful to pare each perspective down to its essence. Each endorses a substantive thesis about the structure of scientific theories.

For the Syntactic View, the structure of a scientific theory is its reconstruction in terms of sentences cast in a metamathematical language. Metamathematics is the axiomatic machinery for building clear foundations of mathematics, and includes predicate logic, set theory, and model theory (e.g., Zach 2009; Hacking 2014). A central question of the Syntactic View is: in which logical language should we recast scientific theory?

Some defenders of the Semantic View keep important aspects of this reconstructive agenda, moving the metamathematical apparatus from predicate logic to set theory. Other advocates of the Semantic View insist that the structure of scientific theory is solely mathematical. They argue that we should remain at the mathematical level, rather than move up (or down) a level, into foundations of mathematics. A central question for the Semantic View is: which mathematical models are actually used in science?

Finally, for the Pragmatic View, scientific theory is internally and externally complex. Mathematical components, while often present, are neither necessary nor sufficient for characterizing the core structure of scientific theories. Theory also consists of a rich variety of nonformal components (e.g., analogies and natural kinds). Thus, the Pragmatic View argues, a proper analysis of the grammar (syntax) and meaning (semantics) of theory must pay heed to scientific theory complexity, as well as to the multifarious assumptions, purposes, values, and practices informing theory. A central question the Pragmatic View poses is: which theory components and which modes of theorizing are present in scientific theories found across a variety of disciplines?

In adopting a descriptive perspective on the structure of scientific theories, each view also deploys, at least implicitly, a prescriptive characterization of our central topic. In other words, postulating that scientific theory is \(X\) (e.g., \(X\) = a set-theoretic structure, as per Suppes 1960, 1962, 1967, 1968, 2002) also implies that what is not \(X\) (or could not be recast as \(X\)) is not (or could not possibly be) a scientific theory, and would not help us in providing scientific understanding, explanation, prediction, and intervention. For the Syntactic View, what is not (or cannot be) reconstructed axiomatically is not theoretical, while for the Semantic View, what is not (or cannot be) modeled mathematically is not theoretical. In contrast, in part due to its pluralism about what a scientific theory actually (and possibly) is, and because it interprets theory structure as distributed in practices, the Pragmatic View resists the definitional and normative terms set by the other two views. As a result, the Pragmatic View ultimately reforms the very concepts of “theory” and “theory structure.”

This encyclopedia entry will be organized as follows. After presenting this piece’s two sustained examples, immediately below, the three views are reviewed in as many substantive sections. Each section starts with a brief overview before characterizing that perspective’s account of theory structure. Newtonian mechanics is used as a running example within each section. The interpretation of theory structure—viz., how theory “hooks up” with phenomena, experiment, and the world—is also reviewed in each section. In the final section of this entry, we turn to population genetics and an analysis of the Hardy-Weinberg Principle (HWP) to compare and contrast each view. The Conclusion suggests, and remains non-committal about, three kinds of relations among the views: identity , combat , and complementarity . Theory is not a single, static entity that we are seeing from three different perspectives, as we might represent the Earth using three distinct mathematical map projections. Rather, theory itself changes as a consequence of perspective adopted.

Two examples will be used to illustrate differences between the three views: Newtonian mechanics and population genetics. While relativity theory is the preferred theory of the Syntactic View, Newtonian mechanics is more straightforward. Somewhat permissively construed, the theory of Newtonian mechanics employs the basic conceptual machinery of inertial reference frames, centers of mass, Newton’s laws of motion, etc., to describe the dynamics and kinematics of, among other phenomena, point masses acting vis-à-vis gravitational forces (e.g. the solar system) or with respect to forces involved in collisions (e.g., pool balls on a pool table; a closed container filled with gas). Newtonian mechanics is explored in each section.

Population genetics investigates the genetic composition of populations of natural and domesticated species, including the dynamics and causes of changes in gene frequencies in such populations (for overviews, see Lloyd 1994 [1988]; Gould 2002; Pigliucci and Müller 2010; Okasha 2012). Population genetics emerged as a discipline with the early 20 th century work of R.A. Fisher, Sewall Wright, and J.B.S. Haldane, who synthesized Darwinian evolutionary theory and Mendelian genetics. One important part of population genetic theory is the Hardy-Weinberg Principle. HWP is a null model mathematically stating that gene frequencies remain unchanged across generations when there is no selection, migration, random genetic drift, or other evolutionary forces acting in a given population. HWP peppers early chapters of many introductory textbooks (e.g., Crow and Kimura 1970; Hartl and Clark 1989; Bergstrom and Dugatkin 2012). We return to HWP in Section 5 and here merely state questions each view might ask about population genetics.

The Syntactic View focuses on questions regarding the highest axiomatic level of population genetics (e.g., Williams 1970, 1973; Van Valen 1976; Lewis 1980; Tuomi 1981, 1992). Examples of such queries are:

  • What would be the most convenient metamathematical axiomatization of evolutionary processes (e.g., natural selection, drift, migration, speciation, competition)? In which formal language(s) would and could such axiomatizations be articulated (e.g., first-order predicate logic, set theory, and category theory)?
  • Which single grammars could contain a variety of deep evolutionary principles and concepts, such as HWP, “heritability,” and “competitive exclusion”?
  • Which formal and methodological tools would permit a smooth flow from the metamathematical axiomatization to the mathematical theory of population genetics?

Investigations of the axiomatized rational reconstruction of theory shed light on the power and promises, and weaknesses and incompleteness, of the highest-level theoretical edifice of population genetics.

Secondly, the Semantic View primarily examines questions regarding the mathematical structure of population genetics (Lewontin 1974, Beatty 1981; López Beltrán 1987; Thompson 1989, 2007; Lloyd 1994 [1988]). Very generally, this exploration involves the following questions:

  • What is the form and content of the directly presented class of mathematical models of evolutionary theory (e.g., HWP)? How could and should we organize the cluster of mathematical models (sensu Levins 1966) of population genetics?
  • Which additional models (e.g., diagrammatic, narrative, scale) might be used to enrich our understanding of evolutionary theory?
  • What are the relations among theoretical mathematical models, data models, and experimental models? How does theory explain and shape data? How do the data constrain and confirm theory?

The main subject of investigation is mathematical structure, rather than metamathematics or even alternative model types or modeling methods.

Finally, the Pragmatic View asks about the internal complexity of population genetic theory, as well as about the development and context of population genetics. In so doing, it inquires into how purposes and values have influenced the theoretical structure of evolutionary theory, selecting and shaping current population genetics from a wide variety of possible alternative theoretical articulations. The following questions about the structure of population genetic theory might be here addressed:

  • What role did R.A. Fisher’s interest in animal husbandry, and his tenure at Rothamsted Experimental Station, play in shaping his influential methodologies of Analysis of Variance (ANOVA) and experimental design involving randomization, blocking, and factorial designs?
  • How did the development of computers and computational practices, statistical techniques, and the molecularization of genetics, shape theory and theorizing in population genetics, especially from the 1980s to today?
  • How might normative context surrounding the concept of “race” impact the way concepts such as “heritability” and “lineage,” or principles such as HWP, are deployed in population genetics?

As when studying an organism, the structure of theory cannot be understood independently of its history and function.

2. The Syntactic View

According to the Syntactic View, which emerged mainly out of work of the Vienna Circle and Logical Empiricism (see Coffa 1991; Friedman 1999; Creath 2014; Uebel 2014), philosophy most generally practiced is, and should be, the study of the logic of natural science, or Wissenschaftslogik (Carnap 1937, 1966; Hempel 1966). Robust and clear logical languages allow us to axiomatically reconstruct theories, which—by the Syntacticists’ definition—are sets of sentences in a given logical domain language (e.g., Campbell 1920, 122; Hempel 1958, 46; cf. Carnap 1967 [1928], §156, “Theses about the Constructional System”). Domain languages include “the language of physics, the language of anthropology” (Carnap 1966, 58).

This view has been variously baptized as the Received View (Putnam 1962; Hempel 1970), the Syntactic Approach (van Fraassen 1970, 1989), the Syntactic View (Wessels 1976), the Standard Conception (Hempel 1970), the Orthodox View (Feigl 1970), the Statement View (Moulines 1976, 2002; Stegmüller 1976), the Axiomatic Approach (van Fraassen 1989), and the Once Received View (Craver 2002). For historical reasons, and because of the linguistic trichotomy discussed above, the “Syntactic View” shall be the name of choice in this entry.

Some conceptual taxonomy is required in order to understand the logical framework of the structure of scientific theories for the Syntactic View. We shall distinguish terms , sentences , and languages (see Table 1).

2.1.1 Terms

Building upwards from the bottom, let us start with the three kinds of terms or vocabularies contained in a scientific language: theoretical, logical, and observational. Examples of theoretical terms are “molecule,” “atom,” “proton,” and “protein,” and perhaps even macro-level objects and properties such as “proletariat” and “aggregate demand.” Theoretical terms or concepts can be classificatory (e.g., “cat” or “proton”), comparative (e.g., “warmer”), or quantitative (e.g., “temperature”) (Hempel 1952; Carnap 1966, Chapter 5). Moreover, theoretical terms are “theoretical constructs” introduced “jointly” as a “theoretical system” (Hempel 1952, 32). Logical terms include quantifiers (e.g., \(\forall, \exists\)) and connectives (e.g., \(\wedge, \rightarrow\)). Predicates such as “hard,” “blue,” and “hot,” and relations such as “to the left of” and “smoother than,” are observational terms.

2.1.2 Sentences

Terms can be strung together into three kinds of sentences: theoretical, correspondence, and observational. \(T_S\) is the set of theoretical sentences that are the axioms, theorems, and laws of the theory. Theoretical sentences include the laws of Newtonian mechanics and of the Kinetic Theory of Gases, all suitably axiomatized (e.g., Carnap 1966; Hempel 1966). Primitive theoretical sentences (e.g., axioms) can be distinguished from derivative theoretical sentences (e.g., theorems; see Reichenbach 1969 [1924]; Hempel 1958; Feigl 1970). \(C_S\) is the set of correspondence sentences tying theoretical sentences to observable phenomena or “to a ‘piece of reality’” (Reichenbach 1969 [1924], 8; cf. Einstein 1934, 1936 [1936], 351). To simplify, they provide the theoretical syntax with an interpretation and an application, i.e., a semantics. Suitably axiomatized version of the following sentences provide semantics to Boyle’s law, \(PV = nRT\): “\(V\) in Boyle’s law is equivalent to the measurable volume \(xyz\) of a physical container such as a glass cube that is \(x\), \(y\), and \(z\) centimeters in length, width, and height, and in which the gas measured is contained” and “\(T\) in Boyle’s law is equivalent to the temperature indicated on a reliable thermometer or other relevant measuring device properly calibrated, attached to the physical system, and read.” Carnap (1987 [1932], 466) presents two examples of observational sentences, \(O_S\): “Here (in a laboratory on the surface of the earth) is a pendulum of such and such a kind,” and “the length of the pendulum is 245.3 cm.” Importantly, theoretical sentences can only contain theoretical and logical terms; correspondence sentences involve all three kinds of terms; and observational sentences comprise only logical and observational terms.

2.1.3 Languages

The total domain language of science consists of two languages: the theoretical language, \(L_T\), and the observational language, \(L_O\) (e.g., Hempel 1966, Chapter 6; Carnap 1966, Chapter 23; the index entry for “Language,” of Feigl, Scriven, and Maxwell 1958, 548 has three subheadings: “observation,” “theoretical,” and “ordinary”). The theoretical language includes theoretical vocabulary, while the observational language involves observational terms. Both languages contain logical terms. Finally, the theoretical language includes, and is constrained by, the logical calculus, Calc , of the axiomatic system adopted (e.g., Hempel 1958, 46; Suppe 1977, 50-53). This calculus specifies sentence grammaticality as well as appropriate deductive and non-ampliative inference rules (e.g., modus ponens) pertinent to, especially, theoretical sentences. Calc can itself be written in theoretical sentences.

2.1.4 Theory Structure, in General

Table 1 summarizes the Syntactic View’s account of theory structure:

The salient divide is between theory and observation. Building on Table 1, there are three different levels of scientific knowledge, according to the Syntactic View:

\(\{T_S\} =\) The uninterpreted syntactic system of the scientific theory. \(\{T_S, C_S\} =\) The scientific theory structure of a particular domain (e.g., physics, anthropology). \(\{T_S,C_S,O_S\} =\) All of the science of a particular domain.

Scientific theory is thus taken to be a syntactically formulated set of theoretical sentences (axioms, theorems, and laws) together with their interpretation via correspondence sentences. As we have seen, theoretical sentences and correspondence sentences are cleanly distinct, even if both are included in the structure of a scientific theory.

Open questions remain. Is the observation language a sub-language of the theoretical language, or are they both parts of a fuller language including all the vocabulary? Can the theoretical vocabulary or language be eliminated in favor of a purely observational vocabulary or language? Are there other ways of carving up kinds of languages? First, a “dialectical opposition” between “logic and experience,” “form and content,” “constitutive principles and empirical laws,” and “‘from above’… [and] ‘from below’” pervades the work of the syntacticists (Friedman 1999, 34, 63). Whether syntacticists believe that a synthesis or unification of this general opposition between the theoretical (i.e., logic, form) and the observational (i.e., experience, content) is desirable remains a topic of ongoing discussion. Regarding the second question, Hempel 1958 deflates what he calls “the theoretician’s dilemma”—i.e., the putative reduction without remainder of theoretical concepts and sentences to observational concepts and sentences. Finally, other language divisions are possible, as Carnap 1937 argues (see Friedman 1999, Chapter 7). Returning to the main thread of this section, the distinction toolkit of theoretical and observational terms, sentences, and languages (Table 1) permit the syntacticists to render theoretical structure sharply, thereby aiming at the reconstructive “logic of science” ( Wissenschafstlogik ) that they so desire.

Reichenbach 1969 [1924] stands as a canonical attempt by a central developer of the Syntactic View of axiomatizing a physical theory, viz., relativity theory (cf. Friedman 1983, 1999; see also Reichenbach 1965 [1920]). For the purposes of this encyclopedia entry, it is preferable to turn to another syntactic axiomatization effort. In axiomatizing Newtonian mechanics, the mid-20 th century mathematical logician Hans Hermes spent significant energy defining the concept of mass (Hermes 1938, 1959; Jammer 1961). More precisely, he defines the theoretical concept of “mass ratio” of two particles colliding inelastically in an inertial reference frame \(S\). Here is his full definition of mass ratio (1959, 287):

One paraphrase of this definition is, “‘the mass of \(x\) is α times that of \(x_0\)’ is equivalent to ‘there exists a system \(S\), an instant \(t\), momentary mass points \(y\) and \(y_0\), and initial velocities \(v\) and \(v_0\), such that \(y\) and \(y_0\) are genidentical, respectively, with \(x\) and \(x_0\); the joined mass points move with a velocity of 0 with respect to frame \(S\) immediately upon colliding at time \(t\); and \(y\) and \(y_0\) have determinate velocities \(v\) and \(v_0\) before the collision in the ratio α, which could also be 1 if \(x\) and \(x_0\) are themselves genidentical.’” Hermes employs the notion of “genidentical” to describe the relation between two temporal sections of a given particle’s world line (Jammer 1961, 113). Set aside the worry that two distinct particles cannot be genidentical per Hermes’ definition, though they can have identical properties. In short, this definition is syntactically complete and is written in first-order predicate logic, as are the other axioms and definitions in Hermes (1938, 1959). Correspondence rules connecting a postulated mass \(x\) with an actual mass were not articulated by Hermes.

The link between theory structure and the world, under the Syntactic View, is contained in the theory itself: \(C_S\), the set of correspondence rules. The term “correspondence rules” (Margenau 1950; Nagel 1961, 97–105; Carnap 1966, Chapter 24) has a variety of near-synonyms:

  • Dictionary (Campbell 1920)
  • Operational rules (Bridgman 1927)
  • Coordinative definitions (Reichenbach 1969 [1924], 1938)
  • Reduction sentences (Carnap 1936/1937; Hempel 1952)
  • Correspondence postulates (Carnap 1963)
  • Bridge principles (Hempel 1966; Kitcher 1984)
  • Reduction functions (Schaffner 1969, 1976)
  • Bridge laws (Sarkar 1998)

Important differences among these terms cannot be mapped out here. However, in order to better understand correspondence rules, two of their functions will be considered: (i) theory interpretation (Carnap, Hempel) and (ii) theory reduction (Nagel, Schaffner). The dominant perspective on correspondence rules is that they interpret theoretical terms. Unlike “mathematical theories,” the axiomatic system of physics “cannot have… a splendid isolation from the world” (Carnap 1966, 237). Instead, scientific theories require observational interpretation through correspondence rules. Even so, surplus meaning always remains in the theoretical structure (Hempel 1958, 87; Carnap 1966). Second, correspondence rules are seen as necessary for inter-theoretic reduction (van Riel and Van Gulick 2014). For instance, they connect observation terms such as “temperature” in phenomenological thermodynamics (the reduced theory) to theoretical concepts such as “mean kinetic energy” in statistical mechanics (the reducing theory). Correspondence rules unleash the reducing theory’s epistemic power. Notably, Nagel (1961, Chapter 11; 1979) and Schaffner (1969, 1976, 1993) allow for multiple kinds of correspondence rules, between terms of either vocabulary, in the reducing and the reduced theory (cf. Callender 1999; Winther 2009; Dizadji-Bahmani, Frigg, and Hartmann 2010). Correspondence rules are a core part of the structure of scientific theories and serve as glue between theory and observation.

Finally, while they are not part of the theory structure, and although we saw some examples above, observation sentences are worth briefly reviewing. Correspondence rules attach to the content of observational sentences. Observational sentences were analyzed as (i) protocol sentences or Protokollsätze (e.g., Schlick 1934; Carnap 1987 [1932], 1937, cf. 1963; Neurath 1983 [1932]), and as (ii) experimental laws (e.g., Campbell 1920; Nagel 1961; Carnap 1966; cf. Duhem 1954 [1906]). Although constrained by Calc , the grammar of these sentences is determined primarily by the order of nature, as it were. In general, syntacticists do not consider methods of data acquisition, experiment, and measurement to be philosophically interesting. In contrast, the confirmation relation between (collected) data and theory, especially as developed in inductive logic (e.g., Reichenbach 1938, 1978; Carnap 1962 [1950], 1952), as well as questions about the conventionality, grammaticality, foundationalism, atomism, and content of sense-data and synthetic statements, are considered philosophically important (e.g., Carnap 1987 [1932], 1937, 1966; Neurath 1983 [1932]; Reichenbach 1951; Schlick 1925 [1918], 1934; for contemporary commentary, see, e.g., Creath 1987, 2014; Rutte 1991; Friedman 1999).

To summarize, the Syntactic View holds that there are three kinds of terms or vocabularies: logical, theoretical, and observational; three kinds of sentences: \(T_S\), \(C_S\), and \(O_S\); and two languages: \(L_T\) and \(L_O\). Moreover, the structure of scientific theories could be analyzed using the logical tools of metamathematics. The goal is to reconstruct the logic of science, viz. to articulate an axiomatic system.

Interestingly, this perspective has able and active defenders today, who discuss constitutive and axiomatized principles of the historical “relativized a priori” (Friedman 2001, cf. 2013), argue that “the semantic view, if plausible, is syntactic” (Halvorson 2013), and explore “logicism” for, and in, the philosophy of science (Demopulous 2003, 2013; van Benthem 2012). Furthermore, for purposes of the syntactic reconstruction of scientific theories, some continue espousing—or perhaps plea for the resurrection of—predicate logic (e.g., Lutz 2012, 2014), while other contemporary syntacticists (e.g., Halvorson 2012, 2013, 2019) endorse more recently developed metamathematical and mathematical equipment, such as category theory, which “turns out to be a kind of universal mathematical language like set theory” (Awodey 2006, 2; see Eilenberg and MacLane 1945). Importantly, Halvorson (2019) urges that interlocutors adopt “structured” rather than “flat” views of theories. For the case of the syntactic view this would mean that rather than accept the usual formulation that a theory is a set of sentences, “… [we] might say that a theory consists of both sentences and inferential relations between those sentences” (Halvorson 2019, 277–8). Classical syntacticists such as Rudolf Carnap (Friedman 1999, 2011; Carus 2007; Blatti and Lapointe 2016; Koellner ms. in Other Internet Resources) and Joseph Henry Woodger (Nicholson and Gawne 2014) have recently received increasing attention.

3. The Semantic View

An overarching theme of the Semantic View is that analyzing theory structure requires employing mathematical tools rather than predicate logic. After all, defining scientific concepts within a specific formal language makes any axiomatizing effort dependent on the choice, nature, and idiosyncrasies of that narrowly-defined language. For instance, Suppes understands first-order predicate logic, with its “linguistic” rather than “set-theoretical” entities, as “utterly impractical” for the formalization of “theories with more complicated structures like probability theory” (Suppes 1957, 232, 248–9; cf. Suppes 2002). Van Fraassen, another influential defender of the Semantic View, believes that the logical apparatus of the Syntactic View “had moved us mille milles de toute habitation scientifique , isolated in our own abstract dreams” (van Fraassen 1989, 225). Indeed, what would the appropriate logical language for specific mathematical structures be, especially when such structures could be reconstructed in a variety of formal languages? Why should we imprison mathematics and mathematical scientific theory in syntactically defined language(s) when we could, instead, directly investigate the mathematical objects, relations, and functions of scientific theory?

Consistent with the combat strategy (discussed in the Conclusion), here is a list of grievances against the Syntactic View discussed at length in the work of some semanticists.

  • First-Order Predicate Logic Objection . Theoretical structure is intrinsically and invariably tied to the specific choice of a language, \(L_T\), expressed in first-order predicate logic. This places heavy explanatory and representational responsibility on relatively inflexible and limited languages.
  • Theory Individuation Objection . Since theories are individuated by their linguistic formulations, every change in high-level syntactic formulations will bring forth a distinct theory. This produces a reductio: if \(T_1 = p \rightarrow q\) and \(T_2 = \neg p \vee q\) then \(T_1\) and \(T_2\), though logically equivalent, have different syntactic formulations and would be distinct theories.
  • Theoretical/Observational Languages Objection . Drawing the theoretical/observational distinction in terms of language is inappropriate, as observability pertains to entities rather than to concepts.
  • Unintended Models Objection . There is no clear way of distinguishing between intended and unintended models for syntactically characterized theories (e.g., the Löwenheim-Skolem theorem, Bays 2014).
  • Confused Correspondence Rules Objection . Correspondence rules are a confused medley of direct meaning relationships between terms and world, means of inter-theoretic reduction, causal relationship claims, and manners of theoretical concept testing.
  • Trivially True yet Non-Useful Objection . Presenting scientific theory in a limited axiomatic system, while clearly syntactically correct, is neither useful nor honest, since scientific theories are mathematical structures.
  • Practice and History Ignored Objection . Syntactic approaches do not pay sufficient attention to the actual practice and history of scientific theorizing and experimenting.

What, then, does the Semantic View propose to put in the Syntactic View’s place?

Even a minimal description of the Semantic View must acknowledge two distinct strategies of characterizing and comprehending theory structure: the state-space and the set-/model-theoretic approaches.

3.1.1 The State-Space Approach

The state-space approach emphasizes the mathematical models of actual science, and draws a clear line between mathematics and metamathematics. The structure of a scientific theory is identified with the “class,” “family” or “cluster” of mathematical models constituting it, rather than with any metamathematical axioms “yoked to a particular syntax” (van Fraassen 1989, 366). Under this analysis, “the correct tool for philosophy of science is mathematics, not metamathematics”—this is Suppes’ slogan, per van Fraassen (1989, 221; 1980, 65). In particular, a state space or phase space is an \(N\)-dimensional space, where each of the relevant variables of a theory correspond to a single dimension and each point in that space represents a possible state of a real system. An actual, real system can take on, and change, states according to different kinds of laws, viz., laws of succession determining possible trajectories through that space (e.g., Newtonian kinematic laws); laws of co-existence specifying the permitted regions of the total space (e.g., Boyle’s law); and laws of interaction combining multiple laws of succession or co-existence, or both (e.g., population genetic models combining laws of succession for selection and genetic drift, Wright 1969; Lloyd 1994 [1988]; Rice 2004; Clatterbuck, Sober, and Lewontin 2013). Different models of a given theory will share some dimensions of their state space while differing in others. Such models will also partially overlap in laws (for further discussion of state spaces, laws, and models pertinent to the Semantic View, see Suppe 1977, 224–8; Lloyd 1994, Chapter 2; Nolte 2010; Weisberg 2013, 26–9).

Historically, the state-space approach emerged from work by Evert Beth, John von Neumann, and Hermann Weyl, and has important parallels with Przełęcki (1969) and Dalla Chiara Scabia and Toraldo di Francia (1973) (on the history of the approach see: Suppe 1977; van Fraassen 1980, 65–67; Lorenzano 2013; advocates of the approach include: Beatty 1981; Giere 1988, 2004; Giere, Bickle, and Mauldin 2006; Lloyd 1983, 1994 [1988], 2013 In Press; Suppe 1977, 1989; Thompson, 1989, 2007; van Fraassen 1980, 1989, 2008; for alternative early analyses of models see, e.g., Braithwaite 1962; Hesse 1966, 1967). Interestingly, van Fraassen (1967, 1970) provides a potential reconstruction of state spaces via an analysis of “semi-interpreted languages.” Weisberg (2013), building on many insights from Giere’s work, presents a broad view of modeling that includes mathematical structures that are “trajectories in state spaces” (29), but also permits concrete objects and computational structures such as algorithms to be deemed models. Lorenzano (2013) calls Giere’s (and, by extension, Weisberg’s and even Godfrey-Smith’s 2006) approach “model-based,” separating it out from the state-space approach. A more fine-grained classification of the state-space approach is desirable, particularly if we wish to understand important lessons stemming from the Pragmatic View of Theories, as we shall see below.

As an example of a state-space analysis of modeling, consider a capsule traveling in outer space. An empirically and dynamically adequate mathematical model of the capsule’s behavior would capture the position of the capsule (i.e., three dimensions of the formal state space), as well as the velocity and acceleration vectors for each of the three standard spatial dimensions (i.e., six more dimensions in the formal state space). If the mass were unknown or permitted to vary, we would have to add one more dimension. Possible and actual trajectories of our capsule, with known mass, within this abstract 9-dimensional state space could be inferred via Newtonian dynamical laws of motion (example in Lewontin 1974, 6–8; consult Suppe 1989, 4). Importantly, under the state-space approach, the interesting philosophical work of characterizing theory structure (e.g., as classes of models), theory meaning (e.g., data models mapped to theoretical models), and theory function (e.g., explaining and predicting) happens at the level of mathematical models.

3.1.2 The Set-/Model-Theoretic Approach

Lurking in the background of the state-space conception is the fact that mathematics actually includes set theory and model theory—i.e., mathematical logic. Indeed, according to some interlocutors, “metamathematics is part of mathematics” (Halvorson 2012, 204). Historically, a set-/model-theoretic approach emerged from Tarski’s work and was extensively articulated by Suppes and his associates (van Fraassen 1980, 67). Set theory is a general language for formalizing mathematical structures as collections—i.e., sets—of abstract objects (which can themselves be relations or functions; see Krivine 2013 [1971]). Model theory investigates the relations between, on the one hand, the formal axioms, theorems, and laws of a particular theory and, on the other hand, the mathematical structures—the models—that provide an interpretation of that theory, or put differently, that make the theory’s axioms, theorems, and laws true (Hodges 1997, Chapter 2; Jones 2005). Interestingly, model theory often uses set theory (e.g., Marker 2002); set theory can, in turn, be extended to link axiomatic theories and semantic models via “set-theoretical predicates” (e.g., Suppes 1957, 2002). Finally, there are certain hybrids of these two branches of mathematical logic, including “partial structures” (e.g., da Costa and French 1990, 2003; Bueno 1997; French 2017; French and Ladyman 1999, 2003; Vickers 2009; Bueno, French, and Ladyman 2012). Lorenzano (2013) provides a more complex taxonomy of the intellectual landscape of the Semantic View, including a discussion of Structuralism, a kind of set-/model-theoretic perspective. Structuralism involves theses about “theory-nets,” theory-relative theoretical vs. non-theoretical terms, a diversity of intra- and inter-theoretic laws with different degrees of generality, a typology of inter-theoretic relations, and a rich account of correspondence rules in scientific practice (see Moulines 2002; Pereda 2013; Schmidt 2014; Ladyman 2014). On the whole, the set-/model-theoretic approach of the Semantic View insists on the inseparability of metamathematics and mathematics. In preferring to characterize a theory axiomatically in terms of its intension rather than its extension, it shares the Syntactic View’s aims of reconstructive axiomatization (e.g., Sneed 1979; Stegmüller 1979; Frigg and Votsis 2011; Halvorson 2013, 2019; Lutz 2012, 2014, 2017).

An example will help motivate the relation between theory and model. Two qualifications are required: (i) we return to a more standard set-/model-theoretic illustration below, viz., McKinsey, Sugar, and Suppes’ (1953) axiomatization of particle mechanics, and (ii) this motivational example is not from the heartland of model theory (see Hodges 2013). Following van Fraassen’s intuitive case of “seven-point geometry” (1980, 41–44; 1989, 218–220), also known as “the Fano plane” we see how a particular geometric figure, the model , interprets and makes true a set of axioms and theorems, the theory . In topology and geometry there is rich background theory regarding how to close Euclidean planes and spaces to make finite geometries by, for instance, eliminating parallel lines. Consider the axioms of a projective plane:

  • For any two points, exactly one line lies on both.
  • For any two lines, exactly one point lies on both.
  • There exists a set of four points such that no line has more than two of them.

A figure of a geometric model that makes this theory true is:

Geometric figure including triangle ACE with interior circle BDF and center point G. Point B is on line segment AC, D is on CE, and F is on AE. G is the center of the circle. Point G is on line segments AD, BE, and CF.

This is the smallest geometrical model satisfying the three axioms of the projective plane theory. Indeed, this example fits van Fraassen’s succinct characterization of the theory-model relation:

A model is called a model of a theory exactly if the theory is entirely true if considered with respect to this model alone. (Figuratively: the theory would be true if this model was the whole world.) (1989, 218)

That is, if the entire universe consisted solely of these seven points and seven lines, the projective plane theory would be true. Of course, our universe is bigger. Because Euclidean geometry includes parallel lines, the Fano plane is not a model of Euclidean geometry. Even so, by drawing the plane, we have shown it to be isomorphic to parts of the Euclidean plane. In other words, the Fano plane has been embedded in a Euclidean plane. Below we return to the concepts of embedding and isomorphism, but this example shall suffice for now to indicate how a geometric model can provide a semantics for the axioms of a theory.

In short, for the Semantic View the structure of a scientific theory is its class of mathematical models. According to some advocates of this view, the family of models can itself be axiomatized, with those very models (or other models) serving as axiom truth-makers.

Returning to our running example, consider Suppes’ 1957 model-theoretic articulation of particle mechanics, which builds on his 1953 article with J.C.C. McKinsey and A.C. Sugar. Under this analysis, there is a domain of set-theoretic objects of the form \(\{ P, T, s, m, f, g \}\), where \(P\) and \(T\) are themselves sets, \(s\) and \(g\) are binary functions, \(m\) is a unary and \(f\) a ternary function. \(P\) is the set of particles; \(T\) is a set of real numbers measuring elapsed times; \(s(p, t)\) is the position of particle \(p\) at time \(t\); \(m(p)\) is the mass of particle \(p\); \(f(p, q, t)\) is the force particle \(q\) exerts on \(p\) at time \(t\); and \(g(p, t)\) is the total resultant force (by all other particles) on \(p\) at time \(t\). Suppes and his collaborators defined seven axioms—three kinematical and four dynamical—characterizing Newtonian particle mechanics (see also Simon 1954, 1970). Such axioms include Newton’s third law reconstructed in set-theoretic formulation thus (Suppes 1957, 294):

Importantly, the set-theoretic objects are found in more than one of the axioms of the theory, and Newton’s calculus is reconstructed in a novel, set-theoretic form. Set-theoretic predicates such as “is a binary relation” and “is a function” are also involved in axiomatizing particle mechanics (Suppes 1957, 249). Once these axioms are made explicit, their models can be specified and these can, in turn, be applied to actual systems, thereby providing a semantics for the axioms (e.g., as described in Section 3.3.1 below). A particular system satisfying these seven axioms is a particle mechanics system. (For an example of Newtonian mechanics from the state-space approach, recall the space capsule of Section 3.1.1.)

How is the theory structure, described in Section 3.1, applied to empirical phenomena? How do we connect theory and data via observation and experimental and measuring techniques? The Semantic View distinguishes theory individuation from both theory-phenomena and theory-world relations. Three types of analysis of theory interpretation are worth investigating: (i) a hierarchy of models (e.g., Suppes; Suppe), (ii) similarity (e.g., Giere; Weisberg), and (iii) isomorphism (e.g., van Fraassen; French and Ladyman).

3.3.1 A Hierarchy of Models

One way of analyzing theory structure interpretation is through a series of models falling under the highest-level axiomatizations. This series has been called “a hierarchy of models,” though it need not be considered a nested hierarchy. These models include models of theory, models of experiment, and models of data (Suppes 1962, 2002). Here is a summary of important parts of the hierarchy (Suppes 1962, Table 1, 259; cf. Giere 2010, Figure 1, 270):

  • Axioms of Theory . Axioms define set-theoretic predicates, and constitute the core structure of scientific theories, as reviewed in Section 3.1.2.
  • Models of Theory. “Representation Theorems,” permit us “to discover if an interesting subset of models for the theory may be found such that any model for the theory is isomorphic to some member of this subset” (Suppes 1957, 263). Representation theorem methodology can be extended (i) down the hierarchy, both to models of experiment and models of data, and (ii) from isomorphism to homomorphism (Suppes 2002, p. 57 ff.; Suppe 2000; Cartwright 2008).
  • Models of Experiment . Criteria of experimental design motivate choices for how to set up and analyze experiments. There are complex mappings between models of experiment thus specified, and (i) models of theory, (ii) theories of measurement, and (iii) models of data.
  • Models of Data . In building models of data, phenomena are organized with respect to statistical goodness-of-fit tests and parameter estimation, in the context of models of theory. Choices about which parameters to represent must be made.

The temptation to place phenomena at the bottom of the hierarchy must be resisted because phenomena permeate all levels. Indeed, the “class of phenomena” pertinent to a scientific theory is its “intended scope” (Suppe 1977, 223; Weisberg 2013, 40). Furthermore, this temptation raises fundamental questions about scientific representation: “there is the more profound issue of the relationship between the lower most representation in the hierarchy—the data model perhaps—and reality itself, but of course this is hardly something that the semantic approach alone can be expected to address” (French and Ladyman 1999, 113; cf. van Fraassen 2008, 257–258, “The ‘link’ to reality”). Borrowing from David Chalmers, the “hard problem” of philosophy of science remains connecting abstract structures to concrete phenomena, data, and world.

3.3.2 Similarity

The similarity analysis of theory interpretation combines semantic and pragmatic dimensions (Giere 1988, 2004, 2010; Giere, Bickle, and Mauldin 2006; Weisberg 2013). According to Giere, interpretation is mediated by theoretical hypotheses positing representational relations between a model and relevant parts of the world. Such relations may be stated as follows:

Here \(S\) is a scientist, research group or community, \(W\) is a part of the world, and \(X\) is, broadly speaking, any one of a variety of models (Giere 2004, 743, 747, 2010). Model-world similarity judgments are conventional and intentional:

Note that I am not saying that the model itself represents an aspect of the world because it is similar to that aspect. …Anything is similar to anything else in countless respects, but not anything represents anything else. It is not the model that is doing the representing; it is the scientist using the model who is doing the representing. (2004, 747)

Relatedly, Weisberg (2013) draws upon Tversky (1977) to develop a similarity metric for model interpretation (equation 8.10, 148). This metric combines (i) model-target semantics (90–97), and (ii) the pragmatics of “context, conceptualization of the target, and the theoretical goals of the scientist” (149). Giere and Weisberg thus endorse an abundance of adequate mapping relations between a given model and the world. From this diversity, scientists and scientific communities must select particularly useful similarity relationships for contextual modeling purposes. Because of semantic pluralism and irreducible intentionality, this similarity analysis of theory interpretation cannot be accommodated within a hierarchy of models approach, interpreted as a neat model nesting based on pre-given semantic relations among models at different levels.

3.3.3 Isomorphism

The term “isomorphism” is a composite of the Greek words for “equal” and “shape” or “form.” Indeed, in mathematics, isomorphism is a perfect one-to-one, bijective mapping between two structures or sets. Figure (2) literally and figuratively captures the term:

Script writing of isomorphism with mirror image underneath

Especially in set theory, category theory, algebra, and topology, there are various kinds of “-morphisms,” viz., of mapping relations between two structures or models. Figure (3) indicates five different kinds of homomorphism, arranged in a Venn diagram.

Venn diagram with outer circle Hom and 3 intersecting interior circles: Mon, Epi, and End. The intersection of all 3 is Aut and the intersection of Mon and Epi is Iso.

Although philosophers have focused on isomorphism, other morphisms such as monomorphism (i.e., an injective homomorphism where some elements in the co-domain remain unmapped from the domain) might also be interesting to investigate, especially for embedding data (i.e., the domain) into rich theoretical structures (i.e., the co-domain). To complete the visualization above, an epimorphism is a surjective homomorphism, and an endomorphism is a mapping from a structure to itself, although it need not be a symmetrical—i.e., invertible—mapping, which would be an automorph.

Perhaps the most avid supporter of isomorphism and embedding as the way to understand theory interpretation is van Fraassen. In a nutshell, if we distinguish (i) theoretical models, (ii) “empirical substructures” (van Fraassen 1980, 64, 1989, 227; alternatively: “surface models” 2008, 168), and (iii) “observable phenomena” (1989, 227, 2008, 168), then, van Fraassen argues, theory interpretation is a relation of isomorphism between observable phenomena and empirical substructures, which are themselves isomorphic with one or more theoretical models. Moreover, if a relation of isomorphism holds between \(X\) and a richer \(Y\), we say that we have embedded \(X\) in \(Y\). For instance, with respect to the seven-point geometry above (Figure 1), van Fraassen contends that isomorphism gives embeddability, and that the relation of isomorphism “is important because it is also the exact relation a phenomenon bears to some model or theory, if that theory is empirically adequate” (1989, 219–20; this kind of statement seems to be simultaneously descriptive and prescriptive about scientific representation, see Section 1.1 above). In The Scientific Image he is even clearer about fleshing out the empirical adequacy of a theory (with its theoretical models) in terms of isomorphism between “appearances” (i.e., “the structures which can be described in experimental and measurement reports,” 1980, 64, italics removed) and empirical substructures. Speaking metaphorically,

the phenomena are, from a theoretical point of view, small, arbitrary, and chaotic—even nasty, brutish, and short…—but can be understood as embeddable in beautifully simple but much larger mathematical models. (2008, 247; see also van Fraassen 1981, 666 and 1989, 230)

Interestingly, and as a defender of an identity strategy (see Conclusion), Friedman also appeals to embedding and subsumption relations between theory and phenomena in his analyses of theory interpretation (Friedman 1981, 1983). Bueno, da Costa, French, and Ladyman also employ embedding and (partial) isomorphism in the empirical interpretation of partial structures (Bueno 1997; Bueno, French, and Ladyman 2012; da Costa and French 1990, 2003; French 2017; French and Ladyman 1997, 1999, 2003; Ladyman 2004). Suárez discusses complexities in van Fraassen’s analyses of scientific representation and theory interpretation (Suárez 1999, 2011). On the one hand, representation is structural identity between the theoretical and the empirical. On the other hand, “There is no representation except in the sense that some things are used, made, or taken, to represent some things as thus or so” (van Fraassen 2008, 23, italics removed). The reader interested in learning how van Fraassen simultaneously endorses acontextually structural and contextually pragmatic aspects of representation and interpretation should refer to van Fraassen’s (2008) investigations of maps and “the essential indexical.” [To complement the structure vs. function distinction, see van Fraassen 2008, 309–311 for a structure (“structural relations”) vs. history (“the intellectual processes that lead to those models”) distinction; cf. Ladyman et al. 2011] In all of this, embedding via isomorphism is a clear contender for theory interpretation under the Semantic View.

In short, committing to either a state-space or a set-/model-theoretic view on theory structure does not imply any particular perspective on theory interpretation (e.g., hierarchy of models, similarity, embedding). Instead, commitments to the former are logically and actually separable from positions on the latter (e.g., Suppes and Suppe endorse different accounts of theory structure, but share an understanding of theory interpretation in terms of a hierarchy of models). The Semantic View is alive and well as a family of analyses of theory structure, and continues to be developed in interesting ways both in its state-space and set-/model-theoretic approaches.

4. The Pragmatic View

The Pragmatic View recognizes that a number of assumptions about scientific theory seem to be shared by the Syntactic and Semantic Views. Both perspectives agree, very roughly, that theory is (1) explicit, (2) mathematical, (3) abstract, (4) systematic, (5) readily individualizable, (6) distinct from data and experiment, and (7) highly explanatory and predictive (see Flyvbjerg 2001, 38–39; cf. Dreyfus 1986). The Pragmatic View imagines the structure of scientific theories rather differently, arguing for a variety of theses:

  • Limitations . Idealized theory structure might be too weak to ground the predictive and explanatory work syntacticists and semanticists expect of it (e.g., Cartwright 1983, 1999a, b, 2019; Morgan and Morrison 1999; Suárez and Cartwright 2008).
  • Pluralism . Theory structure is plural and complex both in the sense of internal variegation and of existing in many types. In other words, there is an internal pluralism of theory (and model) components (e.g., mathematical concepts, metaphors, analogies, ontological assumptions, values, natural kinds and classifications, distinctions, and policy views, e.g., Kuhn 1970; Boumans 1999), as well as a broad external pluralism of different types of theory (and models) operative in science (e.g., mechanistic, historical, and mathematical models, e.g., Hacking 2009, Longino 2013). Indeed, it may be better to speak of the structures of scientific theories, in the double-plural.
  • Nonformal aspects. The internal pluralism of theory structure (thesis #2) includes many nonformal aspects deserving attention. That is, many components of theory structure, such as metaphors, analogies, values, and policy views have a non-mathematical and “informal” nature, and they lie implicit or hidden (e.g., Bailer-Jones 2002; Craver 2002; Contessa 2006; Morgan 2012). Interestingly, the common understanding of “formal,” which identifies formalization with mathematization, may itself be a conceptual straightjacket; the term could be broadened to include “diagram abstraction” and “principle extraction” (e.g., Griesemer 2013, who explicitly endorses what he also calls a “Pragmatic View of Theories”).
  • Function. Characterizations of the nature and dynamics of theory structure should pay attention to the user as well as to purposes and values (e.g., Apostel 1960; Minsky 1965; Morrison 2007; Winther 2012a).
  • Practice . Theory structure is continuous with practice and “the experimental life,” making it difficult to neatly dichotomize theory and practice (e.g., Hacking 1983, 2009; Shapin and Schaffer 1985; Galison 1987, 1988, 1997; Suárez and Cartwright 2008, Cartwright 2019).

These are core commitments of the Pragmatic View.

It is important to note at the outset that the Pragmatic View takes its name from the linguistic trichotomy discussed above, in the Introduction. This perspective need not imply commitment to, or association with, American Pragmatism (e.g. the work of Charles S. Peirce, William James, or John Dewey; cf. Hookway 2013; Richardson 2002). For instance, Hacking (2007a) distinguishes his pragmatic attitudes from the school of Pragmatism. He maps out alternative historical routes of influence, in general and on him, vis-à-vis fallibilism (via Imre Lakatos, Karl Popper; Hacking 2007a, §1), historically conditioned truthfulness (via Bernard Williams; Hacking 2007a, §3), and realism as intervening (via Francis Everitt, Melissa Franklin; Hacking 2007a, §4). To borrow a term from phylogenetics, the Pragmatic View is “polyphyletic.” The components of its analytical framework have multiple, independent origins, some of which circumnavigate American Pragmatism.

With this qualification and the five theses above in mind, let us now turn to the Pragmatic View’s analysis of theory structure and theory interpretation.

We should distinguish two strands of the Pragmatic View: the Pragmatic View of Models and a proper Pragmatic View of Theories .

4.1.1 The Pragmatic View of Models

Nancy Cartwright’s How the Laws of Physics Lie crystallized the Pragmatic View of Models. Under Cartwright’s analysis, models are the appropriate level of investigation for philosophers trying to understand science. She argues for significant limitations of theory (thesis #1), claiming that laws of nature are rarely true, and are epistemically weak. Theory as a collection of laws cannot, therefore, support the many kinds of inferences and explanations that we have come to expect it to license. Cartwright urges us to turn to models and modeling, which are central to scientific practice. Moreover, models “lie”—figuratively and literally—between theory and the world (cf. Derman 2011). That is, “to explain a phenomenon is to find a model that fits it into the basic framework of the theory and that thus allows us to derive analogues for the messy and complicated phenomenological laws which are true of it.” A plurality of models exist, and models “serve a variety of purposes” (Cartwright 1983, 152; cf. Suppes 1978). Cartwright is interested in the practices and purposes of scientific models, and asks us to focus on models rather than theories.

Cartwright’s insights into model pluralism and model practices stand as a significant contribution of “The Stanford School” (cf. Cat 2014), and were further developed by the “models as mediators” group, with participants at LSE, University of Amsterdam, and University of Toronto (Morgan and Morrison 1999; Chang 2011; cf. Martínez 2003). This group insisted on the internal pluralism of model components (thesis #2). According to Morgan and Morrison, building a model involves “fitting together… bits which come from disparate sources,” including “stories” (Morgan and Morrison 1999, 15). Boumans (1999) writes:

model building is like baking a cake without a recipe. The ingredients are theoretical ideas, policy views, mathematisations of the cycle, metaphors and empirical facts. (67) Mathematical moulding is shaping the ingredients in such a mathematical form that integration is possible… (90)

In an instructive diagram, Boumans suggests that a variety of factors besides theory and data feed into a model: metaphors, analogies, policy views, stylised facts, mathematical techniques, and mathematical concepts (93). The full range of components involved in a model will likely vary according to discipline, and with respect to explanations and interventions sought (e.g., analogies but not policy views will be important in theoretical physics). In short, model building involves a complex variety of internal nonformal aspects, some of which are implicit (theses #2 and #3).

As one example of a nonformal component of model construction and model structure, consider metaphors and analogies (e.g., Bailer-Jones 2002). Geary (2011) states the “simplest equation” of metaphor thus: “\(X = Y\)” (8, following Aristotle: “Metaphor consists in giving the thing a name that belongs to something else… ,” Poetics , 1457b). The line between metaphor and analogy in science is blurry. Some interlocutors synonymize them (e.g., Hoffman 1980; Brown 2003), others reduce one to the other (analogy is a form of metaphor, Geary 2011; metaphor is a kind of analogy, Gentner 1982, 2003), and yet others bracket one to focus on the other (e.g., Oppenheimer 1956 sets aside metaphor). One way to distinguish them is to reserve “analogy” for concrete comparisons, with clearly identifiable and demarcated source and target domains, and with specific histories, and use “metaphor” for much broader and indeterminate comparisons, with diffuse trajectories across discourses. Analogies include the “lines of force” of electricity and magnetism (Maxwell and Faraday), the atom as a planetary system (Rutherford and Bohr), the benzene ring as a snake biting its own tail (Kekulé), Darwin’s “natural selection” and “entangled bank,” and behavioral “drives” (Tinbergen) (e.g., Hesse 1966, 1967; Bartha 2010). Examples of metaphor are genetic information, superorganism, and networks (e.g., Keller 1995). More could be said about other informal model components, but this discussion of metaphors and analogies shall suffice to hint at how models do not merely lie between theory and world. Models express a rich internal pluralism (see also de Chadarevian and Hopwood 2004; Morgan 2012).

Model complexity can also be seen in the external plurality of models (thesis #2). Not all models are mathematical, or even ideally recast as mathematical. Non-formalized (i.e., non–state-space, non-set-/model-theoretic) models such as physical, diagrammatic, material, historical, “remnant,” and fictional models are ubiquitous across the sciences (e.g., Frigg and Hartmann 2012; for the biological sciences, see Hull 1975; Beatty 1980; Griesemer 1990, 1991 a, b, 2013; Downes 1992; Richards 1992; Winther 2006a; Leonelli 2008; Weisberg 2013). Moreover, computer simulations differ in important respects from more standard analytical mathematical models (e.g., Smith 1996; Winsberg 2010; Weisberg 2013). According to some (e.g., Griesemer 2013; Downes 1992; Godfrey-Smith 2006; Thomson-Jones 2012), this diversity belies claims by semanticists that models can always be cast “into set theoretic terms” (Lloyd 2013 In Press), are “always a mathematical structure” (van Fraassen 1970, 327), or that “formalisation of a theory is an abstract representation of the theory expressed in a formal deductive framework… in first-order predicate logic with identity, in set theory, in matrix algebra and indeed, any branch of mathematics...” (Thompson 2007, 485–6). Even so, internal pluralism has been interpreted as supporting a “deflationary semantic view,” which is minimally committed to the perspective that “model construction is an important part of scientific theorizing” (Downes 1992, 151). Given the formal and mathematical framework of the Semantic View (see above), however, the broad plurality of kinds of models seems to properly belong under a Pragmatic View of Models.

4.1.2 The Pragmatic View of Theories

Interestingly, while critiquing the Syntactic and Semantic Views on most matters, the Pragmatic View of Models construed theory, the process of theorizing, and the structure of scientific theories, according to terms set by the two earlier views. For instance, Cartwright tends to conceive of theory as explicit, mathematical, abstract, and so forth (see the first paragraph of Section 4). She always resisted “the traditional syntactic/semantic view of theory” for its “vending machine” view, in which a theory is a deductive and automated machine that upon receiving empirical input “gurgitates” and then “drops out the sought-for representation” (1999a, 184–5). Rather than reform Syntactic and Semantic accounts of theory and theory structure, however, she invites us, as we just saw, to think of science as modeling, “with theory as one small component” (Cartwright, Shomar, and Suárez 1995, 138; Suárez and Cartwright 2008). Many have followed her. Kitcher’s predilection is also to accept the terms of the Syntactic and Semantic Views. For instance, he defines theories as “axiomatic deductive systems” (1993, 93). In a strategy complementary to Cartwright’s modeling turn, Kitcher encourages us to focus on practice, including practices of modeling and even practices of theorizing. In The Advancement of Science , practice is analyzed as a 7-tuple, with the following highly abbreviated components: (i) a language; (ii) questions; (iii) statements (pictures, diagrams); (iv) explanatory patterns; (v) standard examples; (vi) paradigms of experimentation and observation, plus instruments and tools; and (vii) methodology (Kitcher 1993, 74). Scientific practice is also center stage for those singing the praises of “the experimental life” (e.g., Hacking 1983; Shapin and Schaffer 1985; Galison 1987), and those highlighting the cognitive grounds of science (e.g., Giere 1988; Martínez 2014) and science’s social and normative context (e.g., Kitcher 1993, 2001; Longino 1995, 2002; Ziman 2000; cf. Simon 1957). Indeed, the modeling and practice turns in the philosophy of science were reasonable reactions to the power of axiomatic reconstructive and mathematical modeling analyses of the structure of scientific theories.

Yet, a Pragmatic View of Theories is also afoot, one resisting orthodox characterizations of theory often embraced, at least early on, by Pragmatic View philosophers such as Cartwright, Hacking, Kitcher, and Longino. For instance, Craver (2002) accepts both the Syntactic and Semantic Views, which he humorously and not inaccurately calls “the Once Received View” and the “Model Model View.” But he also observes:

While these analyses have advanced our understanding of some formal aspects of theories and their uses, they have neglected or obscured those aspects dependent upon nonformal patterns in theories. Progress can be made in understanding scientific theories by attending to their diverse nonformal patterns and by identifying the axes along which such patterns might differ from one another. (55)

Craver then turns to mechanistic theory as a third theory type (and a third philosophical analysis of theory structure) that highlights nonformal patterns:

Different types of mechanisms can be distinguished on the basis of recurrent patterns in their organization. Mechanisms may be organized in series, in parallel, or in cycles. They may contain branches and joins, and they often include feedback and feedforward subcomponents. (71)

Consistent with theses #2 and #3 of the Pragmatic View, we must recognize the internal pluralism of theories as including nonformal components. Some of these are used to represent organizational and compositional relations of complex systems (Craver 2007; Wimsatt 2007; Winther 2011; Walsh 2015). While mechanistic analyses such as Craver’s may not wish to follow every aspect of the Pragmatic View of Theories, there are important and deep resonances between the two.

In a review of da Costa and French (2003), Contessa (2006) writes:

Philosophers of science are increasingly realizing that the differences between the syntactic and the semantic view are less significant than semanticists would have it and that, ultimately, neither is a suitable framework within which to think about scientific theories and models. The crucial divide in philosophy of science, I think, is not the one between advocates of the syntactic view and advocates of the semantic view, but the one between those who think that philosophy of science needs a formal framework or other and those who think otherwise. (376)

Again, we are invited to develop a non-formal framework of science and presumably also of scientific theory. (Halvorson 2012, 203 takes Contessa 2006 to task for advocating “informal philosophy of science.”) Moreover, in asking “what should the content of a given theory be taken to be on a given occasion?”, Vickers (2009) answers:

It seems clear that, in addition to theories being vague objects in the way that ‘heaps’ of sand are, there will be fundamentally different ways to put together theoretical assumptions depending on the particular investigation one is undertaking. For example, sometimes it will be more appropriate to focus on the assumptions which were used by scientists, rather than the ones that were believed to be true. (247, footnote suppressed)

A Pragmatic View of Theories helps make explicit nonformal internal components of theory structure.

Key early defenders of the modeling and practice turns have also recently begun to envision theory in a way distinct from the terms set by the Syntactic and Semantic Views. Suárez and Cartwright (2008) extend and distribute theory by arguing that “What we know ‘theoretically’ is recorded in a vast number of places in a vast number of different ways—not just in words and formulae but in machines, techniques, experiments and applications as well” (79). And while her influence lies primarily in the modeling turn, even in characterizing the “vending machine” view, Cartwright calls for a “reasonable philosophical account of theories” that is “much more textured, and… much more laborious” than that adopted by the Syntactic and Semantic Views (1999a, 185). The theory-data and theory-world axes need to be rethought. In her 2019 book on “artful modeling”, Cartwright emphasizes the importance of know-how and creativity in scientific practice, and “praise[s] engineers and cooks and inventors, as well as experimental physicists like Millikan and Melissa Franklin” (Cartwright 2019, 76). Kitcher wishes to transform talk of theories into discussion of “significance graphs” (2001, 78 ff.). These are network diagrams illustrating which (and how) questions are considered significant in the context of particular scientific communities and norms (cf. Brown 2010). Consistently with a Pragmatic View of Theories, Morrison (2007) reconsiders and reforms canonical conceptualizations of “theory.” Finally, Longino (2013) proposes an archaeology of assumptions behind and under different research programs and theories of human behavior such as neurobiological, molecular behavioral genetic, and social-environmental approaches (e.g., Oyama 2000). For instance, two shared or recurring assumptions across programs and theories are:

(1) that the approach in question has methods of measuring both the behavioral outcome that is the object of investigation and the factors whose association with it are the topic of investigation and (2) that the resulting measurements are exportable beyond the confines of the approach within which they are made. (Longino 2013, 117)

A Pragmatic View of Theories expands the notion of theory to include nonformal aspects, which surely must include elements from Boumans’ list above (e.g., metaphors, analogies, policy views), as well as more standard components such as ontological assumptions (e.g., Kuhn 1970; Levins and Lewontin 1985; Winther 2006b), natural kinds (e.g., Hacking 2007b), and conditions of application or scope (e.g., Longino 2013).

In addition to exploring internal theory diversity and in parallel with plurality of modeling, a Pragmatic View of Theories could also explore pluralism of modes of theorizing, and of philosophically analyzing theoretical structure (thesis #2). Craver (2002) provides a start in this direction in that he accepts three kinds of scientific theory and of philosophical analysis of scientific theory. A more synoptic view of the broader pragmatic context in which theories are embedded can be found in the literature on different “styles” of scientific reasoning and theorizing (e.g., Crombie 1994, 1996; Vicedo 1995; Pickstone 2000; Davidson 2001; Hacking 2002, 2009; Winther 2012b; Elwick 2007; Mancosu 2010). While there is no univocal or dominant classification of styles, two lessons are important. First, a rough consensus exists that theoretical investigations of especially historical, mechanistic, and mathematical structures and relations will involve different styles. Second, each style integrates theoretical products and theorizing processes in unique ways, thus inviting an irreducible pragmatic methodological pluralism in our philosophical analysis of the structure of scientific theories. For instance, the structure of theories of mechanisms in molecular biology or neuroscience involves flow charts, and is distinct from the structure of theories of historical processes and patterns as found in systematics and phylogenetics, which involves phylogenetic trees. As Crombie suggests, we need a “comparative historical anthropology of thinking.” (1996, 71; see Hacking 2009) Mathematical theory hardly remains regnant. It gives way to a pluralism of theory forms and theory processes. Indeed, even mathematical theorizing is a pluralistic motley, as Hacking (2014) argues. Although a “deflationary” Semantic View could account for pluralism of theory forms, the Pragmatic View of Theories, drawing on styles, is required to do justice to the immense variety of theorizing processes, and of philosophical accounts of theory and theory structure.

Finally, outstanding work remains in sorting out the philosophical utility of a variety of proposed units in addition to styles, such as Kuhn’s (1970) paradigms, Lakatos’ (1980) research programmes, Laudan’s (1977) research traditions, and Holton’s (1988) themata. A rational comparative historical anthropology of both theorizing and philosophical analyses of theorizing remains mostly unmapped (cf. Matheson and Dallmann 2014). Such a comparative meta-philosophical analysis should also address Davidson’s (1974) worries about “conceptual schemes” and Popper’s (1996 [1976]) critique of “the myth of the framework” (see Hacking 2002; Godfrey-Smith 2003).

Cartwright has done much to develop a Pragmatic View. Start by considering Newton’s second law:

Here \(F\) is the resultant force on a mass \(m\), and \(a\) is the net acceleration of \(m\); both \(F\) and \(a\) are vectors. This law is considered a “general” (Cartwright 1999a, 187) law expressed with “abstract quantities” (Cartwright 1999b, 249). Newton’s second law can be complemented with other laws, such as (i) Hooke’s law for an ideal spring:

Here \(k\) is the force constant of the spring, and \(x\) the distance along the x-axis from the equilibrium position, and (ii) Coulomb’s law modeling the force between two charged particles:

Here \(K\) is Coulomb’s electrical constant, \(q\) and \(q'\) are the charges of the two objects, and \(r\) the distance between the two objects. The picture Cartwright draws for us is that Newton’s, Hooke’s, and Coulomb’s laws are abstract, leaving out many details. They can be used to derive mathematical models of concrete systems. For instance, by combining (1) and (2), the law of gravitation (a “fundamental” law, Cartwright 1983, 58–59), other source laws, and various simplifying assumptions, we might create a model for the orbit of Mars, treating the Sun and Mars as a 2-body system, ignoring the other planets, asteroids, and Mars’ moons. Indeed, the Solar System is a powerful “nomological machine” (Cartwright 1999a, 50–53), which “is a fixed (enough) arrangement of components, or factors, with stable (enough) capacities that in the right sort of stable (enough) environment will, with repeated operation, give rise to the kind of regular behaviour that we represent in our scientific laws” (Cartwright 1999a, 50). Importantly, most natural systems are complex and irregular, and cannot be neatly characterized as nomological machines. For these cases, abstract laws “run out” (Cartwright 1983) and are rarely smoothly “deidealised” (Suárez 1999). In general, abstract laws predict and explain only within a given domain of application, and only under ideal conditions. More concrete laws or models are not directly deduced from them (e.g., Suárez 1999, Suárez and Cartwright 2008), and they can rarely be combined to form effective “super-laws” (Cartwright 1983, 70–73). In short, the move from (1) and (2) or from (1) and (3) to appropriate phenomenological models, is not fully specified by either abstract law pairing. Indeed, Cartwright developed her notion of “capacities” to discuss how “the principles of physics” “are far better rendered as claims about capacities, capacities that can be assembled and reassembled in different nomological machines, unending in their variety, to give rise to different laws” (1999a, 52). Articulating concrete models requires integrating a mix of mathematical and nonformal components. Laws (1), (2), and (3) remain only one component, among many, of the models useful for, e.g., exploring the behavior of the Solar System, balls on a pool table, or the behavior of charges in electrical fields.

Shifting examples but not philosophical research program, Suárez and Cartwright (2008) explains how analogies such as superconductors as diamagnets (as opposed to ferromagnets) were an integral part of the mathematical model of superconductivity developed by Fritz and Heinz London in the 1930s (63; cf. London and London 1935). Suárez and Cartwright gladly accept that this model “is uncontroversially grounded in classic electromagnetic theory” (64). However, contra Semantic View Structuralists such as Bueno, da Costa, French, and Ladyman, they view nonformal aspects as essential to practices of scientific modeling and theorizing: “The analogy [of diamagnets] helps us to understand how the Londons work with their model… which assumptions they add and which not… a formal reconstruction of the model on its own cannot help us to understand that” (69). In short, the running example of Newtonian mechanics, in conjunction with a glimpse into the use of analogies in mathematical modeling, illustrates the Pragmatic View’s account of theory syntax: theory is constituted by a plurality of formal and informal components.

As we have explored throughout this section, models and theories have informal internal components, and there are distinct modes of modeling and theorizing. Because of the Pragmatic View’s attention to practice, function, and application, distinguishing structure from interpretation is more difficult here than under the Syntactic and Semantic Views. Any synchronic analysis of the structure of models and theories must respect intentional diachronic processes of interpreting and using, as we shall now see.

Regarding the import of function in models and theories (thesis #4), already the Belgian philosopher of science Apostel defined modeling thus: “Let then \(R(S,P,M,T)\) indicate the main variables of the modelling relationship. The subject \(S\) takes, in view of the purpose \(P\), the entity \(M\) as a model for the prototype \(T\)” (1960, 128, see also Apostel 1970). Purposes took center-stage in his article title: “Towards the Formal Study of Models in the Non-Formal Sciences.” MIT Artificial Intelligence trailblazer Minsky also provided a pragmatic analysis:

We use the term “model” in the following sense: To an observer \(B\), an object \(A^*\) is a model of an object \(A\) to the extent that \(B\) can use \(A^*\) to answer questions that interest him about \(A\). The model relation is inherently ternary. Any attempt to suppress the role of the intentions of the investigator \(B\) leads to circular definitions or to ambiguities about “essential features” and the like. (1965, 45)

This account is thoroughly intentionalist and anti-essentialist. That is, mapping relations between model and world are left open and overdetermined. Specifying the relevant relations depends on contextual factors such as questions asked, and the kinds of similarities and isomorphisms deemed to be of interest. The appropriate relations are selected from an infinite (or, at least, near-infinite) variety of possible relations (e.g., Rosenblueth and Wiener 1945; Lowry 1965).

Regarding practice (thesis #5), in addition to ample work on the experimental life mentioned above, consider a small example. A full understanding of the content and structure of the London brothers’ model of superconductivity requires attention to informal aspects such as analogies. Even London and London (1935) state in the summary of their paper that “the current [”in a supraconductor“] is characterized as a kind of diamagnetic volume current” (88). They too saw the diamagnetic analogy as central to their theoretical practices. Criteria and practices of theory confirmation also differ from the ones typical of the Syntactic and Semantic Views. While predictive and explanatory power as well as empirical adequacy remain important, the Pragmatic View also insists on a variety of other justificatory criteria, including pragmatic virtues (sensu Kuhn 1977; Longino 1995) such as fruitfulness and utility. In a nutshell, the Pragmatic View argues that scientific theory structure is deeply shaped and constrained by functions and practices, and that theory can be interpreted and applied validly according to many different criteria.

The analytical framework of the Pragmatic View remains under construction. The emphasis is on internal diversity, and on the external pluralism of models and theories, of modeling and theorizing, and of philosophical analyses of scientific theories. The Pragmatic View acknowledges that scientists use and need different kinds of theories for a variety of purposes. There is no one-size-fits-all structure of scientific theories. Notably, although the Pragmatic View does not necessarily endorse the views of the tradition of American Pragmatism, it has important resonances with the latter school’s emphasis on truth and knowledge as processual, purposive, pluralist, and context-dependent, and on the social and cognitive structure of scientific inquiry.

A further qualification in addition to the one above regarding American Pragmatism is in order. The Pragmatic View has important precursors in the historicist or “world view” perspectives of Feyerabend, Hanson, Kuhn, and Toulmin, which were an influential set of critiques of the Syntactic View utterly distinct from the Semantic View. This philosophical tradition focused on themes such as meaning change and incommensurability of terms across world views (e.g., paradigms), scientific change (e.g., revolutionary: Kuhn 1970; evolutionary: Toulmin 1972), the interweaving of context of discovery and context of justification, and scientific rationality (Preston 2012; Bird 2013; Swoyer 2014). The historicists also opposed the idea that theories can secure meaning and empirical support from a theory-neutral and purely observational source, as the Syntactic View had insisted on with its strong distinction between theoretical and observational vocabularies (cf. Galison 1988). Kuhn’s paradigms or, more precisely, “disciplinary matrices” even had an internal anatomy with four components: (i) laws or symbolic generalizations, (ii) ontological assumptions, (iii) values, and (iv) exemplars (Kuhn 1970, postscript; Godfrey-Smith 2003; Hacking 2012). This work was concerned more with theory change than with theory structure and had fewer conceptual resources from sociology of science and history of science than contemporary Pragmatic View work. Moreover, paradigms never quite caught on the way analyses of models and modeling have. Even so, this work did much to convince later scholars, including many of the Pragmatic View, of certain weaknesses in understanding theories as deductive axiomatic structures.

As a final way to contrast the three views, we return to population genetics and, especially, to the Hardy-Weinberg Principle (HWP). Both Woodger (1937, 1959) and Williams (1970, 1973) provide detailed axiomatizations of certain parts of biology, especially genetics, developmental biology, and phylogenetics. For instance, Woodger (1937) constructs an axiomatic system based on ten logical predicates or relations, including \(\bP\) ( part of ), \(\bT\) ( before in time ), \(\bU\) ( reproduced by cell division or cell fusion ), \(\bm\) ( male gamete ), \(\bff\) ( female gamete ), and \(\bgenet\) ( genetic property ) (cf. Nicholson and Gawne 2014). Woodger (1959) elaborates these logical predicates or relations to produce a careful reconstruction of Mendelian genetics. Here are two axioms in his system (which are rewritten in contemporary notation, since Woodger used Russell and Whitehead’s Principia Mathematica notation):

The first axiom should be read thus: “no gamete is both male and female” (1959, 416). In the second axiom, given that \(DLZxyz\) is a primitive relation defined as “\(x\) is a zygote which develops in the environment \(y\) into the life \(z\)” (1959, 415), the translation is “every life develops in one and only one environment from one and only one zygote” (416). Woodger claims that “the whole of Mendel’s work can be expressed…” via this axiomatic system. Woodger briefly mentions that if one assumes that the entire system or population is random with respect to gamete fusions, “then the Pearson-Hardy law is derivable” (1959, 427). This was a reference to HWP. In her explorations of various axiomatizations of Darwinian lineages and “subclans,” and the process of the “expansion of the fitter,” Williams (1970, 1973) also carefully defines concepts, and axiomatizes basic biological principles of reproduction, natural selection, fitness, and so forth. However, she does not address HWP. Of interest is the lack of axiomatization of HWP or other mathematical principles of population genetics in Woodger’s and Williams’ work. Were such principles considered secondary or uninteresting by Woodger and Williams? Might Woodger’s and Williams’ respective axiomatic systems simply lack the power and conceptual resources to axiomatically reconstruct a mathematical edifice actually cast in terms of probability theory? Finally, other friends of the Syntactic View, such as the early Michael Ruse, do not provide an axiomatization of HWP (Ruse 1975, 241).

Proponents of the Semantic View claim that their perspective on scientific theory accurately portrays the theoretical structure of population genetics. Thompson (2007) provides both set-theoretical and state-space renditions of Mendelian genetics. The first involves defining a set-theoretic predicate for the system, viz., \(\{P, A, f, g\}\), where \(P\) and \(A\) are sets representing, respectively, the total collection of alleles and loci in the population, while \(f\) and \(g\) are functions assigning an allele to a specific location in, respectively, the diploid cells of an individual or the haploid gametic cells. Axioms in this set-theoretic formalization include “The sets \(P\) and \(A\) are finite and non empty” (2007, 498). In contrast, the state-space approach of the Semantic View articulates a phase space with each dimension representing allelic (or genotypic) frequencies (e.g., cover and Chapter 3 of Lloyd 1994 [1988]). As an example, “for population genetic theory, a central law of succession is the Hardy-Weinberg law” (Thompson 2007, 499). Mathematically, the diploid version of HWP is written thus:

Here \(p\) and \(q\) are the frequencies of two distinct alleles at a biallelic locus. The left-hand side represents the allele frequencies in the parental generation and a random mating pattern, while the right-hand side captures genotype frequencies in the offspring generation, as predicted from the parental generation. This is a null theoretical model—actual genotypic and allelic frequencies of the offspring generation often deviate from predicted frequencies (e.g., a lethal homozygote recessive would make the \(q^2_{\text{off}}\) term = 0). Indeed, HWP holds strictly only in abstracted and idealized populations with very specific properties (e.g., infinitely large, individuals reproduce randomly) and only when there are no evolutionary forces operating in the population (e.g., no selection, mutation, migration, or drift) (e.g., Hartl and Clark 1989; Winther et al. 2015). HWP is useful also in the way it interacts with laws of succession for selection, mutation, and so forth (e.g., Okasha 2012). This powerful population genetic principle is central to Semantic View analyses of the mathematical articulation of the theoretical structure of population genetics (see also Lorenzano 2014, Ginnobili 2016).

Recall that the Pragmatic View highlights the internal and external pluralism—as well as the purposiveness—of model and theory structure. Consider recent uses of population genetic theory to specify the kinds and amounts of population structure existing in Homo sapiens . In particular, different measures and mathematical modeling methodologies are employed in investigating human genomic diversity (e.g., Jobling et al. 2004; Barbujani et al. 2013; Kaplan and Winther 2013). It is possible to distinguish at least two different research projects, each of which has a unique pragmatic content (e.g., aims, values, and methods). Diversity partitioning assesses genetic variation within and among pre-determined groups using Analysis of Variance (also crucial to estimating heritability, Downes 2014). Clustering analysis uses Bayesian modeling techniques to simultaneously produce clusters and assign individuals to these “unsupervised” cluster classifications. The robust result of the first modeling project is that (approximately) 85% of all genetic variance is found within human subpopulations (e.g., Han Chinese or Sami), 10% across subpopulations within a continental region, and only 5% is found across continents (i.e., “African,” “Asian,” and “European” – Lewontin 1972, 1974). (Recall also that we are all already identical at, on average, 999 out of 1000 nucleotides.) To calculate diversity partitions at these three nested levels, Lewontin (1972) used a Shannon information-theoretic measure closely related to Sewall Wright’s \(F\)-statistic:

Here \(H_T\) is the total heterozygosity of the population assessed, and \(\bar{H}_S\) is the heterozygosity of each subpopulation (group) of the relevant population, averaged across all the subpopulations. \(F_{ST}\) is bounded by 0 and 1, and is a measure of population structure, with higher \(F_{ST}\) values suggesting more structure, viz., more group differentiation. HWP appears implicitly in both \(H_T\) and \(\bar{H}_S\), which take heterozygosity (\(2pq\)) to be equal to the expected proportion of heterozygotes under HWP rather than the actual frequency of heterozygotes. \(H_T\) is computed by using the grand population average of \(p\) and \(q\), whereas calculating \(\bar{H}_S\) involves averaging across the expected heterozygosities of each subpopulation. If random mating occurs—and thus HWP applies—across the entire population without respecting subpopulation borders, then \(H_T\) and \(\bar{H}_S\) will be equal (i.e., \(p\) of the total population and of each individual subpopulation will be the same; likewise for \(q\)). If, instead, HWP applies only within subpopulations but not across the population as a whole, then \(\bar{H}_S\) will be smaller than \(H_T\), and \(F_{ST}\) will be positive (i.e., there will be “excess homozygosity” across subpopulations, which is known as the “Wahlund Principle” in population genetics). This is one way among many to deploy the population-genetic principle of HWP. Thus, the Lewontin-style diversity partitioning result that only roughly 5% of the total genetic variance is among races is equivalent to saying that \(F_{ST}\) across the big three continental populations in Lewontin’s three-level model is 0.05 (e.g., Barbujani et al. 1997). The basic philosophical tendency is to associate the diversity partitioning research project’s (approximately) 85%-10%-5% result with an anti-realist interpretation of biological race.

In contrast, clustering analysis (e.g., Pritchard et al. 2000; Rosenberg et al. 2002; cf. Edwards 2003) can be readily performed even with the small amount of among-continent genetic variance in Homo sapiens . For instance, when the Bayesian modeling computer program STRUCTURE is asked to produce 5 clusters, continental “races” appear—African, Amerindian, Asian, European, and Pacific Islanders. Interestingly, this modeling technique is also intimately linked to HWP: “Our main modeling assumptions are Hardy-Weinberg equilibrium within populations and complete linkage equilibrium between loci within populations” (Pritchard et al. 2000, 946). That is, for a cluster to eventually be robust in the modeling runs, it should meet HWP expectations. Clustering analysis has sometimes been interpreted as a justification for a realist stance towards biological race (see discussions in Hochman 2013; Winther and Kaplan 2013; Edge and Rosenberg 2015; Spencer 2015).

This example of the mathematical modeling of human genomic diversity teaches that basic and simple formal components can be used in different ways to develop and apply theory, both inside and outside of science. In contrast to the Syntactic and Semantic Views, the Pragmatic View foregrounds tensions vis-à-vis ontological assumptions and political consequences regarding the existence (or not) of biological race between diversity partitioning (Lewontin 1972) and clustering analysis (Pritchard et al. 2000) research packages. These ontological ruptures can be identified despite the fact that both research projects assess population structure by examining departures from HWP (i.e., they measure excess homozygosity), and are completely consistent (e.g., Winther 2014; Ludwig 2015; Edge and Rosenberg 2015).

This exploration of how the three views on the structure of scientific theory address population genetics, and in particular HWP, invites a certain meta-pluralism. That is, the Syntactic View carefully breaks down fundamental concepts and principles in genetics and population genetics, articulating definitions and relations among terms. The Semantic View insightfully decomposes and interweaves the complex mathematical edifice of population genetics. The Pragmatic View sheds light on modeling choices and on distinct interpretations and applications of the same theory or model, both within and without science. The three perspectives are hardly mutually exclusive. (N.B., the two running examples concern theory structure in Newtonian mechanics and population genetics, independently considered. While interesting, debates about “evolutionary forces” are beyond the scope of the current entry; see, e.g., Hitchcock and Velasco 2014.)

The structure of scientific theories is a rich topic. Theorizing and modeling are core activities across the sciences, whether old (e.g., relativity theory, evolutionary theory) or new (e.g., climate modeling, cognitive science, and systems biology). Furthermore, theory remains essential to developing multipurpose tools such as statistical models and procedures (e.g., Bayesian models for data analysis, agent-based models for simulation, network theory for systems analysis). Given the strength and relevance of theory and theorizing to the natural sciences, and even to the social sciences (e.g., microeconomics, physical, if not cultural, anthropology), philosophical attention to the structure of scientific theories could and should increase. This piece has focused on a comparison of three major perspectives: Syntactic View, Semantic View, and Pragmatic View. In order to handle these complex debates effectively, we have sidestepped certain key philosophical questions, including questions about scientific realism; scientific explanation and prediction; theoretical and ontological reductionism; knowledge-production and epistemic inference; the distinction between science and technology; and the relationship between science and society. Each of these topics bears further philosophical investigation in light of the three perspectives here explored.

A table helps summarize general aspects of the three views’ analyses of the structure of scientific theories:

Table 2. General aspects of each view’s analysis of the structure of scientific theories.

The Syntactic, Semantic, and Pragmatic views are often taken to be mutually exclusive and, thus, to be in competition with one another. They indeed make distinct claims about the anatomy of scientific theories. But one can also imagine them to be complementary, focusing on different aspects and questions of the structure of scientific theories and the process of scientific theorizing. For instance, in exploring nonformal and implicit components of theory, the Pragmatic View accepts that scientific theories often include mathematical parts, but tends to be less interested in these components. Moreover, there is overlap in questions—e.g., Syntactic and Semantic Views share an interest in formalizing theory; the Semantic and Pragmatic Views both exhibit concern for scientific practice.

How are these three views ultimately related? A standard philosophical move is to generalize and abstract, understanding a situation from a higher level. One “meta” hypothesis is that a given philosophical analysis of theory structure tends to be associated with a perceived relationship among the three views here discussed. The Syntactic View is inclined to interpret the Semantic View’s formal machinery as continuous with its own generalizing axiomatic strategy, and hence diagnoses many standard Semantic View critiques (Section 3) as missing their mark (the strategy of identity ; e.g., Friedman 1982; Worrall 1984; Halvorson 2012, 2013, 2019; Lutz 2012, 2017; cf. Chakravartty 2001). The Semantic View explicitly contrasts its characterization of theory structure with the “linguistic” or “metamathematical” apparatus of the Syntactic View (the strategy of combat ; e.g., Suppe 1977; van Fraassen 1980, 1989; Lloyd 1994 [1988]). Finally, the Pragmatic View, which did not exist as a perspective until relatively recently, imagines theory as pluralistic and can thus ground a holistic philosophical investigation. It envisions a meta-pluralism in which reconstructive axiomatization and mathematical modeling remain important, though not necessary for all theories. This third view endorses a panoply of theoretical structures and theorizing styles, negotiating continuity both between theorizing and “the experimental life,” and among philosophical analyses of the structure of scientific theories (the strategy of complementarity ; e.g., Hacking 1983, 2009; Galison 1988, 1997; Craver 2002; Suárez and Cartwright 2008; Griesemer 2013). Interestingly, Suárez and Pero (2019) explicitly concur with the Pragmatic View as described in this article, but believe that “the semantic conception in its bare minimal expression” is compatible with, if not sufficient for, capturing “pragmatic elements and themes involved in a more flexible and open-ended approach to scientific theory” (Suárez and Pero 2019, 348). By design, the ecumenical meta-pluralism sanctioned by the Pragmatic View does not completely offset identity and combat strategies. Moreover, only “partial acceptance” of the respective views may ultimately be possible. Even so, the complementarity strategy might be worth developing further. Compared to identity and combat meta-perspectives, it provides broader—or at least different—insights into the structure of scientific theories. More generally, exploring the relations among these views is itself a rich topic for future philosophical work, as is investigating their role in, and interpretation of, active scientific fields ripe for further philosophical analysis such as climate change (e.g., Winsberg 2018), model organisms (e.g., Ankeny and Leonelli 2020), and cartography and GIS (e.g., Winther 2020).

  • Ankeny, R. and S. Leonelli, 2020, Model Organisms , Cambridge: Cambridge University Press.
  • Apostel, L., 1960, “Towards the Formal Study of Models in the Non-Formal Sciences,” Synthese , 12 (23): 125–161.
  • –––, 1970, “The Justification of Formalisation,” Quality and Quantity , 4 (1): 3–38.
  • Awodey, S., 2006, Category Theory , Oxford: Oxford University Press.
  • Bailer-Jones, D.M., 2002, “Models, Metaphors and Analogies,” in Blackwell Guide to the Philosophy of Science , P.K. Machamer and M. Silberstein (eds.), Oxford: Blackwell, pp. 108–127.
  • Barbujani, G., S. Ghirotto, and F. Tassi, 2013, “Nine Things to Remember about Human Genome Diversity,” Tissue Antigens , 82 (3): 155–164.
  • Barbujani, G.A., Magagni, E. Minch, and L.L. Cavalli-Sforza, 1997, “An Apportionment of Human DNA Diversity,” Proceedings of the National Academy of Sciences , 94 (9): 4516–4519.
  • Bartha, P.F.A., 2010, By Parallel Reasoning: The Construction and Evaluation of Analogical Arguments , New York: Oxford University Press
  • Bays, T., 2014, “Skolem’s Paradox”, The Stanford Encyclopedia of Philosophy (Spring 2014 Edition), E. N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/spr2014/entries/paradox-skolem/ >.
  • Beatty, J., 1981, “What’s Wrong with the Received View of Evolutionary Theory?” PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1980 , (2): 397–426.
  • Bergstrom, C. and L. Dugatkin, 2012, Evolution , New York: Norton.
  • Bird, A., 2013, “Thomas Kuhn”, The Stanford Encyclopedia of Philosophy (Fall 2013 Edition), E. N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/fall2013/entries/thomas-kuhn/ >.
  • Blatti, S. and S. Lapointe (eds.), 2016, Ontology After Carnap , Oxford: Oxford University Press.
  • Boumans, M., 1999, “Built-In Justification,” in Models as Mediators: Perspectives on Natural and Social Science , M.S. Morgan and M. Morrison (eds.), Cambridge: Cambridge University Press, pp. 66–96.
  • Braithwaite, R., 1962, “Models in the Empirical Sciences,” in Logic, Methodology and Philosophy of Science: Proceedings of the 1960 International Congress , E. Nagel, P. Suppes, and A. Tarski (eds.), Stanford, CA: Stanford University Press, pp. 224–231.
  • Bridgman, P.W., 1927, The Logic of Modern Physics , New York: Macmillan.
  • Bueno, O., 1997, “Empirical Adequacy: A Partial Structures Approach,” Studies in History and Philosophy of Science (Part A) , 28 (4): 585–610.
  • Bueno, O., S. French, and J. Ladyman, 2012, “Models and Structures: Phenomenological and Partial,” Studies in History and Philosophy of Science (Part B) , 43 (1): 43–46.
  • Brown, T., 2003, Making Truth: Metaphor in Science , Urbana: University of Illinois Press.
  • Brown, M.J., 2010, “Genuine Problems and the Significance of Science,” Contemporary Pragmatism , 7 (2): 131–153.
  • Callender, C., 1999, “Reducing Thermodynamics to Statistical Mechanics: The Case of Entropy,” The Journal of Philosophy , 96 (7): 348–373.
  • Campbell, N.R., 1920, Physics: The Elements , Cambridge: Cambridge University Press.
  • Carnap, R., 1967 [1928], The Logical Structure of the World , translated by R.A. George, Berkeley, CA: University of California Press. Original: Der logische Aufbau der Welt , Leipzig: Felix Meiner.
  • –––, 1932, “Über Protokollsätze”, Erkenntnis , 3: 215–228; transl. by R. Creath and R. Nollan, “On Protocol Sentences,” Noûs , 21 (4) (1987): 457–470.
  • –––, 1936/1937, “Testability and Meaning,” Philosophy of Science , 1936, 3 (4): 419–471; 1937, 4 (1): 1–40.
  • –––, 1937, The Logical Syntax of Language , London: Kegan Paul, Trench, & Trübner.
  • –––, 1939, Foundations of Logic and Mathematics (International Encyclopedia of Unified Science, Volume 1, Number 3), Chicago: University of Chicago Press.
  • –––, 1942, Introduction to Semantics , Cambridge, MA: Harvard University Press.
  • –––, 1952, The Continuum of Inductive Methods , Chicago: University of Chicago Press.
  • –––, 1962 [1950], Logical Foundations of Probability , Chicago: University of Chicago Press, 2 nd edition.
  • –––, 1963, “Philosopher Replies,” in The Philosophy of Rudolf Carnap (Library of Living Philosophers, Volume 11), P. Schilpp (ed.), La Salle: Open Court, pp. 889–999.
  • –––, 1966, Philosophical Foundations of Science , New York: Basic Books; repr. as An Introduction to the Philosophy of Science , 1972; repr. New York: Dover, 1996.
  • Cartwright, N., 1983, How the Laws of Physics Lie , New York: Oxford University Press.
  • –––, 1989, Nature’s Capacities and Their Measurement , New York: Oxford University Press.
  • –––, 1999a, The Dappled World: A Study of the Boundaries of Science , Cambridge: Cambridge University Press.
  • –––, 1999b, “Models and the Limits of Theories: Quantum Hamiltonians and the BCS Model of Superconductivity,” in Models as Mediators: Perspectives on Natural and Social Science , M. Morgan and M. Morrison (eds.), (Perspectives on Natural and Social Sciences), Cambridge: Cambridge University Press, pp. 241–281.
  • –––, 2008, “In Praise of the Representation Theorem,” in Representation, Evidence, and Justification: Themes from Suppes , W.K. Essler and M. Frauchiger (eds.), Ontos Verlag, pp. 83–90.
  • –––, 2019, Nature, the Artful Modeler: Lectures on Laws, Science, How Nature Arranges the World and How We Can Arrange It Better , Chicago, IL: Open Court.
  • Cartwright, N., T. Shomar, and M. Suárez, 1995, “The Tool Box of Science: Tools for the Building of Models with a Superconductivity Example,” in Theories and Models in Scientific Processes (Poznan Studies in the Philosophy of the Sciences and the Humanities, Volume 44), W. Herfel, W. Krajewski, I. Niiniluoto, and R. Wojcicki (eds.), Amsterdam: Rodopi, pp. 137–149.
  • Carus, A.W., 2007, Carnap and Twentieth-Century Thought: Explication as Enlightenment , Cambridge: Cambridge University Press.
  • Cat, J., 2014, “The Unity of Science”, The Stanford Encyclopedia of Philosophy (Winter 2014 Edition), E. N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/win2014/entries/scientific-unity/ >.
  • Chakravartty, A., 2001, “The Semantic or Model-Theoretic View of Theories and Scientific Realism,” Synthese , 127 (3): 325–345.
  • Chang, H., 2011, “The Philosophical Grammar of Scientific Practice” in International Studies in the Philosophy of Science , 25 (3): 205–221.
  • Clatterbuck, H., E. Sober, and R. Lewontin, 2013, “Selection Never Dominates Drift (Nor Vice Versa),” Biology & Philosophy , 28 (4): 577–592.
  • Coffa, A. J., 1991, The Semantic Tradition From Kant to Carnap: To the Vienna Station , Cambridge: Cambridge University Press.
  • Contessa, G., 2006, “Scientific Models, Partial Structures and the New Received View of Theories,” Studies in History and Philosophy of Science (Part A) , 37 (2): 370–377.
  • Craver, C.F., 2002, “Structures of Scientific Theories,” in Blackwell Guide to the Philosophy of Science , P.K. Machamer and M. Silberstein (eds.), Oxford: Blackwell, pp. 55–79.
  • –––, 2007, Explaining the Brain: Mechanisms and the Mosaic Unity of Neuroscience , New York: Oxford University Press.
  • Creath, R., 1987, “The Initial Reception of Carnap’s Doctrine of Analyticity,” Noûs , 21 (4): 477–499.
  • –––, 2014, “Logical Empiricism”, The Stanford Encyclopedia of Philosophy (Spring 2014 Edition), E. N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/spr2014/entries/logical-empiricism/ >.
  • Crombie, A.C., 1994, Styles of Scientific Thinking in the European Tradition (Volumes 1–3), London: Duckworth.
  • –––, 1996, “Commitments and Styles of European Scientific Thinking,” Theoria , 11 (25): 65–76.
  • Crow J. and M. Kimura, 1970, An Introduction to Population Genetics Theory , Edina, MN: Burgess International Group Incorporated.
  • da Costa, N.C.A. and S. French, 1990, “The Model-Theoretic Approach in the Philosophy of Science,” Philosophy of Science , 57 (2): 248–65.
  • –––, 2003. Science and Partial Truth: A Unitary Approach to Models and Scientific Reasoning , Oxford: Oxford University Press.
  • Dalla Chiara Scabia, M.L. and G. Toraldo di Francia, 1973, “A Logical Analysis of Physical Theories,” La Rivista del Nuovo Cimento , 3 (1): 1–20.
  • Davidson, A., 2001, The emergence of sexuality: Historical epistemology and the formation of concepts , Cambridge, MA: Harvard University Press.
  • Davidson, D., 1974, “On the Very Idea of a Conceptual Scheme,” Proceedings and Addresses of the American Philosophical Association , 47: 5–20.
  • de Chadarevian, S. and N. Hopwood, 2004, Models: The Third Dimension of Science , Stanford, CA: Stanford University Press.
  • Demopoulos, W., 2003, “On the Rational Reconstruction of our Theoretical Knowledge,” The British Journal for the Philosophy of Science , 54 (3): 371–403.
  • –––, 2013, Logicism and Its Philosophical Legacy , Cambridge: Cambridge University Press.
  • Derman, E., 2011, Models Behaving Badly: Why Confusing Illusion with Reality Can Lead to Disaster, on Wall Street and in Life , New York: Free Press.
  • Dizadji-Bahmani, F., R. Frigg, and S. Hartmann, 2010, “Who’s Afraid of Nagelian Reduction?,” Erkenntnis , 73 (3): 393–412.
  • Döring, A. and R.G. Winther, forthcoming, “The Human Condition is an Ocean: Philosophy and the Mediterranean Sea,” in Words and Worlds: Use and Abuse of Analogies and Metaphors within Sciences and Humanities , S. Wuppuluri and A.C. Grayling (eds.), Synthese Library Series.
  • Downes, S., 1992, “The Importance of Models in Theorizing: A Deflationary Semantic View,” PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1992 , (1): 142–153.
  • –––, “Heritability,” The Stanford Encyclopedia of Philosophy (Spring 2014 Edition), E. N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/spr2014/entries/heredity/ >.
  • Dreyfus, H., 1986, “Why Studies of Human Capacities Modeled on Ideal Natural Science Can Never Achieve their Goal,” in Rationality, Relativism, and the Human Sciences , J. Margolis, M. Krausz, and R. Burian (eds.), Dordrecht: Martinus Nijhoff, pp. 3–22.
  • Duhem, P., 1906, La théorie physique: Son objet et sa structure , Paris: Chevalier et Rivière; transl. by P.W. Wiener, The Aim and Structure of Physical Theory , Princeton, NJ: Princeton University Press (1954).
  • Edge, M.D. and N. Rosenberg, 2015, “Implications of the Apportionment of Human Genetic Diversity for the Apportionment of Human Phenotypic Diversity,” Studies in History and Philosophy of Biological and Biomedical Sciences , 52: 32–45.
  • Edwards, A.W.F., 2003, “Human Genetic Diversity: Lewontin‘s Fallacy” BioEssays , 25 (8): 798–801.
  • Eilenberg, S. and S. MacLane, 1945, “General Theory of Natural Equivalences,” Transactions of the American Mathematical Society , 58 (2): 231–294.
  • Einstein, A., 1934, “On the Method of Theoretical Physics,” Philosophy of Science , 1 (2): 163–169.
  • –––, 1936, “Physik und Realität,” Journal of The Franklin Institute , 221 (3): 313–347; transl. by J. Piccard, “Physics and Reality,” Journal of the Franklin Institute , 221 (3) (1936): 349–382.
  • Elwick, J., 2007, Styles of Reasoning in British Life Sciences: Shared Assumptions, 1820–1858 , London: Pickering & Chatto.
  • Feigl, H., 1970, “The ‘Orthodox’ View of Theories: Remarks in Defense as Well as Critique,” in Analyses of Theories and Methods of Physics and Psychology (Minnesota Studies in the Philosophy of Science, Volume 4), M. Radner and S. Winokur (eds.), Minneapolis: University of Minnesota Press, pp. 3–16.
  • Feigl, H., M. Scriven, and G. Maxwell (eds.), 1958, Minnesota Studies in the Philosophy of Science (Volume 2), Minneapolis: University of Minnesota Press.
  • Flyvbjerg, B., 2001, Making Social Science Matter: Why Social Inquiry Fails and How it Can Succeed Again , Cambridge: Cambridge University Press.
  • French, S., 2017, “Identity Conditions, Idealisations and Isomorphisms: a Defence of the Semantic Approach,” first online 19 September 2017, Synthese . doi:10.1007/s11229-017-1564-z
  • French, S. and J. Ladyman, 1997, “Superconductivity and Structures: Revisiting the London Account,” Studies in History and Philosophy of Modern Physics , 28 (3): 363–393.
  • –––, 1999, “Reinflating the Semantic Approach,” International Studies in the Philosophy of Science , 13 (2): 103–121.
  • –––, 2003. “Remodelling Structural Realism: Quantum Physics and the Metaphysics of Structure,” Synthese , 136 (1): 31–56.
  • Friedman, M., 1981, “Theoretical Explanation,” in Reduction, Time, and Reality: Studies in the Philosophy of the Natural Sciences , R. Healey (ed.), New York: Cambridge University Press, pp. 1–16.
  • –––, 1982, “ The Scientific Image , by B. van Fraassen,” The Journal of Philosophy , 79 (5): 274–283.
  • –––, 1983, Foundations of Space-Time Theories: Relativistic Physics and Philosophy of Science , Princeton: Princeton University Press.
  • –––, 1999, Reconsidering Logical Positivism , New York: Cambridge University Press.
  • –––, 2001, Dynamics of Reason , Stanford, CA: CSLI Publications.
  • –––, 2011, “Carnap on Theoretical Terms: Structuralism without Metaphysics,” Synthese , 180 (2): 249–263.
  • –––, 2013, Kant’s Construction of Nature: A Reading of the Metaphysical Foundations of Natural Science , Cambridge: Cambridge University Press.
  • Frigg, R. and S. Hartmann, 2012, “Models in Science”, The Stanford Encyclopedia of Philosophy (Fall 2012 Edition), E. N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/fall2012/entries/models-science/ >.
  • Frigg, R. and I. Votsis, 2011, “Everything You Always Wanted to Know about Structural Realism but Were Afraid to Ask,” European Journal for Philosophy of Science , 1 (2): 227–276.
  • Galison, P., 1987, How Experiments End , Chicago: University of Chicago Press.
  • –––, 1988, “History, Philosophy, and the Central Metaphor,” Science in Context , 2 (1): 197–212.
  • –––, 1997, Image and Logic: A Material Culture of Microphysics , Chicago: University of Chicago Press.
  • Geary, J., 2011, I Is an Other: The Secret Life of Metaphor and How It Shapes the Way We See The World , New York: Harper Perennial.
  • Gentner, D., 1982, “Are Scientific Analogies Metaphors?” in Metaphor: Problems and Perspectives , D. Miall (ed.), Brighton: Harvester Press, pp. 106–132.
  • –––, 2003, “Analogical Reasoning, Psychology of,” in Encyclopedia of Cognitive Science , L. Nadel (ed.), London: Nature Publishing Group, pp. 106–112.
  • Giere, R., 1988, Explaining Science: A Cognitive Approach , Chicago: University of Chicago Press.
  • –––, 2004, “How Models Are Used to Represent Reality,” Philosophy of Science , 71 (5): 742–752.
  • –––, 2010, “An Agent-based Conception of Models and Scientific Representation,” Synthese , 172 (2): 269–281.
  • Giere, R., B. Bickle, and R. Mauldin, 2006, Understanding Scientific Reasoning , Belmont, CA: Thomson/Wadsworth, 5 th edition.
  • Ginnobili, S., 2016, “Missing Concepts in Natural Selection Theory Reconstructions,” History and Philosophy of the Life Sciences , 38 (Article 8). doi:10.1007/s40656-016-0109-y
  • Godfrey-Smith, P., 2003, Theory and Reality: An Introduction to the Philosophy of Science , Chicago: University of Chicago Press.
  • –––, 2006, “The Strategy of Model-Based Science,” Biology and Philosophy , 21 (5): 725–740.
  • Gould, S.J., 2002, The Structure of Evolutionary Theory , Cambridge, MA: Harvard University Press.
  • Griesemer, J., 1990, “Modeling in the Museum: On the Role of Remnant Models in the Work of Joseph Grinnell,” Biology and Philosophy , 5 (1): 3–36.
  • –––, 1991a, “Material Models in Biology,” PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1990 , (2): 79–94.
  • –––, 1991b, “Must Scientific Diagrams Be Eliminable? The Case of Path Analysis,” Biology and Philosophy , 6 (2): 155–180.
  • –––, 2013, “Formalization and the Meaning of Theory in the Inexact Biological Sciences,” Biological Theory , 7 (4): 298–310.
  • Hacking, I., 1983, Representing and Intervening: Introductory Topics in the Philosophy of Natural Science , Cambridge: Cambridge University Press.
  • –––, 2002, Historical Ontology , Cambridge, MA: Harvard University Press.
  • –––, 2007a, “On Not Being a Pragmatist: Eight Reasons and a Cause,” in New Pragmatists , C. Misak (ed.), New York: Oxford University Press, pp. 32–49.
  • –––, 2007b, “Natural Kinds: Rosy Dawn, Scholastic Twilight,” Royal Institute of Philosophy Supplements , 61: 203–240.
  • –––, 2009, Scientific Reason , Taipei: National Taiwan University Press.
  • –––, 2012, “Introduction,” in T.S. Kuhn, The Structure of Scientific Revolutions , 50 th Anniversary ed. (4 th ed.), Chicago: University of Chicago Press, pp. vii–xxxvii.
  • –––, 2014, Why Is There Philosophy of Mathematics At All? , Cambridge: Cambridge University Press.
  • Halvorson, H., 2012, “What Scientific Theories Could Not Be,” Philosophy of Science , 79 (2): 183–206.
  • –––, 2013, “The Semantic View, if Plausible, is Syntactic,” Philosophy of Science , 80 (3): 475–478.
  • –––, 2019, The Logic in Philosophy of Science , Cambridge: Cambridge University Press.
  • Hartl, D. and A. Clark, 1989, Principles of Population Genetics , Sunderland, MA: Sinauer Associates.
  • Hempel, C., 1952, Fundamentals of Concept Formation in Empirical Science , Chicago: University of Chicago Press.
  • –––, 1958, “The Theoretician’s Dilemma,” in Minnesota Studies in the Philosophy of Science (Volume 2), H. Feigl, M. Scriven, and G. Maxwell (eds.), Minneapolis: University of Minnesota Press, pp. 37–98.
  • –––, 1966, Philosophy of Natural Science , Englewood Cliffs, N.J.: Prentice-Hall.
  • –––, 1970, “On the ‘Standard Conception’ of Scientific Theories,” in Minnesota Studies in the Philosophy of Science (Volume 4), M. Radner and S. Winokur (eds.), Minneapolis: University of Minnesota Press, pp. 142–163.
  • Hermes, H. 1938, Eine Axiomatisierung der allgemeinen Mechanik (Forschungen zur Logik und zur Grundlegung der exacten Wissenschaften, Heft 3), Leipzig: S. Hirzel.
  • –––, 1959, “Zur Axiomatisierung der Mechanik,” in The Axiomatic Method with Special Reference to Geometry and Physics: Proceedings of an International Symposium Held at the University of California, Berkeley, December 26, 1957–January 4, 1958 , L. Henkin, P. Suppes, and A. Tarski (eds.), Amsterdam: North Holland, pp. 282–290.
  • Hesse, M., 1966, Models and Analogies in Science , Notre Dame: University of Notre Dame Press.
  • –––, 1967, “Models and Analogy in Science,” in The Encyclopedia of Philosophy (Volume 5), P. Edwards (ed.), New York: Macmillan, pp. 354–359.
  • Hitchcock, C. and J.D. Velasco, 2014, “Newtonian and Evolutionary Forces,” Ergo , 1 (2): 39–77.
  • Hochman, A., 2013, “Against the New Racial Naturalism,” The Journal of Philosophy 110 (6): 331–351.
  • Hodges, W., 1997, A Shorter Model Theory , New York: Cambridge University Press.
  • –––, 2013, “Model Theory”, The Stanford Encyclopedia of Philosophy (Fall 2013 Edition), E. N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/fall2013/entries/model-theory/ >.
  • Hoffman, R., 1980, “Metaphor in Science,” in Cognition and Figurative Language , R. Honeck (ed.), Hillsdale: Lawrence Erlbaum Associates, pp. 393–423.
  • Holton, G., 1988, Thematic Origins of Scientific Thought: Kepler to Einstein , Cambridge, MA: Harvard University Press, 2 nd edition.
  • Hookway, C., 2013, “Pragmatism”, The Stanford Encyclopedia of Philosophy (Winter 2013 Edition), E. N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/win2013/entries/pragmatism/ >.
  • Hull, D., 1975, “Central Subjects and Historical Narratives,” History & Theory , 14 (3): 253–274.
  • Jammer, M., 1961, Concepts of Mass in Classical and Modern Physics , Cambridge, MA: Harvard University Press; reprinted unabridged by Dover in 1997.
  • Jobling, M.A., M. Hurles, C. Tyler-Smith, 2004, Human Evolutionary Genetics. Origins, Peoples and Diseases , New York: Garland Science.
  • Jones, M., 2005, “Idealization and Abstraction: A Framework,” in Idealization XII: Correcting the Model – Idealization and Abstraction in the Sciences (Poznan Studies in the Philosophy of the Sciences and the Humanities, Volume 86), M. Jones and N. Cartwright (eds.), Amsterdam: Rodopi, pp. 173–217. (Same individual as Thomson-Jones 2012.)
  • Kaplan, J.M. and R.G. Winther, 2013, “Prisoners of Abstraction? The Theory and Measure of Genetic Variation, and the Very Concept of ‘Race’,” Biological Theory , 7 (4): 401–412.
  • Keller, E.F., 1995, Reconfiguring Life: Metaphors of Twentieth-Century Biology , New York: Columbia University Press.
  • Kitcher P., 1984, “1953 and All That. A Tale of Two Sciences,” Philosophical Review , 93 (3): 335–373.
  • –––, 1993, The Advancement of Science: Science Without Legend, Objectivity Without Illusion , New York: Oxford University Press.
  • –––, 2001, Science, Truth, and Democracy , New York: Oxford University Press.
  • Krivine, J., 2013 [1971], Introduction to Axiomatic Set Theory (Synthese Library, Volume 34), Dordrecht: D. Reidel.
  • Kuhn, T.S., 1970, The Structure of Scientific Revolutions , Chicago: University of Chicago Press, 2 nd edition.
  • –––, 1977, “Objectivity, Value Judgment, and Theory Choice,” in The Essential Tension: Selected Studies in Scientific Tradition and Change , T.S. Kuhn (ed.), Chicago: University of Chicago Press, pp. 320–339.
  • Ladyman, J., 2014, “Structural Realism”, The Stanford Encyclopedia of Philosophy (Spring 2014 Edition), E. N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/spr2014/entries/structural-realism/ >.
  • Ladyman, J., O. Bueno, M. Suárez, and B. van Fraassen, 2011, “Scientific Representation: A Long Journey from Pragmatics to Pragmatics,” Metascience , 20 (3): 417–442.
  • Lakatos, I., 1980, The Methodology of Scientific Research Programmes (Philosophical Papers: Volume 1), Cambridge: Cambridge University Press.
  • Laudan, L., 1977, Progress and Its Problems: Towards a Theory of Scientific Growth , Berkeley, CA: University of California Press.
  • Leonelli, S., 2008, “Performing Abstraction: Two Ways of Modelling Arabidopsis thaliana ,” Biology and Philosophy , 23 (4): 509–528.
  • Levins, R., 1966, “The Strategy of Model Building in Population Biology,” American Scientist , 54 (4): 421–431.
  • Levins, R. and R. Lewontin, 1985, The Dialectical Biologist , Cambridge, MA: Harvard University Press.
  • Lewis, R.W., 1980, “Evolution: A System of Theories,” Perspectives in Biology and Medicine , 23 (4): 551–572.
  • Lewontin, R.C., 1972, “Apportionment of Human Diversity,” Evolutionary Biology , 6: 381–398.
  • –––, 1974, The Genetic Basis of Evolutionary Change , New York: Columbia University Press.
  • Lloyd, E., 1983, “The Nature of Darwin’s Support for the Theory of Natural Selection,” Philosophy of Science , 50 (1): 112–129.
  • –––, 1994 [1988], The Structure and Confirmation of Evolutionary Theory , Princeton: Princeton University Press.
  • –––, 2013 In Press, “Structure of Evolutionary Theory,” in International Encyclopedia of Social and Behavioral Sciences , W. Durham (ed.), 2 nd edition, Amsterdam: Elsevier.
  • London, F. and H. London, 1935, “The Electromagnetic Equations of the Supraconductor,” Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences , 149 (866): 71–88.
  • Longino, H.E., 1995, “Gender, Politics, and the Theoretical Virtues,” Synthese 104 (3): 383–397.
  • –––, 2002, The Fate of Knowledge , Princeton: Princeton University Press.
  • –––, 2013, Studying Human Behavior: How Scientists Investigate Aggression & Sexuality , Chicago: University of Chicago Press.
  • López Beltrán, C., 1987, “La Explicación Evolucionista y el Uso de Modelos,” Masters Thesis, Posgrado en Filosofía de la Ciencia, Universidad Autónoma Metropolitana (Iztapalapa).
  • Lorenzano, P., 2013, “The Semantic Conception and the Structuralist View of Theories: A Critique of Suppe’s Criticisms,” Studies in History and Philosophy of Science (Part A) , 44: 600–607.
  • –––, 2014, “What is the Status of the Hardy-Weinberg Law within Population Genetics?,” in European Philosophy of Science: Philosophy of Science in Europe and the Viennese Heritage (Vienna Circle Institute Yearbook: Volume 17), M.C. Galavotti, E. Nemeth, F.L. Stadler F. (eds.), Cham, Switzerland: Springer, pp. 159–172.
  • Lowry, I., 1965, “A Short Course in Model Design,” Journal of the American Institute of Planners , 31 (2): 158–166.
  • Ludwig, D., 2015. “Against the New Metaphysics of Race,” Philosophy of Science 82: 1–21.
  • Lutz, S., 2012, “On a Straw Man in the Philosophy of Science: A Defense of the Received View,” HOPOS: The Journal of the International Society for the History of Philosophy of Science , 2 (1): 77–120.
  • –––, 2014, “What’s Right with a Syntactic Approach to Theories and Models?” Erkenntnis , 79 (8 supplement): 1475–1492.
  • –––, 2017, What “Was the Syntax-Semantics Debate in the Philosophy of Science About?,” Philosophy and Phenomenological Research , 95 (2): 319–352.
  • Mancosu, P., 2010, “Mathematical Style”, The Stanford Encyclopedia of Philosophy (Spring 2010 Edition), E. N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/spr2010/entries/mathematical-style/ >.
  • Margenau, H., 1950, The Nature of Physical Reality: A Philosophy of Modern Physics , New York: McGraw-Hill.
  • Marker, D., 2002, Model Theory: An Introduction , New York: Springer.
  • Martínez, S., 2003, Geografía de las prácticas científicas: Racionalidad, heurística y normatividad , Mexico City: UNAM Press.
  • –––, 2014, “Technological Scaffolds for Culture and Cognition,” in Developing Scaffolds in Evolution, Culture and Cognition , L. Caporael, J. Griesemer, and W. Wimsatt (eds.), Cambridge, MA: MIT Press, pp. 249–264.
  • Matheson, C. and J. Dallmann, 2014, “Historicist Theories of Scientific Rationality”, The Stanford Encyclopedia of Philosophy (Fall 2014 Edition), E. N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/fall2014/entries/rationality-historicist/ >.
  • McKinsey, J.C.C., A.C. Sugar, and P. Suppes, 1953, “Axiomatic Foundations of Classical Particle Mechanics,” Journal of Rational Mechanics and Analysis , 2 (2): 253–272.
  • Minsky, M., 1965, “Matter, Mind, and Models,” in Proceedings of the International Federation for Information Processing Congress (Volume 1), W. Kalenich (ed.), Washington D.C.: Spartan Books, pp. 45–49.
  • Morgan, M., 2012, The World in the Model: How Economists Work and Think , New York: Cambridge University Press.
  • Morgan, M.S. and M. Morrison (eds.), 1999, Models as Mediators: Perspectives on Natural and Social Science , Cambridge: Cambridge University Press.
  • Mormann, T., 2007, “The Structure of Scientific Theories in Logical Empiricism,” The Cambridge Companion to Logical Empiricism , in A. Richardson and T. Uebel (eds.), Cambridge: Cambridge University Press, pp. 136–162.
  • Morrison, M., 2007, “Where Have All the Theories Gone?,” Philosophy of Science , 74 (2): 195–228.
  • Moulines, C., 1976, “Approximate Application of Empirical Theories: A General Explication,” Erkenntnis , 10 (2): 201–227.
  • –––, 2002, “Introduction: Structuralism as a Program for Modelling Theoretical Science,” Synthese , 130 (1): 1–11.
  • Nagel, E., 1961, The Structure of Science: Problems in the Logic of Scientific Explanation , New York: Harcourt, Brace & World.
  • –––, 1979, “Issues in the Logic of Reductive Explanations,” in Teleology Revisited and Other Essays in the Philosophy and History of Science , New York: Columbia University Press, pp. 95–117.
  • Neurath, O., 1932, “Protokollsätze”, Erkenntnis , 3: 204–214; “Protocol Statements,” in Philosophical Papers 1913-1946 , R.S. Cohen and M. Neurath (eds.), Dordrecht: Reidel (1983), pp. 91–99.
  • Nicholson, D. and R. Gawne, 2014, “Rethinking Woodger’s Legacy in the Philosophy of Biology,” Journal of the History of Biology , 47 (2): 243–292.
  • Nolte, D.D., 2010, “The Tangled Tale of Phase Space,” Physics Today , April: 33–38.
  • Okasha, S., 2012, “Population Genetics”, The Stanford Encyclopedia of Philosophy (Fall 2012 Edition), E. N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/fall2012/entries/population-genetics/ >.
  • Oppenheimer, J.R., 1956, “Analogy in Science,” American Psychologist , 11 (3): 127–135.
  • Oyama, S., 2000, The Ontogeny of Information: Developmental Systems and Evolution , 2 nd ed., Durham: Duke University Press.
  • Pereda, C., 2013, “Ulises Moulines y la concepción estructural de las teorías científicas,” in La filosofía en México en el siglo XX: Apuntes de un participante , C. Pereda, Mexico City: CONACULTA (Consejo Nacional para la Cultura y las Artes), pp. 200–212.
  • Pickstone, J.V., 2000, Ways of Knowing: A New History of Science, Technology and Medicine , Chicago: University of Chicago Press.
  • Pigliucci, M. and G.B. Müller, 2010, Evolution: The Extended Synthesis , Cambridge, MA: MIT Press.
  • Popper, K., 1996 [1976], “The Myth of the Framework,” In The Myth of the Framework: In Defence of Science and Rationality , M. A. Notturno (ed), Abingdon: Routledge, pp. 33–64.
  • Pritchard J.K., M. Stephens, and P. Donnelly, 2000, “Inference of Population Structure Using Multilocus Genotype Data,” Genetics , 155 (2): 945–959.
  • Preston, J., 2012, “Paul Feyerabend”, The Stanford Encyclopedia of Philosophy (Winter 2012 Edition), E. N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/win2012/entries/feyerabend/ >.
  • Przełęcki, M., 1969, The Logic of Empirical Theories , London: Routledge & Kegan Paul.
  • Putnam, H., 1962, “What Theories Are Not,” in Logic, Methodology, and Philosophy of Science: Proceedings of the 1960 International Congress , E. Nagel, P. Suppes, and A. Tarski (eds.), Stanford, CA: Stanford University Press, pp. 240–251.
  • Reichenbach, H., 1938, Experience and Prediction: An Analysis of the Foundations and the Structure of Knowledge , Chicago: University of Chicago Press.
  • –––, 1965 [1920], The Theory of Relativity and A Priori Knowledge , with an introduction by M. Reichenbach, Berkeley: University of California Press. Original: Relativitätstheorie und Erkenntnis apriori , Berlin: Springer.
  • –––, 1969 [1924], The Axiomatization of the Theory of Relativity , with an introduction by W.C. Salmon. Berkeley-Los Angeles: University of California Press. Original: Axiomatik der relativistischen Raum-Zeit-Lehre , Braunschweig: F. Vieweg & Sohn.
  • –––, 1978, Selected Writings, 1909–1953: With a Selection of Biographical and Autobiographical Sketches (Volumes 1–2), Dordrecht: Reidel.
  • Rice, S., 2004, Evolutionary Theory: Mathematical and Conceptual Foundations , Sunderland, MA: Sinauer Associates.
  • Richards, R., 1992, “The Structure of Narrative Explanation in History and Biology,” in History and Evolution , M. Nitecki and D. Nitecki (eds.), Albany: State University of New York Press, pp. 19–53.
  • Richardson, A., 2002, “Engineering Philosophy of Science: American Pragmatism and Logical Empiricism in the 1930s,” Philosophy of Science , 69 (S3): S36–S47.
  • Rosenberg N.A., J.K. Pritchard, J.L. Weber, H.M. Cann, K.K. Kidd, L.A. Zhivotovsky, and M.A. Feldman, 2002, “Genetic Structure of Human Populations,” Science , 298 (5602): 2381–2385.
  • Rosenblueth, A. and N. Wiener, 1945, “The Role of Models in Science,” Philosophy of Science , 12 (4): 316–321.
  • Ruse, M., 1975, “Charles Darwin’s Theory of Evolution: An Analysis,” Journal of the History of Biology , 8 (2): 219–241.
  • Rutte, H., 1991, “Neurath contra Schlick. On the Discussion of Truth in the Vienna Circle,” in Rediscovering the Forgotten Vienna Circle: Austrian studies on Otto Neurath and the Vienna Circle , T. Uebel (ed.), Dordrecht: Kluwer, pp. 169–174.
  • Sarkar, S., 1998, Genetics and Reductionism , Cambridge: Cambridge University Press.
  • Savage, C.W., 1990, “Preface,” in Scientific Theories. Minnesota Studies in the Philosophy of Science. Volume 14, C.W. Savage (ed.), Minneapolis: University of Minnesota Press, pp. vii–ix.
  • Schaffner K., 1969, “Correspondence Rules,” Philosophy of Science , 36 (3): 280–290.
  • –––, 1976, “Reductionism in Biology: Prospects and Problems,” in PSA : Proceedings of the Biennial Meeting of the Philosophy of Science Association 1974 : 613–632.
  • –––, 1993, Discovery and Explanation in Biology and Medicine , Chicago: University of Chicago Press.
  • Schlick, M., 1925 [1918], General Theory of Knowledge , LaSalle, IL: Open Court.
  • –––, 1934, “Über das Fundament der Erkenntnis,” Erkenntnis , 4 (1): 79–99.
  • Schmidt, H.-J., 2014, “Structuralism in Physics”, The Stanford Encyclopedia of Philosophy (Spring 2014 Edition), E. N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/spr2014/entries/physics-structuralism/ >.
  • Shapin, S. and S. Schaffer, 1985, Leviathan and the Air-Pump: Hobbes, Boyle, and the Experimental Life , Princeton: Princeton University Press.
  • Simon, H., 1954, “The Axiomatization of Classical Mechanics,” Philosophy of Science , 21 (4): 340–343.
  • –––, 1957, Models of Man , New York: Wiley.
  • –––, 1970, “The Axiomatization of Physical Theories,” Philosophy of Science , 37 (1): 16–26.
  • Smith, B.C., 1996, On the Origin of Objects , Cambridge, MA: MIT Press.
  • Sneed, J., 1979, The Logical Structure of Mathematical Physics , Dordrecht: D. Reidel, 2 nd edition.
  • Spencer, Q., 2015, “Philosophy of Race Meets Population Genetics,” Studies in History and Philosophy of Biological and Biomedical Sciences 52: 46–55.
  • Stegmüller, W., 1976, The Structure and Dynamics of Theories , New York: Springer.
  • –––, 1979, “The Structuralist View: Survey, Recent Developments and Answers to Some Criticisms”, in The Logic and Epistemology of Scientific Change , I. Niiniluoto and R. Tuomela (eds.), Amsterdam: North Holland.
  • Suárez, M., 1999, “The Role of Models in the Application of Scientific Theories; Epistemological Implications,” in Models as Mediators. Perspectives on Natural and Social Science , M.S. Morgan and M. Morrison (eds.), Cambridge: Cambridge University Press, pp. 168–196.
  • –––, 2011, Comment on van Fraassen Scientific Representation: Paradoxes of Perspective , in Ladyman, J., O. Bueno, M. Suárez, and B. van Fraassen, “Scientific Representation: A Long Journey from Pragmatics to Pragmatics,” Metascience , 20 (3): 428–433.
  • Suárez, M. and N. Cartwright, 2008, “Theories: Tools versus Models,” Studies in History and Philosophy of Modern Physics , 39 (1): 62–81.
  • Suárez, M. and F. Pero, 2019, “The Representational Semantic Conception,” Philosophy of Science , 86 (2): 344–365.
  • Suppe, F., 1977, The Structure of Scientific Theories , Urbana, IL: University of Illinois Press.
  • –––, 1989, The Semantic Conception of Theories and Scientific Realism , Chicago: University of Illinois Press.
  • –––, 2000, “Understanding Scientific Theories: An Assessment of Developments,” PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1998 , (2): S102–S115.
  • Suppes, P., 1957, Introduction to Logic , Princeton: D. Van Nostrand Co.
  • –––, 1960, “A Comparison of the Meaning and Uses of Models in Mathematics and the Empirical Sciences,” Synthese , 12 (2-3): 287–301.
  • –––, 1962, “Models of Data,” in Logic, Methodology, and Philosophy of Science: Proceedings of the 1960 International Congress , E. Nagel, P. Suppes, and A. Tarski (eds.), Stanford, CA: Stanford University Press, pp. 252–261.
  • –––, 1967, “What is a Scientific Theory?,” In Philosophy of Science Today , S. Morgenbesser (ed.), New York: Basic Books, pp. 55–67.
  • –––, 1968, “The Desirability of Formalization in Science,” The Journal of Philosophy , 65 (20): 651–664.
  • –––, 1978, “The Plurality of Science,” PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1978 , (2): 3–16.
  • –––, 2002, Representation and Invariance of Scientific Structures , Stanford, CA: CSLI Publications.
  • Swoyer, C., 2014, “Relativism”, The Stanford Encyclopedia of Philosophy (Winter 2014 Edition), E. N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/win2014/entries/relativism/ >.
  • Thompson, P., 1989, The Structure of Biological Theories , Albany: SUNY Press.
  • –––, 2007, “Formalisations of Evolutionary Biology,” in Philosophy of Biology , M. Matthen and C. Stephens (eds.), Elsevier, Amsterdam, pp. 485–523
  • Thomson-Jones, M., 2012, “Modelling without Mathematics,” Philosophy of Science , 79 (5): 761–772. (Same individual as Jones 2005.)
  • Toulmin, S., 1972, Human Understanding: The Collective Use and Evolution of Concepts , Princeton: Princeton University Press.
  • Tuomi, J., 1981, “Structure and Dynamics of Darwinian Evolutionary Theory,” Systematic Zoology , 30 (1): 22–31.
  • –––, 1992, “Evolutionary Synthesis: A Search for the Strategy,” Philosophy of Science , 59 (3): 429–438.
  • Tversky, A., 1977, “Features of Similarity,” Psychological Review , 84 (4): 327–352.
  • Uebel, T., 2014, “Vienna Circle”, The Stanford Encyclopedia of Philosophy (Spring 2014 Edition), E. N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/spr2014/entries/vienna-circle/ >.
  • van Benthem J., 2012, “The Logic of Empirical Theories Revisited,” Synthese , 186 (3): 775–792.
  • van Fraassen, B., 1967, “Meaning Relations among Predicates,” Noûs , 1 (2): 161–179.
  • –––, 1970, “On the Extension of Beth’s Semantics of Physical Theories,” Philosophy of Science , 37 (3): 325–339.
  • –––, 1980, The Scientific Image , Oxford: Oxford University Press.
  • –––, 1981, “Theory Construction and Experiment: An Empiricist View,” PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1980 , (2): 663–678.
  • –––, 1989, Laws and Symmetry , New York: Oxford University Press.
  • –––, 2008, Scientific Representation: Paradoxes of Perspective , New York: Oxford University Press.
  • van Riel, R. and R. Van Gulick, 2014, “Scientific Reduction”, The Stanford Encyclopedia of Philosophy (Summer 2014 Edition), E. N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/sum2014/entries/scientific-reduction/ >.
  • Van Valen, L., 1976, “Domains, Deduction, the Predictive Method, and Darwin,” Evolutionary Theory , 1: 231–245.
  • Vicedo, M., 1995, “Scientific Styles: Toward Some Common Ground in the History, Philosophy, and Sociology of Science,” Perspectives on Science , 3: 231–254.
  • Vickers, P., 2009, “Can Partial Structures Accommodate Inconsistent Science?” Principia , 13 (2): 233–250.
  • Walsh, D., 2015, Organisms, Agency, and Evolution, Cambridge: Cambridge University Press.
  • Weisberg, M., 2013, Simulation and Similarity: Using Models to Understand the World , New York: Oxford University Press.
  • Wessels, L., 1976, “Laws and Meaning Postulates in van Fraassen’s View of Theories,” in PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1974 : 215–234.
  • Williams, M., 1970, “Deducing the Consequences of Selection: A Mathematical Model,” Journal of Theoretical Biology , 48: 343–385.
  • –––, 1973, “The Logical Status of Natural Selection and other Evolutionary Controversies: Resolution by Axiomatization,” in M. Bunge (ed.), The Methodological Unity of Science , Dordrecht: D. Reidel, pp. 84–102.
  • Wimsatt, W.C., 2007, Re-Engineering Philosophy for Limited Beings: Piecewise Approximations to Reality , Cambridge, MA: Harvard University Press.
  • Winsberg, E., 2010, Science in the Age of Computer Simulation , Chicago: University of Chicago Press.
  • –––, 2018, Philosophy and Climate Science , Cambridge: Cambridge University Press.
  • Winther, R.G., 2006a, “Parts and Theories in Compositional Biology,” Biology and Philosophy , 21 (4): 471–499.
  • –––, 2006b, “Fisherian and Wrightian Perspectives in Evolutionary Genetics and Model-Mediated Imposition of Theoretical Assumptions,” Journal of Theoretical Biology , 240 (2): 218–232.
  • –––, 2009, “Schaffner’s Model of Theory Reduction: Critique and Reconstruction,” Philosophy of Science , 76 (2): 119–142.
  • –––, 2011, “Part-Whole Science,” Synthese , 178 (3): 397–427.
  • –––, 2012a, “Mathematical Modeling in Biology: Philosophy and Pragmatics,” Frontiers in Plant Evolution and Development , 3: 102, doi:10.3389/fpls.2012.00102
  • –––, 2012b, “Interweaving Categories: Styles, Paradigms, and Models,” Studies in History and Philosophy of Science (Part A) , 43 (4): 628–639.
  • –––, 2014, “The Genetic Reification of ‘Race’? A Story of Two Mathematical Methods,” Critical Philosophy of Race , 2 (2): 204–223.
  • –––, 2020, When Maps Become the World , Chicago, IL: University of Chicago Press.
  • Winther, R.G., R. Giordano, M.D. Edge, and R. Nielsen, 2015, “The Mind, the Lab, and the Field: Three Kinds of Populations in Scientific Practice,” Studies in History and Philosophy of Biological and Biomedical Sciences , 52: 12–21.
  • Winther, R.G. and J.M. Kaplan, 2013, “Ontologies and Politics of Biogenomic ‘Race’,” Theoria. A Journal of Social and Political Theory (South Africa) , 60 (3): 54–80.
  • Woodger J.H., 1937, The Axiomatic Method in Biology , Cambridge: Cambridge University Press.
  • –––, 1959, “Studies in the Foundations of Genetics,” in The Axiomatic Method with Special Reference to Geometry and Physics: Proceedings of an International Symposium Held at the University of California, Berkeley, December 26, 1957 – January 4, 1958 , L. Henkin, P. Suppes, and A. Tarski (eds.), Amsterdam: North Holland, pp. 408–428.
  • Worrall, J., 1984, “An Unreal Image,” The British Journal for the Philosophy of Science , 35 (1): 65–80.
  • Wright, S., 1969, Evolution and the Genetics of Populations: A Treatise in Four Volumes (Volume 2: The Theory of Gene Frequencies), Chicago: University of Chicago Press.
  • Zach, R., 2009, “Hilbert’s Program”, The Stanford Encyclopedia of Philosophy (Spring 2009 Edition), E. N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/spr2009/entries/hilbert-program/ >.
  • Ziman, J., 2000, Real Science: What It Is, and What It Means , Cambridge: Cambridge University Press.
How to cite this entry . Preview the PDF version of this entry at the Friends of the SEP Society . Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entry at PhilPapers , with links to its database.
  • Koellner, P., ms., “ Carnap on the Foundations of Logic and Mathematics ,” unpublished.
  • Browse Philpapers on The Nature of Theories
  • Browse Philpapers on Theoretical Virtues
  • Browser Philpapers on Models and Idealization
  • Evolution Resources from the National Academies
  • Definitions of Fact, Theory, and Law in Scientific Work , National Center for Science Education (NCSE).

Carnap, Rudolf | cognitive science | confirmation | Darwinism | empiricism: logical | feminist philosophy, interventions: epistemology and philosophy of science | Feyerabend, Paul | genetics: population | incommensurability: of scientific theories | Kuhn, Thomas | models in science | model theory | paradox: Skolem’s | physics: structuralism in | pragmatism | rationality: historicist theories of | reduction, scientific | science: theory and observation in | scientific explanation | scientific realism | scientific representation | simulations in science | statistical physics: philosophy of statistical mechanics | structural realism | style: in mathematics | theoretical terms in science | underdetermination, of scientific theories | Vienna Circle

Acknowledgments

The following provided helpful feedback or conversation, or both, Jácome Armas, Nancy Cartwright, Mario Casanueva, Carl Craver, Eugene Earnshaw, Doc Edge, Michael Friedman, Sari Friedman, Fermín Fulda, Ryan Giordano, Ian Hacking, Hervé Kieffel, Elisabeth A. Lloyd, Helen Longino, Carlos López Beltrán, Greg Lusk, Sebastian Lutz, Sergio Martínez, Amir Najmi, Thomas Ryckman, Mette Bannergaard Johansen, Mette Smølz Skau, Bas van Fraassen, Denis Walsh, Ole Wæver, and two anonymous reviewers. Alex Dor, Cory Knudson, and Lucas McGranahan offered expert research assistance.

Copyright © 2020 by Rasmus Grønfeldt Winther < rgwinther @ gmail . com >

  • Accessibility

Support SEP

Mirror sites.

View this site from another server:

  • Info about mirror sites

The Stanford Encyclopedia of Philosophy is copyright © 2023 by The Metaphysics Research Lab , Department of Philosophy, Stanford University

Library of Congress Catalog Data: ISSN 1095-5054

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

1.2 - the 7 step process of statistical hypothesis testing.

We will cover the seven steps one by one.

Step 1: State the Null Hypothesis

The null hypothesis can be thought of as the opposite of the "guess" the researchers made. In the example presented in the previous section, the biologist "guesses" plant height will be different for the various fertilizers. So the null hypothesis would be that there will be no difference among the groups of plants. Specifically, in more statistical language the null for an ANOVA is that the means are the same. We state the null hypothesis as:

\(H_0 \colon \mu_1 = \mu_2 = ⋯ = \mu_T\)

for  T levels of an experimental treatment.

Step 2: State the Alternative Hypothesis

\(H_A \colon \text{ treatment level means not all equal}\)

The alternative hypothesis is stated in this way so that if the null is rejected, there are many alternative possibilities.

For example, \(\mu_1\ne \mu_2 = ⋯ = \mu_T\) is one possibility, as is \(\mu_1=\mu_2\ne\mu_3= ⋯ =\mu_T\). Many people make the mistake of stating the alternative hypothesis as \(\mu_1\ne\mu_2\ne⋯\ne\mu_T\) which says that every mean differs from every other mean. This is a possibility, but only one of many possibilities. A simple way of thinking about this is that at least one mean is different from all others. To cover all alternative outcomes, we resort to a verbal statement of "not all equal" and then follow up with mean comparisons to find out where differences among means exist. In our example, a possible outcome would be that fertilizer 1 results in plants that are exceptionally tall, but fertilizers 2, 3, and the control group may not differ from one another.

Step 3: Set \(\alpha\)

If we look at what can happen in a hypothesis test, we can construct the following contingency table:

You should be familiar with Type I and Type II errors from your introductory courses. It is important to note that we want to set \(\alpha\) before the experiment ( a-priori ) because the Type I error is the more grievous error to make. The typical value of \(\alpha\) is 0.05, establishing a 95% confidence level. For this course, we will assume \(\alpha\) =0.05, unless stated otherwise.

Step 4: Collect Data

Remember the importance of recognizing whether data is collected through an experimental design or observational study.

Step 5: Calculate a test statistic

For categorical treatment level means, we use an F- statistic, named after R.A. Fisher. We will explore the mechanics of computing the F- statistic beginning in Lesson 2. The F- value we get from the data is labeled \(F_{\text{calculated}}\).

Step 6: Construct Acceptance / Rejection regions

As with all other test statistics, a threshold (critical) value of F is established. This F- value can be obtained from statistical tables or software and is referred to as \(F_{\text{critical}}\) or \(F_\alpha\). As a reminder, this critical value is the minimum value of the test statistic (in this case \(F_{\text{calculated}}\)) for us to reject the null.

The F- distribution, \(F_\alpha\), and the location of acceptance/rejection regions are shown in the graph below:

Step 7: Based on Steps 5 and 6, draw a conclusion about \(H_0\)

If \(F_{\text{calculated}}\) is larger than \(F_\alpha\), then you are in the rejection region and you can reject the null hypothesis with \(\left(1-\alpha \right)\) level of confidence.

Note that modern statistical software condenses Steps 6 and 7 by providing a p -value. The p -value here is the probability of getting an \(F_{\text{calculated}}\) even greater than what you observe assuming the null hypothesis is true. If by chance, the \(F_{\text{calculated}} = F_\alpha\), then the p -value would be exactly equal to \(\alpha\). With larger \(F_{\text{calculated}}\) values, we move further into the rejection region and the p- value becomes less than \(\alpha\). So, the decision rule is as follows:

If the p- value obtained from the ANOVA is less than \(\alpha\), then reject \(H_0\) in favor of \(H_A\).

What Is a Testable Hypothesis?

  • Scientific Method
  • Chemical Laws
  • Periodic Table
  • Projects & Experiments
  • Biochemistry
  • Physical Chemistry
  • Medical Chemistry
  • Chemistry In Everyday Life
  • Famous Chemists
  • Activities for Kids
  • Abbreviations & Acronyms
  • Weather & Climate
  • Ph.D., Biomedical Sciences, University of Tennessee at Knoxville
  • B.A., Physics and Mathematics, Hastings College

A hypothesis is a tentative answer to a scientific question. A testable hypothesis is a  hypothesis that can be proved or disproved as a result of testing, data collection, or experience. Only testable hypotheses can be used to conceive and perform an experiment using the scientific method .

Requirements for a Testable Hypothesis

In order to be considered testable, two criteria must be met:

  • It must be possible to prove that the hypothesis is true.
  • It must be possible to prove that the hypothesis is false.
  • It must be possible to reproduce the results of the hypothesis.

Examples of a Testable Hypothesis

All the following hypotheses are testable. It's important, however, to note that while it's possible to say that the hypothesis is correct, much more research would be required to answer the question " why is this hypothesis correct?" 

  • Students who attend class have higher grades than students who skip class.  This is testable because it is possible to compare the grades of students who do and do not skip class and then analyze the resulting data. Another person could conduct the same research and come up with the same results.
  • People exposed to high levels of ultraviolet light have a higher incidence of cancer than the norm.  This is testable because it is possible to find a group of people who have been exposed to high levels of ultraviolet light and compare their cancer rates to the average.
  • If you put people in a dark room, then they will be unable to tell when an infrared light turns on.  This hypothesis is testable because it is possible to put a group of people into a dark room, turn on an infrared light, and ask the people in the room whether or not an infrared light has been turned on.

Examples of a Hypothesis Not Written in a Testable Form

  • It doesn't matter whether or not you skip class.  This hypothesis can't be tested because it doesn't make any actual claim regarding the outcome of skipping class. "It doesn't matter" doesn't have any specific meaning, so it can't be tested.
  • Ultraviolet light could cause cancer.  The word "could" makes a hypothesis extremely difficult to test because it is very vague. There "could," for example, be UFOs watching us at every moment, even though it's impossible to prove that they are there!
  • Goldfish make better pets than guinea pigs.  This is not a hypothesis; it's a matter of opinion. There is no agreed-upon definition of what a "better" pet is, so while it is possible to argue the point, there is no way to prove it.

How to Propose a Testable Hypothesis

Now that you know what a testable hypothesis is, here are tips for proposing one.

  • Try to write the hypothesis as an if-then statement. If you take an action, then a certain outcome is expected.
  • Identify the independent and dependent variable in the hypothesis. The independent variable is what you are controlling or changing. You measure the effect this has on the dependent variable.
  • Write the hypothesis in such a way that you can prove or disprove it. For example, a person has skin cancer, you can't prove they got it from being out in the sun. However, you can demonstrate a relationship between exposure to ultraviolet light and increased risk of skin cancer.
  • Make sure you are proposing a hypothesis you can test with reproducible results. If your face breaks out, you can't prove the breakout was caused by the french fries you had for dinner last night. However, you can measure whether or not eating french fries is associated with breaking out. It's a matter of gathering enough data to be able to reproduce results and draw a conclusion.
  • What Are Examples of a Hypothesis?
  • What Are the Elements of a Good Hypothesis?
  • What Is a Hypothesis? (Science)
  • How To Design a Science Fair Experiment
  • Understanding Simple vs Controlled Experiments
  • Scientific Method Vocabulary Terms
  • Hypothesis, Model, Theory, and Law
  • Theory Definition in Science
  • Null Hypothesis Definition and Examples
  • Six Steps of the Scientific Method
  • What 'Fail to Reject' Means in a Hypothesis Test
  • Scientific Method Flow Chart
  • Null Hypothesis Examples
  • What Is an Experiment? Definition and Design
  • Scientific Hypothesis Examples

science made simple logo

The Scientific Method by Science Made Simple

Understanding and using the scientific method.

The Scientific Method is a process used to design and perform experiments. It's important to minimize experimental errors and bias, and increase confidence in the accuracy of your results.

science experiment

In the previous sections, we talked about how to pick a good topic and specific question to investigate. Now we will discuss how to carry out your investigation.

Steps of the Scientific Method

  • Observation/Research
  • Experimentation

Now that you have settled on the question you want to ask, it's time to use the Scientific Method to design an experiment to answer that question.

If your experiment isn't designed well, you may not get the correct answer. You may not even get any definitive answer at all!

The Scientific Method is a logical and rational order of steps by which scientists come to conclusions about the world around them. The Scientific Method helps to organize thoughts and procedures so that scientists can be confident in the answers they find.

OBSERVATION is first step, so that you know how you want to go about your research.

HYPOTHESIS is the answer you think you'll find.

PREDICTION is your specific belief about the scientific idea: If my hypothesis is true, then I predict we will discover this.

EXPERIMENT is the tool that you invent to answer the question, and

CONCLUSION is the answer that the experiment gives.

Don't worry, it isn't that complicated. Let's take a closer look at each one of these steps. Then you can understand the tools scientists use for their science experiments, and use them for your own.

OBSERVATION

observation  magnifying glass

This step could also be called "research." It is the first stage in understanding the problem.

After you decide on topic, and narrow it down to a specific question, you will need to research everything that you can find about it. You can collect information from your own experiences, books, the internet, or even smaller "unofficial" experiments.

Let's continue the example of a science fair idea about tomatoes in the garden. You like to garden, and notice that some tomatoes are bigger than others and wonder why.

Because of this personal experience and an interest in the problem, you decide to learn more about what makes plants grow.

For this stage of the Scientific Method, it's important to use as many sources as you can find. The more information you have on your science fair topic, the better the design of your experiment is going to be, and the better your science fair project is going to be overall.

Also try to get information from your teachers or librarians, or professionals who know something about your science fair project. They can help to guide you to a solid experimental setup.

research science fair topic

The next stage of the Scientific Method is known as the "hypothesis." This word basically means "a possible solution to a problem, based on knowledge and research."

The hypothesis is a simple statement that defines what you think the outcome of your experiment will be.

All of the first stage of the Scientific Method -- the observation, or research stage -- is designed to help you express a problem in a single question ("Does the amount of sunlight in a garden affect tomato size?") and propose an answer to the question based on what you know. The experiment that you will design is done to test the hypothesis.

Using the example of the tomato experiment, here is an example of a hypothesis:

TOPIC: "Does the amount of sunlight a tomato plant receives affect the size of the tomatoes?"

HYPOTHESIS: "I believe that the more sunlight a tomato plant receives, the larger the tomatoes will grow.

This hypothesis is based on:

(1) Tomato plants need sunshine to make food through photosynthesis, and logically, more sun means more food, and;

(2) Through informal, exploratory observations of plants in a garden, those with more sunlight appear to grow bigger.

science fair project ideas

The hypothesis is your general statement of how you think the scientific phenomenon in question works.

Your prediction lets you get specific -- how will you demonstrate that your hypothesis is true? The experiment that you will design is done to test the prediction.

An important thing to remember during this stage of the scientific method is that once you develop a hypothesis and a prediction, you shouldn't change it, even if the results of your experiment show that you were wrong.

An incorrect prediction does NOT mean that you "failed." It just means that the experiment brought some new facts to light that maybe you hadn't thought about before.

Continuing our tomato plant example, a good prediction would be: Increasing the amount of sunlight tomato plants in my experiment receive will cause an increase in their size compared to identical plants that received the same care but less light.

This is the part of the scientific method that tests your hypothesis. An experiment is a tool that you design to find out if your ideas about your topic are right or wrong.

It is absolutely necessary to design a science fair experiment that will accurately test your hypothesis. The experiment is the most important part of the scientific method. It's the logical process that lets scientists learn about the world.

On the next page, we'll discuss the ways that you can go about designing a science fair experiment idea.

The final step in the scientific method is the conclusion. This is a summary of the experiment's results, and how those results match up to your hypothesis.

You have two options for your conclusions: based on your results, either:

(1) YOU CAN REJECT the hypothesis, or

(2) YOU CAN NOT REJECT the hypothesis.

This is an important point!

You can not PROVE the hypothesis with a single experiment, because there is a chance that you made an error somewhere along the way.

What you can say is that your results SUPPORT the original hypothesis.

If your original hypothesis didn't match up with the final results of your experiment, don't change the hypothesis.

Instead, try to explain what might have been wrong with your original hypothesis. What information were you missing when you made your prediction? What are the possible reasons the hypothesis and experimental results didn't match up?

Remember, a science fair experiment isn't a failure simply because does not agree with your hypothesis. No one will take points off if your prediction wasn't accurate. Many important scientific discoveries were made as a result of experiments gone wrong!

A science fair experiment is only a failure if its design is flawed. A flawed experiment is one that (1) doesn't keep its variables under control, and (2) doesn't sufficiently answer the question that you asked of it.

Search This Site:

Science Fairs

  • Introduction
  • Project Ideas
  • Types of Projects
  • Pick a Topic
  • Scientific Method
  • Design Your Experiment
  • Present Your Project
  • What Judges Want
  • Parent Info

Recommended *

  • Sample Science Projects - botany, ecology, microbiology, nutrition

scientific method book

* This site contains affiliate links to carefully chosen, high quality products. We may receive a commission for purchases made through these links.

  • Terms of Service

Copyright © 2006 - 2023, Science Made Simple, Inc. All Rights Reserved.

The science fair projects & ideas, science articles and all other material on this website are covered by copyright laws and may not be reproduced without permission.

  • Tools and Resources
  • Customer Services
  • Affective Science
  • Biological Foundations of Psychology
  • Clinical Psychology: Disorders and Therapies
  • Cognitive Psychology/Neuroscience
  • Developmental Psychology
  • Educational/School Psychology
  • Forensic Psychology
  • Health Psychology
  • History and Systems of Psychology
  • Individual Differences
  • Methods and Approaches in Psychology
  • Neuropsychology
  • Organizational and Institutional Psychology
  • Personality
  • Psychology and Other Disciplines
  • Social Psychology
  • Sports Psychology
  • Share This Facebook LinkedIn Twitter

Article contents

The social brain hypothesis and human evolution.

  • Robin I. M. Dunbar Robin I. M. Dunbar Department of Experimental Psychology, University of Oxford
  • https://doi.org/10.1093/acrefore/9780190236557.013.44
  • Published online: 03 March 2016

Primate societies are unusually complex compared to those of other animals, and the need to manage such complexity is the main explanation for the fact that primates have unusually large brains. Primate sociality is based on bonded relationships that underpin coalitions, which in turn are designed to buffer individuals against the social stresses of living in large, stable groups. This is reflected in a correlation between social group size and neocortex size in primates (but not other species of animals), commonly known as the social brain hypothesis, although this relationship itself is the outcome of an underlying relationship between brain size and behavioral complexity. The relationship between brain size and group size is mediated, in humans at least, by mentalizing skills. Neuropsychologically, these are all associated with the size of units within the theory of mind network (linking prefrontal cortex and temporal lobe units). In addition, primate sociality involves a dual-process mechanism whereby the endorphin system provides a psychopharmacological platform off which the cognitive component is then built. This article considers the implications of these findings for the evolution of human cognition over the course of hominin evolution.

  • social brain
  • social neuroscience
  • brain evolution
  • mentalizing
  • theory of mind

Introduction

Primates have unusually large brains for body size compared to all other vertebrates. The conventional explanation for this is known as the “social brain hypothesis,” which argues that primates need large brains because their form of sociality is much more complex than that of other species (Byrne & Whiten, 1988 ). This does not mean that they live in larger social groups than other species of animals (in fact, they don’t), but rather that their groups have a more complex structure. In exploring the nature of this unique kind of primate sociality, this article shall argue that, so far, social neuroscience has barely scratched the surface of what is actually involved in what it means to be social. To borrow an analogy, social neuroscience has devoted its time to examining the bricks and mortar in great detail but has so far overlooked the complexity of the building that lies at the real heart of primate (and human) sociality.

The original idea for the social brain dates back to the 1970s, when a number of primatologists suggested that primate intelligence might be related to the demands of their more complex social world (Jolly, 1969 ; Humphrey, 1976 ; Kummer, 1982 ), and the name itself was later coined by the neuroscientist Lesley Brothers ( 1990 ). The primary evidence in support of the social brain hypothesis comes from the fact that, across primates, there is a correlation between mean social group size and more or less any measure of brain size one cares to use (Fig. 1 ) (Dunbar, 1992 , 1998 ; Barton, 1996 ; Barton & Dunbar, 1997 ; Dunbar & Shultz, 2007 ; Dunbar, 2011a ), although the relationship improves as the measure of brain size is focused more toward the frontal lobes (Joffe, 1997 ; Dunbar, 2011a ). In this respect, primates differ from almost all other mammals and birds: in most birds and nonprimate mammals, large brains are associated not with social group size but with a monogamous mating system (Shultz & Dunbar, 2007 , 2010a , b ; Pérez-Barbería et al., 2007 ). Note that in Figure 1 there appears to be an obvious grade difference between the apes and the monkeys. This suggests that apes require a proportionately larger brain than monkeys to deal with groups of the same size, implying that their form of sociality requires more computing power to handle. More careful analysis has since revealed that there are similar grade differences among the monkeys, with a clear distinction between more and less intensely social species. As indicated in Figure 1 , extrapolating from the relationship between group size and neocortex size in apes predicts a natural social group size for humans of around 150 (Dunbar, 1993 ). There is considerable evidence for the existence of such a group size in terms of both natural human groupings (e.g., community sizes in small scale societies) and personal social networks (Dunbar, 2008 , 2011b ).

Figure 1. Mean social group size plotted against relative neocortex volume (indexed as the ratio of neocortex volume to the volume of the subcortical brain) in anthropoid primates. Filled circles: apes (including humans); unfilled circles: monkeys. The regression lines indicate grades of increasing socio-cognitive complexity (indexed by the increasing density of the line). (Redrawn from Dunbar, 2014 .)

Secondary support for the social brain hypothesis comes from neuroimaging studies, which have recently shown that the size of an individual’s living group (macaques: Sallet et al., 2011 ) or personal social network (humans: Lewis et al., 2011 ; Powell et al., 2012 , 2014 ; Kanai et al., 2012 ) correlates with the size of core regions of the brain, mostly in the temporal and, especially, the frontal lobes. These regions turn out to be essentially those involved in the mentalizing, or theory of mind, neural network. This is an important finding because it demonstrates that the social brain hypothesis applies not just at the level of the species but also at the level of the individual. Individuals with more processing capacity in core brain units have proportionately larger social networks.

Historically, a number of alternative ecological and developmental hypotheses have been proposed for why primates have such large brains (for an overview, see Dunbar, 2012b ). Among these, the importance of foraging skills, and especially the role of social learning of foraging skills, has attracted a great deal of interest (e.g., Reader et al., 2011 ). This is not the place to discuss the ensuing debates in detail, but some points of clarification are desirable. It is important to note at the outset that everyone agrees that foraging skills have played an important role in primate evolution; the critical question is whether these have been the main, or only, driver of increases in brain size or whether they are an evolutionary by-product of large brains evolving for other (perhaps mainly social) reasons because the same cognitive skills (causal reasoning, predictive reasoning, planning, etc.) underpin both kinds of behavioral outcomes.

In fact, ecology lies at the heart of all explanations for brain evolution, including the social brain hypothesis: the core differences between them are (1) whether animals solve their ecological problems by individual trial-and-error learning or do so socially and (2) which particular ecological problem (foraging or avoidance of predation) is the more fundamental selection pressure (i.e., the evolutionary driver). What makes the social brain hypothesis intrinsically social is that it claims that animals solve their ecological problems by first solving the problem of group cohesion and coordination. One reason for thinking this is that the primary ecological problem faced by primates (and probably most other animals) is the risk of predation (either by predator species or by conspecific raiders) rather than how to find food (as important as this is in the life of any animal). Primates, like most other animals, solve the predation risk problem by living in groups (van Schaik, 1983 ; Shultz et al., 2004 ; Shultz & Finlayson, 2010 ) and have opted to do so by evolving an unusual form of bonded sociality to maintain group coherence through time (Dunbar & Shultz 2010 ). In effect, primates solve the predation problem indirectly by first solving the problem of creating coherent, stable, coordinated social groups. The issue thus comes down to the task demands of foraging versus social coordination.

A second issue we need to clarify is that the social brain hypothesis has sometimes been seen as simply the quantitative relationship between social group size and brain size shown in Figure 1 . In fact, it should properly be seen in systemic terms as a set of causally related functional behavioral relationships. Animals need to solve a variety of ecological problems in order to be able to survive and reproduce successfully, and primates solve these problems communally in a way that requires them to solve a number of social and physiological problems first. In effect, primates establish the means to solve the ecological problem (an alliance or coalition) ahead of its need, and the capacity to form coalitions in anticipation of their future need seems to be a unique feature of monkey and ape behaviour (Harcourt, 1992 ). It is this that gives rise to the unique form of primate sociality that we refer to as “bonded sociality”, in contrast to the more casual groupings found in most other species of birds and mammals where social groups (herds) can fragment and come together relatively easily (Dunbar & Shultz, 2010 ).

Maintaining the coherence and cohesion of bonded social groups through time is very demanding because animals have to be able to override the natural tendency for the stresses of social life to drive them apart (Dunbar 2010a , 2012a ), and the social brain hypothesis argues that this comes down to resolving various tensions and stresses in both dyadic relationships and the collective set of relationships formed within a social group. To do this, monkeys and apes require novel cognitive skills, and these cognitive skills in turn require appropriate hardware (or wetware in this case) to underpin them. Hence, the relationship between brain size and group size is indirect, and the real functional relationship in the social brain hypothesis is that between brain (or brain region) size and/or wiring and social cognitive abilities or competences that allow primates to manage relationships (Dunbar, 1998 , 2011a , 2012a ). In effect, group size is an emergent property of how well the animals solve the problems associated with living in close proximity.

In other words, in contrast to all the alternative ecological hypotheses that have been proposed (for overviews, see Dunbar 2011a , 2012b ), the social brain hypothesis is a two-step explanation for the evolution of large brains in primates. In contrast to all alternative hypotheses, it explicitly claims that primates are doing something radically different to all other species of animals. The ultimate evolutionary driver is not simply the capacity to engage socially or live in large groups but the extent to which this allows the animals to solve the problems associated with successful survival and reproduction. The proximate mechanism involves solving the coordination problem that lies at the heart of maintaining cohesive social groups. To the extent that primates solve this second problem (group coordination), they also solve the first (predation risk).

What’s So Social About Primate Sociality?

All mammals and birds are, of course, social in some generic sense. The central premise of the social brain hypothesis is that sociality in anthropoid primates (and perhaps a very small number of other mammalian families, including elephants, the dolphin family, and maybe the camel family, that also live in complex, multi-level social systems: Hill et al., 2008 , Shultz & Dunbar, 2010a ) is a step up from this: it involves a more bonded form of sociality built around intense dyadic relationships (friendships) (Silk, 2002 ; Dunbar & Shultz, 2010 ; Massen et al., 2010 ). This form of bonded sociality is a response to the need to handle the stresses that arise when animals live in close proximity and cannot escape these pressures simply by leaving (i.e., by group fission). Living in groups creates significant stresses (mainly due to harassment from conspecifics) that radically affect female fitness (Dunbar, 1980 , 1988 ; Hill et al., 2000 ; Smuts & Nicholson, 1989 ; Roberts & Cords, 2013 ) via an endocrinological mechanism that is now relatively well understood. Among other effects, social stress destabilizes the female menstrual endocrinology system and results in amenhorrea (temporary infertility) (Bowman et al., 1978 ; Abbott et al., 1986 ). Unless animals are able to find solutions that buffer them against these and other costs, group fission is inevitable because the cumulative costs for low-ranking females in terms of lost reproduction can become intense. These stresses are a linear function of group size: the more animals there are in the group, the more individuals one can be harassed by. Moreover, sociality itself is costly: for both primates (Dunbar, 1991 ; Lehmann et al., 2007 ) and humans (Roberts & Dunbar, 2011 ; Miritello et al., 2013 ), relationships require the investment of considerable quantities of time for their maintenance, and this time cost is more or less proportional to the number of individuals involved multiplied by relationship quality (Sutcliffe et al., 2012 ). This is partly because the mechanism involved in creating and servicing relationships involves the endorphin system: the more frequently this is activated, the stronger the relationship. We'll return to this later.

The endogenous stresses that the animals face from living in groups act as a constraint on group size because they create centrifugal forces that, if not defused, will eventually cause the group to break up. In species that do not have bonded social systems (most non-monogamous birds and mammals), these stresses are resolved by individuals simply leaving one group to join a smaller one on an ad hoc basis (the bees-around-a-honeypot model of sociality). This solution is not available to species that live in bonded social systems because of the resistance to individuals transferring between groups created by bonded relationships: members of a group do not tolerate strangers.

Anthropoid primates deal with these stresses by forming defensive alliances mediated by social grooming (Dunbar, 1980 ; Silk et al., 2003 , 2009 ; Wittig et al., 2008 ), and this in turn gives rise to highly structured social networks (Dunbar, 2008 , 2012b ; Lehmann & Dunbar, 2009 ). It is this “decision” to use coalitions as a buffer for the stresses of group living that seems to create the complexity that is widely recognized as characteristic of primate societies. This social world is more complex to handle than the physical world, partly because it is dynamic and in a constant state of flux, and partly because it involves phenomena (other individuals’ mind states) that cannot be perceived directly but instead have to be inferred (Dunbar, 2010a , 2011a , 2012b ). In effect, social systems of this kind are implicit social contracts. For a group to be stable through time, its members have to be willing to allow each other to have a fair (though not necessarily equal!) share of the benefits of sociality. Failure to hold back on prepotent actions that would offer immediate benefits to oneself (such as stealing someone else’s newly discovered food item or displacing someone from a safe roosting site) risks driving others away and destabilizing group cohesion.

One explanation for the grade structure observed in Figure 1 is that this reflects a step-change in the complexity of primate social relationships and the behaviors that underpin them as neocortex volume increases. Indeed, across primates, neocortex size correlates with increasing use of sophisticated mating strategies, larger grooming cliques, higher frequencies of tactical deception, and the formation of coalitions (Pawłowski et al., 1998 ; Kudo & Dunbar, 2001 ; Byrne & Corp, 2004 ; Dunbar & Shultz, 2007 ), as well as increasing complexity of both visual (Dobson, 2009 ) and vocal (McComb & Semple, 2005 ) communication repertoires. One example of this is that cognitively more advanced species like macaques are aware of third-party relationships and refrain from attacking or exploiting another animal when they know that individual has powerful allies, even when those allies are not physically present (Datta, 1983 ). Computational models suggest that managing third-party relationships is more demanding in terms of information processing time than managing simple dyadic relationships (Dávid-Barrett & Dunbar, 2013 ). Similarly, playback experiments have demonstrated that baboons (another cerocpithecine) can integrate at least two different relationship dimensions (kinship and dominance) simultaneously, an ability that may be beyond cognitively less well-endowed species (Bergman et al., 2003 ).

There has been a near-universal tendency to assume that the social groups of all animals are “of a kind.” However, in anthropoid primates, grooming networks become increasingly substructured as the number of individuals in the group increases, especially so in species that have larger neocortices (Kudo & Dunbar, 2001 ; Hill et al., 2008 ; Lehmann et al., 2009 ; Lehmann & Dunbar, 2009b ). In effect, these species are able to maintain two qualitatively distinct kinds of relationship simultaneously: intimate relationships with principal grooming partners (allies) and weaker ones with other group members. In this respect, monkey and ape relationships resemble the two-tier structure of human social relationships, where parallel distinctions are drawn between weak and strong “ties” (Granovetter, 1973 ; Sutcliffe et al., 2012 ) and, cutting across the weak/strong divide, between family and friends (Curry et al., 2013 ; Roberts & Dunbar, 2011 ; Roberts et al., 2014 ). This gives the social systems of anthropoid primates (and those of a small number of other mammals) a layered structure (Hill et al., 2008 ) similar to that found in humans (Zhou et al., 2005 ; Hamilton et al., 2007 ; Dunbar et al., 2015 ). While in both humans and primates an individual’s relationships with the other members of their social group can be ranked on a simple continuum based on frequency of interaction (or emotional closeness: Sutcliffe et al., 2012 ; Roberts et al., 2014 ), these nonetheless cluster into quite discrete layers of very distinctive size, as shown in Figure 2 . The numerical sizes of these grouping layers seem to be common to both human social networks and the structure of primate social groups (Hill et al., 2008 ), and one explanation for the differences in social complexity between species may be the number of layers that can be maintained as a coherent, stable system.

Figure 2. The circles of acquaintanceship for normal human adults. Ego indicates the subject of the network. Normal adult humans are embedded in a series of hierarchically inclusive layers of friendship, with each successive layer enclosing a larger number of individuals at a progressively lower level of emotional closeness. The layers have very distinct sizes, with a scaling ratio that approximates three (each layer is three times the size of the layer immediately inside it). The average sizes of each layer are indicated by the numbers against each circle in Figure 2 , although there is considerable individual variation. The circle of ~150 corresponds to the number of individuals with whom one has reciprocated relationships of trust, obligation, and reciprocity. Beyond the 150 layer there are at least two further layers: the layer of acquaintances (totaling ~500 individuals) and the number of faces one can put names to (~1500 individuals). While the two innermost layers (at 5 and 15) tend to be densely interconnected and constitute a single subnetwork, the remaining layers typically consist of more isolated sets of subnetworks (work colleagues, different sets of hobby club friends, church friends, distant family, etc.) for whom the only connection is via Ego. Each of the four innermost layers is typically split between extended family members and unrelated friends, with an overall ratio of about 50:50 (Sutcliffe et al., 2012 ).

In humans at least, there is evidence suggesting that the size of an individual’s personal social network correlates with their mentalizing competences, indexed as the ability to solve multiple-individual false belief tasks (Stiller & Dunbar, 2007 ; Lewis et al., 2011 ; Powell et al., 2012 ). Mentalizing, perhaps the archetypal form of social cognition, is the ability to handle other individuals’ mind states simultaneously and forms a naturally recursive sequence from first order intentionality (I know my own mind state) through second order (I know that A knows something—otherwise known as formal theory of mind) to a maximum of around fifth order (I know that A knows that B knows that C knows that D knows something) in most normal adult humans (Stiller & Dunbar, 2007 ). Since mentalizing competences (the number of different mind states one can have in mind at the same time) correlate with the volume of core areas in the frontal lobes (Lewis et al., 2011 ; Powell et al., 2012 , 2014 ), it follows that maintaining larger social groups is more demanding in terms of the need to allocate neural resources to those regions of the brain implicated in this task.

Further evidence that social cognition is likely to impose limits on social group size comes from an agent-based model that used processor time to assess the cognitive demands of different levels of information processing associated with managing relationships: this demonstrated not only that more complex information processing is more demanding but, more importantly, that this in turn sets limits on the size of group that can be maintained (Dàvid-Barrett & Dunbar, 2013 ; see also McNally et al., 2012 , Moreira et al., 2013 ). It may be no coincidence, then, that the social brain graph in fact consists of a series of socio-cognitive grades (Dunbar, 1993 , 2011a ; Lehmann et al., 2007 ).

There is evidence that social cognition is itself significantly more demanding than more conventional forms of cognition. We have shown, using both reaction time experiments and fMRI in humans, that mentalizing tasks (those that involve modeling the mental states of other individuals [for more details, see below]) are cognitively more demanding than equivalent non-mentalizing (i.e., purely factual memory) tasks and involve the recruitment of more neural circuitry, and that the magnitude of this difference increases with the complexity of the proposition being processed (Lewis et al., forthcoming ). One reflection of the fact that social cognition may be very costly is that it seems to develop much more slowly than more conventional instrumental cognition. In humans, emotional cue recognition (Deeley et al., 2008 ) and aspects of social cognition such as theory of mind (Blakemore & Choudhury, 2006 ; Henzi et al., 2007 ) can take as long as two decades to mature: their developmental progress seems to map onto the slow process of myelinization in the frontal lobes, which in humans is not completed until well into the third decade (Sowell et al., 2001 , 2003 ; Gogtay et al., 2004 ). Socialization seems to play an important role in this: Joffe ( 1997 ) showed that, across primates, the best predictor of the non-V1 neocortex volume is the length of the period of socialization (the period between weaning and puberty), suggesting that a considerable amount of practice over a lengthy period is required to develop the skills that underpin the social brain. These findings suggest that social skills require conscious thought in frontal lobe units before they eventually become automated and localized elsewhere in the cortex or subcortical regions (in humans, as late as the mid-20s). In other words, merely having a big computer (i.e., brain) is not enough: the hardware requires programming, and this is in large part dependent on extensive social experience. This is social learning on a dramatic scale and may explain why social learning appears to be so important in primates (Reader et al., 2011 ). A useful by-product of this is that the cognition that underpins social learning in this context then becomes available for the exchange of factual information about foraging among adults. Although this has sometimes been interpreted as the driver of brain evolution on the basis of correlational evidence (Reader & Laland, 2002 ; Reader et al., 2011 ; Pasquaretta et al., 2015 ), it could, in fact, just as easily be a consequence rather than the cause of brain evolution—a possibility that, surprisingly perhaps, never seems to have been considered.

Neuropsychology and the Social Brain

In primates, the neocortex accounts for a very large proportion of total brain volume (50–80%, compared to 10–40% in all other mammals) (Finlay et al., 2001 ). This probably explains why even total brain volume on its own gives a reasonable correlation with group size and other social variables in primates—subject to some error variance introduced by species like the gorilla and orangutan that have unusually large cerebella and relatively small neocortices and for whom neocortex size gives a significantly better prediction of community size than does total brain size (Dunbar, 1992 , 2011a ). The fit is improved by excluding striate cortex (the primary visual area, V1, in the occipital lobe: see Fig. 3 ) (Joffe & Dunbar, 1997 ), and it is improved still further by narrowing the focus down to the frontal lobes (Dunbar, 2011a ), implying that the automated processing of incoming perceptual stimuli is not itself a major component of the social brain processes—and why would it be, given that it is the meaning attached to these percepts rather than the percepts themselves that lies at the heart of complex sociality? Since the successive visual processing areas (V1 through V5/MT) scale isometrically with each other up through the occipital and parietal lobes (Dougherty et al., 2003 ; Yan et al., 2009 ), it is likely that the fit would be improved still further by excluding these and other basic perceptual processing regions in the brain (i.e., by focusing mainly on the social cognition circuits in the frontal and temporal lobes). Nonetheless, the fact that the brain acts as a distributed processing network may explain why many of the comparative analyses reveal respectable correlations between social behavior and relatively large brain regions like the neocortex.

Figure 3. The main brain regions involved in mentalizing (the “theory of mind network”). PFC, prefrontal cortex; ACC, anterior cingulate cortex (buried within the cortex); TPJ, temporoparietal junction; STS, superior temporal sulcus; V1, primary visual cortex (striate cortex) in occipital lobe. Dashed arrows indicate the principal connections of the “theory of mind” network.

A number of analyses have shown that executive function skills also increase with brain (or brain region) volume (Dunbar et al., 2005 ; Deaner et al., 2006 ; Shultz & Dunbar, 2010b ; Reader et al., 2011 ). Inevitably, these analyses rely on extremely coarse anatomical resolutions and so have not allowed us to narrow down the cortical circuits involved in any detail (although the availability of more sophisticated imaging techniques may offer new opportunities in this respect; see Mars et al., 2014 ). In the only serious attempt to address this issue to date, Passingham and Wise ( 2012 ) concluded that some brain regions (notably the dorsal prefrontal cortex and the frontal pole [Brodman area 10 at the very center of the forehead]; Fig. 3 ) are crucial for causal evaluation and strategic planning in anthropoid primates. However, their analysis was inevitably based on a very small sample of species. That said, the question as to what function(s) these competences subserve remains open: they may well be generic skills required for all forms of decision-making. All the experimental tests on which these studies are based (“odd-one out” problems, mapping tasks, analogical reasoning, causal reasoning) involve tasks that are essentially instrumental (mainly foraging tasks) rather than social ones. The problem for comparative psychology has always been that genuinely social tasks are not easy to devise: they tend to have long time delays to their outcomes (sometimes on the scale of a lifetime; see Silk et al., 2003 , 2009 ), and experimentalists require an immediately measurable outcome. This has been compounded by a long-held and widespread assumption that, in the wild, animals do very little other than sleep and search for food. Historically, there has been no incentive to devise more complex tasks.

If the different social and ecological uses to which primates put their brains depend on essentially the same cognitive mechanism (and, in particular, the same second-order cognitive processes such as causal reasoning, one-trial learning, analogical reasoning, comparison between two or more alternative projections into the future; Passingham & Wise, 2012 ), it may not be too surprising that there is evidence to support both the instrumental and the social hypotheses. However, a task analysis suggests that, while certain kinds of cognition are likely to be common to all the functional hypotheses for primate brain evolution, there is a natural asymmetry among the hypotheses. The kinds of cognition required to support bonded relationships may allow social (i.e., cultural) transmission of information or novel foraging behaviors, but the reverse is probably not the case; similarly, the kinds of cognition required to support social transmission of foraging skills would likely allow individual trial-and-error learning of foraging behavior, but the reverse is not the case. This is especially likely to be true to the extent that the real complexity of social relationships depends on the need to model other individuals’ minds and behavior in a virtual mental state space, something that seems to be cognitively very demanding even for humans (Lewis et al., forthcoming ). Some evidence to support this suggestion is provided by one of the few experimental studies to compare social and instrumental cognitive skills across primate species directly: Herrmann et al. ( 2007 ) found striking differences between humans and great apes in performance on social tasks but much less so on instrumental tasks.

This suggests (1) that the cognitive demands of instrumental tasks are significantly less than those of social tasks and (2) that the ability to manage social tasks depends crucially on frontal lobe volume (in particular). It would seem that only the social hypotheses would naturally provide for the other hypotheses as emergent by-products. This is not to say that cognitive evolution did not begin with solving simple ecological problems like food-finding (it almost certainly did), but rather to suggest that the demands of social cognition have resulted in additional more sophisticated cognitive competences being added to this mix and that these have, in turn, then allowed more sophisticated food-finding strategies.

In the previous section, it was suggested that mentalizing may be central to complex sociality in humans because it allows individuals to work with virtual representations of other individuals in a mental state space. Meta-analyses of a large number of neuroimaging studies of theory of mind in humans have identified the medial and/or orbitofrontal prefrontal cortex (PFC) as being differentially activated during mentalizing tasks in more than 90% of studies, the temporoparietal junction in 58%, the anterior cingulate cortex in 55%, and superior temporal sulcus (STS) in 50%; other regions that were less commonly activated included the amygdala and the insula (13% of studies in both cases) (Carrington & Bailey, 2009 ; see also Gallagher & Frith, 2003 ; van Overwalle, 2009 ; Apperly, 2012 ). Figure 3 shows the relative locations of these regions in the brain. It is well known that lesions in the prefrontal cortex specifically disrupt social skills, whereas those elsewhere typically do not (Kolb & Wishaw, 1996 ), while the role of the prefrontal cortex and the temporoparietal areas in managing false belief tasks (the benchmark for theory of mind) has been confirmed experimentally using transcranial magnetic stimulation to knock these regions out during experimental tasks (Costa et al., 2008 ). Recently, Makinodan et al. ( 2012 ) reported that mice that had been socially isolated immediately after weaning exhibited irrecoverable functional deficits in both the prefrontal cortex and its myelination, indicating that there may be a critical period that is vital for neurotypical development in a region that is crucial for normal adult social behavior.

This network also appears to be present in at least the catarrhine primates (Rushworth et al., 2013 ), although it is unlikely that it is capable of producing fully functional theory of mind sensu stricto in these species. What it probably does allow is perspective-taking, and that may be an important evolutionary and developmental precursor for full-blown theory of mind as well as being functionally essential for much of what is involved in the social interactions of nonhuman primates. There is considerable evidence that great apes, at least, are able to take others’ perspective into account (Hare et al., 2000 , 2001 ), and perspective-taking is probably crucial to managing monogamous pair-bonded relationships, since monogamy requires close coordination between the pair in a way that is not as necessary in the more fluid social systems that characterize most birds and mammals. Perspective-taking may thus have provided the initial step that started the evolutionary process that eventually gave rise to the evolution of full-blown mentalizing (Dunbar, 2011b ). This would explain why large brains are associated with monogamous mating systems rather than with group size in birds and non-primate mammals (Shultz & Dunbar, 2007 ; Pérez-Barbería et al., 2007 ).

In humans, damage to these prefrontal regions is associated with dramatic (and usually catastrophic) changes in personality and empathy, commonly resulting in socially inappropriate behavior (Adolphs, 1999 ) as well as more directly utilitarian responses on emotionally salient moral dilemmas such as the “trolley task” (Koenigs et al., 2007 ). More broadly, there is evidence from clinical studies that lesions in the prefrontal cortex tend to disrupt the processing (manipulation) of knowledge as well as social skills, whereas lesions in the temporal cortex tend to disrupt factual knowledge but leave the processing of social knowledge unaffected (Roca et al., 2010 ; Woolgar et al., 2010 ). Low densities of gray matter in the prefrontal cortex have also been linked to socially dysfunctional conditions such as schizophrenia (Lee et al., 2004 ; Yamada et al., 2007 ). More importantly for present purposes, individual differences in mentalizing competences in normal human adults correlate with the volume of neural matter in the key regions of the theory of mind network, especially those in the frontal lobes (Lewis et al., 2011 ; Powell et al., 2010 , 2014 ).

Seeley et al. ( 2007 ) have suggested that the regions associated with mentalizing constitute two distinct functional networks: an “executive control” network (involving mainly the dorsolateral prefrontal cortex and parietal areas) and an “emotional salience” network (involving mainly the anterior insular cortex and the anterior cingulate cortex, the amygdala and the hypothalamus), although the former may be specifically associated with rational thinking (“fluid IQ”) rather than social cognition per se (Woolgar et al., 2010 ). Nonetheless, emotion and cognition are not entirely independent of each other: the anterior insula and the medial prefrontal cortex are included in both networks, suggesting some level of interaction between the two networks (Craig, 2009 ).

The prefrontal cortex seems to be crucially involved in the management of social relationships in both humans (Powell et al., 2010 , 2012 , 2014 ; Lewis et al., 2011 ; Kanai et al., 2012 ) and macaques (Sallet et al., 2013 ). More importantly, perhaps, Powell et al. ( 2012 ) have shown, using path analysis, that there is a clear causal sequence here: individual differences in orbitofrontal cortex volume determine mentalizing competences (how well individuals do on multi-level/multi-individual false belief tasks), and mentalizing competences in turn determine the individual’s social network size. In humans, the medial and mid-prefrontal cortex is also associated with moral judgment, critical assessment, and core executive functions related to self-control, deception, and lying (MacDonald et al., 2000 ; Karton & Bachmann, 2011 ), all of which are associated with both social skills in general and theory of mind in particular.

This relationship between mentalizing competences and the volume of the frontal lobe in humans seems to be mirrored in the comparative evidence from primates. It is generally accepted that monkeys do not have theory of mind (second order intentionality) and are thus effectively first order intentional (they are aware of their own mental states, but not those of other individuals). In contrast, there is some evidence to suggest that great apes do understand others’ mind states (chimpanzees: O’Connell & Dunbar, 2003 ; Hare et al., 2000 , 2001 ; Crockford et al., 2012 ; orangutans: Cartmill & Byrne, 2007 ): although they are certainly not as good at formal theory of mind (the ability to pass false belief tests) as 6-year-old children (almost all of whom are fully expert on the task), they are about as good as 4-year-olds (most of whom are on the verge of acquiring this skill). By contrast, normal adult humans have been repeatedly shown to cope with fifth order intentionality (Kinderman et al., 1998 ; Stiller & Dunbar, 2007 ; Powell et al., 2010 ). For the limited data available, these competency levels turn out to map linearly against frontal lobe volume (Fig. 4 ). Figure 4 also plots the putative position of other monkey and great ape species for whom frontal lobe volume data are available on the assumption that their mentalizing competences are the same as those of the other members of their respective taxa. Notice how all these points cluster very tightly around the regression line: no monkey has a frontal lobe volume large enough to move it up to second order, and no great ape has one small enough to move it down to first order or large enough to move it up to third order. These data seem to tell us that mentalizing competences (whatever they actually are) are a function of the absolute volume of the frontal lobes (and most likely gray matter regions within the prefrontal cortex). It is important to appreciate that we still do not really understand what theory of mind (or mentalizing, more generally) actually involves cognitively (Roth & Leslie, 1998 ). Nonetheless, it seems to provide us with a convenient and reliable natural scale of social cognitive abilities, whatever the actual cognitive mechanisms involved may be.

Figure 4. Mentalizing competences (indexed as the maximum achievable order of intentionality) of six Old World monkey and four great ape species, plotted against frontal lobe volume. Monkeys are generally assumed to be first order intentional; experimental evidence suggests that chimpanzees and orangs are just about second order intentional, whereas adult humans are fifth order intentional. Species for whom mentalizing competences have been estimated experimentally (left to right: chimpanzees, orangutans, and humans) are indicated by solid symbols; species for whom mentalizing competences are not known but who are assumed to have the same mentalizing competences as other members of their taxonomic family are indicated by open symbols. (Redrawn from Dunbar, 2009 . Frontal lobe volume data from Bush & Allman, 2004 .)

Indeed, the conventional mentalizing (or intentionality) scale essentially treats all competences below full theory of mind (i.e., second order intentionality) as a homogeneous set. This is almost certainly a radical oversimplification. Sperber & Wilson ( 1986 ) argued that there is a series of finer scale gradations at the lower end of this scale (see also Gärdenfors, 2012 ). This makes sense in the light of the fact that, behaviorally, some species of animals (baboons, macaques, spider monkeys) seem to be socially and cognitively more complex than other species (e.g., colobines, howlers, antelope) (Deaner et al., 2006 , 2007 ; Shultz & Dunbar, 2010b ), despite the fact that on the conventional scale all would be regarded as first order intentional. Unpacking the lower end of the scale may allow us to evaluate better the cognitive differences between the different nonhuman species.

Passingham and Wise ( 2012 ) have pointed out that anthropoid primates are characterized by the evolution of entirely new regions in the prefrontal cortex (in particular Brodman area 10, the frontal pole; Fig. 3 ) that are not present in prosimians or other mammals (see also Sallet et al., 2013 ). They argue that these new regions allowed monkeys and apes to engage in cognitive strategies that other mammals (including prosimian primates) are unable to master. These include one-trial learning (as opposed to more laborious forms of association learning), propositional reasoning, and the capacity to compare the future consequences of two or more alternative behavioral strategies (Passingham & Wise, 2012 ). Among the anthropoid primates, it seems that only the callitrichids (marmosets and tamarins) lack area 10—which might account for this taxon’s unusually labile social system, which can flip rapidly between monogamy, polygamy, polygynandry, and polyandry (Dunbar, 1995a ,b; Opie et al., 2013 ), and the fact that their neocortex:group size ratio is completely out of line with those of all obligately monogamous primates (Dunbar, 2010b ).

So far in this section, we have focused in a rather conventional way on the neuroanatomy of sociality. There is, however, an important aspect of the neurobiology of primate sociality that we need to consider, and this has to do with the role played by neuroendocrines. Much fuss has been made of the role of oxytocin in social relationships (Insel & Young, 2001 ); this mechanism is certainly widely distributed among mammals and has been shown to correlate with some aspects of social behavior in both chimpanzees (Crockford et al., 2013 , 2014 ) and humans (Kosfeld et al., 2005 ) (for an overview, see Dunbar 2010c ). However, the oxytocin system habituates very quickly (Dunbar, 2010c ). More importantly, it is an endogenous response that appears to be insensitive to relationship quality or quantity: it causes individuals to act more or less affiliatively depending on the expression of the relevant gene, but it does not allow them to influence the responses of the individuals with whom they interact. It has been argued that the very unusual kind of bonded social relationships that are found in anthropoid primates (Silk, 2002 ; Shultz & Dunbar, 2010a ; Massen et al., 2010 ) necessitated a more robust bonding mechanism, and this involved exploiting the endorphin system (van Wimersma Greidanus et al., 1988 ; Panksepp et al., 1997 ; Depue & Morrone-Strupinsky, 2005 ; Curley & Keverne, 2005 ; Broad et al., 2006 ; Barr et al., 2008 ; Dunbar, 2010b ; Machin & Dunbar, 2011 ; Resendez et al., 2013 ).

In primates, endorphin activation is triggered by social grooming (Keverne et al., 1989 ), and we have been able to show, using positron emission tomography (PET), that light stroking of precisely the kind that so characteristically defines social grooming in primates also triggers endorphin activation in the human brain, and frontal lobe in particular (Nummenmaa et al., under review). It seems likely that this mechanism is mediated by the afferent c-tactile neurons, a unique set of unmyelinated (hence slow) neurons that respond only to slow stroking and which are not associated with a return motor loop from the brain (Olausson et al., 2010 ; Morrison, 2012 ; Vrontou et al., 2013 ). The significance of this is that the endorphin system responds exogenously (i.e., it is triggered in the recipients of grooming by their social partners, rather than merely endogenously in the groomer as is the case for oxytocin) and so is more responsive to both the quantity of time invested in a relationship and the number of social partners. An endorphin agonist, such as morphine, increases the attractiveness ratings of faces as well as the motivation for continuing to view them, whereas antagonists like naltrexone decrease both (Chelnokova et al., 2014 ). Similarly, PET studies reveal that the density of μ ‎-receptors (the opioid receptors that have a particular affinity for β ‎-endorphins) in core areas of the brain correlate with both the size of personal social network (Nummenmaa et al., under review) and an individual’s attachment style (Nummenmaa et al., in press). These findings suggest a central role for endorphins in the processes that underpin social relationships.

Building a close relationship with someone requires time, and there is a strong correlation between time devoted to socializing with an individual and willingness to support or offer help to that individual in both monkeys (Dunbar, 1980 , 2012a ) and humans (Roberts & Dunbar, 2011 ; Curry & Dunbar, 2013 ; Sutcliffe et al., 2012 ; Curry et al., 2013 ). By triggering endorphin activiation, time spent interacting—grooming in the case of primates, engaging in laughter (Dunbar et al., 2012b ) and perhaps other activities as well as affective touch (Nummenmaa et al., in press) in the case of humans—probably sets up an emotional attachment that allows a very rapid response based on a quantitative index of the quality of the relationship.

In sum, primate social bonding seems to involve a two-process mechanism. In effect, the endorphin system is used to create an internal psychopharmacological platform that enables the individuals to develop a more cognitive long-term relationship that involves reciprocity, obligation, and trust (Sutcliffe et al., 2012 ). The latter, of course, is where the social brain comes in, but it is important to appreciate that beneath the simple group–brain size correlation there is a more complex neurobiological story as well as a more complex behavioral superstructure that is supported by these neurological mechanisms.

Neuropsychological research offers considerable potential for understanding both the processing demands of different kinds of cognition and how these relate to neurological pathways in the brain, and hence to the volumetric demands on different brain units and their interconnections (see also Mars et al., 2014 ). Although there has been considerable interest in social cognition in the recent neuroimaging literature, much of it has typically been concerned with judgments of trustworthiness or with reward and punishment in simple dyadic contexts (e.g., Knoch et al., 2006 ; Behrens et al., 2008 ; Lebreton et al., 2009 ). While this clearly provides valuable insight into how such judgments are made, it does not really capture the richness of the social world in which humans and other primates live. Nor does it engage with the question of just how and why humans differ from other primates, or why anthropoid primates differ from other mammals not just in cognitive abilities but also in their social style. It is these issues that need to be addressed, and so far they have been conspicuous by their absence from the literature on brain evolution.

Social Cognition and Human Evolution

Human evolution has always been viewed through the lens of anatomy and archaeology, with a clear focus on the “stones and bones” of the archaeological record. While this has spawned an interest in the cognitive aspects of human evolution (sometimes referred to as cognitive archaeology; Renfrew & Zubrow, 1994 ), in practice the focus has been on task analyses of the demands of tool-making (e.g., Gowlett, 2006 ). More recently attempts have been made to relate these to mentalizing abilities (Barham, 2010 ; Cole, 2012 ). However, Gamble et al. ( 2011 ) and Gowlett et al. ( 2012 ) remind us that the processes of evolution, and human evolution in particular, do not proceed through material culture as such but through the behavior and minds of the people who made the material culture. Here, social cognition is likely to play an especially important role, and, difficult as this may be to study, it needs to be given much more attention.

Although archaeologists have shied away from grappling with social and cognitive evolution, our growing knowledge of the finer details of the cognitive differences between both human and other primates and, at the level of individual differences, within humans offers the possibility of a more principled approach. Given the explicit quantitative relationships between social and cognitive traits and brain (or brain region) volumes, it may, for example, be possible to make more informed inferences about human cognitive evolution. We do not, of course, have access to soft tissue morphology from fossil species, but there has been a long tradition within paleoanthropology of making inferences about brain composition from the impressions created on the inside of the skull by the brain (Bruner, 2010 ; Bruner et al., 2003 ). More importantly, perhaps, the tight allometric scaling between brain regions in living primates allows us to make inferences about the sizes of these units in fossil specimens, given observed cranial volumes. It is, of course, necessary to be cautious in interpreting individual cases, given that there are well-known exceptions to these allometric relationships in living primates (e.g., the large cerebella and small neocortices of the gorilla and orangutan that we noted earlier). Other exceptions include the impact that latitude has on the size of the visual system in both modern humans (Pearce & Dunbar, 2012 ; Pearce & Bridge, 2013 ) and Neanderthals (Pearce et al., 2013 ), which in the latter case at least results in a smaller neocortex than would be predicted on the basis of cranial volume. Nonetheless, such extrapolations from general equations can tell us something about the overall pattern of evolution. What is important here is that these trajectories are not open-ended: we know roughly where the trajectory started (essentially, the brain composition and cognition of great apes) and where it ended (those of modern humans); our problem is to infer how the changes that must have occurred are strung out between these two endpoints. This will be illustrated here with just two contrasting examples.

The easiest and most secure extrapolation is that for social group size, since the social brain relationship is robust and empirically well substantiated. Using standard allometric equations to interpolate from cranial volume to neocortex volume, we can estimate the community sizes for individual fossil specimens of the main hominin species (Fig. 5 ). The community sizes for living chimpanzees are shown on the left side of the graph for comparison. Two things may be noted. First, for most of early human evolution (the australopithecine phase, represented by the genus Australopithecus and its allies) predicted community sizes do not differ from those observed in living chimpanzees. In effect, early hominins were just ordinary great apes. Second, community size undergoes a rapid increase with the appearance of the genus Homo at around 2 million years ago, stabilizes for about a million and a half years, and then increases rapidly and exponentially through archaic humans ( Homo heidelbergensis and allies) into modern humans. To the extent that community size represents the outcome of the cognitive processes that underpin the social brain, these data reflect the pattern of change in cognition over time.

Figure 5. Median (±50% and 95% ranges) social group for the main hominin species, in temporal order of appearance. Social group is estimated by interpolating through a series of equations from cranial volume, via brain size and neocortex size, to group size (using the relationship shown for apes in Fig. 1 ). The equations are given in Aiello & Dunbar ( 1993 ) and Gowlett et al. ( 2012 ). The values are for individual fossil specimens. The equivalent values for individual chimpanzee populations, based on actual community sizes, are shown on the left. (After Gowlett et al., 2012 .)

We can, however, go one step further by considering cognition directly in the form of mentalizing competences, bearing in mind that these are almost certainly simply an emergent index of more conventional forms of cognition. Given that these appear to correlate linearly with the size of the frontal lobe, and, in general, brain units all correlate with total brain volume, it is in principle a simple matter of interpolating through a series of equations from cranial volume to mentalizing abilities. These are plotted in the same way for all major hominin species in Figure 6 . The values for Neanderthals are corrected to take account of their larger occipital lobes and smaller frontal lobes, reflecting their relatively larger visual system (Pearce et al., 2013 ). Once again, our benchmarks are provided by great apes at level 2 intentionality and modern humans at level 5, and our problem is simply to decide the pattern of change between these two fixed points.

Figure 6. Median (±50% and 95% ranges) mentalizing competences, indexed as the maximum achievable level of intentionality, for the main hominin species, in temporal order of appearance. Mentalizing competences are estimated by interpolating through a series of equations from cranial volume, via brain size and frontal lobe volume, to intentionality level (using the relationship for Fig. 5 , and the equation for mentalizing competences from Dunbar 2010a ). The values are for individual fossil specimens. (After Dunbar, 2015 .)

Two points may be noted from this graph. First, once again, australopithecines were simply jobbing great apes, with no particular pretensions to advanced cognition. Second, all fossil anatomically modern humans (i.e., members of our own species) typically achieve level 5 intentionality, but no archaic humans (including Neanderthals) were likely to have done so. To be sure, all of these would have made level 4 intentionality, which, in the grand scheme of things, is itself pretty impressive: they would not have been intellectual slouches by any means. In cognitive terms, they would have been in the same bracket as the lower end of the normal distribution for modern human adults, and at about the same intellectual level as young teenagers. However, this key difference between archaic and modern humans would have had crucial implications in respect to their capacities for both language and culture.

In normal adult humans, individual differences in the ability to manage complex multi-clause sentences correlates one-to-one with mentalizing competences (Oesch & Dunbar, under review). In other words, mentalizing competences seem to determine how complex our language can be, and this would have had inevitable consequences both for the length of the propositional chains that Neanderthals could have managed and, hence, for the complexity of the stories they told. It may also have had implications for the complexity of the culture that these species would have been able to produce, and this at least seems to be borne out by the archaeological evidence. Attempts to claim that Neanderthal culture was as complex as that of contemporary anatomical modern humans (e.g., Zilhão et al., 2010 ) notwithstanding, the fact is that neither the Neanderthals nor the other archaic humans produced cultural artefacts that were nearly as sophisticated as those of contemporary anatomically modern humans (Klein, 1999 ). Neanderthal tools lacked both the technical sophistication of those developed by modern humans (multi-component tools like bows and arrows or spear-throwers) and the capacity to miniaturize (fine bone and flint points that functioned as arrowheads, buttons, awls, needles), and there is no evidence at all to suggest that they ever produced the kinds of “frivolous” material culture (Venus figurines, toys, cave paintings) that modern humans began to produce in abundance around the time the Neanderthals went extinct (Dunbar, 2015 ). This may be associated with the fact that several genes associated with both brain enlargement and neural efficiency in humans show evidence for strong recent selection (Burki & Kaessmann, 2004 ; Evans et al., 2005 ; Mekel-Bobrov et al., 2005 ; Uddin et al., 2008 ; Wang et al., 2008 ). This does not, of course, mean that Neanderthals were, as a result, in any sense intellectually primitive: it simply means they were not yet quite in the same league as modern humans, and this necessarily has consequences for what they could accomplish in social, cultural, and ecological terms.

On a more general note, human evolution provides a framework within which modern human behavior and cognition can be understood. It can tell us why we ended up the way we are, and so provide insights into the design, and perhaps flexibility, of the human mind. The importance of this historical framework is frequently overlooked in psychology, with its emphasis on the mechanisms and development of behavior in the here and now. Asking how and why we got to be the way we are can tell us a great deal about those mechanisms, especially when seen against a background of primate cognitive and social evolution. And it should remind us, above all, that human social evolution, like that of all primates, is not simply about individual traits but about how these traits enable us to live in an extensive, complex, highly dynamic social world.

  • Abbott, D. H. , Keverne, E. B. , Moore, G. F. , & Yodyinguad, U. (1986). Social suppression of reproduction in subordinate talapoin monkeys, Miopithecus talapoin . In J. Else & P. C. Lee (Eds.), Primate ontogeny (pp. 329–341). Cambridge, U.K.: Cambridge University Press.
  • Adolphs, R. (1999). Social cognition and the human brain. Trends in Cognitive Science , 3 , 469–479.
  • Aiello, L. C. , & Dunbar, R. I. M. (1993). Neocortex size, group size and the evolution of language. Current Anthropology, 34 :184–193.
  • Apperly, I. A. (2012). What is “theory of mind”? Concepts, cognitive processes and individual differences. Quarterly Journal of Experimental Psychology , 65 , 825–839.
  • Barham, L. (2010). A technological fix for “Dunbar’s dilemma”? In R. I. M. Dunbar , C. Gamble , & J. A. J. Gowlett (Eds.), Social brain, distributed mind (pp. 371–394). Oxford: Oxford University Press.
  • Barr, C. S. , Schwandt, M. L. , Lindell, S. G. , Higley, J. D. , Maestripieri, D. , Goldman, D. , et al. (2008). Variation at the mu-opioid receptor gene (OPRM1) influences attachment behavior in infant primates. Proceedings of the National Academy of Sciences, USA , 105 , 5277–5281.
  • Barton, R. A. (1996). Neocortex size and behavioural ecology in primates. Proceedings of the Royal Society, London , 263B , 173–177.
  • Barton, R. A. , & Dunbar, R. I. M. (1997). Evolution of the social brain. In A. Whiten & R. Byrne (Eds.), Machiavellian intelligence II (pp. 240–263). Cambridge, U.K.: Cambridge University Press.
  • Behrens, T. E. J. , Hunt, L. T. , Woolrich, M. W. , & Rushworth, M. F. S. (2008). Associative learning of social value. Nature , 456 , 245–250.
  • Bergman, T. J. , Beehner, J. C. , Cheney, D. L. , & Seyfarth, R. M. (2003). Hierarchical classification by rank and kinship in baboons. Science , 302 , 1234–1236.
  • Blakemore, S.-J. , & Choudhury, S. (2006). Development of the adolescent brain: Implications for executive function and social cognition. Journal of Child Psychology and Psychiatry , 47 , 296–312.
  • Bowman, L. A. , Dilley, S. , & Keverne, E. B. (1978). Suppression of oestrogen-induced LH surges by social subordination in talapoin monkeys. Nature , 275 , 56–58.
  • Broad, K. D. , Curley, J. P. , & Keverne, E. B. (2006). Mother-infant bonding and the evolution of mammalian social relationships. Philosophical Transactions of the Royal Society , London , 361B, 2199–2214.
  • Brothers, L. (1990). The social brain: A project for integrating primate behaviour and neurophysiology in a new domain. Concepts in Neuroscience , 1 , 27–51.
  • Bruner, E. (2010). Morphological differences in the parietal lobes within the human genus: A neurofunctional perspective. Current Anthropology , 51 , S77–S88.
  • Bruner, E. , Manzi, G. , & Arsuaga, J. L. (2003). Encephalization and allometric trajectories in the genus Homo : Evidence from the Neandertal and modern lineages. Proceedings of the National Academy of Sciences , USA , 100 , 15335–15340.
  • Burki, F. & Kaessmann, H. (2004). Birth and adaptive evolution of a hominoid gene that supports high neurotransmitter flux. Nature Genetics , 36 , 1061–1063.
  • Bush, E. C. , & Allman, J. M. (2004). The scaling of frontal cortex in primates and carnivores. Proceedings of the National Academy of Sciences, USA , 101 , 3962–3966.
  • Byrne, R. W. , & Corp, N. (2004). Neocortex size predicts deception rate in primates. Proceedings of the Royal Society, London , 271B , 1693–1699.
  • Byrne, R. W. , & Whiten, A . (Eds.). (1988). Machiavellian intelligence . Oxford: Oxford University Press.
  • Byrne, R. W. , & Whiten, A. (1992). Cognitive evolution in primates: Evidence from tactical deception. Man , 27 , 609–627.
  • Carrington, S. J. , & Bailey, A. J. (2009). Are there Theory of Mind regions in the brain? A review of the neuroimaging literature. Human Brain Mapping , 30 , 2313–2335.
  • Cartmill, E. A. , & Byrne, R. B. (2007). Orangutans modify their gestural signaling according to their audience’s comprehension. Current Biology , 17 , 1–4.
  • Chelnokova, O. , Laeng, B. , Eikemo, M. , Riegels, J. , Løseth, G. , Maurud, H. , et al. (2014). Rewards of beauty: The opioid system mediates social motivation in humans. Molecular Psychiatry , 19 , 746–747.
  • Cohen, E. , Ejsmond-Frey, R. , Knight, N. , & Dunbar, R. I. M. (2010). Rowers’ high: Behavioural synchrony is correlated with elevated pain thresholds. Biology Letters , 6 , 106–108.
  • Cole, J. N. (2012). The Identity Model: A theory to access visual display and hominin cognition within the Palaeolithic. Human Origins , 1 , 24–40.
  • Costa, A. , Torriero, S. , Olivieri, M. & Caltagirone, C. (2008). Prefrontal and temporo-parietal involvement in taking others’ perspective: TMS evidence. Behavioral Neurology , 19 , 71–74.
  • Craig, A. D. (2009). How do you feel—now? The anterior insula and human awareness. Nature Reviews Neuroscience , 10 , 59–70.
  • Crockford, C. , Deschner, T. , Ziegler, T. E. , & Wittig, R. M. (2014). Endogenous peripheral oxytocin measures can give insight into the dynamics of social relationships: A review. Frontiers in Behavioral Neuroscience , 8 , 68.
  • Crockford, C. , Wittig, R. M. , Langergraber, K. , Ziegler, T. , Zuberbühler, K. , & Deschner, T. (2013). Urinary oxyotcin and social bonding in related and unrelated chimpanzees. Proceedings of the Royal Society, London , 280B , 20122765.
  • Crockford, C. , Wittig, R. M. , Mundry, R. , & Zuberbühler, K. (2012). Wild chimpanzees inform ignorant group members of danger. Current Biology , 22 , 142–146.
  • Curley, J. P. & Keverne, E. B. (2005). Genes, brains and mammalian social bonds. Trends in Ecology and Evolution , 20 , 561–567.
  • Curry, O. , & Dunbar, R. I. M. (2013). Do birds of a feather flock together? The relationship between similarity and altruism in social networks. Human Nature , 24 , 336–347.
  • Curry, O. , Roberts, S. B. G. , & Dunbar, R. I. M. (2013). Altruism in social networks: Evidence for a “kinship premium.” Brit. J. Psychol. , 104 , 283–295.
  • Datta, S. (1983). Relative power and the acquisition of rank. In R. A. Hinde (Ed.) Primate social relationships (pp. 103–112). Oxford: Blackwells.
  • Dávid-Barrett, T. , & Dunbar, R. I. M. (2013). Processing power limits social group size: Computational evidence for the cognitive costs of sociality. Proceedings of the Royal Society, London , 280B , 20131151.
  • Deaner, R. O. , Isler, K. , Burkart, J. , & van Schaik, C. P. (2007). Overall brain size, and not encephalisation quotient, best predicts cognitive ability across non-human primates. Brain, Behavior and Evolution , 70, 115–124.
  • Deaner, R. O. , van Schaik, C. P. , & Johnson, V. E. (2006). Do some taxa have better domain-general cognition than others? A meta-analysis of nonhuman primate studies. Evolutionary Psychology , 4 , 149–196.
  • Deeley, Q. , Daly, E. , Asuma, R. , Surguladze, S. , Giampietro, V. , Brammer, M. , et al. (2008). Changes in male brain responses to emotional faces from adolescence to middle age. NeuroImage , 40 , 389–397.
  • Depue, R. A. , & Morrone-Strupinsky, J. V. (2005). A neurobehavioral model of affiliative bonding: implications for conceptualizing a human trait of affiliation. Behavioral and Brain Sciences , 28 , 313–395
  • Dobson, S. D. (2009). Socioecological correlates of facial mobility in nonhuman anthropoids. American Journal of Physical Anthropology , 139 , 413–420.
  • Dougherty, R. F. , Koch, V. M. , Brewer, A. A. , Fischer, B. , Modersitzki, J. , & Wandell, B. A. (2003). Visual field representations and locations of visual areas V1/2/3 in human visual cortex. Journal of Vision , 3 , 586–598.
  • Dunbar, R. I. M. (1980). Determinants and evolutionary consequences of dominance among female gelada baboons. Behavioral Ecology and Sociobiology , 7 , 253–265.
  • Dunbar, R. I. M. (1988). Primate social systems . London: Chapman & Hall.
  • Dunbar, R. I. M. (1991). Functional significance of social grooming in primates. Folia Primatologica , 57 , 121–131.
  • Dunbar, R. I. M. (1992). Neocortex size as a constraint on group size in primates. Journal of Human Evolution , 22 , 469–493.
  • Dunbar, R. I. M. (1993). Coevolution of neocortex size, group size and language in humans. Behavioral and Brain Sciences , 16 , 681–735.
  • Dunbar, R. I. M. (1995a). The mating system of Callitrichid primates. I. Conditions for the coevolution of pairbonding and twinning. Animal Behaviour , 50 , 1057–1070.
  • Dunbar, R. I. M. (1995b). The mating system of Callitrichid primates. II. The impact of helpers. Animal Behaviour , 50 , 1071–1089.
  • Dunbar, R. I. M. (1998). The social brain hypothesis. Evolutionary Anthropology , 6 , 178–190.
  • Dunbar, R. I. M. (2008). Mind the gap: Or why humans aren’t just great apes. Proceedings of the. British Academy , 154 , 403–423.
  • Dunbar, R. I. M. (2009). Why only humans have language. In R. Botha & C. Knight (Eds.) The prehistory of language (pp. 12–35). Oxford: Oxford University Press.
  • Dunbar, R. I. M. (2010a). Brain and behaviour in primate evolution. In P. H. Kappeler & J. Silk (Eds.), Mind the gap: Tracing the origins of human universals (pp. 315–330). Berlin: Springer.
  • Dunbar, R. I. M. (2010b). Deacon’s dilemma: The problem of pairbonding in human evolution. In R. I. M. Dunbar , C. Gamble , & J. A. J. Gowlett (Eds.), Social brain, distributed mind (pp. 159–179). Oxford: Oxford University Press.
  • Dunbar, R. I. M. (2010c). The social role of touch in humans and primates: Behavioural function and neurobiological mechanisms. Neuroscience and Biobehavioral Reviews , 34 , 260–268.
  • Dunbar, R. I. M. (2011a). Evolutionary basis of the social brain. In J. Decety & J. Cacioppo (Eds.), Oxford handbook of social neuroscience (pp. 28–38). Oxford: Oxford University Press.
  • Dunbar, R. I. M. (2011b). Constraints on the evolution of social institutions and their implications for information flow. Journal of Institutional Economics , 7 , 345–371.
  • Dunbar, R. I. M. (2012a). Bridging the bonding gap: The transition from primates to humans. Philosophical Transactions of the Royal Society, London , 367B , 1837–1846.
  • Dunbar, R. I. M. (2012b). Social cognition on the internet: testing constraints on social network size. Philosophical Transactions of the Royal Society, London , 367B , 2192–2201.
  • Dunbar, R. I. M. (2014). The social brain: psychological underpinnings and implications for the structure of organizations. Current Directions in Psychological Science , 24 , 109–114.
  • Dunbar, R. I. M. (2015). Human evolution . London: Pelican.
  • Dunbar, R. I. M. , & Shultz, S. (2007). Understanding primate brain evolution. Philosophical Transactions of the Royal Society, London , 362B , 649–658.
  • Dunbar, R. I. M. , & Shultz, S. (2010). Bondedness and sociality. Behaviour , 147 , 775–803.
  • Dunbar, R. I. M. , Arnaboldi, V. , Conti, M. , & Passarella, A. (2015). The structure of online social networks mirror those in the offline world. Social Networks , 43, 39–47.
  • Dunbar, R. I. M. , Kaskatis, K. , MacDonald, I. , & Barra, V. (2012a). Performance of music elevates pain threshold and positive affect. Evolutionary Psychology , 10 , 688–702.
  • Dunbar, R. I. M. , Baron, R. , Frangou, A. , Pearce, E. , van Leeuwen, E. J. C. , Stow, J. , et al. (2012b). Social laughter is correlated with an elevated pain threshold. Proceedings of the Royal Society, London , 279B , 1161–1167.
  • Dunbar, R. I. M. , McAdam, M. , & O’Connell, S. (2005). Mental rehearsal in great apes and humans. Behavioural Processes , 69 , 323–330.
  • Evans, P. D. , Gilbert, S. L. , Mekel-Bobrov, N. , Vallender, E. J. , Anderson, J. R. , Vaez-Azizi, L. M. , et al. (2005). Microcephalin, a gene regulating brain size, continues to evolve adaptively in humans. Science , 309 , 1717–1720.
  • Fehr, E. , Bernhard, H. , & Rockenbach, B. (2008). Egalitarianism in young children. Nature , 454 , 1079–1083.
  • Finlay, B. L. , Darlington, R. B. , & Nicastro, N. (2001). Developmental structure in brain evolution. Behavioral and Brain Sciences , 24 , 263–308.
  • Gallagher, H. L. , & Frith, C. D. (2003). Functional imaging of “theory of mind.” Trends in Cognitive Sciences , 7 , 77–83.
  • Gamble, C. , Gowlett, J. A. J. , & Dunbar, R. I. M. (2011). The social brain and the shape of the Palaeolithic. Cambridge Archaeological Journal , 21 , 115–135.
  • Gärdenfors, P. (2012). The cognitive and communicative demands of cooperation. In J. van Eijck , M. van Hees , & L. Verbrugge (Eds.), Games, Actions, and Social Software (pp. 164–183). Berlin: Springer.
  • Gogtay, N. , Giedd, J. N. , Lusk, L. , Hayashi, K. M. , Greenstein, D. , Vaituzis, A. C. , et al. (2004). Dynamic mapping of human cortical development during childhood through early adulthood. Proceedings of the National Academy of Sciences , USA , 101 , 8174–8179.
  • Gowlett, J. A. J. (2006). The elements of design form in Acheulean bifaces: modes, modalities, rules and language. In N. Goren-Inbar and G. Sharon (Eds.), Axe age: Acheulian tool-making from quarry to discard (pp. 203–221). London: Equinox.
  • Gowlett, J. A. J. , Gamble, C. , & Dunbar, R. I. M. (2012). Human evolution and the archaeology of the social brain. Current Anthropology , 53 , 693–722.
  • Granovetter, M. S. (1973). The strength of weak ties. American Journal of Sociology , 78 , 1360–1380.
  • Hamilton, M. J. , Milne, B. T. , Walker, R. S. , Burger, O. , & Brown, J. H. (2007). The complex structure of hunter-gatherer social networks. Proceedings of the Royal Society, London , 274B, 2195–2202.
  • Harcourt, A. H. (1992). Coalitions and alliances: Are primates more complex than non-primates? In A. H. Harcourt & F. B. M. de Waal (Eds.), Coalitions and Alliances in Humans and Other Animals (pp. 445–472). Oxford: Oxford University Press.
  • Hare, B. , Call, J. , Agnetta, B. , & Tomasello, M. (2000). Chimpanzees know what conspecifics do and do not see. Animal Behaviour , 59 , 771–785.
  • Hare, B. , Call, J. , & Tomasello, M. (2001). Do chimpanzees know what conspecifics know? Animal Behaviour , 61 , 139–151.
  • Henzi, S. P. , de Sousa Pereira, L. , Hawker-Bond, D. , Stiller, J. , Dunbar, R. I. M. , & Barrett, L. (2007). Look who’s talking: Developmental trends in the size of conversational cliques. Evolution and Human Behavior , 28 , 66–74.
  • Herrmann, E. , Call, J. , Hernandez-Lloreda, M. V. , Hare, B. , & Tomasello, M. (2007). Humans have evolved specialized skills of social cognition: The cultural intelligence hypothesis. Science , 317 , 1360–1366.
  • Hill, R. A. , Bentley, A. , & Dunbar, R. I. M. (2008). Network scaling reveals consistent fractal pattern in hierarchical mammalian societies. Biology Letters , 4 , 748–751.
  • Hill, R. A. , Lycett, J. , & Dunbar, R. I. M. (2000). Ecological determinants of birth intervals in baboons. Behavioral Ecology , 11 , 560–564.
  • Humphrey, N. K. (1976). The social function of intellect. In P. P. G. Bateson & R. A. Hinde (Eds.), Growing Points in Ethology (pp. 303–317). Cambridge, U.K.: Cambridge University Press.
  • Insel, T. R. , & Young, L. J. (2001). The neurobiology of attachment. Nature Reviews Neuroscience , 2 , 129–136.
  • Joffe, T. H. (1997). Social pressures have selected for an extended juvenile period in primates. Journal of Human Evolution , 32 , 593–605.
  • Joffe, T. , & Dunbar, R. I. M. (1997). Visual and socio-cognitive information processing in primate brain evolution. Proceedings of the Royal Society, London , 264B , 1303–1307.
  • Jolly, A. (1969). Lemur social behaviour and primate intelligence. Science , 153 , 501–506.
  • Kanai, R. , Bahrami, B. , Roylance, R. , & Rees, G. (2012). Online social network size is reflected in human brain structure. Proceedings of the Royal Society, London , 279B , 1327–1334.
  • Karton, I. , & Bachmann, T. (2011). Effect of prefrontal transcranial magnetic stimulation on spontaneous truth-telling. Behavioural Brain Research , 225 , 209–214.
  • Keverne, E. B. , Martensz, N. D. , & Tuite, B. (1989). Beta-endorphin concentrations in cerebrospinal fluid of monkeys are influenced by grooming relationships. Psychoneuroendocrinology , 14 , 155–161.
  • Kinderman, P. , Dunbar, R. I. M. , & Bentall, R. P. (1998). Theory-of-mind deficits and causal attributions. British Journal of Psychology , 89 , 191–204.
  • Klein, R. G. (1999). The human career: Human behavior and cultural origins . Chicago: University of Chicago Press.
  • Knoch, D. , Pascual-Leone, A. , Meyer, K. , Treyer, V. , & Fehr, E. (2006). Diminishing reciprocal fairness by disrupting the right prefrontal cortex, Science , 314 , 829–832.
  • Koenigs, M. , Young, L. , Adolphs, R. , Tranel, D. , Cushman, F. , Hauser, M. , & Damasio, A. (2007). Damage to the prefrontal cortex increases utilitarian moral judgements. Nature , 446 , 908–911.
  • Kolb, B. , & Wishaw, I. Q. (1996). Fundamentals of human neuropsychology . San Francisco: W. H. Freeman.
  • Kosfeld, M. , Heinrichs, M. , Zak, P. J. , Fischbacher, U. , & Fehr, E. (2005). Oxytocin increases trust in humans. Nature , 435 , 673–676.
  • Krause, J. , & Ruxton, G. (2002). Living in groups . Oxford: Oxford University Press.
  • Kudo, H. , & Dunbar, R. I. M. (2001). Neocortex size and social network size in primates. Animal Behaviour , 62 , 711–722.
  • Kummer, H. (1982). Social knowledge in free-ranging primates. In D. Griffin (Ed.), Animal mind—human mind (pp. 113–130). Berlin: Springer.
  • Lebreton, M. , Barnes, A. , Miettunen, J. , Peltonen, L. , Ridler, K. , Viola, J. , et al. (2009). The brain structural disposition to social interaction. European Journal of Neuroscience , 29 , 2247–2252.
  • Lee, K.-H. , Farrow, T. F. D. , Spence, S. A. , & Woodruff, P. W. R. (2004). Social cognition, brain networks and schizophrenia. Psychological Medicine , 34 , 391–400.
  • Lehmann, J. , Andrews, K. , & Dunbar, R. I. M. (2009). Social networks and social complexity in female-bonded primates. In R. I. M. Dunbar , C. Gamble , & J. A. J. Gowlett (Eds.), Social brain, distributed mind (pp. 57–83). Oxford: Oxford University Press.
  • Lehmann, J. , & Dunbar, R. I. M. (2009). Network cohesion, group size and neocortex size in female-bonded Old World primates. Proceedings of the Royal Society, London , 276B , 4417–4422.
  • Lehmann, J. , Korstjens, A. H. , & Dunbar, R. I. M. (2007). Group size, grooming and social cohesion in primates. Animal Behaviour , 74 , 1617–1629.
  • Lewis, P. A. , Birch, A. , Hall, A. , & Dunbar, R. I. M. (forthcoming). Higher order intentionality tasks are cognitively more demanding.
  • Lewis, P. A. , Rezaie, R. , Browne, R. , Roberts, N. , & Dunbar, R. I. M. (2011). Ventromedial prefrontal volume predicts understanding of others and social network size. NeuroImage , 57 , 1624–1629.
  • MacDonald, A. W. , Cohen, J. D. , Stenger, V. A. , & Carter, C. S. (2000). Dissociating the role of the dorsolateral prefrontal and anterior cingulate cortex in cognitive control. Science , 288 , 1835–1838.
  • Machin, A. , & Dunbar, R. I. M. (2011). The brain opioid theory of social attachment: A review of the evidence. Behaviour , 148 , 985–1025.
  • Mackinnon, J. (1974). The behaviour and ecology of wild orang-utans ( Pongo pygmaeus ). Animal Behaviour , 22 , 3–74.
  • Makinodan, M. , Rosen, K. M. , Ito, S. , & Corfas, G. (2012). A critical period for social experience-dependent oligodendrocyte maturation and myelination. Science , 337 , 1357–1360.
  • Mars, R. , Neubert, F.-X. , Verhagen, L. , Sallet, J. , Miller, K. , Dunbar, R. I. M. , & Barton, R. (2014). Primate comparative neuroscience using magnetic resonance imaging: Promises and challenges. Frontiers in Neuroscience , 8 , 289.
  • Massen, J. J. M. , Sterck, E. H. M. , & de Vos, H. (2010). Close social associations in animals and humans: functions and mechanisms of friendship. Behaviour , 147 , 1379–1412.
  • McComb, K. , & Semple, S. (2005). Coevolution of vocal communication and sociality in primates. Biology Letters , 1 , 381–385.
  • McNally, L. , Brown, S. P. , & Jackson, A. L. (2012). Cooperation and the evolution of intelligence. Proceedings of the Royal Society, London , 279B , 3027–3034.
  • Mekel-Bobrov, N. , Gilbert, S. L. , Evans, P. D. , Vallender, E. J. , Anderson, J. R. , Hudson, R. R. , et al. (2005). Ongoing adaptive evolution of ASPM, a brain size determinant in Homo sapiens. Science , 309 , 17201722.
  • Miritello, G. , Moro, E. , Lara, R. , Martínez-López, R. , Belchamber, J. , Roberts, S. B. G. , & Dunbar, R. I. M. (2013). Time as a limited resource: Communication strategy in mobile phone networks. Social Networks , 35 , 89–95.
  • Moreira, J. A. , Pachero, J. M. & Santos, F. C. (2013). Evolution of collective action in adaptive social systems. Scientific Reports 3:1521.
  • Morrison, I. (2012). CT afferents. Current Biology , 22 , R77–R78.
  • Nithianantharajah, J. , Komiyama, N. H. , McKechanie, A. , Johnstone, M. , Blackwood, D. H. , St Clair, D. , et al. (2012). Synaptic scaffold evolution generated components of vertebrate cognitive complexity. Nature Neuroscience , 16 , 16–24.
  • Nummenmaa, L. , Manninen, S. , Tuominen, L. , Hirvonen, J. , Kalliokoski, K. K. , Nuutila, P. , et al. (in press). Adult attachment style is associated with cerebral μ ‎-opioid receptor availability in humans. Human Brain Mapping .
  • Nummenmaa, L. , Tuominen, L. , Dunbar, R. I. M. , Hirvonen, J. , Manninen, S. , Arponen, E. , et al. (forthcoming). Reinforcing social bonds by touching modulates endogenous µ-opioid system activity in humans.
  • O’Connell, S. , & Dunbar, R. I. M. (2003). A test for comprehension of false belief in chimpanzees. Evolution and Cognition , 9 , 131–139.
  • Olausson, H. , Wessberg, J. , Morrison, I. , McGlone, F. , & Vallbo, A .(2010). The neurophysiology of unmyelinated tactile afferents. Neuroscience and Biobehavioral Reviews , 34 , 185–191.
  • O’Malley, A. J. , Arbesman, S. , Miller Steiger, D. , Fowler, J. H. , & Christakis, N. A. (2014). Egocentric social network structure, health, and pro-social behaviors in a national panel study of Americans. PLoS-One, 7 , e36250.
  • Opie, C. , Atkinson, Q. , Dunbar, R. I. M. , & Shultz, S. (2013). Male infanticide leads to social monogamy in primates. Proceedings of the National Academy of Sciences, USA , 110 , 13328–13332.
  • van Overwalle, F. (2009). Social cognition and the brain a meta-analysis. Human Brain Mapping , 30 , 829–858.
  • Panksepp, J. , Nelson, E. , & Bekkedal, M. (1997). Brain systems for the mediation of social separation-distress and social-reward: Evolutionary antecedents and neuropeptide intermediaries. Annals of the New York Academy of Sciences , 807 , 78–100.
  • Pasquaretta, C. , Levé, M. , Claidière, N. , van de Waal, E. , Whiten, A. , MacIntosh, A. J. J. , et al. (2015). Social networks in primates: smart and tolerant species have more efficient networks. Scientific Reports , 4 , 7600.
  • Passingham, R. E. , & Wise, S. P. (2012). The neurobiology of the prefrontal cortex: Anatomy, evolution and the origin of insight . Oxford: Oxford University Press.
  • Pawłowski, B. P. , Lowen, C. B. , & Dunbar, R. I. M. (1998). Neocortex size, social skills and mating success in primates. Behaviour , 135 , 357–368.
  • Pearce, E. , & Bridge, H. (2013). Does orbital volume index eyeball and visual cortical volumes in humans? Annals of Human Biology , 40, 531–540.
  • Pearce, E. , & Dunbar, R. I. M. (2012). Latitudinal variation in light levels drives human visual system size. Biology Letters , 8 , 90–93.
  • Pearce, E. , Stringer, C. , & Dunbar, R. I. M. (2013). New insights into differences in brain organisation between Neanderthals and anatomically modern humans. Proceedings of the Royal Society, London , 280B , 1471–1481.
  • Pérez-Barbería, J. , Shultz, S. , & Dunbar, R. I. M. (2007). Evidence for intense coevolution of sociality and brain size in three orders of mammals. Evolution , 61 , 2811–2821.
  • Powell, J. L. , Kemp, G. J. , Dunbar, R. I. M. , Roberts, N. , Sluming, V. , & García-Fiñana, M. (2014). Different association between intentionality competence and prefrontal volume in left- and right-handers. Cortex , 54 , 63–76.
  • Powell, J. , Lewis, P. , Dunbar, R. I. M. , García-Fiñana, M. , & Roberts, N. (2010). Orbital prefrontal cortex volume correlates with social cognitive competence. Neuropsychologia , 48 , 3554–3562.
  • Powell, J. , Lewis, P. A. , Roberts, N. , García-Fiñana, M. , & Dunbar, R. I. M. (2012). Orbital prefrontal cortex volume predicts social network size: an imaging study of individual differences in humans. Proceedings of the Royal Society, London , 279B , 2157–2162.
  • Reader, S. M. , Hager, Y. , & Laland, K. N. (2011). The evolution of primate general and cultural intelligence. Philosophical Transactions of the Royal Society, London , 366B , 1017–1027.
  • Reader, S. M. , & Laland, K. N. (2002). Social intelligence, innovation, and enhanced brain size in primates. Proceedings of the National Academy of Sciences, USA , 99 , 4436–4441.
  • Renfrew, C. , & Zubrow, E. B . (Eds.). (1994). The ancient mind: Elements of cognitive archaeology . Cambridge, U.K.: Cambridge University Press.
  • Resendez, S. L. , Dome, M. , Gormley, G. , Franco, D. , Nevárez, N. , Hamid, A. A. , & Aragona, B. J. (2013). μ ‎-Opioid receptors within subregions of the striatum mediate pair bond formation through parallel yet distinct reward mechanisms. Journal of Neuroscience , 33 , 9140–9149.
  • Roberts, S. B. G. , & Dunbar, R. I. M. (2011). The costs of family and friends: An 18-month longitudinal study of relationship maintenance and decay. Evolution and Human Behavior , 32 , 186–197.
  • Roberts, S. B. G. , Arrow, H. , Lehmann, J. , & Dunbar, R. I. M. (2014). Close social relationships: an evolutionary perspective. In: R. I. M. Dunbar , C. Gamble & J. A. J. Gowlett (Eds.) Lucy to language: The benchmark papers (pp. 151–180). Oxford: Oxford University Press.
  • Roberts, S.-J. , & Cords, M. (2013). Group size but not dominance rank predicts the probability of conception in a frugivorous primate. Behavioral Ecology and Sociobiology , 67 , 1995–2009.
  • Roca, M. , Parr, A. , Thompson, R. , Woolgar, A. , Torralva, T. , Antoun, N. , et al. (2010). Executive function and fluid intelligence after frontal lobe lesions. Brain, 133 , 234–247.
  • Rosenquist, J. N. , Fowler, J. H. , & Christakis, N. A. (2010). Social network determinants of depression. Molecular Psychiatry , 15 , 1197–1197.
  • Roth, D. , & Leslie, A. M. (1998). Solving belief problems: Toward a task analysis. Cognition , 66 , 1–31.
  • Rushworth, M. F. , Mars, R. B. , & Sallet, J. (2013). Are there specialized circuits for social cognition and are they unique to humans? Current Opinion in Neurobiology , 23 , 436–442.
  • Sallet, J. , Mars, R. B. , Noonan, M. P. , Andersson, J. L. , O’Reilly, J. X. , Jbabdi, S. , et al. (2011). Social network size affects neural circuits in macaques. Science , 334 , 697–700.
  • Sallet, J. , Mars, R. B. , Noonan, M. P. , Neubert, F. X. , Jbabdi, S. , O’Reilly, J. X. , et al. (2013). The organization of dorsal prefrontal cortex in humans and macaques. Journal of Neuroscience , 33 , 12255–12274.
  • Samson, D. , Apperly, I. A. , Chiavarino, C. , & Humphreys, G. W. (2004). Left temporoparietal junction is necessary for representing someone else’s belief. Nature Neuroscience , 7 , 499–500.
  • van Schaik, C. P. (1983). Why are diurnal primates living in groups. Behaviour , 87 , 120–144.
  • Seeley, W. W. , Menon, V. , Schatzberg, A. F. , Keller, J. , Glover, G. H. , et al. (2007). Dissociable intrinsic connectivity networks for salience processing and executive control. Journal of Neuroscience , 27 , 2349–2356.
  • Shultz, S. , & Dunbar, R. I. M. (2007). The evolution of the social brain: Anthropoid primates contrast with other vertebrates. Proceedings of the Royal Society, London , 274B , 2429–2436.
  • Shultz, S. , & Dunbar, R. I. M. (2010a). Encephalisation is not a universal macroevolutionary phenomenon in mammals but is associated with sociality. Proceedings of the National Academy of Sciences, USA , 107 , 21582–21586.
  • Shultz, S. , & Dunbar, R. I. M. (2010b). Species differences in executive function correlate with hippocampus volume and neocortex ratio across non-human primates. Journal of Comparative Psychology , 124 , 252–260.
  • Shultz, S. , & Dunbar, R. I. M. (2010c). Social bonds in birds are associated with brain size and contingent on the correlated evolution of life-history and increased parental investment. Biological Journal of the Linnaean Society , 100 , 111–123.
  • Shultz, S. , & Finlayson, L. V. (2010). Large body and small brain and group sizes are associated with predator preferences for mammalian prey. Behavioral Ecology , 21 , 1073–1079.
  • Shultz, S. , Noe, R. , McGraw, S. , & Dunbar, R. I. M. (2004). A community-level evaluation of the impact of prey behavioural and ecological characteristics on predator diet composition. Proceedings of the Royal Society, London , 271B , 725–732.
  • Shultz, S. , Opie, C. , & Atkinson, Q. D. (2011). Stepwise evolution of stable sociality in primates. Nature , 479 , 219–222.
  • Silk, J. B. (2002). Using the “F”-word in primatology. Behaviour , 139 , 421–446.
  • Silk, J. B. , Alberts, S. C. , & Altmann, J. (2003). Social bonds of female baboons enhance infant survival. Science , 302 , 1232–1234.
  • Silk, J. B. , Beehner, J. C. , Bergman, T. J. , Crockford, C. , Engh, A. L. , Moscovice, L. R. , et al. (2009). The benefits of social capital: Close social bonds among female baboons enhance offspring survival. Proceedings of the Royal Society, London , 276B , 3099–3104.
  • Smuts, B. B. , & Nicholson, N. (1989). Dominance rank and reproduction in female baboons. American Journal of Primatology , 19 , 229–246.
  • Sowell, E. R. , Peterson, B. A. , Thompson. P. M. , Welcome, S. E. , Henkenius, A. L. , & Toga, A. W. (2003). Mapping cortical change across the human life span. Nature Neuroscience , 6 , 309–315.
  • Sowell, E. R. , Thompson, P. M. , Tessner, K. D. , & Toga, A. W. (2001). Mapping continued brain growth and gray matter density reduction in dorsal frontal cortex: Inverse relationships during postadolescent brain maturation. Journal of Neuroscience , 21 , 8819–8829.
  • Sperber, D. , & Wilson, D. (1986) R elevance: Communication and cognition . Oxford: Blackwell.
  • Stephan, H. , Frahm, H. , & Baron, G. (1981). New and revised data on volumes of brain structures in insectivores and primates. Folia Primatologica , 35 , l–29.
  • Stiller, J. , & Dunbar, R. I. M. (2007). Perspective-taking and memory capacity predict social network size. Social Networks, 29 , 93–104.
  • Sutcliffe, A. J. , Dunbar, R. I. M. , Binder, J. , & Arrow, H. (2012). Relationships and the social brain: Integrating psychological and evolutionary perspectives. British Journal of Psychology , 103 , 149–168.
  • Uddin, M. , Goodman, M. , Erez, O. , Romero, R. , Liu, G. , Islam, M. , et al. (2008). Distinct genomic signatures of adaptation in pre- and postnatal environments during human evolution. Proceedings of the National Academy of Sciences, USA , 105 , 3215–3220.
  • Vogeley, K. , Bussfeld, P. , Newen, A. , Herrmann, S. , Happé, F. , Falkai, P. , et al. (2001). Mind reading: Neural mechanisms of theory of mind and self-perspective. NeuroImage , 14 , 170–181.
  • Vrontou, S. , Wong, A. M. , Rau, K. K. , Koerber, H. R. , & Anderson, D. J. (2013). Genetic identification of C fibres that detect massage-like stroking of hairy skin in vivo. Nature , 493 , 669–673.
  • Wang, J.-K. , Li, Y. , & Su, B. (2008). A common SNP of MCPH1 is associated with cranial volume variation in Chinese population. Human Molecular Genetics , 17 , 1329–1335.
  • van Wimersma Greidanus, B. , van de Brug, F. , de Bruijckere, L. M. , Pabst, P. H. , Ruesink, R. W. , Hulshof, R. L. E. , et al. (1988). Comparison of bombesin-, ACTH-, and P-endorphin-induced grooming antagonism by haloperidol, naloxone, and neurotensin. Annals of the New York Academy of Sciences , 525 , 219–227.
  • Wittig, R. M. , Crockford, C. , Lehmann, J. , Whitten, P. L. , Seyfarth, R. M. , & Cheney, D. L. (2008). Focused grooming networks and stress alleviation in wild female baboons. Hormones and Behavior , 54 , 170–177.
  • Woolgar, A. , Parra, A. , Cusack, R. , Thompson, R. , Nimmo-Smith, I. , Torralva, T. , et al. (2010). Fluid intelligence loss linked to restricted regions of damage within frontal and parietal cortex. Proceedings of the National Academy of Sciences, USA , 107 , 14899–14902.
  • Yamada, M. , Hirao, K. , Namiki, C. , Hanakawa, T. , Fukuyama, H. , Hayashi, T. , & Murai, T. (2007). Social cognition and frontal lobe pathology in schizophrenia: A voxel-based morphometric study. NeuroImage , 35 , 292–298.
  • Yan, T. , Jin, F. , & Wu, J. (2009). Correlated size variations measured in human visual cortex V1/V2/V3 with functional MRI. Brain Informatics , 5819 , 36–44.
  • Zhou, W.-X. , Sornette, D. , Hill, R. A. , & Dunbar, R. I. M. (2005). Discrete hierarchical organization of social group sizes. Proceedings of the Royal Society, London, 272B , 439–444.
  • Zilhão, J. , Angelucci, D. E. , Badal-García, E. , d’Errico, F. , Daniel, F. , Dayet, L. , et al. (2010). Symbolic use of marine shells and mineral pigments by Iberian Neandertals. Proceedings of the National Academy of Sciences , USA , 107 , 102

Related Articles

  • Social Psychology and Language
  • Social Categorization
  • Cognitive Consistency in Social Cognition
  • Biodiversity Metrics in Lifespan Developmental Methodology

Printed from Oxford Research Encyclopedias, Psychology. Under the terms of the licence agreement, an individual user may print out a single article for personal use (for details see Privacy Policy and Legal Notice).

date: 08 May 2024

  • Cookie Policy
  • Privacy Policy
  • Legal Notice
  • Accessibility
  • [66.249.64.20|81.177.180.204]
  • 81.177.180.204

Character limit 500 /500

IMAGES

  1. Best Example of How to Write a Hypothesis 2024

    what is the 3 parts of hypothesis

  2. What is Hypothesis? Functions- Characteristics-types-Criteria

    what is the 3 parts of hypothesis

  3. Hypothesis

    what is the 3 parts of hypothesis

  4. hypothesis types in research methodology

    what is the 3 parts of hypothesis

  5. How to Write a Hypothesis in 12 Steps 2024

    what is the 3 parts of hypothesis

  6. 🏷️ Formulation of hypothesis in research. How to Write a Strong

    what is the 3 parts of hypothesis

VIDEO

  1. Research Hypothesis: What, Why and How?

  2. 5 Cosmic Theories That'll Make You Question Reality! (Part 2)

  3. PRACTICAL RESEARCH 2

  4. The Differences Between Research Questions and Hypotheses (Kuba Glazek, Ph.D.)

  5. Hypothesis| School of Commerce

  6. ЮЛЯ НАУЧИТ Stress Free Revision

COMMENTS

  1. Hypothesis: Definition, Examples, and Types

    The hypothesis is a critical part of any scientific exploration. It represents what researchers expect to find in a study or experiment. In situations where the hypothesis is unsupported by the research, the research still has value. Such research helps us better understand how different aspects of the natural world relate to one another.

  2. Research Hypothesis: Definition, Types, Examples and Quick Tips

    The sign for a non-directional hypothesis is '≠.' 3. Simple hypothesis. A simple hypothesis is a statement made to reflect the relation between exactly two variables. One independent and one dependent. ... Variables are an essential part of any reasonable hypothesis. So, identify your independent and dependent variable(s) and form a ...

  3. What Is a Hypothesis and How Do I Write One?

    Hypotheses are one part of what's called the scientific method . Every (good) experiment or study is based in the scientific method. The scientific method gives order and structure to experiments and ensures that interference from scientists or outside influences does not skew the results.

  4. What Are the Elements of a Good Hypothesis?

    A hypothesis is an educated guess or prediction of what will happen. In science, a hypothesis proposes a relationship between factors called variables. A good hypothesis relates an independent variable and a dependent variable. The effect on the dependent variable depends on or is determined by what happens when you change the independent variable.

  5. On the scope of scientific hypotheses

    2. The scientific hypothesis. In this section, we will describe a functional and descriptive role regarding how scientists use hypotheses. Jeong & Kwon [] investigated and summarized the different uses the concept of 'hypothesis' had in philosophical and scientific texts.They identified five meanings: assumption, tentative explanation, tentative cause, tentative law, and prediction.

  6. Hypothesis

    hypothesis, something supposed or taken for granted, with the object of following out its consequences (Greek hypothesis, "a putting under," the Latin equivalent being suppositio ). Discussion with Kara Rogers of how the scientific model is used to test a hypothesis or represent a theory. Kara Rogers, senior biomedical sciences editor of ...

  7. What is and How to Write a Good Hypothesis in Research?

    An effective hypothesis in research is clearly and concisely written, and any terms or definitions clarified and defined. Specific language must also be used to avoid any generalities or assumptions. Use the following points as a checklist to evaluate the effectiveness of your research hypothesis: Predicts the relationship and outcome.

  8. How to Write a Hypothesis 101: A Step-by-Step Guide

    Step 3: Build the Hypothetical Relationship. In understanding how to compose a hypothesis, constructing the relationship between the variables is key. Based on your research question and variables, predict the expected outcome or connection.

  9. How to Write a Research Hypothesis

    A well-written hypothesis should predict the tested relationship and its outcome. It contains zero ambiguity and offers results you can observe and test. The research hypothesis should address a question relevant to a research area. Overall, your research hypothesis needs the following essentials: Hypothesis Essential #1: Specificity & Clarity

  10. What is a hypothesis?

    A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question. A hypothesis is not just a guess — it should be based on ...

  11. A Practical Guide to Writing Quantitative and Qualitative Research

    INTRODUCTION. Scientific research is usually initiated by posing evidenced-based research questions which are then explicitly restated as hypotheses.1,2 The hypotheses provide directions to guide the study, solutions, explanations, and expected results.3,4 Both research questions and hypotheses are essentially formulated based on conventional theories and real-world processes, which allow the ...

  12. Hypothesis Testing

    Hypothesis Testing Step 1: State the Hypotheses. In all three examples, our aim is to decide between two opposing points of view, Claim 1 and Claim 2. In hypothesis testing, Claim 1 is called the null hypothesis (denoted " Ho "), and Claim 2 plays the role of the alternative hypothesis (denoted " Ha ").

  13. Introduction to Hypothesis Testing

    Hypothesis testing is part of inference. Given a claim about a population, we will learn to determine the null and alternative hypotheses. We will recognize the logic behind a hypothesis test and how it relates to the P-value as well as recognizing type I and type II errors. These are powerful tools in exploring and understanding data in real-life.

  14. What is Hypothesis

    Functions of Hypothesis. Following are the functions performed by the hypothesis: Hypothesis helps in making an observation and experiments possible. It becomes the start point for the investigation. Hypothesis helps in verifying the observations. It helps in directing the inquiries in the right direction.

  15. How to Write a Hypothesis: Types and Tips to Remember

    2. Complex Hypothesis. A complex hypothesis entails the existence of a relationship between two or more variables. It can be two dependent variables and one independent variable or vice versa. 3. Null Hypothesis. A null hypothesis is a statement that states that the variables have no relationship. 4.

  16. The Structure of Scientific Theories

    The whole science of language, consisting of the three parts mentioned, is called semiotic. (1942, 9; see also Carnap 1939, 3-5, 16) ... One "meta" hypothesis is that a given philosophical analysis of theory structure tends to be associated with a perceived relationship among the three views here discussed.

  17. Null & Alternative Hypotheses

    When the research question asks "Does the independent variable affect the dependent variable?": The null hypothesis ( H0) answers "No, there's no effect in the population.". The alternative hypothesis ( Ha) answers "Yes, there is an effect in the population.". The null and alternative are always claims about the population.

  18. The Basics of an Experiment

    An experiment is a procedure designed to test a hypothesis as part of the scientific method. The two key variables in any experiment are the independent and dependent variables. The independent variable is controlled or changed to test its effects on the dependent variable. Three key types of experiments are controlled experiments, field ...

  19. 1.2

    Step 7: Based on Steps 5 and 6, draw a conclusion about H 0. If F calculated is larger than F α, then you are in the rejection region and you can reject the null hypothesis with ( 1 − α) level of confidence. Note that modern statistical software condenses Steps 6 and 7 by providing a p -value. The p -value here is the probability of getting ...

  20. What Is a Testable Hypothesis?

    Updated on January 12, 2019. A hypothesis is a tentative answer to a scientific question. A testable hypothesis is a hypothesis that can be proved or disproved as a result of testing, data collection, or experience. Only testable hypotheses can be used to conceive and perform an experiment using the scientific method .

  21. The Scientific Method

    This is the part of the scientific method that tests your hypothesis. An experiment is a tool that you design to find out if your ideas about your topic are right or wrong. It is absolutely necessary to design a science fair experiment that will accurately test your hypothesis. The experiment is the most important part of the scientific method.

  22. On the role of hypotheses in science

    Euclid's twenty‐three definitions start with sentences such as "1. A point is that which has no part; 2. A line is breadthless length; 3. The extremities of a line are points"; and continues with the definition of angles ("8. ... Poincaré (1854-1912b) also dealt with physics in Science and Hypothesis. "Experiment is the sole source ...

  23. Social Brain Hypothesis and Human Evolution

    Introduction. Primates have unusually large brains for body size compared to all other vertebrates. The conventional explanation for this is known as the "social brain hypothesis," which argues that primates need large brains because their form of sociality is much more complex than that of other species (Byrne & Whiten, 1988).This does not mean that they live in larger social groups than ...