A Survey on Privacy in Social Media: Identification, Mitigation, and Applications A Survey on Privacy in Social Media: Identification, Mitigation, and Applications

ACM Trans. Data Sci., Vol. 1, No. 1, Article 7, Publication date: January 2020. DOI: https://doi.org/10.1145/3343038

The increasing popularity of social media has attracted a huge number of people to participate in numerous activities on a daily basis. This results in tremendous amounts of rich user-generated data. These data provide opportunities for researchers and service providers to study and better understand users’ behaviors and further improve the quality of the personalized services. Publishing user-generated data risks exposing individuals’ privacy. Users privacy in social media is an emerging research area and has attracted increasing attention recently. These works study privacy issues in social media from the two different points of views: identification of vulnerabilities and mitigation of privacy risks. Recent research has shown the vulnerability of user-generated data against the two general types of attacks, identity disclosure and attribute disclosure. These privacy issues mandate social media data publishers to protect users’ privacy by sanitizing user-generated data before publishing it. Consequently, various protection techniques have been proposed to anonymize user-generated social media data. There is vast literature on privacy of users in social media from many perspectives. In this survey, we review the key achievements of user privacy in social media. In particular, we review and compare the state-of-the-art algorithms in terms of the privacy leakage attacks and anonymization algorithms. We overview the privacy risks from different aspects of social media and categorize the relevant works into five groups: (1) social graphs and privacy, (2) authors in social media and privacy, (3) profile attributes and privacy, (4) location and privacy, and (5) recommendation systems and privacy. We also discuss open problems and future research directions regarding user privacy issues in social media.

ACM Reference format: Ghazaleh Beigi and Huan Liu. 2019. A Survey on Privacy in Social Media: Identification, Mitigation, and Applications. ACM Trans. Data Sci. 1, 1, Article 7 (January 2020), 38 pages. https://doi.org/10.1145/3343038

1 INTRODUCTION

The explosive Web growth in the past decade has drastically changed the way billions of people all around the globe conduct numerous activities such as surfing the web, creating online profiles in social media platforms, interacting with other people, and sharing posts and various personal information in a rich environment. This results in tremendous amounts of user-generated data. The massive amounts of user information and the availability of up-to-date data makes social media platforms an attractive target for organizations seeking to collect and aggregate this information either for legitimate purposes or nefarious goals [ 35 ]. For example,the user-generated data provide opportunities for researchers and business partners to study and understand individuals at unprecedented scales [ 19 , 28 ]. This information is also crucial for online vendors to provide personalized services, and a lack of it would result in a deteriorating quality of online personalization service [ 23 ].

On the other hand, tremendous amounts of user-generated data risk exposing individuals’ privacy due to its richness of content including a user's relationships and other private information [ 22 , 26 , 85 , 140 ]. These data also make online users traceable, and, accordingly, users become severely vulnerable to potential risks ranging from persecution by governments to targeted fraud. For example, users may share their vacation plans publicly on Twitter without knowing that this information could be used by adversaries for break-ins and thefts in the future [ 124 , 191 ]. Moreover, sensitive information that users do not usually explicitly disclose can be easily inferred from their activities in social media such as location [ 109 , 123 ], age [ 178 ], and trust/distrust relationships [ 27 , 29 , 30 ].

Privacy issues could be prominent when the data get published by a data publisher or service provider. In general, two types of information disclosures have been identified in the literature: identity disclosure and attribute disclosure attacks [ 51 , 103 , 107 ]. Identity disclosure occurs when an individual is mapped to an instance in a released dataset. Attribute disclosure happens when the adversary could infer some new information regarding an individual based on the released data. Attribute disclosure becomes more probable when there is accurate disclosure of people's identities. Similarly, privacy leakage attacks in social media could be also categorized into either identity disclosure or attribute disclosure. These user privacy issues mandate social media data publishers to protect users’ privacy by sanitizing user-generated data before they are published publicly.

Data anonymization is a complex problem, and its goal is to remove or perturb data to prevent adversaries from inferring sensitive information while ensuring the utility of the published data. One straightforward anonymization technique is to remove “Personally Identifiable Information” (a.k.a. PII) such as names, user ID, age, and location information. This solution has been shown to be far from sufficient in preserving privacy [ 19 , 139 ]. An example of this insufficient approach is the anonymized dataset published for the Netflix prize challenge. As a part of the Netflix prize contest, Netflix publicly released a dataset containing movie ratings of 500,000 subscribers. The data were supposed to be anonymized, and all PII is removed from it. Narayanan et al. [ 139 ] propose a de-anonymization attack that map users’ records in the anonymized dataset to corresponding profiles on IMDB. In particular, the results of this work show that the structure of the data carries enough information for a potential breach of privacy to re-identify anonymized users.

Consequently, various protection techniques have been proposed to anonymize user-generated social media data. In general, the ultimate goal of an anonymization approach is to preserve social media user privacy while ensuring the utility of published data. As a counterpart to this research direction, another group of works investigate the potential privacy breaches from social media user data by introducing new attacks. These works find the gaps in anonymizing user-generated data and further improve anonymization techniques.

There is vast literature on privacy of users in social media from many perspectives. Existing works cover three applications in social media, i.e., making connection with people, sharing contextual information, and receiving personalized services. Besides, users generate various types of data, including graph data, textual data, spatiotemporal data, and profile attribute data. This results in 12 pairs of applications and data type combination. We categorize existing works into five distinct categories to cover different combinations, including (1) social graphs and privacy, (2) authors in social media and privacy, (3) profile attributes and privacy, (4) location and privacy, and (5) recommendation systems and privacy. Table 1 shows how each category covers different combination of applications and data types. The goal of this article is to provide a comprehensive review of existing works on user privacy issues and solutions in social media and give a guidance on future research directions. The contributions of this survey are summarized as follows:

  • We give an overview of the traditional privacy models for structured data and discuss how these models are adopted for privacy issues in social media. We formally define two types of privacy leakage disclosures that covers most of the existing definitions in the literature.
  • We categorize privacy issues and solutions on social media into different groups: (1) social graphs and privacy, (2) authors in social media and privacy, (3) profile attributes and privacy, (4) location and privacy, and (5) recommendation systems and privacy. We overview existing works in each group with a principled way to group representative methods into different categories.
  • We discuss several open issues and provide future directions for privacy in social media.

The remainder of this survey is organized as follows. In Section 2 , we present an overview of traditional methods and formally define two types of privacy disclosures. In Section 3 , we review the state-of-the-art methods for privacy of social media graphs. More specifically, Section 3.1 . covers de-anonymization attacks in social graphs, and Section 3.2 . covers anonymization techniques that are proposed for preserving privacy of graph data against de-anonymization attacks. We review author identification works in Section 4 . In Sections 5 and 6 , we overview state-of-the-art de-anonymization techniques for inferring users profile attributes and location information. In Section 7 , privacy issues and solutions in recommendation systems are reviewed. Finally, we conclude this article in Section 8 by discussing the open issues and future directions.

2 TRADITIONAL PRIVACY MODELS

Privacy-preserving techniques were first introduced for tabular and micro data. With the emergence of social media, the issue of online user privacy was raised. Researchers then focus on studying privacy leakage issues as well as anonymization and privacy-preserving techniques specialized for social media data. There are two types of information disclosure in the literature: identity disclosure and attribute disclosure attacks [ 51 , 103 , 107 ]. We can formally define these attacks as follows:

Definition 2.1 (Identity Disclosure Attack). Given $T = (\mathbf {G}, \mathbf {A}, \mathbf {B})$ , which is a snapshot of a social media platform with a social graph $\mathbf {G} =(V,E)$ where $V$ is the set of users and $E$ demonstrates the social relations between them, a user behavior $\mathbf {A}$ and an attribute information $\mathbf {B}$ , the identity disclosure attack is to map all users in the list of target users $V_t$ to their known identities. For each $v \in V_t$ , we have the information of her social friends and behavior.

Definition 2.2 (Attribute Disclosure Attack). Given $T = (\mathbf {G}, \mathbf {A}, \mathbf {B})$ , which is a snapshot of a social media platform with a social graph $\mathbf {G} =(V,E)$ , where $V$ is the set of users and $E$ demonstrates the social relations between them, a user behavior $\mathbf {A}$ and an attribute information $\mathbf {B}$ , the attribute disclosure attack is used to infer the attributes $a_v$ for all $v \in V_t$ where $V_t$ is a list of targeted users. For each $v \in V_t$ , we have the information of her social friends and behavior.

Network graph de-anonymization and author identification are examples of identity disclosure attacks that exists in social media. Examples of attribute disclosure attack include the disclosure of users’ profile attributes, location, and preferences information in recommendation systems.

Before we discuss privacy leakage in social media, we overview the traditional privacy models for structured data such as $k$ -anonymity [ 171 ], $l$ -diversity [ 119 ], $t$ -closeness [ 107 ], and differential privacy [ 52 ]. These models are defined over structured databases and cannot be directly applied to unstructured user generated data. The reason is that quasi-identifiers and sensitive attributes are not clear in the context of social media data. These techniques are further adopted for social media data, which we will discuss more in the next sections. Finally, we discuss related work and highlight the differences between this work and other surveys in existing literature.

2.1 k-anonymity, l-diversity, and t-closeness

$k$ -anonymity was one of the first techniques introduced for protecting data privacy [ 171 ]. The aim of $k$ -anonymity is to anonymize each instance in the dataset so that it is indistinguishable from at least $k-1$ other instances with respect to certain identifying attributes. $k$ -anonymity could be achieved through suppression or generalization of the data instances. The goal here is to anonymize the data such that $k$ -anonymity is preserved for all instances in the dataset with a minimum number of generalizations and suppressions while maximizing the utility of the resultant data. It has been shown that this problem is NP-hard [ 4 ]. $k$ -anonymity was initially defined for tabular data, but then researchers start to adopt it for solving privacy issues in social media data. In social media related problems, $k$ -anonymity ensures that users cannot be identified and there are $k-1$ other users with the same set of features that makes these $k$ users indistinguishable. These features may include users’ attributes and structural properties.

Although $k$ -anonymity is among the first techniques proposed for protecting the privacy of datasets, it is still vulnerable against specific types of privacy leakage. Machanacajjhala et al. [ 119 ] introduces two simple attacks that defeats $k$ -anonymity. The first attack is homogeneity attack, in which the adversary can infer an instance's (in this case, a users in social media) sensitive attributes when sensitive values in an equivalence class lack diversity. In the second attack, the adversary can infer an instance's sensitive attributes when he or she has access to background knowledge even in the case where the data are $k$ -anonymized. The second attack is known as background knowledge attack. Variations of background knowledge attacks are proposed and used for inferring social media users’ attributes. The background knowledge could be users’ friends’ or behavioral information. We will discuss more about different types of the attribute inference attacks problem in Sections 6 and 7 .

To protect data against homogeneity and background knowledge attacks, Machanacajjhala et al. [ 119 ] introduce the concept of $l$ -diversity. It ensures that the sensitive attribute values in each equivalence class are diverse. More formally, a set of records in an equivalence is $l$ -diverse if the class contains at least $l$ well represented values for the sensitive attributes. The dataset is then $l$ -diverse if every class is $l$ -diverse. Two instantiations of the $l$ -diversity concept are then introduced, entropy $l$ -diversity and recursive $(c,l)$ -diversity. With entropy $l$ -diversity, each equivalence must not only have enough different sensitive values, but also each sensitive value must be distributed evenly enough. More formally, the entropy of the distribution of sensitive values in each equivalence class is at least $log(l)$ . For recursive $(c,l)$ -diversity, the most frequent value should appear frequent enough in the dataset. Interested readers could refer to the work of Reference [ 119 ] for more details.

After $l$ -diversity, Li et al. [ 107 ] studies the vulnerabilities of $l$ -diversity and introduce a new privacy concept, $t$ -closeness. They show that $l$ -diversity cannot protect the privacy of data when the distribution of sensitive attributes in the equivalence class is different from the distribution in the whole dataset. If the distribution of sensitive attributes is skewed, then $l$ -diversity presents a serious privacy risk. This attack is known as the skewness attack. $l$ -diversity is also vulnerable against similarity attacks. This attack can happen when the sensitive attributes in an equivalence class are distinct but semantically similar [ 107 ]. Li et al. [ 107 ] thus introduce a new privacy concept, $t$ -closeness, which ensures that the distribution of a sensitive attribute in any equivalence class is close to the distribution in the overall table. More formally speaking, an equivalence class satisfies $t$ -closeness if the distance between the distribution of a sensitive attribute in this class and the distribution in the whole dataset is no more than a certain threshold. The whole dataset is said to have $t$ -closeness if all equivalence classes have $t$ -closeness. It is valuable to mention that $t$ -closeness protects the data against attribute disclosure but not identity disclosure.

$k$ -anonymity, $l$ -diversity, and $t$ -closeness are further adopted for unstructured social media data. Table 2 summarizes different approaches that leverage adopted versions of these techniques for privacy problems in social media. These works are discussed more in the following sections.

Technique Type of Information Paper
$k$-degree anonymity graph structure [ ]
$k$-neighborhood anonymity graph structure [ ]
$k$-automorphism graph structure [ ]
$k$-isomorphic graph structure [ ]
$k$-anonymity graph structure and attribute information [ ]
$(\theta ,k)$-matching anonymity graph structure and attribute information [ ]
$(k,d)$-anonimity graph structure and attribute information [ ]
$l$-diversity attribute information [ ]
$t$-closeness attribute information [ ]

2.2 Differential Privacy

Differential privacy is a powerful technique that protects a user's privacy during statistical query over a database by minimizing the chance of privacy leakage while maximizing the accuracy of queries. It is introduced by Dwork et al. [ 52 , 53 ] and provides a strong privacy guarantee. The intuition behind differential privacy is that the risk of user's privacy leakage should not be increased as a result of participating in a database [ 52 ]. In particular, it imposes a guarantee on the data release mechanism rather than the dataset itself. The privacy risk is also evaluated according to the existence or absence of an instance in the database. Differential privacy assumes that data instances are independent from each other and guarantees that existence of an instance in the database does not pose a threat to its privacy as the statistical information of data would not change significantly in comparison to the case that the instance is absent [ 52 , 53 ]. This way, the adversary cannot infer whether an instance is in the database or not or which record is associated with it [ 92 ].

Definition 2.3 (Differential Privacy). Given a query function $f(.)$ , a mechanism $K(.)$ with an output range $\mathcal {R}$ satisfies $\epsilon$ -differential privacy for all datasets $\mathcal {D}_1$ and $\mathcal {D}_2$ differing in at most one element iff :

Here $\epsilon$ is called privacy budget and large values of $\epsilon$ (e.g., 10) results in large $e^{\epsilon }$ and indicates that large output difference could be tolerated and hence we have large privacy loss. This is because the adversary can infer the change in the database according to the large change of the query function $f(.)$ . On the other hand, small values of $\epsilon$ (e.g., 0.1) indicate that small privacy loss could be tolerated. Query function $f(.)$ can be thought of as a request about value of a random variable and mechanism $K(.)$ is also a randomized function that can be considered as an algorithm that returns the results for the query function, possibly with some noise. To make it more clear, let us assume that we have a dataset containing every patient information. An example of the query function $f(.)$ could be the question, How many people have the disease $x$ ? The mechanism $K(.)$ could be any algorithm that finds the answer to this question. The output range $\mathcal {R}$ for mechanism $K(.)$ in this example is $\mathcal {R} = \lbrace 1,2,\ldots,n\rbrace$ , where $n$ is the total number of patients in the dataset.

Differential privacy models could be either interactive or non-interactive. Assume that the data consumer executes a number of statistical queries on the same dataset. In the interactive models, the data publisher responds to the customer with $K(f(\mathcal {D}))$ , where $K(.)$ perturbs the query results to achieve the privacy guarantees. In non-interactive models, the data publisher designs a mechanism $K(.)$ , which transforms the original data $\mathcal {D}$ into a new anonymized dataset $\mathcal {D}^{\prime } = K(f(\mathcal {D}))$ . The perturbed data $\mathcal {D}^{\prime }$ are then returned to the consumer, which is ready for arbitrary statistical queries.

A common way of achieving differential privacy is through adding random noises, i.e., Laplacian or Exponential to the query answers [ 52 ]. The Laplacian mechanism is a popular technique for providing $\epsilon$ -differential privacy that adds Laplace noise drawn from Laplace distribution. Since $\epsilon$ -differential privacy is defined over the query function and holds for all datasets according to Equation ( 1 ), the amount of added noise only depends on the sensitivity of the query function. Sensitivity of the query function is further defined as:

The added Laplacian noise is then drawn from $Lap(\Delta (f)/\epsilon) \propto e^{-\epsilon /\Delta (f)}$ , and the output result considering differential privacy constraint will be $K(f(\mathcal {D})) = f(\mathcal {D}) + Y$ , where $Y\sim Lap(\Delta (f)/\epsilon)$ . The mechanism $K(.)$ works best when $\Delta (f)$ is small as it introduces the least noise. The larger the sensitivity of a query, the less privacy risks can be tolerated, as removing any instance from the dataset would change the output of the query more. Note that the sensitivity basically captures how a great difference (between the value of $f(.)$ on two datasets differing in a single element) must be hidden by the additive noise generated by the data publisher. Note that recent studies show that the dependency between instances in the dataset will hurt the differential privacy guarantees [ 92 , 113 ].

There also exists a relaxed version of $\epsilon$ -differential privacy, known as $(\epsilon , \delta)$ -differential privacy, which was developed to deal with very unlikely outputs of $K(.)$ [ 52 , 53 ]. It could be defined as:

Definition 2.4 (( $\epsilon , \delta$ )-differential privacy). Given a query function $f(.)$ , a mechanism $K(.)$ with an output range $\mathcal {R}$ satisfies $(\epsilon , \delta)$ -differential privacy for all datasets $\mathcal {D}_1$ and $\mathcal {D}_2$ differing in at most one element iff :

Table 3 summarizes different works that utilize differential privacy in social media data. All these works are discussed more later.

Type of Information Reference
graph structure [ , , , , ]
recommender systems [ , , , , , , , , ]
textual data [ ]

2.3 Related Work

There are multiple relevant surveys related to the privacy of data and privacy-preserving approaches [ 1 , 5 , 54 , 59 , 82 , 86 , 159 , 165 , 176 , 193 ]. Fung et al. [ 59 ] reviews privacy-preserving data publishing methods for relational data such as $k$ -anonymity, $l$ -diversity, $t$ -closeness and their other variations. These methods are compared in terms of privacy models, anonymization algorithms, and information metrics. Zhelva et al. [ 193 ] review the concepts of privacy issues in tabular data and introduce new privacy risks in graph data. Multiple surveys focus on reviewing graph data privacy risks [ 1 , 82 , 86 , 165 ]. Sharma et al. [ 165 ] are among the first works that reviews $k$ -anonymity and randomization-based techniques for anonymizing graph data. Another overview by Abawajy et al. [ 1 ] presents the threat model for graph data and classified the background knowledge that is used by adversaries to breach the privacy of users. They also review and classify state-of-the-art approaches for anonymizing graph data. Ji et al. [ 82 , 86 ] conducted a survey on graph data anonymization, de-anonymization attacks, and de-anonymizability quantification. Another way of sanitizing data is by providing algorithms that are provably privacy-preserving and ensure no sensitive information leak from the data [ 193 ]. There is a thorough survey [ 176 ] on privacy-preserving data mining, which studies different privacy-preserving data mining approaches. Another work, from Agrawal et al. [ 5 ], proposes algorithms to perturb data values by adding random noise to them. Another set of works focuses on developing privacy-preserving association mining rules to minimize privacy loss [ 54 , 159 ].

In this work, we go one step further and review all aspects of social media data that could lead to privacy leakage. Social media data are highly unstructured and noisy and inherently different from relational and tabular data. Therefore, other approaches are designed specifically to study privacy risks in the context of user-generated data in social media platforms. Different from previous works, we not only reviews state-of-the-art and recent approaches on social graph anonymization and de-anonymization, but we also survey other attribute and identity disclosure attacks that could be performed on the other aspects of user-generated social media data. In addition, we overview and summarize approaches that leaks users’ profile attribute and location information utilizing their other online activities. We also survey author identification techniques that incorporate various pieces of user-generated information such as user profiles and textual posts to re-identify users. Besides, we cover more recent works related to privacy leakage in social media that are not covered in the work of Zhelva et al. [ 193 ]. Furthermore, we include many new techniques related to the privacy of social graphs that are not included in previous surveys [ 1 , 82 , 86 , 165 ].

In summary, to the best of our knowledge, this is the first and most comprehensive work that systematically surveys and analyzes the advances of research on privacy issues in social media.

3 SOCIAL GRAPHS AND PRIVACY

A large amount of data generated by users in social media platforms has graph structure. Friendship and following/followee relations, mobility traces (e.g., WiFi contacts, Instant Message contacts), and spatio-temporal data (latitude, longitude, timestamps) all could be modeled as graphs. This mandates paying attention to privacy issues of graph data. We will first overview graph de-anonymization works and then survey the proposed solutions for anonymizing graph data.

3.1 Graph De-anonymization

The work of Backstrom et al. [ 19 ] was among the first that studied the privacy breach problem related to the social network's graph structure. These attacks could be categorized as either a seed-based or seed-free approach according to whether pre-annotated seed users existed or not. Seed users are those whom their identity are clear for the attacker. Backstrom et al. [ 19 ] is among the first seed-based approaches. This work introduces both active and passive attacks on anonymized social networks. In active attacks, the adversary creates $k$ new user accounts (a.k.a. Sybils) and links them to the set of predefined target nodes before the anonymized graph is produced. Then it links these new accounts together to create a subgraph $H$ . After publishing the anonymized graph, the attacher looks for the subgraph $H$ and then locates and re-identifies targeted nodes in the published graph. The main challenge here is that the subgraph $H$ should be unique enough to be found efficiently. In passive attacks, the adversary is an internal user of the system and no new account is created. The attacker then de-anonymizes the users connected to him after the graph data is released. This attack is susceptible to Sybil defense approaches [ 8 ] and wrongly assumes that attackers can always change the network before its release.

Another work, from Narayanan et al. [ 140 ], introduces an improved attack that does not need compromised accounts or Sybil users to perform the attack. This work assumes that the attacker has access to a different network whose membership has overlap with the original anonymized network. This auxiliary graph is also known as background or auxilary graph knowledge. It also assumes that the attacker has the information of a small set of users, i.e., seed users, who are present in both networks. Narayanan et al. [ 140 ] discuss different ways of collecting background knowledge. For example, if the attacker is a friend of a portion of the targeted users, then he or she knows all the details about them [ 98 , 170 ]. Another approach is paying a set of users to reveal information about themselves and their friends [ 106 ]. Crawling data via social media API or using compromised accounts as discussed in active attack are other approaches for gathering background knowledge. Social graph de-anonymization attack in social media could be then formally defined as:

Definition 3.1 (Social Graph De-anonymization Attack [ 57 , 140 ]). Given an auxiliary/ background graph $G_1 = (V_1, E_1)$ and a target anonymized graph $G_2 = (V_2, E_2)$ , the goal of de-anonymization is to find identity disclosures in the form of $1-1$ mappings as many and accurately as possible. An identity disclosure indicates that the two nodes $i \in V_1$ and $j \in V_2$ actually correspond to the same user.

3.1.1 Seed-based De-anonymization. Seed-based de-anonymziation approaches have two main steps. In the first step, a set of seed users are mapped from the anonymized graph to the background/auxiliary graph knowledge and thus are re-identified. In the second step, the mapping and de-anonymization is propagated from the seed users to the other remaining unidentified users. Similarly, the work of Narayanan et al. [ 140 ] starts from re-identifying seed users in an anonymized and auxiliary graph. Then, other users are re-identified by propagating mappings based on seed users pairs. Structural information such as user's degree, user's eccentricity, and edge directionality are used to heuristically measures the strength of match between users. A straightforward application of this de-anonymization attack with less heuristics is predicting links between users [ 138 ].

Yartseva et al. [ 185 ] propose a percolation-based de-anonymization approach that maps every pair of users in both graphs (background knowledge and anonymized graphs) that have more than $k$ neighboring mapped pairs. The only parameter of this work is $k$ , which is a predefined mapping threshold and does not require a minimum number of users in the seed set. Another similar work, from Korula et al. [ 99 ], proposes a parallelizable percolation-based attack with provable guarantees. It again starts with a set of seed users who are previously mapped and then propagates the mapping to the remaining network. Two users will be mapped if they have a specific number of mapped neighbors. Their approach is robust to malicious users and fake social relationships in the network.

In another work, Nilizadeh et al. [ 142 ] propose a community-based de-anonymization attack using the idea of divide-and-conquer. Community detection has been extensively studied in the literature of social network analysis [ 12 , 184 ] and has been used in variety of tasks such as trust prediction [ 24 ] and guild membership prediction [ 13 , 69 ]. In this work, the attacker first partitions both graphs (i.e., anonymized and knowledge graphs) into multiple communities. It then maps communities by creating a network of communities in both graphs. Then users within mapped communities are re-identified and matched together. Mappings are then propagated to re-identify the remaining users. This attack uses similar heuristics as [ 140 ] to measure the mapping strength between users.

Ji et al. [ 80 , 81 ] study de-anonymizability of social media graph data based on seed-based approaches under both the Erdos-Renyi and a statistical model. Similarly to Reference [ 83 ], they specified the structure conditions for both perfect and partial de-anonymization. Chiasserini et al. [ 45 , 55 ] also study the problem of user de-anonymization according to their structural information under the scale-free user relation model. This assumption is more realistic, since users degree-distribution in social media follows power-law distribution, a.k.a. scale-free. Their results show that the information of a large portion of users in the seed set is useless in re-identifying users. This because of the large inhomogeneities in the users degree. This suggests that given a network with $n$ users, the order of $n^{\frac{1}{2}+\epsilon }$ (for any arbitrarily small $\epsilon$ ) seeds are needed to successfully de-anonymize all users when seeds are uniformly distributed among the vertices. Chiasserini et al. [ 45 , 46 ] also propose a two-phased percolation graph matching-based attack similar to that in Reference [ 185 ].

Bringmann et al. [ 38 ] also propose an approach that uses $n^\epsilon$ seed nodes (for an arbitrarily small $\epsilon$ ) for a graph with $n$ nodes. This is an improvement over the state-of-the-art structure-based de-anonymization techniques that need $\Theta (n)$ seeds [ 99 ]. This approach then finds a signature set for each node as the intersection of its neighbors and previously re-identified nodes. It then defines criterion that further is used to decide if two signatures originate from same nodes with high probability or not, i.e., if the similarity of two nodes signature is more than $n^c$ ( $c\gt 0$ is a constant), then the two nodes are mapped together. Local sensitivity hashing technique [ 78 ] is also used to reduce the number of comparisons needed for the de-anonymization attack. Theoretical and empirical analysis of their work show that the attack is performed in quasilinear time.

Manasa et al. [ 150 ] propose another seed-based attack against anonymized social graphs that has two steps. In the first step, it identifies a seed sub-graph of users with known identities. As discussed earlier in Reference [ 19 ], this sub-graph could be injected by an attacker or it could even be a small group of users that the attack is able to re-identify. In the second step, it extends the seed set based on the users’ social relations and re-identifies the remaining users. For each mapping iteration, the algorithm re-examines previous mapping decisions, given new evidence regarding re-identified nodes. This attack does not have any limitation on the size of the initial seed and the number of links between seeds. Another recent work, by Chiasserini et al. [ 46 ], incorporates clustering for de-anonymization attacks. Their attack uses various levels of clustering and their theoretical results highlight that clustering can potentially reduce the number of seeds in percolation-based de-anonymization attacks due to its wave-like propagation effect. This attack is a modified version of that in Reference [ 185 ], which starts from a small set of seed users and then expands seed set to the closest neighbors of the users in the seed set and repeat the re-identification procedure. In this version, two users are mapped if they have a sufficiently large number of neighbors among the mapped pairs.

To sum-up seed-based graph de-anonymization techniques could be categorized into three groups, percolation-based, clustering-based and seed extension-based works. Table 4 summarizes existing works according to the utilized technique and their properties.

3.1.2 Seed-free De-anonymizatoin. The efficiency of most of seed-based approaches depends on the size of seed set. Seed-free de-anonymization attacks have been developed to solve his issue. Pedarsani et al. [ 149 ] present a Bayesian model that starts from the users with the highest degree and iteratively solves a maximum weighted bipartite graph matching problem. This algorithm iteratively updates fingerprints of all users. The goal in the maximum bipartite graph matching problem is to find a maximum matching between two parties so that each vertex is the endpoint of exactly one of the chosen edges.

Moreover, Ji et al. [ 83 , 84 ] propose to use optimization-based methods to minimize an error function iteratively. More specifically, in each iteration of this attack, two candidate sets of users are selected from the anonymized and background graphs. Then users in the set from the anonymized graph are mapped (de-anonymized) to users in background graph by minimizing an error function defined by the edge difference caused by a mapping scheme. In particular, Ji et al. [ 83 ] quantify the structure-based de-anonymization under the Configuration model [ 141 ] and drive structural conditions for perfect and partial de-anonymization. The configuration Model generates a random graph given a degree sequence by randomly assigning edges to match the given degree sequence [ 141 ].

Another recently developed group of techniques leverages additional sources of information besides structural network to re-identify social media users in anonymized data. This information includes user interactions (e.g., commenting, tweeting) or non-personal identifiable information that is associated with users and is shared publicly such as gender, education, country and interests [ 64 ]. This combination of structural and exogenous sources of information could increase the of risk user privacy. Zhang et al. [ 190 ] study the privacy breach problem in anonymized heterogeneous networks. They first introduce a privacy risk measure based on the potential loss of the user and the number of users who have same value. They then propose a de-anonymization algorithm that incorporates the defined privacy risk measure. For each target user, this framework first finds a set of candidates based on entity attribute matches in the heterogeneous network and then narrows down this candidate set by comparing the neighbors (which are found via heterogeneous links) of the target user and each candidate.

Fu et al. [ 56 , 57 ] propose to use structural and descriptive information. Descriptive information is defined as attribute information such as name, gender, and birth year. This work first proposes a new definition of user similarity, i.e., two users are similar if their neighbors match to each other as well. However, similarity of neighbors also depends on the similarity of users. Therefore, Fu et al. model similarity as a recursive problem and solves it iteratively. Then, they reduce the de-anonymization problem to a complete weighted bipartite graph matching that is solved with Hungarian algorithm [ 101 ]. These weights here are calculated based on the users similarities.

In another work, the effect of user attribute information as an exogenous source of information on de-anonymizing social networks is studied [ 154 ]. In particular, this work incorporates semantic background knowledge of adversary in the de-anonymization process and models it using knowledge graphs [ 79 ]. This approach simultaneously de-anonymizes and infers users attributes (we will discuss user profile attribute inference attack later in Section 5 ). The adversary first models both the de-anonymized dataset and the background knowledge as two knowledge graphs. Then, she makes a complete adversary weighted bipartite graph. Each weight indicates the structural and attribute similarity between corresponding nodes in the anonymized and knowledge graphs. The de-anonymization problem will be then reduced to a maximum weighted bipartite matching problem that can be furthered reduced to a minimum cost maximum flow problem. Attacker prior semantic knowledge could be obtained via different ways such as common sense, statistical information, personal information, and network structural information.

Ji et al. [ 87 ] also study the same problem and show theoretically and empirically that using attribute information alongside structural information could result in a great privacy loss even in an anonymized dataset in comparison to the cases where the data only consists of structural information. They further propose the De-SAG de-anonymization framework, which incorporates both attribute and structural information. It first augments both types of information into a structure-attribute graph. De-SAG has two variants, i.e., user based and set based. In user-based De-SAG, the proposed de-anonymization approach first selects the most similar candidates to the target user from background/auxiliary knowledge graph based on similarity of their attributes. Next, the target user will be mapped to one of the selected candidates based on their structural similarity. In set-based De-SAG, for each iteration, two sets of users are selected from anonymized graph and knowledge graph, respectively. Then, the de-anonymization problem reduces to a maximum weighted bipartite graph Matching problem and users in these two sets are mapped to each other using Hungarian algorithm [ 101 ]. Note that the similarity of users are again calculated according to their attribute and structural information.

In another work, by Lee et al. [ 105 ], a blind de-anonymization technique is proposed in which the adversary does not need to have any background information. Inspired by the idea of $dK$ -series for chatacterizing structural characteristics of a graph, they propose $nK$ -series to describe structural features of each user by exploiting his multi-hop neighbors information. In particular, $nKi$ captures the degree histogram of the user's $i$ -hop neighbors. Then, a structure score is calculated for each user (in both the anonymized graph and the background knowledge graph) based on his diversity score (calculated according to $nK$ -series scores) and his relationships with all other non-reidentified users in the network. It then uses this information to re-identify all users in the anonymized social graph by leveraging pseudo relevance feedback support vector machines. Backes et al. [ 18 ] develop an attack that infers social links between users based on their mobility profiles without using any additional information about existing relations between users. Their approach first constructs mobility profile for each user by obtaining random walk traces from the user-location bipartite graph and using skip-gram [ 131 ] to obtain features in a continuous vector space. It then infers the links based on the similarity of their mobility profile.

Beigi et al. [ 26 , 28 ] introduce a new adversarial attack that does not need to have any background information before initiating the attack. This attack is designed for heterogeneous social media data, which consists of different aspects (i.e., textual and structural) and shows that anonymizing all aspects of data is not sufficient. This attack first extracts the most revealing information for each user in the anonymized dataset and then accordingly finds a set of candidate users. Each user is finally mapped to the most probable candidate user. Sharad et al. [ 164 ] propose to formulate the problem of graph de-anonymization in social networks as a learning task. They use one-hop and two-hop neighborhood degree distributions to represent each user. The intuition behind this selection is that two nodes refer to the same user if their neighborhoods also matches to each other. These features are further used to train a classifier to learn the degree deviation for identical and non-identical user pairs. In another work, Sharad et al. [ 163 ] go even further and propose a new generation of de-anonymization attacks that is heuristic free, seedless, and considered a learning problem. They use the same set of structural features as proposed in Reference [ 164 ] and then de-anonymize the sanitized graph by re-identifying users with high degree first and then use them to attack low-degree nodes. Mappings are then frozen and propagated to the remaining nodes to discover new set of mappings.

Table 5 categorizes reviewed works based on the used technique and the fact that if they are applicable on heterogeneous or homogeneous graph networks.

3.1.3 Theoretical Analysis and De-anonymization. Another set of works studies de-anonymization attacks from the theoretical perspective of view. For example, Liu et al. [ 113 ] theoretically study the vulnerability of differential privacy mechanisms against de-anonymization attacks. Differential privacy provides protection against even the strongest attacks in which the adversary knows the entire dataset except one entry. However, differential privacy assumes the independence between dataset entities that is not correct in most real-world applications. This work introduces a new attack in which the probabilistic dependence between dataset entries are calculated and then leveraged to infer users’ sensitive information from differentially private queries. The attack is also tested on graph data in which users’ degree distributions is published differentially privately.

Lee et al. [ 104 ] also study the theoretical quantification for relating the anonymized graph data vulnerability against de-anonymization attacks. In particular, they study the relation between application specific anonymized data utility (i.e., quality of data) and capability of de-anonymization attacks. They define local neighborhood utility and global structure utility. They theoretically show that under certain conditions for each of defined utilities, the probability of successful de-anonymization approaches one with the increase of number of users in data. Their foundations could be used to evaluate the effectiveness of the de-anonymization/anonymization techniques.

Recent research by Fu et al. [ 58 ] studies the conditions under which the adversary can perfectly de-anonymize user identities in social graphs. In particular, they theoretically study the cost of quantifying the quality of the mappings. Community structures are also parameterized and leveraged as side information for de-anonymization. They study two different cases in which the community information is available for both background knowledge and anonymized graphs or only for one of them. They showed that perfectly de-anonymizing graph data with community information in polynomial time is NP-hard. They further propose two algorithms with approximation guarantees and lower time complexity by relaxing the original optimization problem. The main drawback of this study is the assumption of disjoint communities, which fails to reflect the real-world situations. Wu et al. [ 181 ] extend Fu et al.’s study by considering overlapping communities. In contrast to Fu et al.’s work [ 58 ], which uses Maximum a Posteriori estimation to find the correct mappings, Wu et al. introduces a new cost function, Minimum Mean Square Error, which minimizes the expected number of mismatched users by incorporating all possible true mappings.

There are different surveys [ 1 , 82 , 86 , 104 ] on quantification and analysis of graph de-anomyziation techniques that study a portion of covered works here in terms of scalability, robustness, and practicability. Interested readers can refer to these surveys for further readings.

3.2 Graph Anonymization

Another research direction in protecting privacy of users in graph data is studying graph anonymization techniques. Existing anonymization approaches use different techniques and mechanisms and could be categorized mainly into five categories: $k$ -anonymity-based approaches [ 43 , 115 , 189 , 196 , 199 ], edge manipulation techniques [ 188 ], cluster-based techniques [ 31 , 70 , 114 , 134 , 174 ], random walk-based techniques [ 116 , 134 ], and differential privacy-based techniques [ 152 , 162 , 179 , 182 ]. We discuss each of these categories later.

3.2.1 K-anonymity-based Approaches. The aim of $k$ -anonymity methods is to anonymize each user/node in the graph so that it is indistinguishable from at least $k-1$ other users [ 171 ]. Liu et al. [ 115 ] proposed an anonymization framework for $k$ -degree anonymization in which for each user, there are at least $k$ other users with the same degree. The goal of this approach to add/delete the minimum number of edges to preserve $k$ -degree anonymity. This algorithm has two steps. In the first step, given the degree sequence of the original graph, a $k$ -degree anonymized version of the degree sequence is constructed and then in the second step, the anonymized graph is built based on the anonymized degree sequence. In another work [ 196 ], Zhou et al. aim to achieve $k$ -neighborhood anonymity. They consider the assumption that the adversary knows the subgraph constructed by the immediate neighbors of a target node. In the first step of the anonymization, one-hop neighborhoods of all users are extracted and encoded in a way that isomorphic neighborhoods could be easily identified. In the second step, users with similar/isomorphic neighborhoods are grouped together until size of each group is at least $k$ . Then, each group is anonymized satisfying $k$ -neighborhood anonymity as each neighborhood has at least $k-1$ isomorphic neighborhoods in the same group. Eventually, this approach anonymizes the graph against neighborhood attacks.

Zou et al. [ 199 ] propose a $k$ -automorphism-based framework that protects the graph against multiple attacks including the neighborhood attack [ 196 ], degree-based attack [ 115 ], hub-fingerprint attack [ 70 ], and subgraph attack [ 70 ]. A graph is $k$ -authomorphic if there exists $k-1$ automorphic functions in the graph and for each user in the graph, the attacker cannot distinguish it from her $k-1$ symmetric vertices. The proposed approach first partitions the graph into $n$ blocks and then clusters blocks into $m$ groups (graph partitioning step). In the second step, alignments of blocks are obtained and original blocks are replaced with alignment blocks (block alignment step). In the last step, edge copy is performed to get the anonymized graph. Edge copy adds $k-1$ edges between $k-1$ pairs $(F_a(u), F_a(v)))$ $(a = 1,2,\ldots k-1)$ , where $F_a(.)$ is the automorphic function and $u$ and $v$ are users in the social graph. Authors also propose the use of generalized vertex ID's for handling dynamic data releases. Another similar work, by Cheng et al. [ 43 ], proposes a $k$ -isomorphism anonymization approach. A graph is $k$ -isomorphic if it is consisted of $k$ disjoint subgraphs and all subgraphs pairs are isomorphic. In the first step, the graph is partitioned into $k$ subgraphs with the same number of vertices. Then, edges are added or deleted so that these subgraphs are isomorphic. This approach protects the published graph against neighborhood attacks [ 196 ].

Yuan et al. [ 189 ] incorporate semantic and graph information together to achieve personalized privacy anonymization. In particular, they consider three different levels for attacker's knowledge regarding the target user, (1) only attribute information, (2) both attribute and degree information, and (3) combination of attribute, node degree, and neighborhood's information. They accordingly propose three levels of protection to achieve $k$ -anonymity. For level 1 protection, their approach considers label generalization. For the level 2 anonymization, it uses node/edge adding approach as well. For the level 3 protection, it uses edge label generalization.

3.2.2 Edge Manipulation-based Approaches. Edge manipulation and randomization algorithms for social graphs usually utilizes edge-based randomization strategies to anonymize data such as random edge adding/deleting and random edge switching [ 188 ]. Ying et al. [ 188 ] propose spectrum preserved edge editing that either adds $k$ random edges to the graph and removes another $k$ edge randomly or alternatively switches $k$ edges. In the switching technique, two random edges, $(i_1, j_1)$ and $(i_2, j_2),$ are selected from the original graph edge set $E$ such that $\lbrace (i_1,j_2) \notin E \wedge (i_2,j_1) \notin E \rbrace$ . Then edges $(i_1,j_1)$ and $(i_2,j_2)$ are removed, and new edges $(i_1,j_2)$ and $(i_2,j_1)$ are added instead. This method is going to protect the graph against the edge inference attack. Backes et al. [ 18 ] also propose a randomization-based approach to preserve the privacy of social links between users in graph data and counteract link inference attacks. In this specific type of attack, the adversary exploits users mobility traces to infer social links between users with the intuition that friends have more similar mobility profiles in comparison to the mobility profiles of two strangers [ 18 ]. They utilize three privacy-preserving techniques: hiding, replacement, and generalization of user mobility information. Results show that data publishers need to hide 80% of the location points or replace 50% of them to prevent leakage of information of users social links.

3.2.3 Clustering-based Techniques. Clustering-based approaches group users and edges and only reveal the density and size of the cluster so that individual attributes are protected. Hay et al. [ 70 ] propose an aggregation-based method for graph data anonymization that is robust against three types of attacks: neighborhood, subgraph, and hub fingerprint. It models the aggregate network structure by partitioning original graph and describing it at the level of partitions. Partitions are considered as nodes and edges between them makes the edges in the generalized graph. A graph can be then randomly sampled it and be published as the anonymized graph data.

Another cluster-based work [ 31 ] proposes two approaches, label list and partitioning, which consider user attributes (i.e., labels) in addition to structural information. In the label list approach, a list of labels are allocated to each user that also includes her true label. This approach first clusters nodes into $m$ classes and then a set of symmetric lists is built deterministically for each class from the set of nodes in the corresponding class. In the partitioning approach, nodes are divided into classes and instead of releasing full edge information, only the number of edges between and within each class is released. This is similar to the generalization approach of Hay et al. [ 70 ]. Bhagat et al. also use a set of safety conditions to ensure that the released data do not leak information. The proposed partitioning approach is more robust than the label list technique when facing the attacks with richer background knowledge. However, the partitioning approach has lower utility than the label list as less information is revealed about the graph structure.

Thompson et al.’s approach [ 174 ] protects the graph information against $i$ -hop degree-based attack. They present two clustering algorithms, bounded $t$ -means clustering and union-split clustering. These approaches group users with similar social roles into clusters with a minimum size constraint. Then they utilize the proposed inter-cluster matching anonymization method, which anonymizes the social graph by removing/adding edges according to the users’ inter-cluster connectivity. The number of nodes and edges between and within clusters are then released similar to Hay et al.’s approach [ 70 ]. Mittal et al. [ 114 ] also propose another clustering-based aonymization technique that considers evolutionary dynamics of social graphs such as node/edge addition/deletion and consistently anonymizes the graph. It first dynamically clusters nodes and then perturbed the intra-cluster and inter-cluster links for changed clusters in a way that structural properties of social media graph is preserved. They leverage static perturbation method in Reference [ 134 ] to modify intra-cluster links and randomly connect marginal nodes to create fake inter-cluster links according to their degree. The obfuscated graph is robust against the edge inference attack and has higher indistinguishability that is defined from an information theoretic perspective.

3.2.4 Random Walk-based Approaches. Another group of works utilizes random walk idea to anonymize graph data. The idea of random walk has been previously used in many security applications such as Sybil defense [ 8 ]. Recent works also use this idea for anonymzing social graphs. The work of Mittal et al. [ 134 ] introduces a random-walk-based edge perturbation algorithm. According to this approach, for each node $u$ , a random walk with the length $t$ will be performed starting from one of the $u$ ’s contacts, $v$ and an edge $(u,z)$ between destination node, $z$ and $u$ will be added with an assigned probability, and the edge $(u,v)$ will be removed accordingly. This probability will decrease as more random walks are performed from $u$ ’s contacts. Later, Liu et al. [ 116 ] improve this approach such that instead of having a fixed length random walk with length $t$ , they utilize a smart adaptive random that its length is learned based on the local structure characteristics. This method first predicts the local mixing timing for each node, which is the minimum random walk length for a starting node to be within a given distance to stationary (distance) node. This mixing time is predicted based on the local structure and limited global knowledge of the graph and is further used to adjust the length of random walk for social graph anonymziation.

3.2.5 Differential Privacy-based Approaches. Recently, many works extend differential privacy [ 52 ] to the social graph data. Sala et al. [ 162 ] first use $dK$ -series to capture sufficient graph structure at multiple granularities. $dK$ -series is the degree distributions of connected components of size $K$ within a target graph [ 50 , 122 ]. Then, they partition the statistical representation of the graph captured by $dK$ -series into clusters and use $\epsilon$ -differential privacy mechanism to add noise to the representation in each cluster. Another differentially private-based approach [ 152 ] scales down the magnitude of added noise by reducing the contributions of challenging records.

In another work, Wang et al. [ 179 ] use $dK$ -graph generation models to generate sanitized graphs. In particular, their approach first extracts various information form the original social graph such as degree correlations and then enforces differential privacy on the learned information and, finally, uses perturbed pieces of information to generate an anonymized graph with $dK$ -graph models. Different from the approach in Sala et al. [ 162 ], in the specific case of $d=2$ , noise is generated based on the smooth sensitivity rather than global sensitivity. The reason behind this specification is to reduce the magnitude of the added noise. Smooth sensitivity is a smooth upper bound on the local sensitivity when deciding the noise magnitude [ 143 ]. Another work, Reference [ 182 ], proposes an anonymization approach that satisfies edge $\epsilon$ -differential privacy to hide each user's connections to other users. They propose to learn how to transform edges to connection probabilities via statistical Hierarchal Random Graphs (HRG) under differential privacy. In particular, their approach infers the HRG by learning the entire HRG model space and sampling an HRG by a Markov Chain Monte Carlo method and generating the sanitized graph according to the sampled HRG while satisfying differential privacy. Their results show that using edge probabilities can result in significant noise scale reduction in comparison to the case where the edges are used directly.

In another work, from Liu et al. [ 113 ], it has been shown that differential privacy is not robust to the de-anonymization attacks if there is dependence among dataset entries. Liu et al. [ 113 ] also propose a stronger privacy notion, dependent differential privacy in which it incorporates the probabilistic dependence between the tuples in a statistical database. They then propose an effective perturbation framework that provides privacy guarantees. Their result show that more noise should be added when there is dependency between tuples. The added noise is also dependent on the sensitivity of two tuples as well as the dependence relationship between them. They evaluate their proposed framework on graph data to sanitize the degree distribution of the given graph.

Ji et al. [ 82 , 86 ] and Abajaway et al. [ 1 ] study the defense and attacking performance of a portion of existing social graph anonymization and de-anonymization techniques. Ji et al. [ 82 , 86 ] have also performed a thorough theoretical and empirical analysis on a portion of existing related papers. Results demonstrate that anonymized social graphs are vulnerable to de-anonymization attacks.

To sum up, Table 6 categorizes reviewed works with respect to the utilized technique, i.e., $k$ -anonymity, edge manipulation, cluster based, random walk based, and differential privacy based. Each column in Table 6 refers to the type of graph de-anonymization attack and correspondingly the works that are robust against the mentioned attack.

4 AUTHORS IN SOCIAL MEDIA AND PRIVACY

People have the right to have anonymous free speech over different topics such as politics. However, an author's identity can be unmasked by adversaries through providing her real name or IP address to a service provider. However, authors can use tools such as Tor to protect their identity at the network level. Manually generated content will always reflect some characteristics of the person who authored it. For example, some anonymous online author is prone to several specific spelling errors or has other recognizable idiosyncrasies [ 137 ]. These characteristics could be enough to figure out whether authors of two pieces of content are same or not. Therefore, with material authored by the true identity of the author, the adversary can discover the identity of a content posted online by the same author anonymously. Identifying the author of a text according to her writing style, a.k.a. stylometry, has been studied for a long time [ 135 , 169 ]. With the adverse of machine learning techniques, researches start to extract textual features and discriminate between 100 and 300 authors [ 2 ]. The application of author identification includes identifying authors of terroristic threats and harassing messages [ 42 ], detecting fraud [ 3 ], and extracting demographic information [ 95 ].

Privacy implications of stylometry have been studied recently. For example, Rao et al. [ 156 ] investigate whether people who are posting under different pseudonyms to USENET newsgroup can be linked based on their writing style. They use a dataset of 117 people having 185 different pseudonyms and exploit function words and Principal Component Analysis (PCA) to perform matching between newsgroups posting and email domains. Another work, from Koppel et al. [ 96 , 97 ], studies author identification at the scale of over 10,000 blog authors. They use 4-grams of characters, which is a context specific feature. The problem with this work is that it is not clear whether their approach is solving author recognition or context recognition. In another work, Koppel et al. [ 95 ] use both content-based and stylistic features to identify 10,000 authors in the blog corpus dataset. There are also several works on identifying authors of academic papers under blind review based on the citations of the paper [ 37 , 73 ] or other sources from unblind texts of potential authors [ 136 ].

Narayanan et al. [ 137 ] propose another author identification attack that exploits 1,188 real-valued features from each post, such as frequency of characters, capitalization of words, syntactic structure (extracted by Stanford Parser [ 93 ], e.g., noun phrases containing a personal pronoun, noun phrases containing a singular proper noun), and distribution of word length. These features capture the writing style of the author regardless of the topic at hand and can re-identify large number of authors. However, this approach will not work when authors anonymize their writing style. Almishari et al. [ 10 ] proposed a new linkage attack that investigates the linkability of prolific reviews that users post on social media platforms. More specifically, given a subset of information on reviews made by an anonymous user, this approach seeks to map it to a known identified record. This approach first extracts four types of tokens: (i) unigrams, (ii) digrams, (iii) ratings, and (iv) category of reviewed entity. Then, it uses Naive Bayes and Kullback–Leibler (KL) divergence models to re-identify the anonymized information. This approach could be also used for identity disclosure attack across multiple platforms using people's posts and reviews.

Bowers et al. [ 36 ] propose an anonymization approach that uses iterative language translation to conceal one's writing style. This approach first translates English text into another foreign language (e.g., Spanish, Chinese, etc.) and then turns it back to English again for three iterations. Another work, from Nathan et al. [ 121 ], evaluates Bowers's work by introducing a feature selection approach, namely Generative and Evolutionary Feature Selection (GEFES), over the set of predefined features that mask out non-salient previously extracted features. Both Reference [ 36 ] and Reference [ 121 ] are tested over a set of blog posts by users and the results show the efficiency of ILT-based anonymization. A recent work is also proposed by Zhang et al. [ 191 ] that anonymizes users’ textual information before publishing user-generated data. This approach first introduces a verified version of differential privacy specified for textual data, namely, $\epsilon$ -Text Indistinguishability, to overcome the curse of dimensionality problem when original differential privacy is deployed on high-dimensional textual data. It then proposes a framework that perturbs user-keyword matrix by adding Laplacian noise to satisfy $\epsilon$ -Text Indistinguishability. Results confirms both the utility and privacy of the data.

5 SOCIAL MEDIA PROFILE ATTRIBUTES AND PRIVACY

A user's profile includes her self-disclosed demographic attributes such as age, gender, majors, cities she loved, and so on. To address the privacy of users, social networks usually offer the option for users to limit the access to their attributes, i.e., they are only visible to friends or friends of friends. A user could also create a profile without explicitly disclosing any attribute information. A social network thus is a mixture of both private and public user information. However, there exists one privacy attack that focuses on inferring users’ attributes. This attack is known as attribute inference attack and it leverages publicly available information of users in social networks to infer missing or incomplete attribute information [ 63 ].

The attacker could be any party who is interested in this information such as social network service providers, cyber criminals, data brokers, and advertisers. Data brokers benefit from selling individuals’ information to other parties such as banks, advertisers, and insurance companies. 1 Social network providers and advertisers leverage users’ attribute information to provide more targeted services and advertisements. Cyber criminals exploit attribute information to perform targeted social engineering, spear phishing, 2 and backup authentication attacks [ 68 ]. This attribute information could be also used for linking users across multiple sites [ 62 ] and records (e.g., vote registration records) [ 132 , 171 ]. Existing attacks could be categorized into three groups: friend based, behavior based, and friend and behavior based.

5.1 Friend-based Profile Attribute Inference

Friend-based approaches use homophily theory [ 127 ], which states that two friends are more probable to share similar attributes rather than two strangers. Following this intuition, if most of a user's friends study at Arizona State University, then she is more likely studying in the same university. He et al. [ 71 ] first constructs a Bayesian network from a user's social neighbors and then uses it to model the causal relations among people in the network and thus obtains the probability that the user has a specific attribute. The main challenge in this approach is its scalability as Bayesian inference is not scalable to the millions of users in social networks. Another work, by Lindamood et al. [ 111 ], uses Naive Bayes classification algorithm to infer a user's attributes by exploiting features from her node trait (i.e., other available attributes information) and link structures (i.e., friends). However, this approach is not usable for a user who does not share any attributes. In the other work, Reference [ 173 ], the authors propose an approach that leverages friends’ activities and information to infer a user's attributes. These features from friends and wall posts are then exploited into a multi-label classifier. The authors then propose a multi-party privacy approach that defends against attribute inference attacks. This approach enforces mutual privacy requirements for all users to prevent disclosure of users attributes and sensitive information.

Zhelva et al. [ 192 ] study how users sensitive attribute information could be leaked through their social relations and group memberships. This friend-based attribute inference attack exploits social links and group information to infer sensitive attributes for each user. Authors propose various algorithms in which it was found LINK was the best among those that only use link information. This method models each user $u$ as a binary vector whose length is the size of the network (i.e., number of users in the network) and the value of each element $v$ is one if $u$ is connected to $v$ . Then, different classifiers are trained over the users with a public profile and then attributes for users with private profiles could be inferred. The GROUP algorithm was the best among the methods that incorporates group information. This method first selects the groups that are relevant to the attribute inference problem using either feature selection approach (i.e., entropy) or manually. Next, relevant groups are considered as features for each node and a classifier model is trained. In the last step, the attributes for targeted users are predicted using the classification model. Mislove et al. introduces a similar approach that leverages users’ social links and communities information [ 133 ]. Their approach takes some seed users with known attributes as the input and then finds the local communities around this seed set using available link information. Then it uses the fact that users in the same community share similar attributes. This approach then infers remaining users’ attributes based on the communities they are a member of. The limitation is that this approach is not able to infer attributes for users who are not assigned to any local communities.

Avello et al. [ 61 ] propose a semi-supervised profiling approach named McC-Splat. They consider the attribute inference problem as a multiclass classifier. It then learns the attributes’ weights according to the user's friends’ attributes. Weights here indicate the users’ likelihood in belonging to a given attribute value class. Finally, McC-Splat assigns the class with the highest percentile to the target user. The percentile is calculated according to the labeled individuals information. In the other work, from Dey et al. [ 49 ], the authors focus on predicting facebook users’ ages considering their friendship network information. Although a user's friends list is not fully available for all users, this work uses reverse lookup approach to obtain a partial friend list for each user. Then, they designed an iterative algorithm that estimates users’ ages based on friends’ ages, friends of friends’ ages and so on. They also incorporated other public information in each user's profile such as their high school graduation year to estimate their birth year. Another work, Reference [ 77 ], seeks to find a targeted user based on her social network connections and the similarity of attributes between friends. It starts from a source user and continue crawling until it reaches the target user. The navigations are based on the set of target user's known attributes, friendship links between users and their attributes as well. Similarly, Labitzke et al. [ 102 ] also study whether profile information of Facebook users could be still leaked through their social relations. A recent work published by Li et al. [ 110 ] uses convolutionam neural network (CNN) to infer multi-valued attributes for a target user according to his ego network. A user's ego network is a subset of the original social network based on the user's friends and the social relations among them. CNN can capture the latent relationship between users’ attributes and social links.

Another set of works in this category focuses on predicting both network structure (i.e., links) and inferring missing users attribute information [ 65 , 186 , 187 ]. The reason for simultaneously solving these two problems is that users with similar attributes tend to link to one another and individuals who are friends are likely to adopt similar attributes. The work of Yin et al. [ 186 , 187 ] first creates a social-attribute network graph from an original social graph and user-attributes information, i.e., nodes in the graph are either users or attributes. Edges show the friendship between a pair of users or the relation between a user and attribute. Then, authors use random walk with restart algorithm [ 175 ] to calculate link relevance and attribute relevance with regard to a given user. Similarly, Gong et al. [ 65 ] transform the attribute inference attack problem to a link prediction problem in the social-attribute network graph. They generalized several supervised and unsupervised link prediction algorithms to predict the links between user-user and user-attributes.

5.2 Behavior-based Profile Attribute Inference

Unlike friend-based approaches, behavior-based inference attacks infer a user's attributes based on the publicly available information regarding her behaviors and public attributes of other users similar to her. Weinsberg et al. [ 180 ] propose an approach that infers users’ attributes (i.e., gender) according to their behavior toward movies. In particular, each user is modeled with a vector with the size being the number of items. A non-zero value for each vector element demonstrates that the user has rated the item, and zero value means that user has not rated the item. Then, they use different classifiers such as logistic regression, SVM, and Naïve Bayes to infer users’ ages. Accordingly, the authors propose a gender obfuscation method that adds movies and corresponding ratings to a given user's profile such that it will be hard to infer the gender of the user while minimally impacting the quality of recommendations the user received. They use three different approaches for movie selection: random, sampled, and greedy strategy. The sampled strategy picks a movie based on ratings distribution associated with the movies of the opposite gender. The greedy approach also selects a movie with the highest score in the list of movies for opposite gender. Ratings are also added for each movie based on either the average movie rating or the rating predicted using recommendation approaches such as matrix factorization. The greedy movie selection approach with predicted rating has the best results regarding user profile obfuscation. Kosinski et al. [ 100 ] follow a similar approach to Reference [ 180 ] and construct a feature vector for each user based on Facebook likes. Authors then use logistic regression classifier to infer various attributes for each user.

Another work, from Bhagat et al. [ 32 ], proposes an active learning-based attack that infers users’ attributes via interactive questions. In particular, their approach involves finding a set of movies and asking users to rate them. Each selection maximizes the confidence of the attacker in inferring users attributes. The work of Reference [ 41 ] seeks to infer users attributes based on the different types of music they like. This approach first extracts a user's interests and finds semantic similarity among them. It uses an ontologized version of Wikipeda related to each type of music, exploits topic modeling techniques (i.e., Latent Dirichlet Allocation, LDA [ 34 ]), and learns semantic interest topics for each user. Then, a user is predicted to have similar attributes as those who like similar types of musics as the user. In another work, from Luo et al. [ 117 ], authors infer household structures of Internet Protocol Television based on the users’ watching behavior. Their approach first extracts related features from log-data including TV programs topics and viewing behavior using LDA and low-rank model, respectively. Then, it combines graph-based semi-supervised learning with non-parametric regression and uses it to learn a classifier for inferring the household structure.

5.3 Friend and Behavior–based Profile Attribute Inference

Another category of approaches exploit both social link and user behavior information for inferring users attributes. Gong et al. [ 63 , 64 ] first make a social-behavior-attribute network (SBA) in which social structures, user behaviors, and user attributes are integrated into a unified framework. Nodes of this graph are users, behaviors, or attributes, and edges represents the relationship between these attributes. Then, they infer a target user's attributes through a vote distribution attack (VIAL) model. VIAL performs a customized random walk from a target user to all other users in the augmented SBA network and assigns probabilities to the users such that a user receives higher probability if it is structurally more similar to the target node in SBA network. The stationary probabilities of attribute nodes are then used to infer attributes of the target user, i.e., the attribute with maximum probability is assigned to the target user. Unlike most of the existing approaches that only use the information of users who have an attribute, a recent work from Ji et al. [ 88 ] incorporates information from users who do not have the attribute in the training process as well, i.e., negative training samples. This work associates a binary random variable with each user characterizing whether a user has an attribute or not. Then it learns the prior probability of each user having a specified attribute by incorporating the user's behavior information. Next, it models the joint probability of users as a pairwise Markov Random Field according to their social relationships and uses this model to infer posterior probability of attributes for each target user.

5.4 Exploiting Other Sources of Information for Profile Attribute Inference

These approaches leverage sources of information other than social structures and behaviors, such as writing style [ 144 ], posted tweets [ 9 ], liked pages [ 68 ], purchasing behavior [ 178 ], and checked-in locations [ 195 ]. A recent research combined identity and attribute disclosure across multiple social network platforms [ 16 ]. It defines the concept of $(\theta ,k)$ -matching anonymity as a measure of identity disclosure risk. Given a user and her identity in a source social network, a matching anonymity set is defined as the set of identities in the target social network with a matching probability of more than $\theta$ . The user is $(\theta ,k)$ anonymous if the size of the matching set is $k$ . Another work, by Backes et al. [ 17 ], introduces a relative linkability measure that ranks identities within a social media site. In particular, it incorporates the idea of $k$ -anonymity to define $(k,d)$ -anonimity for each user $u$ in social media that captures the largest $k$ subset of identities (including $u$ ) who are within a similarity (or dissimilarity) threshold $d$ from $u$ considering their attributes. A recent work from Liu et al. [ 113 ] also studies the vulnerability of differential privacy mechanism against the inference attack problem. As stated earlier, differential privacy provides protection against the adversary who knows the entire dataset except one entry. However, differential privacy considers the independence between dataset entities. Liu et al. introduce a new inference attack in which the probabilistic dependence between dataset entries are calculated and then leveraged to infer a user's location information from differentially private queries.

Different from all the works focusing on profile attribute inference, a recent work, Reference [ 11 ], brings evasion and poisoning attacks into this problem. This work introduces five variants of evasion and poisoning attacks to interfere with the results of the profile attribute inference:

  • Good/Bad Feature Attack (Evasion) : The adversary adds good features from one attribute to another while removing bad features from each class to introduce false signals for the predictor.
  • Mimicry Attack (Evasion) : Adversary samples a set of users from one class and then finds the most similar users in the other class. Good (bad) features are added (removed) for users in the found subsets.
  • Class Altering Attack (Poisoning) : Adversary randomly chooses users from one class and then flips their class label. This results in higher misclassification rate.
  • Feature Altering Attack (Poisoning) : The goal is to increase the misclassification rate. She poisons the training data by randomly adding good feature values of one class to another class.
  • Fake Users Addition Attack (Poisoning) : The attacker poisons the data by removing a set of real users and then injecting fake users into the training dataset.

Table 7 summarizes existing works based on the technique they have used and the type of information leveraged for attribute inference attacks. Utilized techniques could be categorizes into different groups: community and clustering-based, random walk-based, graphical model-based, iterative-based, active learning-based, semi-supervised-based, and traditional supervised methods.

6 SOCIAL MEDIA USERS LOCATION AND PRIVACY

This location disclosure attack is a specific version of attribute inference attack in which the adversary focuses on inferring geo-location information for a given user. The location disclosure attack takes as input some geolocated data and produces some additional knowledge about target users. More precisely, the objective of this attack may be to (1) predict the movement patterns of an individual, (2) learn the semantics of the target user mobility behavior, (3) link records of the same individual, and (4) identify points of interest [ 60 ]. Existing works incorporate a given user's friends’ known geo-location information [ 20 , 47 , 90 , 91 , 94 , 125 , 126 , 160 ]. The work of Reference [ 20 ] introduces a probabilistic model representing the likelihood of the target user's location based on her friends’ location and geographic distance between them. Reference [ 94 ] and Reference [ 126 ] extend Backstrom et al.’s work [ 20 ] and find the target user's friends that are strong predictors of her location.

In another work, Mcgee et al. [ 125 ] integrates social tie strength information to capture the uncertainty across multiple location granularities. The reason is that not all relationships in social media are the same and the location of friends with strong ties are more revealing of a user's location. Rout et al. [ 160 ] deploy a SVM classifier on a given set of features to predict the target user's location. These features include cities of the target user's friends, number of friends in the same city as the target user and number of reciprocal relationships the target user has per city. Jurgens et al. [ 90 ] infer locations by proposing an iterative multi-pass label propagation approach. This approach calculates each target user's location as the geometric median of her friends’ locations and it seeks to overcome the sparsity problem when the ground truth data is sparse. The work of Reference [ 47 ] extends Reference [ 90 ] and limits the propagation of noisy locations by weighting different locations using information such as the number of times the users have interacted.

Another work, from Cheng et al. [ 44 ], proposes a probabilistic framework that infers Twitter users’ city level location based on the content of their tweets. The idea is that users’ tweets include either implicit or explicit location-specific content, e.g., place names, or words or phrases more associated with certain locations (e.g., “howdy” for Texas). It uses lattice-based neighborhood smoothing technique to even out the word probabilities and overcome the tweet sparsity challenge. Hecht et al. [ 72 ] also found that only $34\%$ of Twitter users do not provide their real location information or share fake locations or sarcastic comments to fool location inference approaches. They show that a user's location could be inferred using machine learning techniques through the implicit user behavior reflected in their tweets. In another work, Ryoo et al. [ 161 ] refine Cheng et al.’s city-level granularity location inference approach [ 44 ] to 500-m distance bins. Having GPS-tagged tweets for a set of users, their approach builds geographic distributions of words and computes user location as a weighted center of mass from the user's words. It then uses a probabilistic model and computes the foci and dispersions by binning the distance between GPS coordinates and the word's center by 500 m for computational scalability.

Li et al. [ 109 ] introduce a unified discriminative influence model that considers both users’ social network and user-centric data (e.g., tweets) to solve the scarce and noisy data challenge for location inference. It first augments social network and user data in a probabilistic framework that is viewed as a heterogeneous graph with users and tweets as nodes and social and tweeting relations as edges. Every node in this graph is then associated with a location and the proposed probabilistic influence model measures how likely an edge is generated between two nodes considering their locations. Another similar work, from Li et al. [ 108 ], exploits a user's tweets and social relations to build a complete location profile that infers a set of multiple long-term geographic location scopes related to her, which not only includes her home location, but also other related ones, e.g., work space. Their approach captures the user's friends’ locations as well.

Srivatsa et al. [ 168 ] propose a de-anonymization attack that exploits a user's friendship information in social media to de-anonymize users mobility traces. The idea behind this approach is that people meet those who have a relationship with them and thus they could be identified by their social relationships. This approach models mobility traces as contact graphs and identifies a set of seed users in both graphs, i.e., contacts graph and friendship in social network. In the second step, it propagates mapping from seed users to the remaining users in the graphs. This approach uses Distance Vector, Randomized Spanning Trees, and Recursive Subgraph Matching heuristics to measure the mapping strength and propagate the measured strength through the network.

Another work, from Ji et al. [ 85 ], improves the work of Srivasta et al. [ 168 ] in terms of accuracy and computational complexity. This work focuses on mapping anonymized users mobility traces to social media accounts. In addition to the users’ local features, their approach incorporates users’ global characteristics as well. Ji et al. define three similarity metrics: structural similarity, relative distance similarity, and inheritance similarity. These similarities are then combined in a unified similarity. Structural similarity considers features such as degree centrality, closeness centrality, and betweenness centrality while relative distance similarity captures the distance between users and seed users. Inheritance similarity considers the number of common neighbors that have been mapped as well as the degree similarity between the users in mobility traces and social media network graph. Next, Ji et al. [ 85 ] propose an adaptive de-anonymization framework that adaptively starts de-anonymizing from a core matching set that is consisted of a number of mapped users and $k$ -hop mapping spanning set of them.

In another work, Reference [ 123 ], the location of Twitter users are inferred in different granularities (e.g., city, state, time zone, geographical region) based on their tweeting behavior (frequency of tweets per time unit) and the content of their tweets. This approach exploits external location knowledge (e.g., dictionary containing names of cities and states, and location-based services such as Foursquare) and finds explicit references of locations in tweets. Then all features are fed into a dynamically weighted method that is an ensemble of the statistical and heuristic classifiers.

Another work, from Wang et al. [ 177 ], links multiple users identities across multiple services/social media platforms (even with different types) according to the spatial-temporal locality of their activities, i.e., users mobility traces. This work also assumes that individuals can have multiple IDs/accounts. The motivation behind their algorithm is that IDs corresponding to the same person, are online at the same time in the same location and users’ daily movement is predictable with repeated patterns. Wang et al. model users information as a contact graph where nodes are IDs (regardless of the service) and an edge represents connected IDs that have visited the same location. The weight of the edge demonstrates the number of co-location of two nodes. Then, a Bayesian matching algorithm is proposed to find the most probable matching candidates for a given target ID. A Bayesian inference method is then used to generate confidence scores for ranking candidates.

The work of Reference [ 91 ] compares different approaches in location inference attacks in social networks. There are also some other surveys discussing location inference techniques specifically in Twitter [ 7 , 194 ], to which the reader can refer. Note that a large portion of research is dedicated to inference attacks on geolocated data, which is out of the scope of this survey [ 60 , 112 , 167 ]. A thorough survey is also available discussing geolocation data privacy, to which readers can refer if they are interested [ 112 ]. Note that the scope of this survey is a different from ours in which we cover the location privacy issues of users based on activities in social media.

In conclusion, location inference attack uses three types of information, (1) a user's network, (2) a user's contextual, and (3) a user's network and contextual information. A summary of the existing works is represented in Table 8 based on the type of leveraged information and used technique.

7 RECOMMENDATION SYSTEMS AND PRIVACY

Recommendation systems help individuals find information that matches with their interests by building user-interest profiles and recommending items to users based on those profiles. These profiles could be extracted from the users’ interactions as they express their preferences and interests, e.g., clicks, likes/dislikes, ratings, purchases, and so on [ 25 ]. While user profiles help recommender systems to improve the quality of the services a user receives (a.k.a. utility), they also raise privacy concerns by reflecting the preferences of users [ 155 ]. Many works have studied the relationship between privacy and utility and have proposed solutions to handle the tradeoff. In general, these works focus on obfuscating users’ interactions to hide their actual intentions and prevent accurate profiling [ 153 , 157 ]. Following this strategy, no third parties or external entities need to be trusted by the users to preserve their privacy. Existing approaches use different techniques and mechanisms and could be categorized mainly into three categories: cryptographic-based techniques [ 6 , 21 , 40 , 74 , 172 ], differential privacy-based approaches [ 66 , 76 , 89 , 120 , 128 , 130 , 166 , 197 , 198 ], and perturbation-based techniques [ 75 , 118 , 146 , 147 , 148 , 151 , 153 , 158 , 183 ]

A group of works focus on providing cryptographic solutions to the problem of secure recommender systems. The approaches do not let the single trusted party have access to everyone's data [ 6 , 21 , 40 , 74 , 172 ]. Instead, users’ ratings are stored as encrypted vectors and aggregates of the data are provided in the public domain. These approaches do not prevent privacy leaks through the output of recommendation systems (i.e., the recommendation themselves). These techniques are not the scope of this survey. Interested readers can refer to the mentioned papers for more details.

7.1 Differential Privacy-based Solutions

Works in this group utilize a differential privacy strategy to either anonymize user data before sending it to the recommendation system or perturb the recommendation outputs. McSherry et al. [ 128 ] modify leading algorithms for recommendation systems (i.e., SVD and $k$ -nearest neighbor) for the first time so that drawing inferences about original ratings is difficult. They utilize differential privacy to construct private covariance matrices and make the collaborative filtering algorithms that use them private without having significant loss in accuracy.

In another work, Calandrino et al. [ 39 ] propose a new passive attack on recommender systems to infer a target user's transactions (i.e., item ratings). Their attack first monitors changes in the public outputs of a recommender system over a period of time. Public outputs may include related-items lists or an item-item covariance matrix. Then, it combines this information with a moderate amount of auxiliary information about the target user's transactions to further infer many of the target user's unknown transactions. Calandrino et al. further introduce an active inference attack on $k$ -NN recommender systems. In this attack, $k$ sybil users accounts are created and the $k$ nearest neighbor of each sybil consists of $k-1$ other sybil users and the target user. The attack can then infer the target user's transactions history based on the items recommended to any of the sybils. Results confirm the existence of privacy risks over the public outputs of recommender systems. The work of McSherry et al. [ 128 ] is not effective in protecting users against this attack as it does not consider updates to the covariance matrices and cannot provide a privacy guarantee in the dynamic settings. Machanavajjhala et al. [ 120 ] then quantifies the accuracy-privacy tradeoff. In particular, they prove lower bounds on the minimum loss in accuracy for recommendation systems that utilize differential privacy. Moreover, they adapt two differentially private algorithms, Laplace [ 53 ] and Exponential [ 129 ], to prevent disclosure of users’ private attributes.

Previous works [ 120 , 128 ] are vulnerable to $k$ -nearest neighbor attack, as they fail to hide similar neighbors [ 39 ]. Zhu et al. [ 197 ] also propose a private neighborhood-based collaborative filtering that protects the information of both neighbors and user ratings. The proposed work assumes that the recommender system is trusted and introduces two operations: private neighbor selection and recommendation-aware sensitivity. The first operation seeks to protect neighbors identity by privately selecting $k$ neighbors from a list of candidates and then adopting the exponential mechanism [ 129 ] to arrange a probability for each candidate. The second operation enhances the utility by reducing the magnitude of added noise. To do so, after selecting $k$ neighbors, the similarity of neighbors is then perturbed by adding Laplace noise to mask the ratings given by a certain neighbor. Finally, the neighborhood collaborative filtering-based recommendation is performed on the private data. In another work, Jorgensen et al. [ 89 ] assume that all users’ item-rating attributes are sensitive. However, different from Machanavajjhala et al. [ 120 ], they assume that users’ social relations are non-sensitive. They propose a differentially private-based recommendation that incorporates social relations besides user-item ratings. To address the utility loss, this work first clusters users according to their social relations. Then, noisy averages of the user-item preferences are computed for each cluster using the differential privacy mechanism.

Shen et al. [ 166 ] assume that the recommender system is untrusted. They propose a user perturbation framework that anonymizes user data under a novel mechanism for differential privacy: relaxed admissible mechanism. The users’ perturbed data is then used for recommendation. They provide mathematical bounds on the privacy and utility of the anonymized data. Hua et al. [ 76 ] also propose a differentially private matrix factorization-based recommender system. In particular, they solve this problem for two scenarios, trusted recommender and untrusted recommender. For the first scenario, user and item profile vectors are learned via regular and private version of matrix factorization, respectively. Private version of matrix factorization adds noises to item vectors to make them differentially private. In the second scenario, item profile vectors are first differentially privately learned with private matrix factorization problem. Then, a user's differentially private profile vector is derived from the private item profiles. A novel and strong form of differential privacy, namely, distance-based differential privacy, has been introduced by Guerroaui et al. [ 66 ]. Distance-based differential privacy ensures privacy for all the items rated by a user and the ones that are within a distance $\lambda$ from it. The distance parameter $\lambda$ controls the level of privacy and aids in tuning the recommendation privacy-utility tradeoff. The proposed protocol first finds a group of similar items for each given item. Then, it creates a manipulated user profile to preserve $(\epsilon , \lambda)$ -differential privacy by selecting an item and replacing it with another one.

Another differential privacy-based recommendation by Zhu et al. [ 198 ] proposed two approaches to solve the privacy problem in recommendation systems: item-based and user-based recommendation algorithms. In the item-based one, the exponential mechanism [ 129 ] is applied to the selection of the related items to guarantee differential privacy. Such resultant differentially private items list is further used to find recommendation for a given user. Similar private procedure happens in the user-based recommendation system. Another work differentiates sensitive and non-sensitive ratings to further improve the quality of recommendation systems in the long run [ 130 ]. Meng et al. [ 130 ] propose a personalized privacy-preserving recommender system. Given sets of sensitive and non-sensitive ratings for each user, their approach utilizes differential privacy [ 52 ] to perturb users’ ratings. Smaller and larger privacy budgets are considered for sensitive and non-sensitive ratings, respectively. This protects users’ privacy while retaining recommendation effectiveness.

7.2 Perturbation-based Solutions

Perturbation-based techniques usually obfuscate users item ratings by adding random noise to the user data. Rebollo et al. [ 158 ] propose an approach that first measures the user's privacy risk as the KL divergence [ 48 ] between user's apparent profile and average population's distribution profile. The idea is that the more a user's profile diverges from the general population, the more information an attacker can learn about her. Then it seeks to find the obfuscation rate for generating forged user profiles so that the privacy risk is minimized. A closed-form solution is also provided for perturbing users interactions to optimize the privacy risk function.

Puglisi et al. [ 153 ] further extend Rebollo et al.’s work [ 158 ] to investigate the impact of this technique on content-based recommendation. This work measures a user's privacy risk similar to the approach proposed in Reference [ 158 ]. The utility is also measured by the prediction accuracy of the recommender system. It evaluates three different strategies, namely: optimized tag forgery [ 157 ], uniform tag forgery and TrackmeNot (TMN) [ 75 ]. The uniform tag forgery method assign forged tags according to a uniform distribution across all categories of the user profile. TMN constructed eleven categories from Open Directory Project (ODP) classification scheme 3 and selected the tags uniformly from this set. According to this work, users tend to mimic the profile of the population distribution when larger values of obfuscation rate is considered that results in less privacy risk but lower utility rate. Moreover, the authors have found that for a small forgery rate, it is possible to obtain an increase in privacy against a small degradation of utility.

Polat et al. [ 151 ] use a randomized perturbation technique [ 5 ] to obfuscate user generated data. Each user generates the disguised z-score for the item he has rated. The z-score for each user-item pair is based on the original item-rating, the user's average ratings and the total number of items she has rated. The proposed approach then passes the perturbed private data to the collaborative filtering-based recommender system. Another work, Reference [ 146 ], obfuscates user rating information and then passes disguised information to the collaborative filtering system for further recommendation. The proposed Nearest Neighbor Data Substitution (NeNDS) obfuscation method substitutes a user's data elements with one of her neighbors in the metric space [ 145 ]. However, one drawback of NeNDS is that the value of the perturbed data could be close enough to the original value that thus makes the data vulnerable. A hybrid version of NeNDS is then proposed that provides stronger privacy by geometrically transforming data before passing to NeNDS.

In contrast to Mcsherry et al. [ 128 ], Xin et al, assume that the recommender is not trusted and the onus is on the users to protect their privacy [ 183 ]. Their approach separates the computations that can be done by the users locally and privately and those that must be done by the recommender system. In particular, item features are learned by the system and user features are obtained locally by the users and further used for recommendation. Their approach also divides users into two groups, users who publicly share their information, and those who keep their preferences private. It then uses information of users in the first group to estimate items features. Xin et al. show theoretically and empirically that having the public information of a moderate number of users with a high number of ratings is enough to have an accurate estimation. Moreover, they propose a new privacy mechanism that privately releases second order information that is needed for estimating item features. This information is extracted from users who keep their preferences private. The main assumption behind this work is not realistic though, as in a real-world scenario it is not easy to collect ratings of a moderate number of people with a high number of ratings.

Luo et al. [ 118 ] propose a perturbation-based group recommendation method that assumes that similar users are grouped to each other and they are not willing to expose their preferences to anybody other than the group members. The items will be then recommended to the users within the same group. Particularly, in the first step, users are required to exchange their rating data among users in the same group given a secret key. This key varies for different users. The output of this step is a fake preference vector for each user. The value of the rating is then obfuscated in the second step by a chaos-based scrambling method. Similarly to Polat et al. [ 151 ], randomness is added to the output of the previous step to make sure no sensitive information remains in the published data. This information is then sent to the recommender system and it iteratively extracts information about aggregated ratings of the users. Extracted information is then used to estimate a group preference vector for collaborative filtering-based recommendation.

Parra-Arnau et al. [ 148 ], propose a privacy enhancing technology framework, PET, which perturbs users preferences information by combining two techniques, namely, the forgery and the suppression of ratings. In this scenario, users may avoid rating items they like and instead rate those that do not reflect their actual preferences. Therefore, the apparent profile of users will be different from their actual profile. Similarly to Reference [ 158 ], the privacy risk of each user is then measured as the KL divergence [ 48 ] between the user's apparent profile and the average population distribution. Utility is also controlled with the forgery and suppression rates. The tradeoff among privacy, forgery and suppression rate is then modeled as an optimization function to infer which ratings for each user should be forged and which ones should be suppressed to achieve the minimum privacy risk while keeping the utility of the data as high as possible. Similarly, Parra-Arnau et al. [ 147 ] propose a system that perturbs user rating profile according to her privacy preferences. The system has two components: (1) a profile-density model in which the user's profile will be more similar to the crowd's and (2) a classification model in which the user will not be identified as a member of a given group of users. The proposed model optimizes the tradeoff between privacy and utility and decides whether each service provider can have access to the user's profile, or not.

Recently, Biega et al. [ 33 ] proposed a framework that scrambles the users’ rating history to preserve both their privacy and utility. The main assumption of this article is that recommender systems do not need the complete and accurate user profiles. Therefore, it splits users’ profiles (i.e., pairs of user-item interactions) to Mediator Accounts (MA) in a way that coherent pieces of different users’ profiles are kept intact in the MAs. The recommender will then deal with the MAs rather than real user profiles. This helps to preserve users’ privacy by scrambling the user data across various proxy accounts while it keeps the user utility high as possible as well. Another work, from Guerraoui et al. [ 67 ], introduces metrics for measuring the utility and privacy effect of a user's behavior such as clicks, and likes/dislikes. Then, it shows that there is not always a tradeoff between utility and privacy. This article also proposed a click-advisor platform that warn users regarding the status of their click w.r.t. the privacy and utility. Here utility is defined as the difference between commonality of a user profile before and after a click. The privacy risk of a click is accordingly defined as the difference of the disclosure degree before and after the click.

Last, we summarize the reviewed state-of-the-art works in Table 9 . Columns show the properties of the proposed models. These works protect user-item data (1) before sharing, (2) while processing the data, or (3) after recommending items to the user.

8 SUMMARY AND FUTURE RESEARCH DIRECTIONS

Explosive growth of the Web has not only drastically changed the way people conduct activities and acquire information but also has raised security [ 8 , 14 , 15 ] and privacy [ 26 , 139 ] issues for them. Users are increasingly sharing their personal information on social media platforms. These platforms publish and share user-generated data with third-parties that risks exposing individuals’ privacy. There are two general types of attacks, identity disclosure, and attribute disclosure. Sanitizing user-generated social media data is more challenging than structured data as it is heterogeneous, highly unstructured, noisy, and inherently different from relational and tabular data. In this survey, we review the recent developments in the field of privacy of social media data. We first review traditional privacy models for structural data. Then, we review, categorize, and compare existing methods in terms of privacy models, privacy leakage attacks, and anonymization algorithms. We also review privacy risks that exists in different aspects of social media such as users graph information, profile attributes, textual information and preferences. We categorize relevant works into five groups: (1) social graphs and privacy, (2) authors in social media and privacy, (3) profile attributes and privacy, (4) location and privacy, and (5) recommendation systems and privacy. For each category, we discuss existing attacks and solutions (if any was proposed) and classify them based on the type of data and the used technique. We outline the privacy attacks/solutions in Figure 1 . Figure 2 also depicts the relevant privacy issues w.r.t. the type of social media data.

Fig. 1.

Detecting privacy issues and proposing techniques to protect users privacy in social media is a challenging issue. Most of the existing works focus on introducing new attacks and thus the gap between protection and detection becomes larger. Although a large body of work has emerged in recent years for investigating privacy issues for social media data, the development of tasks in each category is highly imbalanced. Some of them are well studied, whereas others need further investigation. We highlight these tasks in red in Figure 1 and Figure 2 based on privacy issues and user-generated data type, respectively. Below, we present some potential research directions:

  • Protecting privacy of textual information : Textual information is noisy, high-dimensional, and unstructured. It is rich in content and could reveal many sensitive information that user does not originally expose such as demographic information and location. This makes textual data a very important source of information for adversaries and could be exploited in many attacks. We thus need more research for anonymizing users’ textual information to preserve privacy of users against various attacks such as author identification, and profile attribute disclosure.
  • Protecting privacy of profile attribute information : We reviewed many works that introduces privacy risks w.r.t. profile attributes. To the best of our knowledge, there is no work on introducing defense mechanisms against these attacks. One research direction could be in terms of a privacy-preserving tool for users that warns them against their activities and possibility of privacy leakage. Another direction is to propose a privacy protection technique that anonymize user's data before publishing to protect them against private attribute leakage.
  • Privacy of spatiotemporal social media data : Social media platforms support space-time indexed data and users have created a large volume of time-stamped, geo-located data. Such spatiotemporal data has an immense value for understanding users behavior better. In this survey, we review the state-of-the-art re-identification attacks that incorporate this data to breach privacy of users. This information may be used to infer users’ location as well as their preferences and interests in case of recommendation systems. One future research direction could be investigating the role of temporal information in privacy of online users. More research should be done to build anonymization frameworks for protecting users temporal information.
  • Privacy of heterogeneous social media data : User-generated social media data is heterogeneous and consists of different aspects. Existing anonymization techniques assume that it is enough to anonymize each aspect of heterogeneous social media data independently. Beigi et al. [ 28 ] show that this assumption is not correct in practice due to the hidden relations between different aspects of the heterogeneous data. One potential research direction is to examine how different combinations of heterogeneous data (e.g., a combination of location and textual information) are vulnerable to the de-anonymization attack. Another potential direction is to improve anonymization techniques by considering hidden relations between different aspects of the data.

ACKNOWLEDGMENTS

The authors thank Alexander Nou for his help throughout the article.

  • Jemal H. Abawajy, Mohd Izuan Hafez Ninggal, and Tutut Herawan. 2016. Privacy preserving social network data publication. IEEE Commun. Surv. Tutor. 18, 3 (2016), 1974–1997.
  • Ahmed Abbasi and Hsinchun Chen. 2008. Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Trans. Inf. Syst. 26, 2 (2008), 7.
  • Sadia Afroz, Michael Brennan, and Rachel Greenstadt. 2012. Detecting hoaxes, frauds, and deception in writing style online. In Proceedings of the 2012 IEEE Symposium on Security and Privacy (SP’12) . IEEE, 461–475.
  • Gagan Aggarwal, Tomas Feder, Krishnaram Kenthapadi, Rajeev Motwani, Rina Panigrahy, Dilys Thomas, and An Zhu. 2005. Approximation algorithms for k-anonymity. In Proceedings of the International Conference on Database Theory (ICDT'05).
  • Rakesh Agrawal and Ramakrishnan Srikant. 2000. Privacy-preserving data mining. In ACM SIGMOD Record , Vol. 29.
  • Esma Aimeur, Gilles Brassard, Jose M. Fernandez, Flavien Serge Mani Onana, and Zbigniew Rakowski. 2008. Experimental demonstration of a hybrid privacy-preserving recommender system. In Availability, Reliability and Security .
  • Oluwaseun Ajao, Jun Hong, and Weiru Liu. 2015. A survey of location inference techniques on Twitter. J. Inf. Sci. 41, 6 (2015), 855–864.
  • Muhammad Al-Qurishi, Mabrook Al-Rakhami, Atif Alamri, Majed Alrubaian, Sk Md Mizanur Rahman, and M Shamim Hossain. 2017. Sybil defense techniques in online social networks: A survey. IEEE Access 5 (2017), 1200–1219.
  • Faiyaz Al Zamal, Wendy Liu, and Derek Ruths. 2012. Homophily and latent attribute inference: Inferring latent attributes of twitter users from neighbors. In Sixth International AAAI Conference on Weblogs and Social Media (ICWSM'12) .
  • Mishari Almishari and Gene Tsudik. 2012. Exploring linkability of user reviews. In Proceedings of the European Symposium on Research in Computer Security . Springer, 307–324.
  • Yasmeen Alufaisan, Yan Zhou, Murat Kantarcioglu, and Bhavani Thuraisingham. 2017. Hacking social network data mining. In Proceedings of the 2017 IEEE International Conference on Intelligence and Security Informatics (ISI’17) . IEEE, 54–59.
  • Hamidreza Alvari, Alireza Hajibagheri, Gita Sukthankar, and Kiran Lakkaraju. 2016. Identifying community structures in dynamic networks. Soc. Netw. Anal. Min. 6, 1 (2016), 77.
  • Hamidreza Alvari, Kiran Lakkaraju, Gita Sukthankar, and Jon Whetzel. 2014. Predicting guild membership in massively multiplayer online games. In Proceedings of the International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction . Springer, 215–222.
  • Hamidreza Alvari, Elham Shaabani, and Paulo Shakarian. 2018. Early identification of pathogenic social media accounts. In Proceedings of the IEEE Intelligence and Security Informatics (ISI’18) . IEEE.
  • Hamidreza Alvari, Paulo Shakarian, and J. E. Kelly Snyder. 2017. Semi-supervised learning for detecting human trafficking. Secur. Inf. 6, 1 (2017), 1.
  • Athanasios Andreou, Oana Goga, and Patrick Loiseau. 2017. Identity vs. attribute disclosure risks for users with multiple social profiles. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’17) . ACM, 163–170.
  • Michael Backes, Pascal Berrang, Oana Goga, Krishna P. Gummadi, and Praveen Manoharan. 2016. On profile linkability despite anonymity in social media systems. In Proceedings of the ACM on Workshop on Privacy in the Electronic Society .
  • Michael Backes, Mathias Humbert, Jun Pang, and Yang Zhang. 2017. walk2friends: Inferring social links from mobility profiles. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security .
  • Lars Backstrom, Cynthia Dwork, and Jon Kleinberg. 2007. Wherefore art thou r3579x?: Anonymized social networks, hidden patterns, and structural steganography. In Proceedings of the 16th International Conference on the World Wide Web (WWW’07) .
  • Lars Backstrom, Eric Sun, and Cameron Marlow. 2010. Find me if you can: Improving geographical prediction with social and spatial proximity. In Proceedings of the 19th International Conference on the World Wide Web (WWW’10) .
  • Shahriar Badsha, Xun Yi, Ibrahim Khalil, and Elisa Bertino. 2017. Privacy preserving user-based recommender system. In Proceedings of the 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS’17) . IEEE, 1074–1083.
  • Ghazaleh Beigi. 2018. Social media and user privacy. Arxiv Preprint Arxiv:1806.09786 (2018).
  • Ghazaleh Beigi, Ruocheng Guo, Alexander Nou, Yanchao Zhang, and Huan Liu. 2019. Protecting user privacy: An approach for untraceable web browsing history and unambiguous user profiles. In Proceedings of the 12th ACM International Conference on Web Search and Data Mining . ACM, 213–221.
  • Ghazaleh Beigi, Mahdi Jalili, Hamidreza Alvari, and Gita Sukthankar. 2014. Leveraging community detection for accurate trust prediction. In Proceedings of the ASE International Conference on Social Computing .
  • Ghazaleh Beigi and Huan Liu. 2018. Similar but different: Exploiting users’ congruity for recommendation systems. In Proceedings of the International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction . Springer.
  • Ghazaleh Beigi and Huan Liu. 2019. Identifying novel privacy issues of online users on social media platforms by Ghazaleh Beigi and Huan Liu with Martin Vesely as coordinator. ACM SIGWEB Newslett. Article 4 (Winter, 2019), 7 pages. http://doi.acm.org/10.1145/3293874.3293878
  • Ghazaleh Beigi, Suhas Ranganath, and Huan Liu. 2019. Signed link prediction with sparse data: The role of personality information. In Companion Proceedings of the Web Conference 2019 . International World Wide Web Conferences Steering Committee.
  • Ghazaleh Beigi, Kai Shu, Yanchao Zhang, and Huan Liu. 2018. Securing social media user data: An adversarial approach. In Proceedings of the 29th Conference on Hypertext and Social Media . ACM, 165–173.
  • Ghazaleh Beigi, Jiliang Tang, and Huan Liu. 2016. Signed link analysis in social media networks. In Proceedings of the 10th International Conference on Web and Social Media (ICWSM’16) . AAAI Press.
  • Ghazaleh Beigi, Jiliang Tang, Suhang Wang, and Huan Liu. 2016. Exploiting emotional information for trust/distrust prediction. In Proceedings of the 2016 SIAM International Conference on Data Mining . SIAM, 81–89.
  • Smriti Bhagat, Graham Cormode, Balachander Krishnamurthy, and Divesh Srivastava. 2009. Class-based graph anonymization for social network data. Proc. VLDB Endow. 2, 1 (2009), 766–777.
  • Smriti Bhagat, Udi Weinsberg, Stratis Ioannidis, and Nina Taft. 2014. Recommending with an agenda: Active learning of private attributes using matrix factorization. In Proceedings of the Recommender Systems Conference (RecSys’14) . ACM.
  • Asia J. Biega, Rishiraj Saha Roy, and Gerhard Weikum. 2017. Privacy through solidarity: A user-utility-preserving framework to counter profiling. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval . ACM, 665–674.
  • David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. In Proceedings of Machine Learning Research (JMLR’03) .
  • Joseph Bonneau, Jonathan Anderson, and George Danezis. 2009. Prying data out of a social network. In Proceedings of the International Conference on Advances in Social Network Analysis and Mining 2009 (ASONAM’09) . IEEE, 249–254.
  • Jasmine Bowers, Henry Williams, Gerry Dozier, and R Williams. 2015. Mitigation deanonymization attacks via language translation for anonymous social networks. Proceedings of the International Conference on Machine Learning (ICML’15) (2015).
  • Joseph K. Bradley, Patrick Gage Kelley, and Aaron Roth. [n.d.]. Author identification from citations. ([n. d.]).
  • Karl Bringmann, Tobias Friedrich, and Anton Krohmer. 2014. De-anonymization of heterogeneous random graphs in quasilinear time. In Proceedings of the European Symposium on Algorithms . Springer, 197–208.
  • Joseph A. Calandrino, Ann Kilzer, Arvind Narayanan, Edward W. Felten, and Vitaly Shmatikov. 2011. ” You might also like:” Privacy risks of collaborative filtering. In Proceedings of the Security and Privacy (SP) . IEEE.
  • John Canny. 2002. Collaborative filtering with privacy via factor analysis. In Proceedings of the SIGIR Conference on Research and Development in Information Retrieval . ACM, 238–245.
  • Abdelberi Chaabane, Gergely Acs, Mohamed Ali Kaafar, et al. 2012. You are what you like! information leakage through users’ interests. In Proceedings of the 19th Annual Network & Distributed System Security Symposium (NDSS) .
  • Carole E Chaski. 2005. Who is at the keyboard? Authorship attribution in digital evidence investigations. International Journal of Digital Evidence 4, 1 (2005), 1–13.
  • James Cheng, Ada Wai-chee Fu, and Jia Liu. 2010. K-isomorphism: Privacy preserving network publication against structural attacks. In Proceedings of the ACM SIGMOD International Conference on Management of Data .
  • Zhiyuan Cheng, James Caverlee, and Kyumin Lee. 2010. You are where you tweet: A content-based approach to geo-locating twitter users. In Proceedings of the Conference on Information and Knowledge Management (CIKM’10) . ACM, 759–768.
  • Carla-Fabiana Chiasserini, Michele Garetto, and Emilio Leonardi. 2016. Social network de-anonymization under scale-free user relations. IEEE/ACM Trans. Netw. 24, 6 (2016), 3756–3769.
  • Carla-Fabiana Chiasserini, Michel Garetto, and Emili Leonardi. 2018. De-anonymizing clustered social networks by percolation graph matching. ACM Trans. Knowl. Discov. Data 12, 2 (2018), 21.
  • Ryan Compton, David Jurgens, and David Allen. 2014. Geotagging one hundred million twitter accounts with total variation minimization. In Proceedings of the 2014 IEEE International Conference on Big Data (Big Data’14) . IEEE, 393–401.
  • Thomas M. Cover and Joy A. Thomas. 2012. Elements of Information Theory . John Wiley & Sons.
  • Ratan Dey, Cong Tang, Keith Ross, and Nitesh Saxena. 2012. Estimating age privacy leakage in online social networks. In Proceedings of the 2012 Proceedings IEEE International Conference on Computer Communications (INFOCOM’12) . IEEE, 2836–2840.
  • Xenofontas Dimitropoulos, Dmitri Krioukov, Amin Vahdat, and George Riley. 2009. Graph annotations in modeling complex network topologies. ACM Trans. Model. Comput. Simul. 19, 4 (2009), 17.
  • George T. Duncan and Diane Lambert. 1986. Disclosure-limited data dissemination. J. Am. Stat. Assoc. 81, 393 (1986), 10–18.
  • Cynthia Dwork. 2008. Differential privacy: A survey of results. In Proceedings of the International Conference on Theory and Applications of Models of Computation . Springer, 1–19.
  • Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Theory of Cryptography Conference . Springer, 265–284.
  • Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh Agrawal, and Johannes Gehrke. 2004. Privacy preserving mining of association rules. Inf. Syst. 29, 4 (2004), 343–364.
  • Carla Fabiana, Michele Garetto, and Emilio Leonardi. 2015. De-anonymizing scale-free social networks by percolation graph matching. In Proceedings of the 2015 IEEE International Conference on Computer Communications (INFOCOM’15) . IEEE, 1571–1579.
  • Hao Fu, Aston Zhang, and Xing Xie. 2014. De-anonymizing social graphs via node similarity. In Proceedings of the Annual Conference on the World Wide Web (WWW’14) .
  • Hao Fu, Aston Zhang, and Xing Xie. 2015. Effective social graph deanonymization based on graph structure and descriptive information. ACM Trans. Intell. Syst. Technol. 6, 4 (2015), 49.
  • Xinzhe Fu, Zhongzhao Hu, Zhiying Xu, Luoyi Fu, and Xinbing Wang. 2017. De-anonymization of networks with communities: When quantifications meet algorithms. In Proceedings of the IEEE Global Communications Conference .
  • Benjamin C. M. Fung, K. Wang, R. Chen, and S. Yu Philip. 2010. Privacy-preserving data publishing: A survey on recent developments. ACM Comput. Surv. 42, 4 (2010), 1–53.
  • Sébastien Gambs, Marc-Olivier Killijian, and Miguel Núñez del Prado Cortez. 2010. Show me how you move and i will tell you who you are. In Proceedings of the SIGSPATIAL International Workshop on Security and Privacy in GIS and LBS .
  • Daniel Gayo Avello. 2011. All liaisons are dangerous when all your friends are known to us. In Proceedings of the 22nd ACM conference on Hypertext and hypermedia . ACM, 171–180.
  • Oana Goga, Howard Lei, Sree Hari Krishnan Parthasarathi, Gerald Friedland, Robin Sommer, and Renata Teixeira. 2013. Exploiting innocuous activity for correlating users across sites. In Proceedings of the Annual Conference on the World Wide Web (WWW’13) .
  • Neil Zhenqiang Gong and Bin Liu. 2016. You are who you know and how you behave: Attribute inference attacks via users’ social friends and behaviors. In Proceedings of the USENIX Security Symposium . 979–995.
  • Neil Zhenqiang Gong and Bin Liu. 2018. Attribute inference attacks in online social networks. ACM Trans. Priv. Secur. 21, 1 (2018), 3.
  • Neil Zhenqiang Gong, Ameet Talwalkar, Lester Mackey, Ling Huang, Eui Chul Richard Shin, Emil Stefanov, Elaine Runting Shi, and Dawn Song. 2014. Joint link prediction and attribute inference using a social-attribute network. ACM Trans. Intell. Syst. Technol. 5, 2 (2014), 27.
  • Rachid Guerraoui, Anne-Marie Kermarrec, Rhicheek Patra, and Mahsa Taziki. 2015. D 2 p: Distance-based differential privacy in recommenders. Proc. VLDB Endow. 8, 8 (2015), 862–873.
  • Rachid Guerraoui, Anne-Marie Kermarrec, and Mahsa Taziki. 2017. The utility and privacy effects of a click. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval . ACM.
  • Payas Gupta, Swapna Gottipati, Jing Jiang, and Debin Gao. 2013. Your love is public now: Questioning the use of personal information in authentication. In Proceedings of the ACM Special Interest Group on Security, Audit and Control Conference (SIGSAC’13) . ACM.
  • Alireza Hajibagheri, Gita Sukthankar, Kiran Lakkaraju, Hamidreza Alvari, Rolf T. Wigand, and Nitin Agarwal. 2018. Using massively multiplayer online game data to analyze the dynamics of social interactions. Social Interactions in Virtual Worlds: An Interdisciplinary Perspective (2018).
  • Michael Hay, Gerome Miklau, David Jensen, Don Towsley, and Philipp Weis. 2008. Resisting structural re-identification in anonymized social networks. Proc. VLDB Endow. 1, 1 (2008), 102–114.
  • Jianming He, Wesley W. Chu, and Zhenyu Victor Liu. 2006. Inferring privacy information from social networks. In Proceedings of the International Conference on Intelligence and Security Informatics . Springer, 154–165.
  • Brent Hecht, Lichan Hong, Bongwon Suh, and Ed H. Chi. 2011. Tweets from justin bieber's heart: The dynamics of the location field in user profiles. In Proceedings of the Conference of the Special Interest Group on Computer-Human Interaction (SIGCHI’11) . ACM, 237–246.
  • Shawndra Hill and Foster Provost. 2003. The myth of the double-blind review?: Author identification using only citations. ACM SIGKDD Explor. Newslett. 5, 2 (2003), 179–184.
  • T. Ryan Hoens, Marina Blanton, and Nitesh V. Chawla. 2010. A private and reliable recommendation system for social networks. In Proceedings of the 2010 IEEE Second International Conference on Social Computing (SocialCom’10) . IEEE, 816–825.
  • D. C. Howe and H. Nissenbaum. 2009. TrackMeNot: Resisting surveillance in web search. In Lessons from the Identity Trail: Privacy, Anonymity and Identity in a Networked Society . (Oxford University Press, New York, 2009), 417–436.
  • Jingyu Hua, Chang Xia, and Sheng Zhong. 2015. Differentially private matrix factorization. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’15) .
  • Mathias Humbert, Théophile Studer, Matthias Grossglauser, and Jean-Pierre Hubaux. 2013. Nowhere to hide: Navigating around privacy in online social networks. In Proceedings of the European Symposium on Research in Computer Security .
  • Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th Annual ACM Symposium on Theory of Computing . ACM, 604–613.
  • P. James. 1992. Knowledge graphs. Linguistic Instruments in Knowledge Engineering (1992), 97--117.
  • Shouling Ji, Weiqing Li, Neil Zhenqiang Gong, Prateek Mittal, and Raheem A. Beyah. 2015. On your social network de-anonymizablity: Quantification and large scale evaluation with seed knowledge. In Proceedings of the Network and Distributed System Security Symposium (NDSS’15) .
  • Shouling Ji, Weiqing Li, Neil Zhenqiang Gong, Prateek Mittal, and Raheem A. Beyah. 2016. Seed based deanonymizability quantification of social networks. IEEE Trans. Inf. Forens. Secur. 11, 7, 1398–1411.
  • Shouling Ji, Weiqing Li, Prateek Mittal, and Raheem Beyah. 2015. SecGraph: A uniform and open-source evaluation system for graph data anonymization and de-anonymization. In Proceedings of the USENIX Security Symposium . 303–318.
  • Shouling Ji, Weiqing Li, Mudhakar Srivatsa, and Raheem Beyah. 2014. Structural data de-anonymization: Quantification, practice, and implications. In Proceedings of the 2014 ACM Special Interest Group on Security, Audit and Control Conference (SIGSAC’14) . ACM, 1040–1053.
  • Shouling Ji, Weiqing Li, Mudhakar Srivatsa, and Raheem Beyah. 2016. Structural data de-anonymization: Theory and practice. IEEE/ACM Trans. Netw. 24, 6 (2016), 3523–3536.
  • Shouling Ji, Weiqing Li, Mudhakar Srivatsa, Jing Selena He, and Raheem Beyah. 2016. General graph data de-anonymization: From mobility traces to social networks. ACM Trans. Intell. Syst. Technol. 18, 4 (2016).
  • Shouling Ji, Prateek Mittal, and Raheem Beyah. 2016. Graph data anonymization, de-anonymization attacks, and de-anonymizability quantification: A survey. IEEE Commun. Surv. Tutor. 19, 2 (2016), 1305–1326.
  • Shouling Ji, Ting Wang, Jianhai Chen, Weiqing Li, Prateek Mittal, and Raheem Beyah. 2017. De-SAG: On the de-anonymization of structure-attribute graph data. IEEE Trans. Depend. Sec. Comput. 16, 4 (2017), 594--607.
  • Jinyuan Jia, Binghui Wang, Le Zhang, and Neil Zhenqiang Gong. 2017. AttriInfer: Inferring user attributes in online social networks using markov random fields. In Proceedings of the Annual Conference on the world wide web (WWW’17) . 1561–1569.
  • Zach Jorgensen and Ting Yu. 2014. A privacy-preserving framework for personalized, social recommendations. In Proceedings of the Extended Database Technology Conference (EDBT’14) . 582.
  • David Jurgens. 2013. That's what friends are for: Inferring location in online social media platforms based on social relationships. In Seventh International AAAI Conference on Weblogs and Social Media .
  • David Jurgens, Tyler Finethy, James McCorriston, Yi Tian Xu, and Derek Ruths. 2015. Geolocation prediction in twitter using social networks: A critical analysis and review of current practice. In Ninth International AAAI Conference on Web and Social Media .
  • Daniel Kifer and Ashwin Machanavajjhala. 2011. No free lunch in data privacy. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data . ACM, 193–204.
  • Dan Klein and Christopher D. Manning. 2003. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics .
  • Longbo Kong, Zhi Liu, and Yan Huang. 2014. Spot: Locating social media users based on social network context. Proc. VLDB Endow. 7, 13 (2014), 1681–1684.
  • Moshe Koppel, Jonathan Schler, and Shlomo Argamon. 2009. Computational methods in authorship attribution. J. Assoc. Inf. Sci. Technol. 60, 1 (2009), 9–26.
  • Moshe Koppel, Jonathan Schler, and Shlomo Argamon. 2011. Authorship attribution in the wild. Lang. Resourc. Eval. 45, 1 (2011), 83–94.
  • Moshe Koppel, Jonathan Schler, Shlomo Argamon, and Eran Messeri. 2006. Authorship attribution with thousands of candidate authors. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval . ACM, 659–660.
  • Aleksandra Korolova, Rajeev Motwani, Shubha U Nabar, and Ying Xu. 2008. Link privacy in social networks. In Proceedings of the 17th ACM Conference on Information and Knowledge Management . ACM, 289–298.
  • Nitish Korula and Silvio Lattanzi. 2014. An efficient reconciliation algorithm for social networks. Proc. VLDB Endow. 7, 5 (2014), 377–388.
  • Michal Kosinski, David Stillwell, and Thore Graepel. 2013. Private traits and attributes are predictable from digital records of human behavior. Proc. Natl. Acad. Sci. U.S.A. 110, 15 (2013), 5802–5805.
  • Harold W. Kuhn. 2010. The hungarian method for the assignment problem. In 50 Years of Integer Programming 1958-2008 . Springer, 29–47.
  • Sebastian Labitzke, Florian Werling, Jens Mittag, and Hannes Hartenstein. 2013. Do online social network friends still threaten my privacy? In Proceedings of the ACM Conference on Data and Application Security and Privacy .
  • Diane Lambert. 1993. Measures of disclosure risk and harm. J. Off. Stat. 9, 2 (1993), 313.
  • Wei-Han Lee, Changchang Liu, Shouling Ji, Prateek Mittal, and Ruby B. Lee. 2017. How to quantify graph de-anonymization risks. In International Conference on Information Systems Security and Privacy . Springer, 84--104.
  • Wei-Han Lee, Changchang Liu, Shouling Ji, Prateek Mittal, and Ruby B. Lee. 2017. Blind de-anonymization attacks using social networks. In Proceedings of the 2017 on Workshop on Privacy in the Electronic Society . ACM, 1–4.
  • Kevin Lewis, Jason Kaufman, Marco Gonzalez, Andreas Wimmer, and Nicholas Christakis. 2008. Tastes, ties, and time: A new social network dataset using facebook. com. Soc. Netw. 30, 4 (2008), 330–342.
  • Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. 2007. t-closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the IEEE 23rd International Conference on Data Engineering 2007 (ICDE’07) . IEEE, 106–115.
  • Rui Li, Shengjie Wang, and Kevin Chen-Chuan Chang. 2012. Multiple location profiling for users and relationships from social network and content. Proc. VLDB Endow. 5, 11 (2012), 1603–1614.
  • Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, and Kevin Chen-Chuan Chang. 2012. Towards social user profiling: Unified and discriminative influence model for inferring home locations. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD’12) .
  • Xiaoxue Li, Yanan Cao, Yanmin Shang, Yanbing Liu, Jianlong Tan, and Li Guo. 2017. Inferring user profiles in online social networks based on convolutional neural network. In Proceedings of the International Conference on Knowledge Science, Engineering and Management . Springer.
  • Jack Lindamood, Raymond Heatherly, Murat Kantarcioglu, and Bhavani Thuraisingham. 2009. Inferring private information using social network data. In Proceedings of the Annual Conference of the World Wide Web (WWW’09) . ACM, 1145–1146.
  • Bo Liu, Wanlei Zhou, Tianqing Zhu, Longxiang Gao, and Yong Xiang. 2018. Location privacy and its applications: A systematic study. IEEE Access 6 (2018), 17606–17624.
  • Changchang Liu, Supriyo Chakraborty, and Prateek Mittal. 2016. Dependence makes you vulnberable: Differential privacy under dependent tuples. In Proceedings of the Network and Distributed System Security Symposium (NDSS’16) , Vol. 16. 21–24.
  • Changchang Liu and Prateek Mittal. 2016. LinkMirage: Enabling privacy-preserving analytics on social relationships. In Proceedings of the Network and Distributed System Security Symposium (NDSS’16) .
  • Kun Liu and Evimaria Terzi. 2008. Towards identity anonymization on graphs. In Proceedings of the ACM Special Interest Group on Management of Data Conference (SIGMOD’08) .
  • Yushan Liu, Shouling Ji, and Prateek Mittal. 2016. SmartWalk: Enhancing social network security via adaptive random walks. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security . ACM, 492–503.
  • Dixin Luo, Hongteng Xu, Hongyuan Zha, Jun Du, Rong Xie, Xiaokang Yang, and Wenjun Zhang. 2014. You are what you watch and when you watch: Inferring household structures from iptv viewing data. IEEE Trans. Broadcast. 60, 1 (2014), 61–72.
  • Zhifeng Luo and Zhanli Chen. 2014. A privacy preserving group recommender based on cooperative perturbation. In Proceedings of the International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery . IEEE.
  • Ashwin Machanavajjhala, Johannes Gehrke, Daniel Kifer, and Muthuramakrishnan Venkitasubramaniam. 2006. l-diversity: Privacy beyond k-anonymity. In Proceedings of the IEEE International Conference on Data Engineering (ICDE’06) . IEEE, 24–24.
  • Ashwin Machanavajjhala, Aleksandra Korolova, and Atish Das Sarma. 2011. Personalized social recommendations: Accurate or private. Proc. VLDB Endow. 4, 7 (2011), 440–450.
  • Nathan Mack, Jasmine Bowers, Henry Williams, Gerry Dozier, and Joseph Shelton. 2015. The best way to a strong defense is a strong offense: Mitigating deanonymization attacks via iterative language translation. Int. J. Mach. Learn. Comput. 5, 5 (2015), 409.
  • Priya Mahadevan, Dmitri Krioukov, Kevin Fall, and Amin Vahdat. 2006. Systematic topology analysis and generation using degree correlations. In Proceedings of the ACM SIGCOMM Computer Communication Review , Vol. 36. ACM, 135–146.
  • Jalal Mahmud, Jeffrey Nichols, and Clemens Drews. 2014. Home location identification of twitter users. ACM Trans. Intell. Syst. Technol. 5, 3 (2014), 47.
  • Huina Mao, Xin Shuai, and Apu Kapadia. 2011. Loose tweets: An analysis of privacy leaks on twitter. In Proceedings of the 10th Annual ACM Workshop on Privacy in the Electronic Society . ACM, 1–12.
  • Jeffrey McGee, James Caverlee, and Zhiyuan Cheng. 2013. Location prediction in social media based on tie strength. In Proceedings of the Conference on Information and Knowledge Management (CIKM’13) . ACM.
  • Jeffrey McGee, James A. Caverlee, and Zhiyuan Cheng. 2011. A geographic study of tie strength in social media. In Proceedings of the Conference on Information and Knowledge Management (CIKM’11) . ACM, 2333–2336.
  • Miller McPherson, Lynn Smith-Lovin, and James M. Cook. 2001. Birds of a feather: Homophily in social networks. Annu. Rev. Sociol. 27, 1 (2001), 415–444.
  • Frank McSherry and Ilya Mironov. 2009. Differentially private recommender systems: Building privacy into the netflix prize contenders. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD’09) . ACM.
  • Frank McSherry and Kunal Talwar. 2007. Mechanism design via differential privacy. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science 2007 (FOCS’07) . IEEE, 94–103.
  • Xuying Meng, Suhang Wang, Kai Shu, Jundong Li, Bo Chen, Huan Liu, and Yujun Zhang. 2018. Personalized privacy-preserving social recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’18) .
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems . 3111–3119.
  • Tehila Minkus, Yuan Ding, Ratan Dey, and Keith W. Ross. 2015. The city privacy attack: Combining social media and public records for detailed profiles of adults and children. In Proceedings of the ACM Conference on Online Social Networks .
  • Alan Mislove, Bimal Viswanath, Krishna P. Gummadi, and Peter Druschel. 2010. You are who you know: Inferring user profiles in online social networks. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM’10) . ACM, 251–260.
  • Prateek Mittal, Charalampos Papamanthou, and Dawn Song. 2013. Preserving link privacy in social network based systems. NDSS .
  • Frederick Mosteller and David Wallace. 1964. Inference and Disputed Authorship: The Federalist . Addison-Wesley, Reading, Mass.
  • Mihir Nanavati, Nathan Taylor, William Aiello, and Andrew Warfield. 2011. Herbert west-deanonymizer. In Proceedings of the 6th USENIX Conference on Hot Topics in Security (HotSec'11) . USENIX Association, San Francisco, CA, 6--6.
  • Arvind Narayanan, Hristo Paskov, Neil Zhenqiang Gong, John Bethencourt, Emil Stefanov, Eui Chul Richard Shin, and Dawn Song. 2012. On the feasibility of internet-scale author identification. In Proceedings of the Conference on Security and Privacy (SP’12) . IEEE.
  • Arvind Narayanan, Elaine Shi, and Benjamin IP Rubinstein. 2011. Link prediction by de-anonymization: How we won the kaggle social network challenge. In Proceedings of the International Joint Conference on Neural Networks . IEEE.
  • Arvind Narayanan and Vitaly Shmatikov. 2008. Robust de-anonymization of large sparse datasets. In Proceedings of the Conference on Security and Privacy . IEEE.
  • Arvind Narayanan and Vitaly Shmatikov. 2009. De-anonymizing social networks. In Proceedings of the Conference on Security and Privacy . IEEE.
  • M. E. J. Newman. 2003. The structure and function of complex networks. In SIAM Review , Vol. 45. 167–256.
  • Shirin Nilizadeh, Apu Kapadia, and Yong-Yeol Ahn. 2014. Community-enhanced de-anonymization of online social networks. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security . ACM, 537–548.
  • Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2007. Smooth sensitivity and sampling in private data analysis. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing . ACM, 75–84.
  • Jahna Otterbacher. 2010. Inferring gender of movie reviewers: Exploiting writing style, content and metadata. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management . ACM, 369–378.
  • Rupa Parameswaran and D. Blough. 2005. A robust data obfuscation approach for privacy preservation of clustered data. In Proceedings of the Workshop on Privacy and Security Aspects of Data Mining . 18–25.
  • Rupa Parameswaran and Douglas M. Blough. 2007. Privacy preserving collaborative filtering using data obfuscation. In Proceedings of the IEEE International Conference on Granular Computing .
  • Javier Parra-Arnau. 2017. Pay-per-tracking: A collaborative masking model for web browsing. Information Sciences 385--386 (2017), 96--124.
  • Javier Parra-Arnau, David Rebollo-Monedero, and Jordi Forné. 2014. Optimal forgery and suppression of ratings for privacy enhancement in recommendation systems. Entropy 16, 3 (2014), 1586–1631.
  • Pedram Pedarsani and Matthias Grossglauser. 2011. On the privacy of anonymized networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 1235–1243.
  • Wei Peng, Feng Li, Xukai Zou, and Jie Wu. 2014. A two-stage deanonymization attack against anonymized social networks. IEEE Trans. Comput. 63, 2 (2014), 290–303.
  • Huseyin Polat and Wenliang Du. 2003. Privacy-preserving collaborative filtering using randomized perturbation techniques. In Proceedings of the 3rd IEEE International Conference on Data Mining 2003 (ICDM’03) . IEEE, 625–628.
  • Davide Proserpio, Sharon Goldberg, and Frank McSherry. 2014. Calibrating data to sensitivity in private data analysis: A platform for differentially-private analysis of weighted datasets. Proc. VLDB Endow. 7, 8 (2014).
  • Silvia Puglisi, Javier Parra-Arnau, Jordi Forné, and David Rebollo-Monedero. 2015. On content-based recommendation and user privacy in social-tagging systems. Comput. Stand. Interfaces 41 (2015), 17–27.
  • Jianwei Qian, Xiang-Yang Li, Chunhong Zhang, and Linlin Chen. 2016. De-anonymizing social networks and inferring private attributes using knowledge graphs. In Proceedings of the IEEE International Conference on Computer Communications (INFOCOM’16) .
  • Naren Ramakrishnan, Benjamin J. Keller, Batul J. Mirza, Ananth Y. Grama, and George Karypis. 2001. Privacy risks in recommender systems. IEEE Internet Comput. 5, 6 (2001), 54.
  • Josyula R. Rao, Pankaj Rohatgi, et al. 2000. Can pseudonymity really guarantee privacy? In Proceedings of the USENIX Conference on Security .
  • David Rebollo-Monedero and Jordi Forné. 2010. Optimized query forgery for private information retrieval. IEEE Trans. Inf. Theory 56, 9 (2010), 4631–4642.
  • David Rebollo-Monedero, Javier Parra-Arnau, and Jordi Forné. 2011. An information-theoretic privacy criterion for query forgery in information retrieval. In Proceedings of the International Conference on Security Technology . Springer, 146–154.
  • Shariq J Rizvi and Jayant R Haritsa. 2002. Maintaining data privacy in association rule mining. In Proceedings of the 28th International Conference on Very Large Databases (VLDB’02) . Elsevier, 682–693.
  • Dominic Rout, Kalina Bontcheva, Daniel Preoţiuc-Pietro, and Trevor Cohn. 2013. Where's@ wally?: A classification approach to geolocating users based on their social ties. In Proceedings of the Annual Conference on Hypertext and Social Media . ACM.
  • KyoungMin Ryoo and Sue Moon. 2014. Inferring twitter user locations with 10 km accuracy. In Proceedings of the Annual Conference on the World Wide Web (WWW’14) .
  • Alessandra Sala, Xiaohan Zhao, Christo Wilson, Haitao Zheng, and Ben Y Zhao. 2011. Sharing graphs using differentially private graph models. In Proceedings of the ACM SIGCOMM on Internet Measurement Conference .
  • Kumar Sharad. 2016. Change of guard: The next generation of social graph de-anonymization attacks. In Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security . ACM, 105–116.
  • Kumar Sharad and George Danezis. 2014. An automated social graph de-anonymization technique. In Proceedings of the 13th Workshop on Privacy in the Electronic Society . ACM, 47–58.
  • Sanur Sharma, Preeti Gupta, and Vishal Bhatnagar. 2012. Anonymisation in social network: A literature survey and classification. Int. J. Soc. Netw. Min. 1, 1 (2012), 51–66.
  • Yilin Shen and Hongxia Jin. 2014. Privacy-preserving personalized recommendation: An instance-based approach via differential privacy. In Proceedings of the 2014 IEEE International Conference on Data Mining (ICDM’14) . IEEE, 540–549.
  • Reza Shokri, George Theodorakopoulos, Jean-Yves Le Boudec, and Jean-Pierre Hubaux. 2011. Quantifying location privacy. In Proceedings of the 2011 IEEE Symposium on Security and Privacy (SP’11) . IEEE, 247–262.
  • Mudhakar Srivatsa and Mike Hicks. 2012. Deanonymizing mobility traces: Using social network as a side-channel. In Proceedings of the 2012 ACM Conference on Computer and Communications Security . ACM, 628–637.
  • Efstathios Stamatatos. 2009. A survey of modern authorship attribution methods. J. Assoc. Inf. Sci. Technol. 60, 3 (2009), 538–556.
  • Zak Stone, Todd Zickler, and Trevor Darrell. 2008. Autotagging facebook: Social network context improves photo annotation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops .
  • Latanya Sweeney. 2002. k-anonymity: A model for protecting privacy. Int. J. Uncert. Fuzz. Knowl.-Based Syst. 10, 05 (2002), 557–570.
  • Qiang Tang and Jun Wang. 2018. Privacy-preserving friendship-based recommender systems. IEEE Trans. Depend. Sec. Comput. 15, 5 (2018), 784--796.
  • Kurt Thomas, Chris Grier, and David M Nicol. 2010. unfriendly: Multi-party privacy risks in social networks. In Proceedings of the International Symposium on Privacy Enhancing Technologies Symposium . Springer, 236–252.
  • Brian Thompson and Danfeng Yao. 2009. The union-split algorithm and cluster-based anonymization of social networks. In Proceedings of the Symposium on Information, Computer, and Communications Security .
  • Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. 2006. Fast random walk with restart and its applications. In Proceedings of the Sixth International Conference on Data Mining (ICDM'06) . IEEE Computer Society, 613--622.
  • Vassilios S. Verykios, Elisa Bertino, Igor Nai Fovino, Loredana Parasiliti Provenza, Yucel Saygin, and Yannis Theodoridis. 2004. State-of-the-art in privacy preserving data mining. ACM SIGMOD Rec. 33, 1 (2004), 50–57.
  • Huandong Wang, Yong Li, Gang Wang, and Depeng Jin. 2018. You are how you move: Linking multiple user identities from massive mobility traces. In Proceedings of the SIAM International Conference on Data Mining (SDM’18) . Society for Industrial and Applied Mathematics.
  • Pengfei Wang, Jiafeng Guo, Yanyan Lan, Jun Xu, and Xueqi Cheng. 2016. Your cart tells you: Inferring demographic attributes from purchase data. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM’16) . ACM.
  • Yue Wang and Xintao Wu. 2013. Preserving differential privacy in degree-correlation based graph generation. Trans. Data Priv. 6, 2 (2013), 127.
  • Udi Weinsberg, Smriti Bhagat, Stratis Ioannidis, and Nina Taft. 2012. BlurMe: Inferring and obfuscating user gender based on ratings. In Proceedings of the 6th ACM Conference on Recommender Systems . ACM, 195–202.
  • Xinyu Wu, Zhongzhao Hu, Xinzhe Fu, Luoyi Fu, Xinbing Wang, and Songwu Lu. 2018. Social network de-anonymization with overlapping communities: Analysis, algorithm and experiments. In Proceeding of the International Conference on Computer Communications (INFOCOM’18) .
  • Qian Xiao, Rui Chen, and Kian-Lee Tan. 2014. Differentially private network data release via structural inference. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 911–920.
  • Yu Xin and Tommi Jaakkola. 2014. Controlling privacy in recommender systems. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’14) .
  • Jaewon Yang and Jure Leskovec. 2013. Overlapping community detection at scale: A nonnegative matrix factorization approach. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining . ACM, 587–596.
  • Lyudmila Yartseva and Matthias Grossglauser. 2013. On the performance of percolation graph matching. In Proceedings of the 1st ACM Conference on Online Social Networks . ACM, 119–130.
  • Zhijun Yin, Manish Gupta, Tim Weninger, and Jiawei Han. 2010. Linkrec: A unified framework for link recommendation with user attributes and graph structure. In Proceedings of the Annual Conference of the World Wide Web (WWW’10) . ACM, 1211–1212.
  • Zhijun Yin, Manish Gupta, Tim Weninger, and Jiawei Han. 2010. A unified framework for link recommendation using random walks. In Proceedings of the International Conference on Advances in Social Networks Analysis and Mining (ASONAM’10) . IEEE, 152–159.
  • Xiaowei Ying and Xintao Wu. 2009. Graph generation with prescribed feature constraints. In Proceedings of the SIAM International Conference on Data Mining (SDM’09) .
  • Mingxuan Yuan, Lei Chen, and Philip S. Yu. 2010. Personalized privacy protection in social networks. Proc. VLDB Endow. 4, 2 (2010), 141–150.
  • Aston Zhang, Xing Xie, Carl A. Gunter, Jiawei Han, and XiaoFeng Wang. 2014. Privacy risk in anonymized heterogeneous information networks. In Proceedings of the Extended Database Technology Conference (EDBT’14) .
  • Jinxue Zhang, Jingchao Sun, Rui Zhang, and Yanchao Zhang. 2018. Privacy-preserving social media data outsourcing. In Proceedings of the IEEE International Conference on Computer Communications (INFOCOM’18) .
  • Elena Zheleva and Lise Getoor. 2009. To join or not to join: The illusion of privacy in social networks with mixed public and private user profiles. In Proceedings of the 18th International Conference on World Wide Web . ACM, 531–540.
  • Elena Zheleva, Evimaria Terzi, and Lise Getoor. 2012. Privacy in social networks. Synth. Lect. Data Min. Knowl. Discov. 3, 1 (2012), 1–85.
  • X. Zheng, J. Han, and A. Sun. 2018. A survey of location prediction on twitter. IEEE Trans. Knowl. Data Eng. 30, 9 (2018), 1652--1671.
  • Yuan Zhong, Nicholas Jing Yuan, Wen Zhong, Fuzheng Zhang, and Xing Xie. 2015. You are where you go: Inferring demographic attributes from location check-ins. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM’15) . ACM, 295–304.
  • Bin Zhou and Jian Pei. 2008. Preserving privacy in social networks against neighborhood attacks. In Proceedings of the IEEE International Conference on Data Engineering (ICDE’08) .
  • Tianqing Zhu, Gang Li, Yongli Ren, Wanlei Zhou, and Ping Xiong. 2013. Differential privacy for neighborhood-based collaborative filtering. In Proceedings of the International Conference on Advances in Social Networks Analysis and Mining (ASONAM’13) . ACM, 752–759.
  • Xue Zhu and Yuqing Sun. 2016. Differential privacy for collaborative filtering recommender algorithm. In Proceedings of the 2016 ACM on International Workshop on Security And Privacy Analytics . ACM, 9–16.
  • Lei Zou, Lei Chen, and M. Tamer Özsu. 2009. K-automorphism: A general framework for privacy preserving network publication. Proc. VLDB Endow. 2, 1 (2009), 946–957.
  • 1 https://bit.ly/1AwePQE .
  • 2 http://www.microsoft.com/protect/yourself/phishing/spear.mspx .
  • 3 http://www.dmoz.com .

This material is based upon the work supported in part by Army Research Office (ARO) under grant number W911NF-15-1-0328 and Office of Naval Research (ONR) under grant number N00014-17-1-2605.

Authors’ address: G. Beigi and H. Liu, School of Computing, Informatics, and Decision Systems Engineering, Ira A. Fulton Schools of Engineering, Arizona State Univesity, Tempe, AZ, USA; emails: [email protected] , [email protected] .

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected] .

©2020 Association for Computing Machinery. 2577-3224/2020/01-ART7 $15.00 DOI: https://doi.org/10.1145/3343038

Publication History: Received July 2018; revised January 2019; accepted April 2019

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

information-logo

Article Menu

social media privacy research paper

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Impact of social media behavior on privacy information security based on analytic hierarchy process.

social media privacy research paper

1. Introduction

2. research framework, 2.1. literature review, 2.1.1. types of user’s behavior, 2.1.2. types of privacy, 2.1.3. facebook case study, 2.2. theoretical underpinnings, 2.3. research model, 2.4. hypotheses building.

  • H2a. Privacy Concern: H1a. Against excessive information collection; H1b. Pay attention to app request permission; H1c. Care about information security.
  • H2b. Privacy Protection: H1d. APP Privacy settings; H1e. Clean up the traces; H1f. Change Password.
  • H2c. Active disclosure: H1g. Input real information; H1h. Share your personal life; H1i. Real-time social interaction. H1j. Express personal feelings and value.
  • H2d. Passive participation: H1k. News and information disclosure; H1l. Traces of associated third party websites.
  • H3a. Defensive privacy—It mainly describes the following aspects of privacy: virtual territory/accessibility.
  • H3b. Identity authentication privacy—It mainly describes the following aspects of privacy: factual/personal/Bodily/Biological.
  • H3c. Interactional privacy—It mainly describes the following aspects of privacy: Communication/comment/share.
  • H3d. Psychological privacy—It mainly describes the following aspects of privacy: emotion/decision/value/knowledge.
  • H3e. Integration informational privacy—It mainly describes the following aspects of privacy: Proprietary/preference/Commercial history and traces.

3.1. Descriptive Statistical Analysis

3.2. analytic hierarchy process, 3.2.1. determine the measurement table, 3.2.2. construct judgment matrix, 3.2.3. consistency test, 3.2.4. model analysis result, 3.3. user behavior empirical analysis, 3.3.1. user behavior evaluation model, 3.3.2. survey result analysis, 4. discussion, 5. conclusions, 6. limitations and improvements, author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest.

  • Kenton, W.; Mansa, J. Understanding Social Networking. Available online: https://www.investopedia.com/terms/s/social-networking.asp (accessed on 10 November 2020).
  • Ryan, T.; Xenos, S. Who uses Facebook? An investigation into the relationship between the Big Five, shyness, narcissism, loneliness, and Facebook usage. Comput. Hum. Behav. 2011 , 27 , 1658–1664. [ Google Scholar ] [ CrossRef ]
  • Wang, X.; Sun, X. Evaluation of personal Privacy Disclosure Prevention Ability based on Mobile Social Network. Mod. Inf. Technol. 2019 , 3 , 144–147. [ Google Scholar ]
  • Fuchs, C. An Alternative View of Privacy on Facebook. Information 2011 , 2 , 140–165. Available online: https://www.mdpi.com/2078-2489/2/1/140/htm (accessed on 7 November 2020). [ CrossRef ] [ Green Version ]
  • Tavani, H.T. Informational privacy: Concepts, theories, and controversies. In The Handbook of Information and Computer Ethics ; Fuchs, C., Ed.; Rivier University: Nashua, NH, USA, 2008; pp. 131–164. [ Google Scholar ]
  • Gu, L. Integrated privacy: A new type of privacy in the era of big data. Nanjing Soc. Sci. 2020 , 4 , 106–111+122. [ Google Scholar ] [ CrossRef ]
  • Clement, J. Facebook: Number of Users Worldwide. Available online: https://www.statista.com/statistics/490424/number-of-worldwide-facebook-users/ (accessed on 7 November 2020).
  • Smith, K. 53 Incredible Facebook Statistics and Facts. Available online: https://www.brandwatch.com/blog/facebook-statistics/ (accessed on 7 November 2020).
  • Nyoni, P.; Velempini, M. Privacy and user awareness on Facebook. South Afr. J. Sci. 2018 , 114 , 1–5. Available online: https://www.sajs.co.za/article/view/5165 (accessed on 7 November 2020). [ CrossRef ]
  • Wang, N.; Xu, H.; Grossklags, J. Third-party apps on Facebook: Privacy and the illusion of control. In Proceedings of the 5th ACM Symposium on Computer Human Interaction for Management of Information Technology, New York, NY, USA, 4 December 2011; Available online: http://docplayer.net/17187683-Third-party-apps-on-facebook-privacy-and-the-illusion-of-control.html (accessed on 7 November 2020).
  • Liu, Y.; Gummadi, K.P.; Krishnamurthy, B.; Mislove, A. Analyzing Facebook Privacy Settings: User Expectations vs. Reality. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, New York, NY, USA, 2 November 2011; Available online: https://scinapse.io/papers/2118994807 (accessed on 7 November 2020).
  • Bumgarner, B.A. You Have Been Poked: Exploring the Uses and Gratifications of Facebook among Emerging Adults. Available online: https://firstmonday.org/article/view/2026/1897 (accessed on 7 November 2020).
  • Wang, A.; Zhang, A.; Xu, Y. Privacy in Online Social Networks. In Proceedings of the Thirty Second International Conference on Information Systems, Shanghai, China, 4–7 December 2011. [ Google Scholar ]
  • Callahan, M. Big Brother 2.0: 160,000 Facebook Pages Are Hacked a Day. Available online: https://nypost.com/2015/03/01/big-brother-2-0-160000-facebook-pages-are-hacked-a-day/ (accessed on 13 November 2020).
  • Facebook Faces $ 5 Billion Fine Over Privacy Violations. 2019. Available online: https://www.dw.com/en/facebook-faces-5-billion-fine-over-privacy-violations/a-49575702 (accessed on 13 November 2020).
Types of User BehaviorVariables in Twelve User BehaviorsMeanSDDescription
Privacy ConcernAgainst excessive information collection5.671.43The degree of over-collection of information on social media
Pay attention to app request permission2.231.39The degree that users pay attention to app request permission and read privacy policy statement
Care about information security6.560.86The importance of information security
Privacy ProtectionApp privacy settings4.041.52The degree of application privacy settings can help reduce privacy leaks
Clean up the traces3.491.63The extent to which clean up the traces on social media can help reduce privacy disclosures
Change password3.321.71The extent to which social media passwords are regularly changed can help reduce privacy disclosures
Active DisclosureInput real information5.741.19The extent to which use real information increase privacy disclosures
Share your personal life5.471.18The extent to which share personal life increase privacy disclosures
Real-time social interaction4.721.55The extent to which real-time social interaction has privacy disclosures risk
Express personal feelings and value4.511.56The extent to which express personal feelings and value increase privacy disclosures
Passive ParticipationBe involved in news/ topic/ recommend5.11.69The degree of users that being involved in news/ topic/ recommend
Traces of associated third party websites5.141.44Due to traces of associated third party websites, users received recommended information from social media
Four Types of User BehaviourMeanDescription
Lack of Privacy Concern4.82The extent to which privacy concern can help reduce privacy disclosures
Lack of Privacy Protection3.62The extent to which privacy protection can help reduce privacy disclosures
Active Disclosure5.11The extent to which use active disclosure increase privacy disclosures
Passive Participation5.12The degree of users that involved in passive participation and that increase privacy disclosures
ScoreMeansScoreMeans
1The same//
3A bit more importantReciprocal 1/3A bit less important
5More importantReciprocal 1/5Less important
7Extremely importantReciprocal 1/7Extremely unimportant
2, 4, 6, 8Median on both sidesReciprocal ½, ¼Median on both side
Grade0–1010–2020–3030–4040–5050–6060–7070–8080–9090–100
Interval12345678910
MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

Liu, Y.; Tse, W.K.; Kwok, P.Y.; Chiu, Y.H. Impact of Social Media Behavior on Privacy Information Security Based on Analytic Hierarchy Process. Information 2022 , 13 , 280. https://doi.org/10.3390/info13060280

Liu Y, Tse WK, Kwok PY, Chiu YH. Impact of Social Media Behavior on Privacy Information Security Based on Analytic Hierarchy Process. Information . 2022; 13(6):280. https://doi.org/10.3390/info13060280

Liu, Yuxuan, Woon Kwan Tse, Pui Yu Kwok, and Yu Hin Chiu. 2022. "Impact of Social Media Behavior on Privacy Information Security Based on Analytic Hierarchy Process" Information 13, no. 6: 280. https://doi.org/10.3390/info13060280

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Data Privacy in Social Media Platform: Issues and Challenges

27 Pages Posted: 22 Mar 2021

Sakshi Rewaria

Galgotias university, school of law, students.

Date Written: February 26, 2021

Even though society as a whole is increasing the amount of personal information available to the public, there is still an expectation of privacy. People believe, sometimes falsely, that they can control the personal information they hold out to the public by determining who can access the information and how the information will be used. It is extremely challenging to define a fluid concept like privacy because it touches almost every aspect of a person and society to one degree or another. With over 1 billion users connected through online social media, user confidentiality is becoming even more important and is widely argue in the media and researched in academia. Social networking sites are a powerful and fun way to communicate with the world. The Internet is the safe place for only those people who aware of the risk and the security, and can take steps to protect themselves, so the best solution is to learn. Social media is a good service because it lets you to share what actually you want to share, but it can also be used for negative purposes, and in both cases you are responsible for your security. Protection and preventative techniques are not very difficult, but you need to be careful while you are on the Internet. In this paper it has been provided a brief overview of some attire to users’ privacy. Classification has been made of these threats as: users’ block, design pitfall and limitations, implicit flows of information, and clash of stimulus. Further also describe about the privacy and security issues associated with social network systems.

Keywords: social media, privacy, data protection

Suggested Citation: Suggested Citation

Sakshi Rewaria (Contact Author)

Do you have a job opening that you would like to promote on ssrn, paper statistics, related ejournals, information systems legislation & regulations ejournal.

Subscribe to this fee journal for more curated articles on this topic

Cybersecurity, Privacy, & Networks eJournal

Communication & technology ejournal.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Psychol

Research on the influence mechanism of privacy invasion experiences with privacy protection intentions in social media contexts: Regulatory focus as the moderator

1 School of Journalism and Communication, Xiamen University, Xiamen, China

2 Research Center for Intelligent Society and Social Governance, Interdisciplinary Research Institute, Zhejiang Lab, Hangzhou, China

Associated Data

The original contributions presented in the study are included in the article/ Supplementary material , further inquiries can be directed to the corresponding author.

Introduction

In recent years, there have been numerous online privacy violation incidents caused by the leakage of personal information of social media users, yet there seems to be a tendency for users to burn out when it comes to privacy protection, which leads to more privacy invasions and forms a vicious circle. Few studies have examined the impact of social media users' privacy invasion experiences on their privacy protection intention. Protection motivation theory has often been applied to privacy protection research. However, it has been suggested that the theory could be improved by introducing individual emotional factors, and empirical research in this area is lacking.

To fill these gaps, the current study constructs a moderated chain mediation model based on protection motivation theory and regulatory focus theory, and introduces privacy fatigue as an emotional variable.

Results and discussion

An analysis of a sample of 4800 from China finds that: (1) Social media users' previous privacy invasion experiences can increase their privacy protection intention. This process is mediated by response costs and privacy fatigue. (2) Privacy fatigue plays a masking effect, i.e., increased privacy invasion experiences and response costs will raise individuals' privacy fatigue, and the feeling of privacy fatigue significantly reduces individuals' willingness to protect their privacy. (3) Promotion-focus individuals are less likely to experience privacy fatigue than those with prevention-focus. In summary, this trend of “lie flat” on social media users' privacy protection is caused by the key factor of “privacy fatigue”, and the psychological trait of regulatory focus can be used to interfere with the development of privacy fatigue. This study extends the scope of research on privacy protection and regulatory focus theory, refines the theory of protection motivation, and expands the empirical study of privacy fatigue; the findings also inform the practical governance of social network privacy.

1. Introduction

Nowadays, people communicate and share information through SNS, and it has become an integral part of the daily lives of network users worldwide (Hsu et al., 2013 ). SNS makes people's lives highly convenient. However, it also poses an increasingly serious privacy issue. For instance, British media reported that 87,000,000 Facebook users' profiles were illegally leaked to a political consulting firm, Cambridge Analytica (Revell, 2019 ). In addition, one of the three major US credit bureaus, Equifax, reported a large-scale data leak in 2017, including 146 million pieces of personal information (Zhou and Schaub, 2018 ). The incidents that happened in recent years provoked a wave of discussion on personal privacy and information security issues.

Individuals' proactive behavior in protecting online privacy information is an effective method for reducing the occurrence of privacy violations; therefore, scholars explored how to enhance individuals' willingness to protect privacy. In terms of applied theoretical models, the Health Belief Model (HBM) (Kisekka and Giboney, 2018 ), the Technology Threat Avoidance Theory (TTAT) (McLeod and Dolezel, 2022 ), the Technology Acceptance Model (TAM) (Baby and Kannammal, 2020 ), and the Theory of Planned Behavior (TPB) (Xu et al., 2013 ) have been applied to explore the issue of online privacy protection behavior. By contrast, Protection Motivation Theory (PMT) is more applicable to studying privacy protection behavior in SNS because it focuses on threat assessment and coping mechanisms for privacy issues. However, the issue with this study's application of PMT theory is that it ignores the influence of individual emotions on protective behavior (Mousavi et al., 2020 ). Therefore, this study considered privacy fatigue as a variable to expand the theory of PMT in the context of social media privacy protection research. Moreover, in terms of the antecedents of privacy protection, existing research suggests that factors such as perceived benefits, perceived risks (Price et al., 2005 ), privacy concerns (Youn and Kim, 2019 ), self-efficacy (Baruh et al., 2017 ), and trust (Wang et al., 2017 ) can affect individuals' privacy-protective behaviors.

Along with the increased frequency of data breaches on the Internet, people find that they have less control over their data. Further, they are overwhelmed by having to protect their privacy alone. Moreover, the complexity of the measures required to protect personal information aggravates users' sense of futility, leading to exhaustion among online users. This phenomenon, defined as “privacy fatigue,” is regarded as a factor leading to the avoidance of privacy issues. Privacy fatigue has recently been prevalent among network users. However, empirical studies related to this phenomenon are still insufficient (Choi et al., 2018 ). Therefore, this study attempted to explore the role privacy burnout plays in users' privacy protection behaviors. Previous studies discovered that the impact of varying degrees of privacy invasion on privacy protection differed according to individual differences. It could be moderated by psychological differences (Lai and Hui, 2006 ). Clarifying the role of psychological traits is beneficial to the hierarchical governance of privacy protection. Regulatory focus is a kind of psychological trait based on different regulatory orientations, which could effectively affect social media users' behavioral preferences and decisions on privacy protection (Cho et al., 2019 ); however, to date, the relationship between regulatory focus, privacy fatigue, and privacy protection intentions has not been sufficiently examined. For this reason, it is necessary to empirically explore this question.

Based on the PMT theoretical framework, this study built a moderated mediation model to examine the influential mechanism of privacy-invasive experiences on privacy protection intentions by introducing three factors: response costs, privacy burnout, and regulatory focus. Data analyzed from an online survey of 4,800 network users demonstrated that, first, social media users' experiences of privacy invasion increase their willingness to protect privacy. Second, privacy burnout has a masking effect, which means that the more privacy-invasive experiences and response costs there are, the greater the privacy fatigue, which reduces users' privacy protection intentions even further. Third, promotion-focused individuals are less likely to experience fatigue from protecting personal information alone. The significance of this study lies in the fact that it bridged the gap between the effects of privacy violation experiences on individuals' protective willingness.

Meanwhile, this study verified the practicality of combining PMT theory with emotionally related variables. Additionally, it complemented the study on privacy fatigue and expanded the scope of regulatory orientation theory in privacy research. From a practical perspective, this study offered a reference for the hierarchical governance of privacy in social networks. Finally, this study reveals a vicious cycle mechanism (negative experiences, privacy fatigue, low willingness to protect, and new negative experiences) followed by a theoretical reference for breaking this cycle.

2. Theoretical framework

2.1. privacy invasion experiences, response costs, and privacy protection intentions.

Protection motivation theory (PMT) is commonly used in online privacy studies (Chen et al., 2015 ). According to Rogers ( 1975 ), individuals cognitively evaluate the risk before adopting behaviors, develop protection motivation, and eventually modify their behaviors to avoid risks. There are two sources of impact on people's response assessments: environmental and interpersonal sources of information and prior experience. After combing through the past literature, we found that many scholars have verified the influence of environmental (Wu et al., 2019 ) and interpersonal (Hsu et al., 2013 ) factors on individual privacy protection; however, only a few scholars explored the effect of privacy violation experiences on privacy protection intentions. Some studies proved that individuals' prior privacy violation experiences are an antecedent to their information privacy concerns, including in the mobile context and at the online marketplace (Pavlou and Gefen, 2005 ; Belanger and Crossler, 2019 ). Regarding privacy concerns, prior studies widely demonstrated a significant antecedent to privacy protection intentions and protective behaviors. In addition, a meta-analysis found that users who worried about privacy were less likely to use internet services and were more likely to adopt privacy-protective actions (Baruh et al., 2017 ).

People make sense of the world based on their prior experiences (Floyd et al., 2000 ), while network users who have had privacy-invasive experiences tend to believe that the privacy risks are closely related to themselves (Li, 2008 ). They tend to be more aware of the seriousness and vulnerability of privacy issues (Mohamed and Ahmad, 2012 ). The effects of previous negative experiences on perceived vulnerability can also be explained by the availability heuristic, which assumes that the easier it is to retrieve experienced cases from memory, the higher the perceived frequency of the event. In contrast, when fewer cases are retrieved, people may estimate that the event is less likely to occur than in objective situations. Therefore, people's accumulated experiences of negative events might influence their perception of future vulnerability to risk (Tversky and Kahneman, 1974 ). However, in accordance with PMT, seriousness and vulnerability affect protective behavior in the context of social media privacy issues. Therefore, we can assume that the more memories of privacy violations people have, the more likely they are to believe that their privacy will be violated by privacy exposure, thereby increasing their motivation to protect privacy that is, their willingness to protect privacy. Therefore, this study proposed the following hypothesis:

  • H1: Privacy invasion experience is positively affecting protective privacy willingness.

PMT suggests that cognitive evaluation consists of assessing response costs (Rogers, 1975 ), and response costs refer to any costs, such as monetary, time, and effort (Floyd et al., 2000 ). According to findings from a health psychology study, when faced with the threat of skin cancer, people prefer to use sunscreen rather than avoid the sun (Jones and Leary, 1994 ; Wichstrom, 1994 ). It may be because of the lower response costs of utilizing sunscreen. These findings inspire us to believe that individuals calculate the response cost before they take protective actions. Privacy protection-related studies also indicate that prior experiences with personal information violations may significantly increase consumers' privacy concerns about both offline and online privacy and that privacy concerns are related to perceived risks (Okazaki et al., 2009 ; Bansal et al., 2010 ). It has also been shown that individuals who have experienced privacy invasion perceive a greater severity of risk (Petronio, 2002 ). Considering individuals' perceptions of risks affects their assessment of costs, which is part of the game between risks and benefits. In other words, a stronger risk perception indicates that higher response costs should be paid. Thus, this study assumed that people with more privacy violation experiences might perceive higher response costs and tend to take protective actions to avoid paying more. Consequently, this study made the following hypothesis:

  • H2a: A higher level of privacy-invasive experiences results in a higher perception of response costs.
  • H2b: A higher level of perception of response costs will result in higher privacy protection intentions.
  • H2c: Response cost mediates the effect of privacy-invasive experiences on privacy protection intentions.

2.2. Privacy invasion experiences, response costs, and privacy protection intentions

The medical community first introduced the concept of fatigue and referred to it as a subjective unpleasant feeling of tiredness (Piper et al., 1987 ). The concept of fatigue has been used in many research fields, such as clinical medicine (Mao et al., 2018 ), psychology, and more (Ong et al., 2006 ). In recent years, scholars also used the concept of “fatigue” in the study of social media and regarded it as an important antecedent to individual behaviors (Ravindran et al., 2014 ). Choi et al. ( 2018 ) defined “privacy fatigue” as a psychological state of fatigue caused by privacy issues. Specifically, “privacy fatigue” manifests itself as an unwillingness to actively manage and protect one's personal information and privacy (Hargittai and Marwick, 2016 ).

With the increasing severity of social network and personal information issues, the research around privacy fatigue, especially the examination of the antecedents and effects of privacy fatigue, has been widely developed. Regarding antecedents, scholars found that privacy concerns, self-disclosure, learning about privacy statements and information security, and the complexity of privacy protection practices could influence individuals' levels of privacy fatigue (Dhir et al., 2019 ; Oh et al., 2019 ). In terms of the effects, privacy fatigue can not only cause people to reduce the frequency of using social media or even withdraw from the Internet (Ravindran et al., 2014 ), but it can also motivate individuals to resist disclosing personal information (Keith et al., 2014 ); however, only a few studies examined privacy invasion experiences, privacy fatigue, and privacy protection intentions under one theoretical framework.

Furnell and Thomson ( 2009 ) pointed out that “privacy fatigue” is triggered by an individual's experience with privacy problems. Additionally, privacy fatigue has a boundary. When this boundary is crossed, social network users become bored with privacy management, leading them to abandon social network services. It has also been suggested that privacy data breaches can cause individuals to feel “disappointed.” In a study of medical data protection, the results showed that breaches of patients' medical data can have a cumulative effect on patients' behavioral decisions by causing them to perceive that their requests for privacy protection are being ignored (Juhee and Eric, 2018 ). The relationship between privacy invasion experiences and privacy fatigue has been widely demonstrated. Such social media characteristics as internet privacy threat experience and privacy invasion could lead to users' sense of emotional exhaustion and privacy cynicism, which was further associated with social media privacy fatigue (Xiao and Mou, 2019 ; Sheng et al., 2022 ). In terms of the outcomes, some other studies focusing on the privacy paradox found that emotional exhaustion and powerlessness (the same concept as exhaustion) would weaken the positive influence relationship between privacy concerns and their willingness to protect personal information (Tian et al., 2022 ). On account of the above reviews, it is reasonable to analogize that an individual's privacy invasion experience in the context of social media use can exacerbate an individual's perception of privacy fatigue. In other words, considering the social media privacy context, privacy fatigue may lead network users to abandon privacy protection behaviors and create opportunities for privacy invasion. Based on the above discussions, we proposed the following hypotheses:

  • H3a: Privacy invasion experiences positively affect privacy fatigue.
  • H3b: Privacy fatigue negatively affects privacy protection intentions.
  • H3c: Privacy fatigue has a masking (a form of mediating effect) role in the effects of individual social media privacy invasion experiences on privacy protection intentions.

As discussed above, we hypothesized that both response costs and privacy fatigue mediate the effect of social media users' privacy invasion experiences on their privacy protection intentions. Assuming that both response costs and privacy fatigue could mediate the effect of social media users' privacy invasion experiences on their privacy protection intentions, what is the association between response costs and privacy fatigue? It has been argued that a common shortcoming of current research applying PMT theory is that it ignores the role emotions play in this mechanism (Mousavi et al., 2020 ). This view is supported by Li's research, which argues that most research on privacy topics is conducted from a risk assessment perspective and tends to ignore the impact of emotions on privacy protection behaviors (Li et al., 2016 ). It was believed that emotions could change an individual's attention and beliefs (Friestad and Thorson, 1985 ). These factors are both related to behavioral intentions.

It has also been suggested that emotions play a mediating role in the process of behavioral decision-making (Tanner et al., 1991 ). However, only a few studies explored this influential mechanism to date. Zhang et al. ( 2022 ) found a positive influence between response costs and privacy fatigue. They conducted the research based on the Stressor-Strain-Outcome (S-S-O) framework to explore which factors (stressors) could cause privacy fatigue intentions (strain) and related behaviors (outcome). The results discovered that time cost and several other stressors significantly positively impact social media fatigue intention. As quoted from Floyd et al. ( 2000 ), “response costs” refer to any costs in which time costs were included. Despite an important reference to the above study's results provided for this study, the time cost is just one factor among response costs. This piece of research will focus on general response costs, assisting in a better understanding of this influential mechanism. Based on this, we proposed the following hypotheses:

  • H4a: Privacy response costs are positively associated with privacy fatigue.
  • H4b: Response costs and privacy fatigue play chain mediating roles in the effect of privacy invasion experiences on privacy protection intentions.

2.3. Regulatory focus as the moderator

Differences in individual psychological traits can lead to significant differences in individuals' cognition and behaviors (Benbasat and Dexter, 1982 ), and it has been shown that personal psychological traits can influence individuals' perceptions of fatigue (Dhir et al., 2019 ). A recent study also found that neuroticism has positive effects on privacy fatigue but that traits like agreeableness and extraversion have negative effects (Tang et al., 2021 ). However, previous research on social media privacy fatigue is relatively limited. Given the critical nature of privacy fatigue in research models, it is necessary to explore the differences in perceived fatigue among individuals with different psychological traits. This study introduced individual levels of regulatory focus as a moderator and examined the effect of privacy invasion experiences on privacy fatigue. Regulatory focus as a psychological trait was applied to explain social media users' privacy management and privacy protection problems (Wirtz and Lwin, 2009 ; Li et al., 2019 ).

Regulatory Focus Theory (RFT) classifies individuals into two different levels based on psychological traits: promotion focus, which focuses more on benefits and ignores potential risks, and prevention focus, which tends to avoid risks and ignore benefits when making decisions (Higgins, 1997 ). Research demonstrated that perceptions of benefits are supposed to reduce fatigue, while perceptions of risk could exacerbate fatigue (Boksem and Tops, 2008 ). By the same analogy, promotion-focused individuals are more inclined to notice the benefits of using social media (Jin, 2012 ) and thus may experience less fatigue and lower response costs when experiencing privacy violations; in contrast, individuals with a prevention focus are more aware of the risks associated with privacy invasion and thus have more concerns about privacy issues, which can lead to more feelings of fatigue and higher perceived response costs about privacy issues. Combined with H4, we can reason that the path of influence of social media privacy invasion experiences on privacy protection intentions may be affected by the level of individual regulatory focus. The effect of privacy invasion experiences on privacy fatigue and response costs was stronger for individuals who tended to be prevention focused than for those who tended to be promotion focused. Therefore, the mediating effect of privacy fatigue and response cost is stronger. In summary, this study proposed the hypotheses as follows:

  • H5a: Compared to promotion-focused users, the effect of privacy invasion experiences on privacy fatigue is greater for prevention-focused users.
  • H5b: Compared to prevention-focused users, the effect of privacy invasion experiences on response costs is greater for promotion-focused users.

2.4. Current study

In summary, the current study concluded that, in the social media context, users' experiences of privacy invasion would increase their perception of response costs and thus result in privacy fatigue. Privacy fatigue decreases individuals' privacy protection intentions. However, this process differed for individuals with different regulatory focuses. In detail, individuals with a promotion focus are less likely to experience privacy fatigue than individuals with a prevention focus. Based on the above logic, the conceptual model constructed in this study is shown in Figure 1 .

An external file that holds a picture, illustration, etc.
Object name is fpsyg-13-1031592-g0001.jpg

Conceptual model.

3. Materials and methods

3.1. participants and procedures.

This survey was conducted in December 2021, and Zhejiang Lab collected the data. The questionnaire was pretested with a small group of participants to ensure the questions were clearly phrased. Participants were informed of their right to withdraw and were assured of confidentiality and anonymity before participating in this research survey. Computers, tablets, and mobile phones were all used to complete the cross-sectional survey. After giving their consent, participants were asked to complete the following scales. After the screening, 4,800 valid questionnaires were selected. The invalid questionnaires were deleted mainly based on not passing the test of the screening questions rather than not answering the questions carefully (e.g., the answers to the questions of several consecutive variables are the same, or the number of repeated options is >70%).

To guarantee data quality and reduce possible interference from gender and geographical factors, this survey used a quota sampling method, as shown in Table 1 , with a sample gender ratio of 1:1 and samples from 16 cities in China, with 300 valid samples in each city. Considering the possible relationship between the privacy invasion experience and the years of Internet usage, participants' previous privacy invasion experience is meaningful to this study, and the final sample had 34.5 and 57.3% of Internet usage between 5 and 10 years and more than 10 years, respectively, which met the requirements of the study. In terms of education level, college and bachelor's degrees accounted for the largest proportion, at 62.0%, followed by high school/junior high school and vocational high school, at 27.3%. In terms of the age of the sample, the ratio of those younger than 46 years old to those above was 59.7:40.3 with a balanced distribution among all age groups. The basic demographic variables are tabulated as shown in Table 1 .

Statistical table of basic information on effective samples.

GenderMen24,0050.0
Women24,0050.0
Age18~253577.4
26~351,57332.8
36~4593619.4
Over 461,93440.3
Educational backgroundUnder High School3567.4
High School1,30827.3
Undergraduate2,97562.0
Master and Doctor1613.4
Internet life timeLess than 3 years340.7
3~5 years3567.4
5~10 years1,65834.5
Over 10 years2,75257.3

3.2. Measurements

Based on the model and hypotheses of this study, the instruments of this study included measures of privacy invasion experiences, response costs, privacy fatigue, privacy protection intentions, and regulatory focus (including promotion focus and prevention focus). This study's questionnaire was designed on scales that have been pre-validated. All scales were adapted based on social media contexts, and all responses were graded on a Likert scale ranging from 0 (strongly disagree) to 6 (strongly agree). A higher score was a better fit for that measure. Sub-items within each scale were averaged, resulting in composite scores.

The privacy invasion experiences scale was referenced from Su's study (Su et al., 2018 ). The scale is a 3-item self-reported scale (e.g., “My personal information, such as my phone number, shopping history, and more, is used to be shared by intelligent media with third-party platforms.”). The response cost scale was developed from the scale in the study by Yoon et al. ( 2012 ), which included three measurement questions (e.g., “When personal information security is at risk on social media, I consider that taking practical action will take too much time and effort.”). The privacy fatigue scale was derived from a related study by Choi et al. ( 2018 ), and the current study applied this 4-item scale to measure privacy fatigue on social media (e.g., “Dealing with personal information protection issues on social media makes me tired.”). The privacy protection intention scale was based on the scale developed by Liang and Xue ( 2010 ), which contains three measurement items (e.g., “When my personal information security is threatened on social media, I am willing to make efforts to protect it.”). The regulatory focus scale was derived from the original scale developed by Higgins ( 2002 ) and later adapted by Chinese scholars for use with Chinese samples (Cui et al., 2014 ). The scale contains six items on measures for promotion focus (e.g., “For what I want to do, I can do it all well”) and four items on measures for prevention focus (e.g., “While growing up, I often did things that my parents didn't agree were right”). The regulatory focus was measured by subtracting the average prevention score from the average promotion score, with higher differences indicating a greater tendency toward promotion focus and lower differences indicating a greater tendency toward prevention focus (Cui et al., 2014 ).

3.3. Data analysis

The validity and reliability of our questionnaire were tested using Mplus8. The PROCESS macro for SPSS was used to evaluate the moderated chain mediation model with the bootstrapping method (95 percent CI, 5,000 samples). Gender (1 = men, 0 = women), age, the highest degree obtained, and Internet lifetime are among the covariates examined in this model.

4.1. Measurement of the model

As shown in Table 2 , privacy invasion experiences, response costs, privacy fatigue, and privacy protection intentions are all factors to consider. Cronbach's α and composite reliability of scales are higher than the acceptable value (>0.70). Although the Cronbach's α for promotion and prevention focus were slightly <0.70, they were >0.60 and close to 0.70, which was also considered permissible due to the large sample size of this study, and the reliability test of the measurement model in this study was qualified (Hair et al., 2019 ).

Results of the validity and reliability.

1. PIE 0.7240.7730.767
2. RC0.468 0.5940.8620.862
3. PF0.4570.538 0.7840.8570.856
4. PPI0.1060.075−0.153 0.5180.7510.750
5. Pro Focus0.0510.020−0.0930.451 0.4200.6830.693
6. Pre Focus0.3380.2870.449−0.030−0.002 0.4420.7030.697

PIE, privacy invasion experiences; PC, response costs; PF, privacy fatigue; PPI, privacy protection intentions. Bold value is the square root of AVE.

Since the measurement instruments in this study were derived from validated scales, the average variance extracted (AVE) was higher than 0.5, but we can accept 0.4. According to Fornell and Larcker ( 1981 ), if the AVE is <0.5, but the composite reliability is higher than 0.6, the construct's convergent validity is still acceptable (Fornell and Larcker, 1981 ). Further, Lam ( 2012 ) also explained and confirmed this view (Lam, 2012 ). Discriminant validity was tested by comparing the square root of AVE with the correlations of the researched variables. The square root of the AVE was higher than the correlation, indicating good discriminant validity.

Then, we tested the goodness of fit indices. Confirmatory factor analysis (CFA) of our questionnaire produced acceptable fit values for the one-dimensional factor structure (RMSEA = 0.048 0.15, SRMR = 0.042 0.05, GFI = 0.955 > 0.9, CFI = 0.947 > 0.9, NFI = 0.943 > 0.9, and 948 = 0.945 > 0.9) after introducing the error covariances in the model. In summary, the current study passed the reliability and validity tests.

4.2. Descriptive statistics

Table 3 shows the descriptive statistics and correlation analysis results. Response costs, privacy fatigue, and privacy protection intentions were all positively correlated with privacy invasion experiences. Privacy fatigue and privacy protection intentions were both positively correlated with response costs. Private fatigue was found to be negatively related to privacy protection intentions.

Means, standard deviations, and correlations among research variables.

1. PIE3.5251.3041
2. RC3.7971.4410.468 1
3. PF2.8071.4770.457 0.538 1
4. PPI4.6360.8820.106 0.075 −0.153 1
5. RF1.6371.476−0.265 −0.239 −0.440 0.271 1

PIE, privacy invasion experiences; PC, response costs; PF, privacy fatigue; PPI, privacy protection intentions; RF, regulatory focus; ** p < 0.01.

4.3. Relationship between privacy invasion experience and privacy protection intentions

Table 4 shows the results of the polynomial regression analysis. Privacy invasion experiences significantly influenced levels of response costs (β = 0.466, SE = 0.023, t = 11.936, p = 0.000), privacy fatigue (β = 0.297, SE = 0.022, t = 13.722, p = 0.000), and privacy protection intentions (β = 0.133, SE = 0.011, t = 12.382, p = 0.000) after controlling for gender, highest degree obtained, age, and Internet lifetime. Response costs positively predicted privacy fatigue (β = 0.382, SE = 0.013, t = 29.793, p = 0.000) and privacy protection intention (β = 0.098, SE = 0.010, t = 9.495, p = 0.000). However, privacy fatigue was significantly negatively correlated with privacy protection intentions (β = −0.130, SE = 0.011, t = −12.303, p = 0.000) in this model. In conclusion, H1, H2a, H2b, H3a, H3b, and H4a were supported.

Multiple regression results of the moderated mediation model.

PIE0.1330.01112.3820.000 0.134106.295
PF−0.1300.011−12.3030.000
RC0.0980.0109.4950.000
PIE0.2970.02213.7220.000 0.427446.246
RC0.3820.01329.7930.000
RF−0.1010.040−2.5100.0121
PIE × RF−0.0310.008−4.1030.000
PIE0.4660.02311.9360.000 0.234209.354
RF−0.1430.046−3.1380.0017
PIE × RF0.0070.0090.8400.401

PIE, privacy invasion experiences; PC, response costs; PF, privacy fatigue; PPI, privacy protection intentions; RF, regulatory focus; * p < 0.05; ** p < 0.01; *** p < 0.001; β, unstandardized regression weight; SE, standard error for the unstandardized regression weight; t, t-test statistic; F, F-test statistic.

Then, we used Model 6 of PROCESS to test the mediating effect in our model. As the results in Table 5 , H2c, H3c, and H4b were accepted.

Results of mediating effect test.

Indirect effectPIE → RC → PPI0.0530.0410.065
PIE → PF → PPI−0.057−0.065−0.049
PIE → RC → PF → PPI−0.042−0.048−0.037
Total indirect effect−0.047−0.059−0.035

PIE, privacy invasion experiences; PC, response costs; PF, privacy fatigue; PPI, privacy protection intentions.

Model 84 in the SPSS PROCESS macro is applied to carry out the bootstrapping test to examine the moderation effect of regulatory focus. Privacy invasion experiences, response costs, privacy fatigue, and regulatory focus were centralized before constructing the interaction term. The results showed that regulatory focus significantly moderated the effect of privacy invasion experiences on privacy fatigue [95% Boot CI = (0.002, 0.006), and H5a was supported. In addition, the mediating effect was significant at a low level of regulatory focus (−1 SD; Effec t = −0.038; 95% Boot CI = (−0.046, −0.030)], medium level of regulatory focus [Effec t = −0.032; 95% Boot CI = (−0.039, −0.026)] and high level of regulatory focus [+1 SD; Effec t = −0.026; 95% Boot CI = (−0.032, 0.020)]. Specifically, the mediating effect of privacy fatigue decreased as individuals increasingly tended to be promotion focused. However, the regulatory focus did not significantly moderate the effect of privacy invasion experiences on response costs [95% Boot CI = (−0.001, 0.003)], and H5b was rejected.

Meanwhile, privacy invasion experiences × regulatory focus interaction significantly predicted privacy fatigue (β = −0.046, SE = 0.008, t = −3.694, p = 0.000; see Figure 2 ). The influence of privacy invasion experiences on privacy fatigue was significant when the level of regulatory focus was high (β = 0.385, SE = 0.016, t = 23.981, p = 0.000), medium (β = 0.430, SE = 0.015, t = 29.415, p = 0.000), and low (β = 0.475, SE = 0.022, t = 22.061, p = 0.000). Specifically, the more the individuals tended to be promotion focused (high regulatory focus scores), the less the level of fatigue caused by privacy invasion, and the more the individuals tended to be prevention focused (low regulatory focus scores), the more the level of fatigue was caused by privacy invasion.

An external file that holds a picture, illustration, etc.
Object name is fpsyg-13-1031592-g0002.jpg

Simple slope test of the interaction between PIE and RF on the PF.

5. Discussion

The purpose of the present study was to explore the relationship among privacy invasion experiences, response costs, privacy fatigue, privacy protection intentions, and regulatory focus. This study showed that response costs and privacy fatigue play mediating roles, whereas regulatory focus plays a moderating role in this process (as shown in Figure 3 ). These findings help clarify how and under which circumstances social media users' privacy invasion experiences affect their privacy protection intentions, thereby providing a means to improve people's privacy situation on social media platforms.

An external file that holds a picture, illustration, etc.
Object name is fpsyg-13-1031592-g0003.jpg

The moderated chain mediation model. Dashed lines represent nonsignificant relations *** p < 0.001.

5.1. A chain mediation of response costs and privacy fatigue

The current study found that social media users' privacy invasion experiences have a significant positive effect on their response costs, and the increase in response costs will in turn increase individuals' privacy protection intentions. This finding was consistent with previous literature on health psychology, which found that individuals calculate response costs for different actions before making decisions. The higher the response costs individuals perceive, the greater the possibility that they will improve their protective intention (Jones and Leary, 1994 ; Wichstrom, 1994 ). Compared with users who experienced less privacy invasion on social media, people who experienced more privacy violations would perceive a higher level of response costs, which would further increase their protective intention to avoid dealing with the negative outcomes followed by privacy invasion.

The study also found that social media users' privacy invasion experiences had a significant positive effect on privacy fatigue, which is consistent with prior research on social media use (Xiao and Mou, 2019 ; Sheng et al., 2022 ). At the same time, response costs also positively affected privacy fatigue, and research on social media fatigue behaviors indicated this influential mechanism in the past (Zhang et al., 2022 ). However, this study additionally found that response costs partially mediated the effect of privacy invasion experiences on privacy fatigue. Although both increased privacy invasion experiences and increased response costs will improve social media users' privacy protection intentions, privacy fatigue can mask this process, i.e., increased privacy fatigue reduces individuals' privacy protection intentions.

Moreover, this study revealed that response costs and privacy fatigue play chain-mediated roles in the effect of social media privacy invasion experiences on privacy protection intentions and further explained the mechanism. In addition, the masking effect of privacy fatigue also explains why privacy invasion experiences do not have a strong effect on privacy protection intentions. In other words, this privacy fatigue is an important reason that people currently “lie flat” (adopt passive protection) in the face of privacy-invasive issues online.

5.2. Regulatory focus as moderator

The relationship between social media privacy invasion experiences and privacy fatigue was moderated by regulatory focus. To be more specific, the more the people who promoted their privacy, the less the level of privacy fatigue they felt; the more the people who prevented their privacy, the more the level of privacy fatigue they felt. In other words, promotion focus acts as a buffer in this process. In other words, promotion focus has a buffering effect in this process. To some extent, the result of this study verified that different regulated individuals would sense different levels of fatigue due to their pursuing benefits or avoiding risks when they make decisions (Boksem and Tops, 2008 ; Jin, 2012 ). On the other hand, the regulatory focus did not moderate the relationship between privacy invasion experiences and response costs. One possible explanation is that, compared with privacy fatigue, response costs to privacy violations are based on exact experiences in users' memories. Individuals who have had more privacy invasions have more experience dealing with the negative consequences of privacy violations. Thus, whether psychological traits were added or not, the effect of privacy-invasive experiences on response costs would not be strengthened or weakened.

Meanwhile, this study has proven a moderated mediation model investigating the moderating role of regulatory focus in mediating “privacy invasion experiences—privacy fatigue—privacy protection intentions.” The results indicated that, as individuals tend to be prevention focused, privacy invasion experiences affect individuals' privacy protection intentions through the mediating role of privacy fatigue; specifically, the more they tend to be prevention focused, the stronger their privacy fatigue and the weaker their privacy protection intentions. Therefore, interventions for privacy fatigue (e.g., improving media literacy, creating a better online environment, and more) can be used to enhance social media users' privacy protection intentions (Bucher et al., 2013 ; Agozie and Kaya, 2021 ). In particular, focusing on social media users who tend to be prevention focused is crucial.

5.3. Implication

From a theoretical perspective, our study found a mechanism for influencing privacy-protective behavior based on an extension of the protective motivation theory. Protection motivation theory is a fear-based theory. We used our experiences with social media privacy invasions as a source of fear. Based on this, we found that these experiences were associated with individuals' privacy protection intentions. We explained the mechanism through the mediating variable of response costs, which is also consistent with previous findings (Chen et al., 2016 ).

More importantly, however, in response to what previous researchers have argued is an emotional factor that traditional protection motivation theory ignores (Mousavi et al., 2020 ), our study extended traditional protection motivation theory to include privacy fatigue as a factor and verified that fatigue significantly reduces social media users' privacy protection intentions. The introduction of “privacy fatigue” can better explain why occasional privacy invasion experiences do not cause privacy-protective behaviors, which is another possible explanation for the privacy paradox in addition to the traditional privacy calculus theory. The introduction of “privacy fatigue” has also inspired researchers to pay attention to individual emotions in privacy research. This study also compared differences in privacy protection intentions among social media users of different regulatory focus types, which are mainly caused by fatigue rather than response costs. By combining privacy fatigue and regulatory focus, it was found that not all subjects felt the same level of privacy fatigue after experiencing privacy invasion. This study also expanded the application of both privacy fatigue and regulatory focus theories and built a bridge between online privacy research and regulatory focus theory.

In addition to the aforementioned implications for research and theory, the findings also have some useful, practical implications. First of all, the findings of this piece ask for measures to reduce privacy invasion on social media. (a) Reducing the incidence of privacy violations at their root requires improving the current online privacy environment on social media platforms. We call on the government to strengthen the regulation of online privacy and social media platforms to reinforce the protection of users' privacy. To a large extent, users' personal information should not be misused. (b) From the social media agent perspective, relevant studies mentioned that content relevance perceived by online users could mitigate the negative relations between privacy invasion and continuous use intention (Zhu and Chang, 2016 ). Social media agents should improve their efficiency in using qualified personal information, giving users a smoother experience on online platforms.

Second, the results show that privacy fatigue could affect users' privacy protection intentions. (c) According to Choi et al. ( 2018 ), users have a tolerance threshold for privacy fatigue. The policy should formulate an acceptable level of privacy protection. Other scholars suggested that online service providers should avoid excessively or unnecessarily collecting personal information and forbid sharing or selling users' personal information strictly with any third party without their permission (Tang et al., 2021 ). (d) Another effective way is to reduce response costs to reduce the costs of protecting one's privacy. For example, social media platforms can optimize privacy interfaces and management tools or provide more effective feedback mechanisms for users. (e) In addition, improving users' privacy literacy (especially for prevention-focused individuals) can also be effective in reducing privacy fatigue (Bucher et al., 2013 ).

Finally, different measures should be applied based on different regulatory-focused users. (f) Social media managers could further classify users into groups based on their psychological characteristics and manage them in accordance with their requirements for the level of privacy protection. Thereby, social media users may have a wider range of choices. Specifically, due to previous privacy invasive experience, prevention-focused individuals tend to feel more privacy fatigue, requiring additional privacy protection features for prevention-focused users. For example, social media platforms could offer specific explanations of privacy protection technologies to increase prevention-focused individuals' trust in privacy protection technologies.

5.4. Limitations and future directions

There are still some limitations present in this article. Firstly, this study solely selected response costs as individuals' cognitive process, whereas threat appraisal was also included in the cognitive process of protection motivation theory, which focused on the potential outcomes of risky behaviors, including perceived vulnerability, perceived severity of the risk, and rewards associated with risky behavior (Prentice-Dunn et al., 2009 ). Future studies could systematically consider the association between these factors and privacy protection intentions. Second, users' perceptions of privacy invasion are different across various social media platforms (e.g., Instagram and Facebook), and this study only applies to a generalized social media context. Future research could pay more attention to the differences among users on different social media platforms (with different functions). Finally, this study did not focus on specific privacy invasion experiences. However, studies pointed out that different types of privacy invasions affect people differently. Moreover, people with different demographical backgrounds, such as cultural backgrounds and gender, would react differently when faced with the same situation (Klein and Helweg-Larsen, 2002 ). Future research can investigate this in more depth through experiments.

6. Conclusion

In conclusion, our findings suggest that social media privacy invasion experiences increase individuals' privacy protection intentions by increasing their response costs, but e increase in privacy fatigue masks this effect. Pivacy fatigue is a barrier to increasing social media users' willingness to protect their privacy, which explains why users do not seem to show a stronger willingness to protect their privacy when privacy invasion is a growing problem in social networks nowadays. Our study also revealed a different level of fatigue that individuals with different levels of regulatory focus exhibit when faced with the same level of privacy invasion experience. In particular, prevention-focused social media users are more likely to become fatigued. Therefore, social media agents should pay special attention to these individuals because they may be particularly vulnerable to privacy violations. Furthermore, the current research on privacy fatigue has yet to be expanded, and future researchers can add to it.

Our theoretical analysis and empirical results further emphasize the distinction between individuals, a differentiation that allows researchers to align their analyses with theoretical hypotheses more tightly. This applies not only to research on the effects of privacy invasion experiences on privacy behavior but also to exploring other privacy topics. Therefore, we recommend that future privacy research be more human-oriented, which will also benefit the current “hierarchical governance” of the Internet privacy issue.

Data availability statement

Ethics statement.

This study was approved by the Academic Committee of the School of Journalism and Communication at Xiamen University, and we carefully verified that we complied strictly with the ethical guidelines.

Author contributions

CG is responsible for the overall research design, thesis writing, collation of the questionnaire, and data analysis. SC and ML are responsible for the guidance. JW is responsible for the proofreading and article touch-up. All authors contributed to the article and approved the submitted version.

Acknowledgments

The authors thank all the participants of this study. The participants were all informed about the purpose and content of the study and voluntarily agreed to participate. The participants were able to stop participating at any time without penalty.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg.2022.1031592/full#supplementary-material

  • Agozie D. Q., Kaya T. (2021). Discerning the effect of privacy information transparency on privacy fatigue in e-government . Govern. Inf. Q . 38, 101601. 10.1016/j.giq.2021.101601 [ CrossRef ] [ Google Scholar ]
  • Baby A., Kannammal A. (2020). Network Path Analysis for developing an enhanced TAM model: A user-centric e-learning perspective . Comput. Hum. Behav . 107, 24. 10.1016/j.chb.2019.07.024 [ CrossRef ] [ Google Scholar ]
  • Bansal G., Zahedi F. M., Gefen D. (2010). The impact of personal dispositions on information sensitivity, privacy concern and trust in disclosing health information online . Decision Support Syst . 49 , 138–150. 10.1016/j.dss.2010.01.010 [ CrossRef ] [ Google Scholar ]
  • Baruh L., Secinti E., Cemalcilar Z. (2017). Online privacy concerns and privacy management: a meta-analytical review . J. Commun. 67 , 26–53. 10.1111/jcom.12276 [ CrossRef ] [ Google Scholar ]
  • Belanger F., Crossler R. E. (2019). Dealing with digital traces: Understanding protective behaviors on mobile devices . J. Strat. Inf. Syst. 28 , 34–49. 10.1016/j.jsis.2018.11.002 [ CrossRef ] [ Google Scholar ]
  • Benbasat I., Dexter A. S. (1982). Individual differences in the use of decision support aids . J. Account. Res . 20 , 1–11. 10.2307/2490759 [ CrossRef ] [ Google Scholar ]
  • Boksem M. A. S., Tops M. (2008). Mental fatigue: costs and benefits . Brain Res. Rev. 59 , 125–139. 10.1016/j.brainresrev.2008.07.001 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bucher E., Fieseler C., Suphan A. (2013). The stress potential of social media in the workplace . Inf. Commun. Soc . 16 , 1639–1667. 10.1080/1369118X.2012.710245 [ CrossRef ] [ Google Scholar ]
  • Chen H., Beaudoin C. E., Hong T. (2015). Teen online information disclosure: Empirical testing of a protection motivation and social capital model . J. Assoc. Inf. Sci. Technol . 67 , 2871–2881. 10.1002/asi.23567 [ CrossRef ] [ Google Scholar ]
  • Chen H., Beaudoin C. E., Hong T. (2016). Protecting oneself online: The effects of negative privacy experiences on privacy protective behaviors . J. Mass Commun. Q. . 93 , 409–429. 10.1177/1077699016640224 [ CrossRef ] [ Google Scholar ]
  • Cho H., Roh S., Park B. (2019). Of promoting networking and protecting privacy: effects of defaults and regulatory focus on social media users' preference settings . Comput. Hum. Behav . 101 , 1–13. 10.1016/j.chb.2019.07.001 [ CrossRef ] [ Google Scholar ]
  • Choi H., Park J., Jung Y. (2018). The role of privacy fatigue in online privacy behavior . Comput. Hum. Behav . 81 , 42–51. 10.1016/j.chb.2017.12.001 [ CrossRef ] [ Google Scholar ]
  • Cui Q., Yin C. Y., Lu H. L. (2014). The reaction of consumers to others' assessments under different social distance . Chin. J. Manage . 11 , 1396–1402. [ Google Scholar ]
  • Dhir A., Kaur P., Chen S., Pallesen S. (2019). Antecedents and consequences of social media fatigue . Int. J. Inf. Manage . 8 , 193–202. 10.1016/j.ijinfomgt.2019.05.021 [ CrossRef ] [ Google Scholar ]
  • Floyd D. L., Prentice-Dunn S., Rogers R. W. A. (2000). meta-analysis of research on protection motivation theory . J. Appl. Soc. Psychol . 30 , 407–429. 10.1111/j.1559-1816.2000.tb02323.x [ CrossRef ] [ Google Scholar ]
  • Fornell C., Larcker D. F. (1981). Evaluating structural equation models with unobservable variables and measurement error . J. Market. Res . 18 , 39–50. 10.1177/002224378101800104 [ CrossRef ] [ Google Scholar ]
  • Friestad M., Thorson E. (1985). The Role of Emotion in Memory for Television Commercials . Washington, DC: Educational Resources Information Center. [ Google Scholar ]
  • Furnell S., Thomson K. L. (2009). Recognizing and addressing “security fatigue” . Comput. Fraud Secur . 11 , 7–11. 70139-3 10.1016/S1361-3723(09)70139-3 [ CrossRef ] [ Google Scholar ]
  • Hair J. F., Ringle C. M., Gudergan S. P. (2019). Partial least squares structural equation modeling-based discrete choice modeling: an illustration in modeling retailer choice . Bus. Res . 12 , 115–142. 10.1007/s40685-018-0072-4 [ CrossRef ] [ Google Scholar ]
  • Hargittai E., Marwick A. (2016). “What can I really do?” Explaining the privacy paradox with online apathy . Int. J. Commun. 10 , 21. 1932–8036/20160005. [ Google Scholar ]
  • Higgins E. T. (1997). Beyond pleasure and pain . Am. Psychol . 52 , 1280–1300. 10.1037/0003-066X.52.12.1280 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Higgins E. T. (2002). How self-regulation creates distinct values: the case of promotion and prevention decision making . J. Consum. Psychol . 12 , 177–191. 10.1207/S15327663JCP1203_01 [ CrossRef ] [ Google Scholar ]
  • Hsu C. L., Park S. J., Park H. W. (2013). Political discourse among key Twitter users: the case of Sejong city in South Korea . J. Contemp. Eastern Asia . 12 , 65–79. 10.17477/jcea.2013.12.1.065 [ CrossRef ] [ Google Scholar ]
  • Jin S. A. A. (2012). To disclose or not to disclose, that is the question: A structural equation modeling approach to communication privacy management in e-health . Comput. Hum. Behav . 28 , 69–77. 10.1016/j.chb.2011.08.012 [ CrossRef ] [ Google Scholar ]
  • Jones J. L., Leary M. R. (1994). Effects of appearance-based admonitions against sun exposure on tanning intentions in young-adults . Health Psychol. 13 , 86–90. 10.1037/0278-6133.13.1.86 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Juhee K. Eric J. The Market Effect of Healthcare Security: Do Patients Care About Data Breaches? (2018). Available online at: https//www.econinfosec.org/archive/weis2015/papers/WEIS_2015_kwon.pdf (accessed October 30, 2018).
  • Keith M. J., Maynes C., Lowry P. B., Babb J. (2014). “Privacy fatigue: the effect of privacy control complexity on consumer electronic information disclosure,” in International Conference on Information Systems (ICIS 2014) , Auckland , 14–17. [ Google Scholar ]
  • Kisekka V., Giboney J. S. (2018). The effectiveness of health care information technologies: evaluation of trust, security beliefs, and privacy as determinants of health care outcomes . J. Med. Int. Res . 20, 9014. 10.2196/jmir.9014 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Klein C. T., Helweg-Larsen M. (2002). Perceived control and the optimistic bias: a meta-analytic review . Psychol. Health . 17 , 437–446. 10.1080/0887044022000004920 [ CrossRef ] [ Google Scholar ]
  • Lai Y. L., Hui K. L. (2006). “Internet opt-in and opt-out: Investigating the roles of frames, defaults and privacy concerns,” in Proceedings of the 2006 ACM SIGMIS CPR Conference on Computer Personnel Research . New York, NY: ACM , 253–263. [ Google Scholar ]
  • Lam L. W. (2012). Impact of competitiveness on salespeople's commitment and performance . J. Bus. Res. 65 , 1328–1334. 10.1016/j.jbusres.2011.10.026 [ CrossRef ] [ Google Scholar ]
  • Li H., Wu J., Gao Y., Shi Y. (2016). Examining individuals' adoption of healthcare wearable devices: an empirical study from privacy calculus perspective . Int. J. Med. Inf. 88 , 8–17. 10.1016/j.ijmedinf.2015.12.010 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Li P., Cho H., Goh Z. H. (2019). Unpacking the process of privacy management and self-disclosure from the perspectives of regulatory focus and privacy calculus . Telematic. Inf. 41 , 114–125. 10.1016/j.tele.2019.04.006 [ CrossRef ] [ Google Scholar ]
  • Li X. (2008). Third-person effect, optimistic bias, and sufficiency resource in Internet use . J. Commun . 58 , 568–587. 10.1111/j.1460-2466.2008.00400.x [ CrossRef ] [ Google Scholar ]
  • Liang H., Xue Y. L. (2010). Understanding security behaviors in personal computer usage: a threat avoidance perspective . J. Assoc. Inf. Syst . 11 , 394–413. 10.17705/1jais.00232 [ CrossRef ] [ Google Scholar ]
  • Mao H., Bao T., Shen X., Li Q., Seluzicki C., Im E. O., et al.. (2018). Prevalence and risk factors for fatigue among breast cancer survivors on aromatase inhibitors . Eur. J. Cancer . 101 , 47–54. 10.1016/j.ejca.2018.06.009 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • McLeod A., Dolezel D. (2022). Information security policy non-compliance: can capitulation theory explain user behaviors? Comput. Secur . 112, 102526. 10.1016/j.cose.2021.102526 [ CrossRef ] [ Google Scholar ]
  • Mohamed N., Ahmad I. H. (2012). Information privacy concerns, antecedents and privacy measure use in social networking sites: evidence from Malaysia . Comput. Hum. Behav . 28 , 2366–2375. 10.1016/j.chb.2012.07.008 [ CrossRef ] [ Google Scholar ]
  • Mousavi R., Chen R., Kim D. J., Chen K. (2020). Effectiveness of privacy assurance mechanisms in users' privacy protection on social networking sites from the perspective of protection motivation theory . Decision Supp. Syst . 135, 113323. 10.1016/j.dss.2020.113323 [ CrossRef ] [ Google Scholar ]
  • Oh J., Lee U., Lee K. (2019). Privacy fatigue in the internet of things (IoT) environment . INPRA 6 , 21–34. [ Google Scholar ]
  • Okazaki S., Li H., Hirose M. (2009). Consumer privacy concerns and preference for degree of regulatory control . J. Adv. 38 , 63–77. 10.2753/JOA0091-3367380405 [ CrossRef ] [ Google Scholar ]
  • Ong A. D., Bergeman C. S., Bisconti T. L., Wallace K. A. (2006). Psychological resilience, positive emotions, and successful adaptation to stress in later life . J. Pers. Soc. Psychol. 91 , 730. 10.1037/0022-3514.91.4.730 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Pavlou P. A., Gefen D. (2005). Psychological contract violation in online marketplaces: antecedents, consequences, and moderating role . Inf. Syst. Res. 16 , 372–399. 10.1287/isre.1050.0065 [ CrossRef ] [ Google Scholar ]
  • Petronio S. (2002). Boundaries of Privacy: Dialectics of Disclosure . Albany, NY: State University of New York Press. [ Google Scholar ]
  • Piper B. F., Lindsey A. M., Dodd M. J. (1987). Fatigue mechanisms in cancer patients: developing nursing theory . Oncol. Nurs. Forum . 14, 17. [ PubMed ] [ Google Scholar ]
  • Prentice-Dunn S., Mcmath B. F., Cramer R. J. (2009). Protection motivation theory and stages of change in sun protective behavior . J. Health Psychol . 14 , 297–305. 10.1177/1359105308100214 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Price B. A., Adam K., Nuseibeh B. (2005). Keeping ubiquitous computing to yourself: a practical model for user control of privacy . Int. J. Hum. Comput. Stu. 63 , 228–253. 10.1016/j.ijhcs.2005.04.008 [ CrossRef ] [ Google Scholar ]
  • Ravindran T., Yeow Kuan A. C., Hoe Lian D. G. (2014). Antecedents and effects of social network fatigue . J. Assoc. Inf. Sci. Technol . 65 , 2306–2320. 10.1002/asi.23122 [ CrossRef ] [ Google Scholar ]
  • Revell T. (2019). Facebook Must Come Clean and Hand Over Election Campaign Data. New Scientist . Available online at: https://www.newscientist.com/article/mg24332472-300-face-book-must-come-clean-and-hand-over-election-campaign-data/ (accessed September 11, 2019).
  • Rogers R. W. A. (1975). protection motivation theory of fear appeals and attitude change . J. Psychol . 91 , 93–114. 10.1080/00223980.1975.9915803 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sheng N., Yang C., Han L., Jou M. (2022). Too much overload and concerns: antecedents of social media fatigue and the mediating role of emotional exhaustion . Comput. Hum. Behav. 139 , 107500. 10.1016/j.chb.2022.107500 [ CrossRef ] [ Google Scholar ]
  • Su P., Wang L., Yan J. (2018). How users' internet experience affects the adoption of mobile payment: a mediation model . Technol. Anal. Strat. Manage . 30 , 186–197. 10.1080/09537325.2017.1297788 [ CrossRef ] [ Google Scholar ]
  • Tang J., Akram U., Shi W. (2021). Why people need privacy? The role of privacy fatigue in app users' intention to disclose privacy: based on personality traits . J. Ent. Inf. Manage . 34 , 1097–1120. 10.1108/JEIM-03-2020-0088 [ CrossRef ] [ Google Scholar ]
  • Tanner J. F., Hunt J. B., Eppright D. R. (1991). The protection motivation model: a normative model of fear appeals . J. Market . 55 , 36–45. 10.1177/002224299105500304 [ CrossRef ] [ Google Scholar ]
  • Tian X., Chen L., Zhang X. (2022). The role of privacy fatigue in privacy paradox: a psm and heterogeneity analysis . Appl. Sci. 12 , 9702. 10.3390/app12199702 [ CrossRef ] [ Google Scholar ]
  • Tversky A., Kahneman D. (1974). Judgement under uncertainty: heuristics and biases . Science . 185 , 1124–1131. 10.1126/science.185.4157.1124 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wang L., Yan J., Lin J., Cui W. (2017). Let the users tell the truth: Self-disclosure intention and self-disclosure honesty in mobile social networking . Int. J. Inf. Manage . 37 , 1428–1440. 10.1016/j.ijinfomgt.2016.10.006 [ CrossRef ] [ Google Scholar ]
  • Wichstrom L. (1994). Predictors of Norwegian adolescents sunbathing and use of sunscreen . Health Psychol. 13 , 412–420. 10.1037/0278-6133.13.5.412 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wirtz J., Lwin M. O. (2009). Regulatory focus theory, trust, and privacy concern . J. Serv. Res . 12 , 190–207. 10.1177/1094670509335772 [ CrossRef ] [ Google Scholar ]
  • Wu Z., Xie J., Lian X., Pan J. (2019). A privacy protection approach for XML-based archives management in a cloud environment . Electr. Lib . 37 , 970–983. 10.1108/EL-05-2019-0127 [ CrossRef ] [ Google Scholar ]
  • Xiao L., Mou J. (2019). Social media fatigue -Technological antecedents and the moderating roles of personality traits: the case of WeChat . Comput. Hum. Behav . 101 , 297–310. 10.1016/j.chb.2019.08.001 [ CrossRef ] [ Google Scholar ]
  • Xu F., Michael K., Chen X. (2013). Factors affecting privacy disclosure on social network sites: an integrated model . Electr. Comm. Res 13 , 151–168. 10.1007/s10660-013-9111-6 [ CrossRef ] [ Google Scholar ]
  • Yoon C., Hwang J. W., Kim R. (2012). Exploring factors that influence students' behaviors in information security . J. Inf. Syst. Educ . 23 , 407–415. [ Google Scholar ]
  • Youn S., Kim S. (2019). Newsfeed native advertising on Facebook. Young millennials' knowledge, pet peeves, reactance and ad avoidance . Int. J. Adv . 38 , 651–683. 10.1080/02650487.2019.1575109 [ CrossRef ] [ Google Scholar ]
  • Zhang Y., He W., Peng L. (2022). How perceived pressure affects users' social media fatigue behavior: a case on WeChat . J. Comput. Inf. Syst . 62 , 337–348. 10.1080/08874417.2020.1824596 [ CrossRef ] [ Google Scholar ]
  • Zhou Y., Schaub F. (2018). “Concern but no action: consumers, reactions to the equifax data breach,” in Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems , Montreal, QC , 22–26. [ Google Scholar ]
  • Zhu Y.Q., Chang J. H. (2016). The key role of relevance in personalized advertisement: examining its impact on perceptions of privacy invasion, self-awareness, and continuous use intentions . Comput. Hum. Behav . 65 , 442–447. 10.1016/j.chb.2016.08.048 [ CrossRef ] [ Google Scholar ]

Numbers, Facts and Trends Shaping Your World

Read our research on:

Full Topic List

Regions & Countries

  • Publications
  • Our Methods
  • Short Reads
  • Tools & Resources

Read Our Research On:

Americans’ complicated feelings about social media in an era of privacy concerns

Share of countries with highest levels of social hostilities remained stable

Amid public concerns over Cambridge Analytica’s use of Facebook data and a subsequent movement to encourage users to abandon Facebook , there is a renewed focus on how social media companies collect personal information and make it available to marketers.

Pew Research Center has studied the spread and impact of social media since 2005, when just 5% of American adults used the platforms. The trends tracked by our data tell a complex story that is full of conflicting pressures. On one hand, the rapid growth of the platforms is testimony to their appeal to online Americans. On the other, this widespread use has been accompanied by rising user concerns about privacy and social media firms’ capacity to protect their data.

All this adds up to a mixed picture about how Americans feel about social media. Here are some of the dynamics.

People like and use social media for several reasons

social media privacy research paper

About seven-in-ten American adults (69%) now report they use some kind of social media platform (not including YouTube) – a nearly fourteenfold increase since Pew Research Center first started asking about the phenomenon. The growth has come across all demographic groups and includes 37% of those ages 65 and older.

The Center’s polls have found over the years that people use social media for important social interactions like staying in touch with friends and family and reconnecting with old acquaintances. Teenagers are especially likely to report that social media are important to their friendships and, at times, their romantic relationships .

Beyond that, we have documented how social media play a role in the way people participate in civic and political activities, launch and sustain protests , get and share health information , gather scientific information , engage in family matters , perform job-related activities and get news . Indeed, social media is now just as common a pathway to news for people as going directly to a news organization website or app.

Our research has not established a causal relationship between people’s use of social media and their well-being. But in a 2011 report, we noted modest associations between people’s social media use and higher levels of trust, larger numbers of close friends, greater amounts of social support and higher levels of civic participation.

People worry about privacy and the use of their personal information

While there is evidence that social media works in some important ways for people, Pew Research Center studies have shown that people are anxious about all the personal information that is collected and shared and the security of their data.

Overall, a 2014 survey found that 91% of Americans “agree” or “strongly agree” that people have lost control over how personal information is collected and used by all kinds of entities. Some 80% of social media users said they were concerned about advertisers and businesses accessing the data they share on social media platforms, and 64% said the government should do more to regulate advertisers.

social media privacy research paper

Another survey last year found that just 9% of social media users were “very confident” that social media companies would protect their data . About half of users were not at all or not too confident their data were in safe hands.

Moreover, people struggle to understand the nature and scope of the data collected about them. Just 9% believe they have “a lot of control” over the information that is collected about them, even as the vast majority (74%) say it is very important to them to be in control of who can get information about them.

Six-in-ten Americans (61%) have said they would like to do more to protect their privacy. Additionally, two-thirds have said current laws are not good enough in protecting people’s privacy, and 64% support more regulation of advertisers.

Some privacy advocates hope that the European Union’s General Data Protection Regulation , which goes into effect on May 25, will give users – even Americans – greater protections about what data tech firms can collect, how the data can be used, and how consumers can be given more opportunities to see what is happening with their information.

People’s issues with the social media experience go beyond privacy

In addition to the concerns about privacy and social media platforms uncovered in our surveys, related research shows that just 5% of social media users trust the information that comes to them via the platforms “a lot.”

social media privacy research paper

Moreover, social media users can be turned off by what happens on social media. For instance, social media sites are frequently cited as places where people are harassed . Near the end of the 2016 election campaign, 37% of social media users said they were worn out by the political content they encountered, and large shares said social media interactions with those opposed to their views were stressful and frustrating. Large shares also said that social media interactions related to politics were less respectful, less conclusive, less civil and less informative than offline interactions.

A considerable number of social media users said they simply ignored  political arguments when they broke out in their feeds. Others went steps further by blocking or unfriending those who offended or bugged them.

Why do people leave or stay on social media platforms?

The paradox is that people use social media platforms even as they express great concern about the privacy implications of doing so – and the social woes they encounter. The Center’s most recent survey about social media found that 59% of users said it would  not be difficult to give up these sites, yet the share saying these sites would be hard to give up grew 12 percentage points from early 2014.

Some of the answers about why people stay on social media could tie to our findings about how people adjust their behavior on the sites and online, depending on personal and political circumstances. For instance, in a 2012 report we found that 61% of Facebook users said they had taken a break from using the platform. Among the reasons people cited were that they were too busy to use the platform, they lost interest, they thought it was a waste of time and that it was filled with too much drama, gossip or conflict.

In other words, participation on the sites for many people is not an all-or-nothing proposition.

People pursue strategies to try to avoid problems on social media and the internet overall. Fully 86% of internet users said in 2012 they had taken steps to try to be anonymous online. “Hiding from advertisers” was relatively high on the list of those they wanted to avoid.

Many social media users fine-tune their behavior to try to make things less challenging or unsettling on the sites, including changing their privacy settings and restricting access to their profiles. Still, 48% of social media users reported in a 2012 survey they have difficulty managing their privacy controls.

After National Security Agency contractor Edward Snowden disclosed details about government surveillance programs starting in 2013, 30% of adults said they took steps to hide or shield their information and 22% reported they had changed their online behavior in order to minimize detection.

One other argument that some experts make in Pew Research Center canvassings about the future is that people often find it hard to disconnect because so much of modern life takes place on social media. These experts believe that unplugging is hard because social media and other technology affordances make life convenient and because the platforms offer a very efficient, compelling way for users to stay connected to the people and organizations that matter to them.

Note: See topline results  for overall social media user data   here (PDF).

  • Emerging Technology
  • Online Privacy & Security
  • Privacy Rights
  • Social Media
  • Technology Policy Issues

Download Lee Rainie's photo

Lee Rainie is director of internet and technology research at Pew Research Center .

A quarter of U.S. teachers say AI tools do more harm than good in K-12 education

Many americans think generative ai programs should credit the sources they rely on, americans’ use of chatgpt is ticking up, but few trust its election information, q&a: how we used large language models to identify guests on popular podcasts, computer chips in human brains: how americans view the technology amid recent advances, most popular.

1615 L St. NW, Suite 800 Washington, DC 20036 USA (+1) 202-419-4300 | Main (+1) 202-857-8562 | Fax (+1) 202-419-4372 |  Media Inquiries

Research Topics

  • Email Newsletters

ABOUT PEW RESEARCH CENTER  Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of  The Pew Charitable Trusts .

© 2024 Pew Research Center

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 12 June 2024

Social media needs science-based guidelines

Nature Reviews Psychology volume  3 ,  page 367 ( 2024 ) Cite this article

429 Accesses

13 Altmetric

Metrics details

  • Psychiatric disorders
  • Science, technology and society

The debate about the negative impact of social media use is heated. Psychology research must avoid the noise and remain focused on improving adolescent mental health.

Social media is a part of adolescents’ daily reality, and as such it necessarily influences their behaviour and development. There is heated academic debate about the consequences of this influence and how to deal with them. However, academics are not the only ones interested in this debate. Some of the most profitable private companies in the world, along with citizens and governments, are concerned about the outcome of this discussion.

The debate has financial implications for social media companies, which are constantly working to find new users and increase the time and money that users spend on their platforms. Through this lens, young people are an attractive market that can potentially increase revenue for products that already generate billions of dollars annually. Revenue objectives are linked to technological advances (for example, apps specifically designed to nudge users to engage with the platforms) 1 , leading social media technology to develop fast. Within this logic of constant technological advances and ever-growing profits, limiting the use of social media to respond to potential safety concerns is simply not part of the business model. Even if it were, the speed of advancements means that by the time psychological science reaches conclusions about the impact of a specific platform, social media technology has already changed.

In a Comment published in this issue, Montag et al. argue that governments must step up to ensure the safety of adolescents in the digital age. Policy makers should design legislation to proactively prevent potential harm. The authors suggest that such legislative measures should be informed by well-funded and structured research conducted by independent academics from multidisciplinary backgrounds. In particular, Montag et al. propose that research input is needed on the appropriateness of entry age barriers for social media platforms and that researchers should be granted access to scrutinize the design features and digital infrastructure that companies use to engage users. Research insights about safety concerns could enable the development of healthier social media platforms.

In the absence of a clear evidence base, guidelines and recommendations are put forward that play on parents’ fears despite a lack of supporting evidence. Writing in Nature 2 , Candice Odgers notes that such fear-based parental guidelines are profitable for the publishers because they create alarm and are therefore shared widely. However, those guidelines might promote policies that are untested and might therefore be ineffective or create new problems. Moreover, focusing only on social media reduces the complexity of one of the most challenging problems in modern society: the rise in the prevalence of mental health conditions and suicide among adolescents over the past decade 3 , 4 .

Because this rise in the prevalence of mental health conditions in adolescents coincides with the increase in digital technologies and social media use, some speculate that these trends are related 5 . However, the onset and development of mental health disorders cannot be reduced to a single cause. In a Review in this issue, Orben et al. side-step the question of whether social media use is or is not associated with adolescent mental health, and instead consider how social media acts as an amplifier of the socioemotional and neurobiological changes that increase mental health vulnerability during adolescence. Indeed, most mental health conditions have an onset before the age of 25.

Drawing on the concept of ‘affordances’ (the perceived and flexible action possibilities of digital environments, such as broad visibility of content and the persistence of online content over time), Orben et al. consider both potentially harmful effects of social media (such as promotion of risky behaviour) and potentially beneficial effects (such as exploring self-identify without ‘real-world’ consequences). They argue that focusing on social media features and considering the mechanisms by which these features interact with developmental changes provides a productive framework for researchers to study the effects of technology despite an ever-changing social media landscape. Ultimately, such work could lead to specific science-based interventions to protect and promote adolescent mental health.

“We join the call to redirect the focus of the conversation […] towards the important problem of how to improve the mental health of the world’s youth”

Psychologists are increasingly asked to contribute — and sometimes lead — many of the most relevant conversations of our times. The rise of mental health problems in adolescents is one of these conversations, and how the community of psychological scientists responds to this challenge will shape the future of a whole generation. We join the call to redirect the focus of the conversation away from the ‘moral panic’ of new technology and towards the important problem of how to improve the mental health of the world’s youth.

Flayelle, M. et al. A taxonomy of technology design features that promote potentially addictive online behaviours. Nat. Rev. Psychol. 2 , 136–150 (2023).

Article   Google Scholar  

Odgers, C. L. The great rewiring: is social media really behind an epidemic of teenage mental illness? Nature https://doi.org/10.1038/d41586-024-00902-2 (2024).

Plana‐Ripoll, O. et al. Temporal changes in sex‐ and age‐specific incidence profiles of mental disorders — a nationwide study from 1970 to 2016. Acta Psychiatr. Scand. 145 , 604–614 (2022).

Article   PubMed   PubMed Central   Google Scholar  

Mojtabai, R. & Olfson, M. National trends in mental health care for US adolescents. JAMA Psychiatry 77 , 703 (2020).

Article   PubMed   Google Scholar  

Valkenburg, P. M., Meier, A. & Beyens, I. Social media use and its impact on adolescent mental health: an umbrella review of the evidence. Curr. Opin. Psychol. https://doi.org/10.1016/j.copsyc.2021.08.017 (2022).

Download references

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Social media needs science-based guidelines. Nat Rev Psychol 3 , 367 (2024). https://doi.org/10.1038/s44159-024-00327-8

Download citation

Published : 12 June 2024

Issue Date : June 2024

DOI : https://doi.org/10.1038/s44159-024-00327-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

social media privacy research paper

American Psychological Association Logo

Potential risks of content, features, and functions: The science of how social media affects youth

tween girl looking at tablet computer

Almost a year after APA issued its health advisory on social media use in adolescence , society continues to wrestle with ways to maximize the benefits of these platforms while protecting youth from the potential harms associated with them. 1

By early 2024, few meaningful changes to social media platforms had been enacted by industry, and no federal policies had been adopted. There remains a need for social media companies to make fundamental changes to their platforms.

Psychological science continues to reveal benefits from social media use , as well as risks and opportunities that certain content, features, and functions present to young social media users. The science discussed below highlights the need to enact new, responsible safety standards to mitigate harm. 2

Download in PDF format (267KB)

Related content

  • APA report calls on social media companies to take responsibility to protect youth
  • How much is too much social media use?

Elaboration of science on social media content, features, and functions

Platforms built for adults are not inherently suitable for youth. i Youth require special protection due to areas of competence or vulnerability as they progress through the childhood, teenage, and late adolescent years. ii This is especially true for youth experiencing psychological, physical, intellectual, mental health, or other developmental challenges; chronological age is not directly associated with social media readiness . iii

Hypersensitivity to social feedback

Brain development starting at ages 10–13 (i.e., the outset of puberty) until approximately the mid-twenties is linked with hypersensitivity to social feedback/stimuli. iv In other words, youth become especially invested in behaviors that will help them get personalized feedback, praise, or attention from peers.

  • AI-recommended content has the potential to be especially influential and hard to resist within this age range. v It is critical that AI-recommended content be designed to prioritize youth safety and welfare over engagement. This suggests potentially restricting the use of personalized recommendations using youth data, design features that may prioritize content evoking extreme emotions, or content that may depict illegal or harmful behavior.
  • Likes and follower counts activate neural regions that trigger repetitive behavior, and thus may exert greater influence on youths’ attitudes and behavior than among adults. vi Youth are especially sensitive to both positive social feedback and rejection from others. Using these metrics to maintain platform engagement capitalizes on youths’ vulnerabilities and likely leads to problematic use.
  • The use of youth data for tailored ad content similarly is influential for youth who are biologically predisposed toward peer influence at this stage and sensitive to personalized content. vii

social media privacy research paper

Need for relationship skill building

Adolescence is a critical period for the development of more complex relationship skills, characterized by the ability to form emotionally intimate relationships. viii The adolescent years should provide opportunities to practice these skills through one-on-one or small group interactions.

  • The focus on metrics of followers, likes, and views focuses adolescents’ attention on unilateral, depersonalized interactions and may discourage them from building healthier and psychologically beneficial relationship skills. ix

Susceptibility to harmful content

Adolescence is a period of heightened susceptibility to peer influence, impressionability, and sensitivity to social rejection. x Harmful content, including cyberhate, the depiction of illegal behavior, and encouragement to engage in self-harm (e.g., cutting or eating-disordered behavior) is associated with increased mental health difficulties among both the targets and witnesses of such content. xi

  • The absence of clear and transparent processes for addressing reports of harmful content makes it harder for youth to feel protected or able to get help in the face of harmful content.

Underdeveloped impulse control

Youths’ developing cortical system (particularly in the brain’s inhibitory control network) makes them less capable of resisting impulses or stopping themselves from behavior that may lead to temporary benefit despite negative longer-term consequences. xii This can lead to adolescents making decisions based on short-term gain, lower appreciation of long-term risks, and interference with focus on tasks that require concentration.

  • Infinite scroll is particularly risky for youth since their ability to monitor and stop engagement on social media is more limited than among adults. xiii This contributes to youths’ difficulty disengaging from social media and may contribute to high rates of youth reporting symptoms of clinical dependency on social media. xiv
  • The lack of time limits on social media use similarly is challenging for youth, particularly during the school day or at times when they should be doing homework. xv
  • Push notifications capitalize on youths’ sensitivity to distraction. Task-shifting is a higher order cognitive ability not fully developed until early adulthood and may interfere with youths’ focus during class time and when they should be doing homework. xvi
  • The use and retention of youths’ data without appropriate parental consent, and/or child assent in developmentally appropriate language, capitalizes on youths’ relatively poor appreciation for long-term consequences of their actions, permanence of online content, or their ability to weigh the risks of their engagement on social media. xvii

Reliance on sleep for healthy brain development

Other than the first year of life, puberty is the most important period of brain growth and reorganization in our lifetimes. xviii Sleep is essential for healthy brain development and mental health in adolescence. xix Sleep delay or disruptions have significant negative effects on youths’ attention, behavior, mood, safety, and academic performance.

  • A lack of limits on the time of day when youth can use social media has been cited as the predominant reason why adolescents are getting less than the recommended amount of sleep, with significant implications for brain and mental health. xx

social media privacy research paper

Vulnerability to malicious actors

Youth are easily deceived by predators and other malicious actors who may attempt to interact with them on social media channels. xxi

  • Connection and direct messaging with adult strangers places youth at risk of identity theft and potentially dangerous interactions, including sexploitation.

Need for parental/caregiver partnership

Research indicates that youth benefit from parental support to guide them toward safe decisions and to help them understand and appropriately respond to complex social interactions. xxii Granting parents oversight of youths’ accounts should be offered in balance with adolescents’ needs for autonomy, privacy, and independence. However, it should be easier for parents to partner with youth online in a manner that fits their family’s needs.

  • The absence of transparent and easy-to-use parental/caregiver tools increases parents’ or guardians’ difficulty in supporting youths’ experience on social media. xxiii

Health advisory on social media use in adolescence

Related topics

  • Social media and the internet
  • Mental health

Recommended Reading

Cover of Kid Confident Book 3

Science Spotlight

Get the most relevant news and information for psychological scientists delivered to your inbox every two weeks

Welcome! Thank you for subscribing.

Six Things Psychologists are Talking About

The APA Monitor on Psychology ® sister e-newsletter offers fresh articles on psychology trends, new research, and more.

A path forward based on science

Change is needed soon. Solutions should reflect a greater understanding of the science in at least three ways.

First, youth vary considerably in how they use social media. Some uses may promote healthy development and others may create harm. As noted in the APA health advisory , using social media is not inherently beneficial or harmful to young people. The effects of social media depend not only on what teens can do and see online, but teens’ pre-existing strengths or vulnerabilities, and the contexts in which they grow up.

Second, science has highlighted biological and psychological abilities/vulnerabilities that interact with the content, functions, and features built into social media platforms, and it is these aspects of youths’ social media experience that must be addressed to attenuate risks. xxiv Social media use, functionality, and permissions/consenting should be tailored to youths’ developmental capabilities. Design features created for adults may not be appropriate for children.

Third, youth are adept at working around age restrictions. Substantial data reveal a remarkable number of children aged 12 years and younger routinely using social media, indicating that current policies and practices to restrict use to older youth are not working. xxv

Policies will not protect youth unless technology companies are required to reduce the risks embedded within the platforms themselves.

As policymakers at every level assess their approach to this complex issue, it is important to note the limitations of frequently proposed policies, which are often misreported and fall far short of comprehensive safety solutions that will achieve meaningful change.

Restricting downloads

Restricting application downloads at the device level does not fully restrict youths’ access and will not meaningfully improve the safety of social media platforms. Allowing platforms to delegate responsibility to app stores does not address the vulnerabilities and harms built into the platforms.

social media privacy research paper

Requiring age restrictions

Focusing only on age restrictions does not improve the platforms or address the biological and psychological vulnerabilities that persist past age 18. While age restriction proposals could offer some benefits if effectively and equitably implemented, they do not represent comprehensive improvements to social media platforms, for at least four reasons:

  • Creating a bright line age limit ignores individual differences in adolescents’ maturity and competency
  • These proposals fail to mitigate the harms for those above the age limit and can lead to a perception that social media is safe for adolescents above the threshold age, though neurological changes continue until age 25
  • Completely limiting access to social media may disadvantage those who are experiencing psychological benefits from social media platforms, such as community support and access to science-based resources, which particularly impact those in marginalized populations
  • The process of age-verification requires more thoughtful consideration to ensure that the storage of official identification documents does not systematically exclude subsets of youth, create risks for leaks, or circumvent the ability of young people to maintain anonymity on social platforms.

Use of parental controls

Granting parents and caregivers greater access to their children’s social media accounts will not address risks embedded within platforms themselves. More robust and easy-to-use parental controls would help some younger age groups, but as a sole strategy, this approach ignores the complexities of adolescent development, the importance of childhood autonomy and privacy, and disparities in time or resources available for monitoring across communities. xxvi

[Related: Keeping teens safe on social media: What parents should know to protect their kids ]

Some parents might be technologically ill-equipped, lack the time or documentation to complete requirements, or simply be unavailable to complete these requirements. Disenfranchising some young people from these platforms creates inequities. xxvii

social media privacy research paper

Speaking of Psychology

Subscribe to APA’s audio podcast series highlighting some of the most important and relevant psychological research being conducted today.

Subscribe to Speaking of Psychology and download via:

Listen to podcast on iTunes

1 These recommendations enact policies and resolutions approved by the APA Council of Representatives including the APA Resolution on Child and Adolescent Mental and Behavioral Health and the APA Resolution on Dismantling Systemic Racism in contexts including social media. These are not professional practice guidelines but are intended to provide information based on psychological science.

2 This report seeks to elaborate on extant psychological science findings, which may be particularly relevant in the creation of policy solutions that protect young people, and to inform the development of social media safety standards.

Recommendations from APA’s health advisory on social media use in adolescence

  • Youth using social media should be encouraged to use functions that create opportunities for social support, online companionship, and emotional intimacy that can promote healthy socialization.
  • Social media use, functionality, and permissions/consenting should be tailored to youths’ developmental capabilities; designs created for adults may not be appropriate for children.
  • In early adolescence (i.e., typically 10–14 years), adult monitoring (i.e., ongoing review, discussion, and coaching around social media content) is advised for most youths’ social media use; autonomy may increase gradually as kids age and if they gain digital literacy skills. However, monitoring should be balanced with youths’ appropriate needs for privacy.
  • To reduce the risks of psychological harm, adolescents’ exposure to content on social media that depicts illegal or psychologically maladaptive behavior, including content that instructs or encourages youth to engage in health-risk behaviors, such as self-harm (e.g., cutting, suicide), harm to others, or those that encourage eating-disordered behavior (e.g., restrictive eating, purging, excessive exercise) should be minimized, reported, and removed; moreover, technology should not drive users to this content.
  • To minimize psychological harm, adolescents’ exposure to “cyberhate” including online discrimination, prejudice, hate, or cyberbullying especially directed toward a marginalized group (e.g., racial, ethnic, gender, sexual, religious, ability status), or toward an individual because of their identity or allyship with a marginalized group should be minimized.
  • Adolescents should be routinely screened for signs of “problematic social media use” that can impair their ability to engage in daily roles and routines, and may present risk for more serious psychological harms over time.
  • The use of social media should be limited so as to not interfere with adolescents’ sleep and physical activity.
  • Adolescents should limit use of social media for social comparison, particularly around beauty- or appearance-related content.
  • Adolescents’ social media use should be preceded by training in social media literacy to ensure that users have developed psychologically-informed competencies and skills that will maximize the chances for balanced, safe, and meaningful social media use.
  • Substantial resources should be provided for continued scientific examination of the positive and negative effects of social media on adolescent development.

Acknowledgments

We wish to acknowledge the outstanding contributions to this report made by the following individuals:

Expert advisory panel

Mary Ann McCabe, PhD, ABPP, member-at-large, Board of Directors, American Psychological Association; associate clinical professor of pediatrics, The George Washington University School of Medicine and Health Sciences

Mitchell J. Prinstein, PhD, ABPP, chief science officer, American Psychological Association; John Van Seters Distinguished Professor of Psychology and Neuroscience, University of North Carolina at Chapel Hill

Mary K. Alvord, PhD, founder, Alvord, Baker & Associates; board president, Resilience Across Borders; adjunct associate professor of psychiatry and behavioral sciences, The George Washington University School of Medicine and Health Sciences

Dawn T. Bounds, PhD, PMHNP-BC, FAAN, assistant professor, Sue & Bill Gross School of Nursing, University of California, Irvine

Linda Charmaraman, PhD, senior research scientist, Wellesley Centers for Women, Wellesley College

Sophia Choukas-Bradley, PhD, assistant professor, Department of Psychology, University of Pittsburgh

Dorothy L. Espelage, PhD, William C. Friday Distinguished Professor of Education, University of North Carolina at Chapel Hill

Joshua A. Goodman, PhD, assistant professor, Department of Psychology, Southern Oregon University

Jessica L. Hamilton, PhD, assistant professor, Department of Psychology, Rutgers University

Brendesha M. Tynes, PhD, Dean’s Professor of Educational Equity, University of Southern California

L. Monique Ward, PhD, professor, Department of Psychology (Developmental), University of Michigan

Lucía Magis-Weinberg, MD, PhD, assistant professor, Department of Psychology, University of Washington

We also wish to acknowledge the contributions to this report made by Katherine B. McGuire, chief advocacy officer, and Corbin Evans, JD, senior director of congressional and federal relations, American Psychological Association.

Selected references

i Maza, M. T., Fox, K. A., Kwon, S. J., Flannery, J. E., Lindquist, K. A., Prinstein, M. J., & Telzer, E. H. (2023). Association of habitual checking behaviors on social media with longitudinal functional brain development. JAMA Pediatrics , 177 (2), 160–167; Prinstein, M. J., Nesi, J., & Telzer, E. H. (2020). Commentary: An updated agenda for the study of digital media use and adolescent development—Future directions following Odgers & Jensen (2020). Journal of Child Psychology and Psychiatry , 61 (3), 349–352. https://doi.org/10.1111/jcpp.13219

ii Nesi, J., Choukas-Bradley, S., & Prinstein, M. J. (2018). Transformation of adolescent peer relations in the social media context: Part 1—A theoretical framework and application to dyadic peer relationships. Clinical Child and Family Psychology Review , 21 (3), 267–294. https://doi.org/10.1007/s10567-018-0261-x

iii Valkenburg, P. M., & Peter, J. (2013). The differential susceptibility to media effects model. Journal of Communication , 63 (2), 221–243. https://doi.org/10.1111/jcom.12024

iv Fareri, D. S., Martin, L. N., & Delgado, M. R. (2008). Reward-related processing in the human brain: Developmental considerations. Development and Psychopathology , 20 (4), 1191–1211; Somerville, L. H., & Casey, B. J. (2010). Developmental neurobiology of cognitive control and motivational systems. Current Opinion in Neurobiology , 20 (2), 236–241. https://doi.org/10.1016/j.conb.2010.01.006

v Shin, D. (2020). How do users interact with algorithm recommender systems? The interaction of users, algorithms, and performance. Computers in Human Behavior , 109 , 106344. https://doi.org/10.1016/j.chb.2020.106344

vi Sherman, L. E., Payton, A. A., Hernandez, L. M., Greenfield, P. M., & Dapretto, M. (2016). The power of the Like in adolescence: Effects of peer influence on neural and behavioral responses to social media. Psychological Science , 27 (7), 1027–1035. https://doi.org/10.1177/0956797616645673

vii Albert, D., Chein, J., & Steinberg, L. (2013). The teenage brain: Peer influences on adolescent decision making. Current Directions in Psychological Science , 22 (2), 114–120. https://doi.org/10.1177/0963721412471347

viii Armstrong-Carter, E., & Telzer, E. H. (2021). Advancing measurement and research on youths’ prosocial behavior in the digital age. Child Development Perspectives , 15 (1), 31–36. https://doi.org/10.1111/cdep.12396 ; Newcomb, A. F., & Bagwell, C. L. (1995). Children’s friendship relations: A meta-analytic review. Psychological Bulletin , 117 (2), 306.

ix Nesi, J., & Prinstein, M. J. (2019). In search of likes: Longitudinal associations between adolescents’ digital status seeking and health-risk behaviors. Journal of Clinical Child & Adolescent Psychology , 48 (5), 740–748. https://doi.org/10.1080/15374416.2018.1437733 ; Rotondi, V., Stanca, L., & Tomasuolo, M. (2017). Connecting alone: Smartphone use, quality of social interactions and well-being. Journal of Economic Psychology , 63 , 17–26. https://doi.org/10.1016/j.joep.2017.09.001

x Sherman, L. E., Payton, A. A., Hernandez, L. M., Greenfield, P. M., & Dapretto, M. (2016). The power of the Like in adolescence: Effects of peer influence on neural and behavioral responses to social media. Psychological Science , 27 (7), 1027–1035. https://doi.org/10.1177/0956797616645673

xi Susi, K., Glover-Ford, F., Stewart, A., Knowles Bevis, R., & Hawton, K. (2023). Research review: Viewing self-harm images on the internet and social media platforms: Systematic review of the impact and associated psychological mechanisms. Journal of Child Psychology and Psychiatry , 64 (8), 1115–1139.

xii Hartley, C. A., & Somerville, L. H. (2015). The neuroscience of adolescent decision-making. Current Opinion in Behavioral Sciences , 5 , 108–115. https://doi.org/10.1016/j.cobeha.2015.09.004

xiii Atherton, O. E., Lawson, K. M., & Robins, R. W. (2020). The development of effortful control from late childhood to young adulthood. Journal of Personality and Social Psychology , 119 (2), 417–456. https://doi.org/10.1037/pspp0000283

xiv Boer, M., Stevens, G. W., Finkenauer, C., & Van den Eijnden, R. J. (2022). The course of problematic social media use in young adolescents: A latent class growth analysis. Child Development , 93 (2), e168–e187.

xv Hall, A. C. G., Lineweaver, T. T., Hogan, E. E., & O’Brien, S. W. (2020). On or off task: The negative influence of laptops on neighboring students’ learning depends on how they are used. Computers & Education , 153 , 103901. https://doi.org/10.1016/j.compedu.2020.103901 ; Sana, F., Weston, T., & Cepeda, N. J. (2013). Laptop multitasking hinders classroom learning for both users and nearby peers. Computers & Education , 62 , 24–31. https://doi.org/10.1016/j.compedu.2012.10.003

xvi von Bastian, C. C., & Druey, M. D. (2017). Shifting between mental sets: An individual differences approach to commonalities and differences of task switching components. Journal of Experimental Psychology: General , 146 (9), 1266–1285. https://doi.org/10.1037/xge0000333

xvii Andrews, J. C., Walker, K. L., & Kees, J. (2020). Children and online privacy protection: Empowerment from cognitive defense strategies. Journal of Public Policy & Marketing , 39 (2), 205–219. https://doi.org/10.1177/0743915619883638 ; Romer D. (2010). Adolescent risk taking, impulsivity, and brain development: Implications for prevention. Developmental Psychobiology , 52 (3), 263–276. https://doi.org/10.1002/dev.20442

xviii Orben, A., Przybylski, A. K., Blakemore, S.-J., Kievit, R. A. (2022). Windows of developmental sensitivity to social media. Nature Communications , 13 (1649). https://doi.org/10.1038/s41467-022-29296-3

xix Paruthi, S., Brooks, L. J., D’Ambrosio, C., Hall, W. A., Kotagal, S., Lloyd, R. M., Malow, B. A., Maski, K., Nichols, C., Quan, S. F., Rosen, C. L., Troester, M. M., & Wise, M. S. (2016). Recommended amount of sleep for pediatric populations: A consensus statement of the American Academy of Sleep Medicine. Journal of Clinical Sleep Medicine , 12 (6), 785–786. https://doi.org/10.5664/jcsm.5866

xx Perrault, A. A., Bayer, L., Peuvrier, M., Afyouni, A., Ghisletta, P., Brockmann, C., Spiridon, M., Hulo Vesely, S., Haller, D. M., Pichon, S., Perrig, S., Schwartz, S., & Sterpenich, V. (2019). Reducing the use of screen electronic devices in the evening is associated with improved sleep and daytime vigilance in adolescents. Sleep , 42 (9), zsz125. https://doi.org/10.1093/sleep/zsz125 ; Telzer, E. H., Goldenberg, D., Fuligni, A. J., Lieberman, M. D., & Gálvan, A. (2015). Sleep variability in adolescence is associated with altered brain development. Developmental Cognitive Neuroscience , 14, 16–22. https://doi.org/10.1016/j.dcn.2015.05.007

xxi Livingstone, S., & Smith, P. K. (2014). Annual research review: Harms experienced by child users of online and mobile technologies: The nature, prevalence and management of sexual and aggressive risks in the digital age. Journal of Child Psychology and Psychiatry , 55 (6), 635–654. https://doi.org/10.1111/jcpp.12197 ; Wolak, J., Finkelhor, D., Mitchell, K. J., & Ybarra, M. L. (2008). Online “predators” and their victims: Myths, realities, and implications for prevention and treatment. American Psychologist , 63 (2), 111–128. https://doi.org/10.1037/0003-066X.63.2.111

xxii Wachs, S., Costello, M., Wright, M. F., Flora, K., Daskalou, V., Maziridou, E., Kwon, Y., Na, E.-Y., Sittichai, R., Biswal, R., Singh, R., Almendros, C., Gámez-Guadix, M., Görzig, A., & Hong, J. S. (2021). “DNT LET ’EM H8 U!”: Applying the routine activity framework to understand cyberhate victimization among adolescents across eight countries. Computers & Education , 160 , Article 104026. https://doi.org/10.1016/j.compedu.2020.104026 ; Padilla-Walker, L. M., Stockdale, L. A., & McLean, R. D. (2020). Associations between parental media monitoring, media use, and internalizing symptoms during adolescence. Psychology of Popular Media , 9 (4), 481. https://doi.org/10.1037/ppm0000256

xxiii Dietvorst, E., Hiemstra, M., Hillegers, M. H. J., & Keijsers, L. (2018). Adolescent perceptions of parental privacy invasion and adolescent secrecy: An illustration of Simpson’s paradox. Child Development , 89 (6), 2081–2090. https://doi.org/10.1111/cdev.13002 ; Auxier, B. (2020, July 28). Parenting Children in the Age of Screens. Pew Research Center: Internet, Science & Tech; Pew Research Center. https://www.pewresearch.org/internet/2020/07/28/parenting-children-in-the-age-of-screens/

xxiv National Academies of Sciences, Engineering, and Medicine. (2024). Social media and adolescent health . The National Academies Press. https://doi.org/10.17226/27396

xxv Charmaraman, L., Lynch, A. D., Richer, A. M., & Zhai, E. (2022). Examining early adolescent positive and negative social technology behaviors and well-being during the Covid -19 pandemic. Technology, Mind, and Behavior , 3 (1), Feb 17 2022. https://doi.org/10.1037/tmb0000062

xxvi Dietvorst, E., Hiemstra, M., Hillegers, M.H.J., & Keijsers, L. (2018). Adolescent perceptions of parental privacy invasion and adolescent secrecy: An illustration of Simpson’s paradox. Child Development , 89 (6), 2081–2090. https://doi.org/10.1111/cdev.13002

xxvii Charmaraman, L., Lynch, A. D., Richer, A. M., & Zhai, E. (2022). Examining early adolescent positive and negative social technology behaviors and well-being during the Covid -19 pandemic. Technology, Mind, and Behavior , 3 (1), Feb 17 2022. https://doi.org/10.1037/tmb0000062

Enhancing Financial Risk Prediction Through Echo State Networks and Differential Evolutionary Algorithms in the Digital Era

  • Published: 13 June 2024

Cite this article

social media privacy research paper

  • Huan Xu 1 &
  • Li Yang   ORCID: orcid.org/0009-0009-5859-5223 2  

Explore all metrics

In the ever-evolving landscape of financial investment, the digital era has ushered in a new paradigm characterized by technological innovation and sustainability considerations. This research paper delves into the intersection of technology, sustainability, and financial risk prediction. With the rise of digital finance and automated investment mechanisms, including blockchain technology and social media-driven market sentiment analysis, discerning investors now focus on sustainability through environmental, social, and corporate governance (ESG) criteria. However, navigating this landscape is not without challenges, such as cybersecurity risks and privacy concerns. The paper addresses these issues by proposing a financial risk prediction model that leverages echo state networks (ESN) and differential evolutionary algorithms. By quantifying various risk indicators through data transformation and employing machine learning techniques, the model enhances the accuracy and robustness of risk identification. The research introduces an optimization methodology for multiple swarm differential planning algorithms, optimizing ESN networks for risk identification within financial investment data. Experimental results validate the efficacy of the proposed method, achieving accuracy levels near 90%. This study contributes valuable insights for the future of intelligent finance by demonstrating the superiority of the MPDE-ESN model in risk recognition. Future research directions include expanding the model’s generalization performance, addressing diverse financial risks, and integrating reinforcement learning for dynamic risk determination. Additionally, optimizing feature dimensions and identifying optimal features remain key areas of investigation in this digital age of financial innovation and sustainability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

social media privacy research paper

Data Availability

The data in the article can be accessed on demand.

Abbass, H. A., Sarker, R., & Newton, C. (2001). PDE: A pareto-frontier differential evolution approach for multi-objective optimization problems. IEEE Congress on Evolutionary Computation Piscataway , 971–978.

Andreas, B., & Markus, L. (2011). Economic benefit of powerful credit scoring, National Centre of Competence . Research Financial Valuation and Risk Management, 216 .

Aziz, S., & Dowling, M. (2019). Machine learning and AI for risk management. Disrupting finance: FinTech and strategy in the 21st century, 33–50.

Babu, B. V., Mathew, M., & Jehan, L. (2003). Differential evolution for multi-objective optimization. IEEE Congress on Evolutionary Computation Canberra, 2696–2703.

Boccato, L., Attux, R., & Von, Z. F. J. (2014). Self-organization and lateral interaction in echo state network reservoirs. Neurocomputing , 138 , 297–309.

Article   Google Scholar  

Carnriro, N. (2012). Adaptive consumer credit classification. Journal of the Operational Research Society, (12), 1645–1654.

Chang, A. H., Yang, L. K., Tsaih, R. H., & Lin, S. K. (2022). Machine learning and artificial neural networks to construct P2P lending credit-scoring model: A case using Lending Club data. Quantitative Finance and Economics , 6 (2), 303–325.

Cui, H., Chen, F., & Liu, Y. (2013). Analysis of prediction performance in wavelet minimum complexity echo state network. The Journal of China Universities of Posts and Telecommunications , 20 (4), 59–66.

Dai, Z., Sadiq, M., Kannaiah, D., & Khan, N. (2022). Correction to: The dynamic impacts of financial investment on environmental-health and MDR-TB diseases and their influence on environmental sustainability at Chinese hospitals. Environmental Science and Pollution Research, 29 (27), 40542–40542.

Everett, D., Dellana, S., & Qian, J. X. (2009). Neural network ensemble strategies for financial decision applications . Computers&Operations Research.

Google Scholar  

Gamperle, R., Dmuller, S., & Koumoutsakos, P. (2002). A parameter study for differential evolution [A]. International Conference on Advances in Intelligent Systems, Fuzzy Systems, Evolutionary Computation, 293–298.

Hynes, E. A. (2010). Posner.Information sharing in credit markets: A survey, case Working Paper Series , (36):15–18.

Jiang, H., Li, S., & Wang, W. G. (2023). Moderate deviations for parameter estimation in the fractional Ornstein-Uhlenbeck processes with periodic mean. Acta Mathematica Sinica English Series , 1–17.

Kim, A., Yang, Y., Lessmann, S., Ma, T., & Sung, M. C. (2020). Can deep learning predict risky retail investors? A case study in financial risk behavior forecasting. European Journal of Operational Research , 283 (1), 217–234.

Klifer, A., & Mehmet, B. K. (2009). Consumer credit risk characteristics: Income and expense differentials. Emerging Markets Finance and Trade, (2), 15–26.

Kotaskova, A., Lazanyi, K., Amoah, J., & Belas, J. (2020). Financial risk management in the V4 countries’ SMEs segment . Investment Management and Financial Innovations.

Book   Google Scholar  

Li, X., Zhang, H., & Lu, Z. (2019). A differential evolution algorithm based on multi-population for economic dispatch problems with valve-point effects. Ieee Access : Practical Innovations, Open Solutions , 7 , 95585–95609.

Li, S., Luan, W., Wang, C., Chen, Y., & Zhuang, Z. (2022a). Degradation prediction of proton exchange membrane fuel cell based on Bi-LSTM-GRU and ESN fusion prognostic framework. International Journal of Hydrogen Energy , 47 (78), 33466–33478.

Li, Y., Kou, G., Li, G., & Hefni, M. A. (2022b). Fuzzy multi-attribute information fusion approach for finance investment selection with the expert reliability. Applied Soft Computing , 126 , 109270.

Liu, X., Peng, X., & Stuart, M. (2020). Multiparty game research and example analysis in supply chain finance system based on MPDE theory. Alexandria Engineering Journal , 59 (4), 2315–2321.

Ma, Q. L., & Chen, W. B. (2013). Modular state space of echo state network. Neurocomputing , 122 , 406–417.

Madanvan, K. N. (2002). Multi-objective optimization using a Pareto differential evolution approach. Proceeding of the congress on evolutionary computation, Honolulu, USA , 1145–1150.

Magee, J. (2011). Peer-to-peer lending in the United States surviving after Dodd-Frank. North Carolina Banking Institute Jornal , (15).

Milad, M., & Vural, A. (2015). Risk assessment in social lending via random forests. Elsevier Ltd , 42(10).

Parsopoulos, K. E., Tasoulis, D. K., & Pavlidis, N. G. (2004). Vector evaluated differential evolution for multi-objective optimization. IEEE Congress on Evolutionary Computation Portland , 204–211.

Price, K. V. (1999). An introduction to differential evolution. New Ideas in Optimization, 79–108.

Qian, W., & Ajun, L. (2008). Adaptive differential evolution algorithm for multi-objective optimization problems. Applied Mathematics and Computation , 201 (1–2), 431–440.

Rodan, A., & Tino, P. (2010). Minimum complexity echo state network. IEEE Transactions on Neural Networks , 22 (1), 131–144.

Scardapane, S., Wang, D., & Panella, M. (2016). A decentralized training algorithm for echo state networks in distributed big data applications. Neural Networks, 78 , 65–74.

Shi, Y., & Zhong, X. (2008). Hierarchical differential evolution for parameter estimation in chemical kinetic. International Conference on Pricai: Trends in Artificial Intelligence. Springer-Verlag.

Shi, X., Wang, Z., Zhao, H., Qiu, S., Liu, R., Lin, F., & Tang, K. (2022). Threshold-free phase segmentation and zero velocity detection for gait analysis using foot-mounted inertial sensors. IEEE Transactions on Human-Machine Systems , 53 (1), 176–186.

Singh, D. K., & Goel, N. (2023). Customer relationship management: Two dataset comparison in perspective of bank loan approval using machine learning techniques. Journal of Theoretical and Applied Information Technology, 101 (19)

Storn, R., & Price, K. (1995). Differential evolution-a simple and efficient adaptive scheme for global optimization over continuous spaces. International Computer Science Institute , (8): 22–25.

Storn, R., & Price, K. (1996). Minimizing the real functions of the ICEC’96 contest by differential evolution. International Conference of Evolution Computation. Japan: Nagoya .

Xu, D., Jing, L., & Principe, J. C. (2005). Direct adaptive control: An echo state network and genetic algorithm approach. IEEE International Joint Conference on Neural Networks , 3 : 1483–1486.

Zhang, L., Hua, C., & Tang, Y. (2016). Ill-posed echo state network based on L-curve method for prediction of blast furnace gas flow. Neural Processing Letters, 43 (1), 97–113.

Download references

Author information

Authors and affiliations.

School of Economics, Shandong University, Jinan, 250100, Shandong, China

School of Economics, Southwest University of Political Science & Law, Chongqing, 401120, China

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization: Huan Xu; methodology: Huan Xu, Li Yang; formal analysis: Huan Xu; investigation: Huan Xu, Li Yang; data curation: Li Yang; writing: Huan Xu, Li Yang

Corresponding author

Correspondence to Li Yang .

Ethics declarations

Ethics approval.

This article does not include human and animal studies.

Informed Consent

All authors of this article have given their informed consent to the publication of this article.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Xu, H., Yang, L. Enhancing Financial Risk Prediction Through Echo State Networks and Differential Evolutionary Algorithms in the Digital Era. J Knowl Econ (2024). https://doi.org/10.1007/s13132-024-02084-8

Download citation

Received : 04 February 2024

Accepted : 13 May 2024

Published : 13 June 2024

DOI : https://doi.org/10.1007/s13132-024-02084-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Financial risk prediction
  • Echo state networks (ESN)
  • Differential evolutionary algorithm
  • Data transformation
  • Machine learning
  • Digital finance
  • Sustainability
  • Risk identification

Advertisement

  • Find a journal
  • Publish with us
  • Track your research

The state of AI in 2023: Generative AI’s breakout year

You have reached a page with older survey data. please see our 2024 survey results here ..

The latest annual McKinsey Global Survey  on the current state of AI confirms the explosive growth of generative AI (gen AI) tools . Less than a year after many of these tools debuted, one-third of our survey respondents say their organizations are using gen AI regularly in at least one business function. Amid recent advances, AI has risen from a topic relegated to tech employees to a focus of company leaders: nearly one-quarter of surveyed C-suite executives say they are personally using gen AI tools for work, and more than one-quarter of respondents from companies using AI say gen AI is already on their boards’ agendas. What’s more, 40 percent of respondents say their organizations will increase their investment in AI overall because of advances in gen AI. The findings show that these are still early days for managing gen AI–related risks, with less than half of respondents saying their organizations are mitigating even the risk they consider most relevant: inaccuracy.

The organizations that have already embedded AI capabilities have been the first to explore gen AI’s potential, and those seeing the most value from more traditional AI capabilities—a group we call AI high performers—are already outpacing others in their adoption of gen AI tools. 1 We define AI high performers as organizations that, according to respondents, attribute at least 20 percent of their EBIT to AI adoption.

The expected business disruption from gen AI is significant, and respondents predict meaningful changes to their workforces. They anticipate workforce cuts in certain areas and large reskilling efforts to address shifting talent needs. Yet while the use of gen AI might spur the adoption of other AI tools, we see few meaningful increases in organizations’ adoption of these technologies. The percent of organizations adopting any AI tools has held steady since 2022, and adoption remains concentrated within a small number of business functions.

Table of Contents

  • It’s early days still, but use of gen AI is already widespread
  • Leading companies are already ahead with gen AI
  • AI-related talent needs shift, and AI’s workforce effects are expected to be substantial
  • With all eyes on gen AI, AI adoption and impact remain steady

About the research

1. it’s early days still, but use of gen ai is already widespread.

The findings from the survey—which was in the field in mid-April 2023—show that, despite gen AI’s nascent public availability, experimentation with the tools  is already relatively common, and respondents expect the new capabilities to transform their industries. Gen AI has captured interest across the business population: individuals across regions, industries, and seniority levels are using gen AI for work and outside of work. Seventy-nine percent of all respondents say they’ve had at least some exposure to gen AI, either for work or outside of work, and 22 percent say they are regularly using it in their own work. While reported use is quite similar across seniority levels, it is highest among respondents working in the technology sector and those in North America.

Organizations, too, are now commonly using gen AI. One-third of all respondents say their organizations are already regularly using generative AI in at least one function—meaning that 60 percent of organizations with reported AI adoption are using gen AI. What’s more, 40 percent of those reporting AI adoption at their organizations say their companies expect to invest more in AI overall thanks to generative AI, and 28 percent say generative AI use is already on their board’s agenda. The most commonly reported business functions using these newer tools are the same as those in which AI use is most common overall: marketing and sales, product and service development, and service operations, such as customer care and back-office support. This suggests that organizations are pursuing these new tools where the most value is. In our previous research , these three areas, along with software engineering, showed the potential to deliver about 75 percent of the total annual value from generative AI use cases.

In these early days, expectations for gen AI’s impact are high : three-quarters of all respondents expect gen AI to cause significant or disruptive change in the nature of their industry’s competition in the next three years. Survey respondents working in the technology and financial-services industries are the most likely to expect disruptive change from gen AI. Our previous research shows  that, while all industries are indeed likely to see some degree of disruption, the level of impact is likely to vary. 2 “ The economic potential of generative AI: The next productivity frontier ,” McKinsey, June 14, 2023. Industries relying most heavily on knowledge work are likely to see more disruption—and potentially reap more value. While our estimates suggest that tech companies, unsurprisingly, are poised to see the highest impact from gen AI—adding value equivalent to as much as 9 percent of global industry revenue—knowledge-based industries such as banking (up to 5 percent), pharmaceuticals and medical products (also up to 5 percent), and education (up to 4 percent) could experience significant effects as well. By contrast, manufacturing-based industries, such as aerospace, automotives, and advanced electronics, could experience less disruptive effects. This stands in contrast to the impact of previous technology waves that affected manufacturing the most and is due to gen AI’s strengths in language-based activities, as opposed to those requiring physical labor.

Responses show many organizations not yet addressing potential risks from gen AI

According to the survey, few companies seem fully prepared for the widespread use of gen AI—or the business risks these tools may bring. Just 21 percent of respondents reporting AI adoption say their organizations have established policies governing employees’ use of gen AI technologies in their work. And when we asked specifically about the risks of adopting gen AI, few respondents say their companies are mitigating the most commonly cited risk with gen AI: inaccuracy. Respondents cite inaccuracy more frequently than both cybersecurity and regulatory compliance, which were the most common risks from AI overall in previous surveys. Just 32 percent say they’re mitigating inaccuracy, a smaller percentage than the 38 percent who say they mitigate cybersecurity risks. Interestingly, this figure is significantly lower than the percentage of respondents who reported mitigating AI-related cybersecurity last year (51 percent). Overall, much as we’ve seen in previous years, most respondents say their organizations are not addressing AI-related risks.

2. Leading companies are already ahead with gen AI

The survey results show that AI high performers—that is, organizations where respondents say at least 20 percent of EBIT in 2022 was attributable to AI use—are going all in on artificial intelligence, both with gen AI and more traditional AI capabilities. These organizations that achieve significant value from AI are already using gen AI in more business functions than other organizations do, especially in product and service development and risk and supply chain management. When looking at all AI capabilities—including more traditional machine learning capabilities, robotic process automation, and chatbots—AI high performers also are much more likely than others to use AI in product and service development, for uses such as product-development-cycle optimization, adding new features to existing products, and creating new AI-based products. These organizations also are using AI more often than other organizations in risk modeling and for uses within HR such as performance management and organization design and workforce deployment optimization.

AI high performers are much more likely than others to use AI in product and service development.

Another difference from their peers: high performers’ gen AI efforts are less oriented toward cost reduction, which is a top priority at other organizations. Respondents from AI high performers are twice as likely as others to say their organizations’ top objective for gen AI is to create entirely new businesses or sources of revenue—and they’re most likely to cite the increase in the value of existing offerings through new AI-based features.

As we’ve seen in previous years , these high-performing organizations invest much more than others in AI: respondents from AI high performers are more than five times more likely than others to say they spend more than 20 percent of their digital budgets on AI. They also use AI capabilities more broadly throughout the organization. Respondents from high performers are much more likely than others to say that their organizations have adopted AI in four or more business functions and that they have embedded a higher number of AI capabilities. For example, respondents from high performers more often report embedding knowledge graphs in at least one product or business function process, in addition to gen AI and related natural-language capabilities.

While AI high performers are not immune to the challenges of capturing value from AI, the results suggest that the difficulties they face reflect their relative AI maturity, while others struggle with the more foundational, strategic elements of AI adoption. Respondents at AI high performers most often point to models and tools, such as monitoring model performance in production and retraining models as needed over time, as their top challenge. By comparison, other respondents cite strategy issues, such as setting a clearly defined AI vision that is linked with business value or finding sufficient resources.

The findings offer further evidence that even high performers haven’t mastered best practices regarding AI adoption, such as machine-learning-operations (MLOps) approaches, though they are much more likely than others to do so. For example, just 35 percent of respondents at AI high performers report that where possible, their organizations assemble existing components, rather than reinvent them, but that’s a much larger share than the 19 percent of respondents from other organizations who report that practice.

Many specialized MLOps technologies and practices  may be needed to adopt some of the more transformative uses cases that gen AI applications can deliver—and do so as safely as possible. Live-model operations is one such area, where monitoring systems and setting up instant alerts to enable rapid issue resolution can keep gen AI systems in check. High performers stand out in this respect but have room to grow: one-quarter of respondents from these organizations say their entire system is monitored and equipped with instant alerts, compared with just 12 percent of other respondents.

3. AI-related talent needs shift, and AI’s workforce effects are expected to be substantial

Our latest survey results show changes in the roles that organizations are filling to support their AI ambitions. In the past year, organizations using AI most often hired data engineers, machine learning engineers, and Al data scientists—all roles that respondents commonly reported hiring in the previous survey. But a much smaller share of respondents report hiring AI-related-software engineers—the most-hired role last year—than in the previous survey (28 percent in the latest survey, down from 39 percent). Roles in prompt engineering have recently emerged, as the need for that skill set rises alongside gen AI adoption, with 7 percent of respondents whose organizations have adopted AI reporting those hires in the past year.

The findings suggest that hiring for AI-related roles remains a challenge but has become somewhat easier over the past year, which could reflect the spate of layoffs at technology companies from late 2022 through the first half of 2023. Smaller shares of respondents than in the previous survey report difficulty hiring for roles such as AI data scientists, data engineers, and data-visualization specialists, though responses suggest that hiring machine learning engineers and AI product owners remains as much of a challenge as in the previous year.

Looking ahead to the next three years, respondents predict that the adoption of AI will reshape many roles in the workforce. Generally, they expect more employees to be reskilled than to be separated. Nearly four in ten respondents reporting AI adoption expect more than 20 percent of their companies’ workforces will be reskilled, whereas 8 percent of respondents say the size of their workforces will decrease by more than 20 percent.

Looking specifically at gen AI’s predicted impact, service operations is the only function in which most respondents expect to see a decrease in workforce size at their organizations. This finding generally aligns with what our recent research  suggests: while the emergence of gen AI increased our estimate of the percentage of worker activities that could be automated (60 to 70 percent, up from 50 percent), this doesn’t necessarily translate into the automation of an entire role.

AI high performers are expected to conduct much higher levels of reskilling than other companies are. Respondents at these organizations are over three times more likely than others to say their organizations will reskill more than 30 percent of their workforces over the next three years as a result of AI adoption.

4. With all eyes on gen AI, AI adoption and impact remain steady

While the use of gen AI tools is spreading rapidly, the survey data doesn’t show that these newer tools are propelling organizations’ overall AI adoption. The share of organizations that have adopted AI overall remains steady, at least for the moment, with 55 percent of respondents reporting that their organizations have adopted AI. Less than a third of respondents continue to say that their organizations have adopted AI in more than one business function, suggesting that AI use remains limited in scope. Product and service development and service operations continue to be the two business functions in which respondents most often report AI adoption, as was true in the previous four surveys. And overall, just 23 percent of respondents say at least 5 percent of their organizations’ EBIT last year was attributable to their use of AI—essentially flat with the previous survey—suggesting there is much more room to capture value.

Organizations continue to see returns in the business areas in which they are using AI, and they plan to increase investment in the years ahead. We see a majority of respondents reporting AI-related revenue increases within each business function using AI. And looking ahead, more than two-thirds expect their organizations to increase their AI investment over the next three years.

The online survey was in the field April 11 to 21, 2023, and garnered responses from 1,684 participants representing the full range of regions, industries, company sizes, functional specialties, and tenures. Of those respondents, 913 said their organizations had adopted AI in at least one function and were asked questions about their organizations’ AI use. To adjust for differences in response rates, the data are weighted by the contribution of each respondent’s nation to global GDP.

The survey content and analysis were developed by Michael Chui , a partner at the McKinsey Global Institute and a partner in McKinsey’s Bay Area office, where Lareina Yee is a senior partner; Bryce Hall , an associate partner in the Washington, DC, office; and senior partners Alex Singla and Alexander Sukharevsky , global leaders of QuantumBlack, AI by McKinsey, based in the Chicago and London offices, respectively.

They wish to thank Shivani Gupta, Abhisek Jena, Begum Ortaoglu, Barr Seitz, and Li Zhang for their contributions to this work.

This article was edited by Heather Hanselman, an editor in the Atlanta office.

Explore a career with us

Related articles.

McKinsey partners Lareina Yee and Michael Chui

The economic potential of generative AI: The next productivity frontier

A green apple split into 3 parts on a gray background. Half of the apple is made out of a digital blue wireframe mesh.

What is generative AI?

Circular hub element virtual reality of big data, technology concept.

Exploring opportunities in the generative AI value chain

IMAGES

  1. (PDF) Social media privacy concerns and risk beliefs

    social media privacy research paper

  2. 😂 Research paper on social networking. Research Paper Example On The

    social media privacy research paper

  3. (PDF) Privacy in Social Media

    social media privacy research paper

  4. 38+ Research Paper Samples

    social media privacy research paper

  5. 📚 Social Media Research Paper: Geo-social Contents. Free Essay

    social media privacy research paper

  6. Poster: Step-by-step

    social media privacy research paper

VIDEO

  1. Social Media Privacy

  2. The Impact of social media on the academic performance of social science students at UWI T&T

  3. 10 Privacy Tips: What Not to Post on Social Media

  4. Social Media Privacy and Boundaries in Relationships

  5. Privacy and Security in Online Social Media Week-6 Assignment/Quiz

  6. KTS Series

COMMENTS

  1. On Privacy and Security in Social Media

    Social networks have become a part of human life. Starting from sharing information like text, photos, messages, many have started share latest news, and news related pictures in the Media domain, question papers, assignments, and workshops in Education domain, online survey, marketing, and targeting customers in Business domain, and jokes, music, and videos in Entertainment domain.

  2. Social Media and Privacy

    A commonality among early social media privacy research is that the focus has been on information privacy and self-disclosure . Self-disclosure is the information a person chooses to share with other people or websites, such as posting a status update on social media. Information privacy breaches occur when a website and/or person leaks private ...

  3. Full article: Ethical concerns about social media privacy policies: do

    While privacy has been addressed by extant research (Bartoli et al., Citation 2022; Burkhardt et al., Citation 2022), the literature demonstrated the lack of research into consent actions within social media, even though an 1890 publication heralded the unauthorized use of personal imagery, a key aspect of social media (Maehle et al., Citation ...

  4. The Effects of Privacy and Data Breaches on Consumers' Online Self

    Five major streams of research inform our work in this paper: (1) technology adoption model (TAM), (2) consumer privacy paradox, (3) service failure, (4) protection motivation theory (PMT), and (5) trust. ... It has been used recently to explain social media privacy management such as message valence. ... Li C. (2017). The joint effect of ...

  5. A Survey on Privacy in Social Media: Identification, Mitigation, and

    Publishing user-generated data risks exposing individuals' privacy. Users privacy in social media is an emerging research area and has attracted increasing attention recently. These works study privacy issues in social media from the two different points of views: identification of vulnerabilities and mitigation of privacy risks.

  6. (PDF) Social Media and Privacy

    Abstract. With the popularity of social media, researchers and designers must consider a wide variety of privacy concerns while optimizing for meaningful social interactions and connection. While ...

  7. Privacy Risks and Social Media

    The core privacy risks on platforms like Facebook and Twitter include data collection, targeted advertising, tracking user behavior, security breaches and more. When signing up for social media, users are typically required to provide personal information like name, email, birthdate, interests, location and more.

  8. Social Media, Freedom of Expression and Right to Privacy: An ...

    This paper begins by exploring the concept of freedom of expression in the context of social media, examining the legal and ethical frameworks that govern this fundamental right. Furthermore, the research delves into the right to privacy, addressing the complexities arising from the collection, storage, and analysis of personal data by social ...

  9. Impact of Social Media Behavior on Privacy Information Security Based

    In the era of global social media, Internet users' privacy rights have been weakened, and the insight and alertness of individuals for privacy disclosure are decreasing. The security and flexibility of the system are usually the two ends of the measurement standard. While more and more users pursue the intelligence and convenience of using social media applications, letting big data ...

  10. Privacy Literacy on Social Media: Its Predictors and Outcomes

    This study focuses on privacy literacy on social media with respect to how it functions as a marker for reflecting and alleviating the privacy divide. In doing so, the paper investigates the influence of objective measures such as characteristics of user populations (e.g., ethnicity, gender, and privacy experience) on the degree of privacy ...

  11. On Privacy and Security in Social Media

    Social networks have become a part of human life. Starting from sharing information like text, photos, messages, many have started share latest news, and news related pictures in the Media domain ...

  12. Privacy in Social Media by Andrei Marmor :: SSRN

    This paper is about the state of privacy in social media. Most people's immediate concern about privacy in social media, and internet platforms more generally, relate to data protection. People fear that information they post on various platforms is potentially abused by corporate entities, governments, or even criminals, in all sorts of ...

  13. Social Media Users' Legal Consciousness About Privacy

    Drawing on the concept of legal consciousness, this article investigates through focus group interviews, the ways in which social media users make sense of privacy as a right and the ways in which they experience and respond to challenges to privacy. Our research aims to explore what role, if any, law—both private and public policy—plays in ...

  14. Advances in Social Media Research: Past, Present and Future

    The research papers reviewed in this study exhibit diversity in studying authenticity of reviews for travel sites, social bookmarking and review sites, movie ratings, car manufacturing, and social media check-ins. Studies concur that there has been an exponential increase in the number of fake reviews, which is severely damaging the credibility ...

  15. Full article: Online Privacy Breaches, Offline Consequences

    This is a critical point, as there are few alternatives to using many of services (e.g., search engines) that strip people of their privacy. Similarly, given that social media frequency leads to stronger social connections, and ultimately well-being (Roberts & David, Citation 2020), quitting social media may lead to a cut in ties with friends ...

  16. Social media and data privacy in education: an international

    Social media platforms offer many educational possibilities, but they also create challenges associated with their business models. One increasingly relevant challenge, especially in the context of teacher education and schools, is personal data privacy. When considering social media and data privacy in education, taking into account culture-specific aspects in different countries, such as ...

  17. A Critical Analysis of Privacy and Security on Social Media

    This paper provides study on many privacy and security challenges encountered by social media networks and the privacy threats they pose, as well as current studies into possible solutions. View ...

  18. Data Privacy in Social Media Platform: Issues and Challenges

    In this paper it has been provided a brief overview of some attire to users' privacy. Classification has been made of these threats as: users' block, design pitfall and limitations, implicit flows of information, and clash of stimulus. Further also describe about the privacy and security issues associated with social network systems.

  19. Research on the influence mechanism of privacy invasion experiences

    At the same time, response costs also positively affected privacy fatigue, and research on social media fatigue behaviors indicated this influential mechanism in the past (Zhang et al., 2022). However, this study additionally found that response costs partially mediated the effect of privacy invasion experiences on privacy fatigue.

  20. PDF Social Media and Privacy

    While early research into social media and privacy largely focused on these types of concerns, researchers have uncovered how the social dynamics surrounding social media have led to a broader array of social privacy issues that shape people's adoption of platforms and their usage behaviors. Rainie and Wellman explain how

  21. PDF Personal Information Disclosure and Privacy in Social Networking ...

    The purpose of this research is to assess the intensity of this problem by identifying SNSs users' personal information disclosure levels, the kinds of information that they reveal, the degree to which they expose personal information to the public, the privacy

  22. PDF Social Media: Security and Privacy

    For this research paper, I used an online platform for social media research analysis known as Survey Analytics, online questionnaires on victims of SNS' lapse in security and privacy, and direct interviews with SNS users. Data from these methods were compared to data in other research papers concerning the same topic for validity. 6 Conclusion

  23. How Americans feel about social media and privacy

    Overall, a 2014 survey found that 91% of Americans "agree" or "strongly agree" that people have lost control over how personal information is collected and used by all kinds of entities. Some 80% of social media users said they were concerned about advertisers and businesses accessing the data they share on social media platforms, and ...

  24. Social media needs science-based guidelines

    The debate about the negative impact of social media use is heated. Psychology research must avoid the noise and remain focused on improving adolescent mental health.

  25. Potential risks of content, features, and functions: The science of how

    Hypersensitivity to social feedback. Brain development starting at ages 10-13 (i.e., the outset of puberty) until approximately the mid-twenties is linked with hypersensitivity to social feedback/stimuli. iv In other words, youth become especially invested in behaviors that will help them get personalized feedback, praise, or attention from peers.. AI-recommended content has the potential to ...

  26. Browse journals and books

    Browse Calls for Papers. Browse 5,060 journals and 35,600 books. A; A Review on Diverse Neurological Disorders. ... Medical and Social Aspects. Book • 1981. Above Ground Storage Tank Oil Spills. Applications and Case Studies. Book ... The Nuclear Research Foundation School Certificate Integrated, Volume 2. Book

  27. Enhancing Financial Risk Prediction Through Echo State ...

    This research paper delves into the intersection of technology, sustainability, and financial risk prediction. With the rise of digital finance and automated investment mechanisms, including blockchain technology and social media-driven market sentiment analysis, discerning investors now focus on sustainability through environmental, social ...

  28. The state of AI in 2023: Generative AI's breakout year

    About the research. The online survey was in the field April 11 to 21, 2023, and garnered responses from 1,684 participants representing the full range of regions, industries, company sizes, functional specialties, and tenures. Of those respondents, 913 said their organizations had adopted AI in at least one function and were asked questions ...