Skip to contents

EPA dictionary data

actdata makes available a number of what are known as EPA dictionaries. These dictionaries provide the measured evaluation, potency, and activity (EPA) values associated with terms.

In addition to measured (or in one case, estimated) EPA values, the package provides metadata on the data collection and the term. These metadata are provided as additional variables. These variables are:

  • dataset: an identifier unique to a particular study (e.g. nc1978; morocco2015). These keys are used within this package’s functions to access particular datasets.
  • context: The country (or other context, such as the internet) where the data was originally collected.
  • year: The year of data collection. In some cases these are approximate; always check original sources for definitive information.
  • component: indicates what category the term belongs to. Not all studies provide all possible components, and some focus on only one component. Components include:
    • identity: Words that can be used to refer to people. Identities can serve as the actors or the objects in an actor-behavior-object ACT event. They are typically nouns (e.g. academic, woman, youngster).
    • modified_identity: A person with an item that may change their affective connotation (e.g. adolescent with a flat basketball). Note that there are also some identities modified with adjectives mixed into the “identities” category (e.g. white man)–those are classified as identities rather than modified identities at present.
    • behavior: Actions that people (or other entities, like governments) can perform. Most can serve as the behavior in an actor-behavior-object ACT event. These are typically verbs (e.g. wheedle, acclaim, work).
    • modifier: Typically adjectives that can be applied to identities (e.g. active, witty, young)
    • setting: Places and situations (e.g. airplane, alley, worship_service)
    • value: Traits that people can have (some overlap with modifiers) or concepts or states of being that they may consider desirable or undesirable (e.g. authenticity, accessibility, tradition, risk averse)
    • artifact: A non-human thing (e.g. cigar, gas guzzler, sports car, slippers)
  • group: The subsample of the respondents who provided the rating. All datasets provide group = “all”, representing the average over the entire sample. Many data sets additionally provide separate summary statistics for male and female respondents. Some dictionaries (e.g., internationaldomesticrelations1981, gaymensanfrancisco1980) provide other subgroups; see dictionary documentation for details. Summary statistics across the whole sample (group = “all) are calculated slightly differently depending on data set. Some dictionaries (e.g. the 2015 US, Morocco, and Egypt dictionaries) are originally published as average values over all respondents. In these cases, all is the only provided option. Other dictionaries are originally published in male and female subsets. Average values over all raters are not provided in these originally published sets. In this case, the package provides an approximate average calculated by averaging the values for each subsample. Typically, subsamples are male and female genders, studies recruit approximately equal numbers of men and women, and men and women’s ratings do not differ substantially on most terms. In these cases, we expect these approximate average values to be reasonably close to those that we would obtain from an average over all raters. For more information on gender and affect control theory dictionaries, see section 4.1 of David Heise’s Expressive Order (2007).
  • instcodes: A fourteen digit binary code that further classifies terms. See the section on institution codes below for details.

Available dictionaries

This package contains data from 40 different publicly available affect control theory dictionary data sets. Basic information on these dictionaries is shown in the table below. Detailed information on each dictionary, including references, is available at the end of this page.

Please cite these data! The data sets included in this package were originally collected by a number of different research teams. When using them for publication, please cite both their publications about the data (linked in the dataset detail section below) and this package.

Not sure which data set to use? If you simply need dependable recent ratings of a wide-ranging set of concepts in a US context (such as is used in much ACT behavior modeling research), I recommend the usfullsurveyor2015 or the usmturk2015 data sets. These are the largest and most recent general dictionaries that are currently available. Other data sets may be useful for questions regarding specific kinds of terms, other cultural contexts, or other points in time.

Dataset key Country or context Year Statistics available Components
artifactmods2022 US 2022 mean, sd, cov identity, modified_identity, artifact
calcuttaall2017 India 2017 mean, sd identity, behavior, modifier, setting, artifact
calcuttasubset2017 India 2017 mean, sd identity, behavior, modifier, setting, artifact
china1999 China 1999 mean identity, behavior, modifier, setting
dukecommunity2015 US 2015 mean, sd, cov identity, behavior, modifier
dukestudent2015 US 2015 mean, sd, cov identity, behavior, modifier
egypt2015 Egypt 2012-2014 mean, sd, cov identity, behavior, modifier
employeeorg2022 US 2022 mean, sd identity
expressive2002 US 2002 mean behavior
gaymensanfrancisco1980 US 1980 mean behavior
generaltech2020 US 2020 mean, sd artifact
germany1989 Germany 1989 mean identity, behavior, modifier
germany2007 Germany 2007 mean identity, behavior, modifier, setting
groups2017 US 2017 mean identity
groups2019 US 2019 mean identity
household1994 US 1994 mean identity, behavior
humanvalues2022 US 2022 mean, sd, cov value
indiana2003 US 2003 mean identity, behavior, modifier, setting
internationaldomesticrelations1981 Unknown 1981 mean behavior
internet1998 Internet 1998 mean identity, behavior, setting
japan1995 Japan 1989-2002 mean identity, behavior, modifier, setting
morocco2015 Morocco 2015 mean, sd, cov identity, behavior, modifier
mostafaviestimates2022 US 2022 mean, sd identity, behavior, modifier
nc1978 US 1978 mean identity, behavior, modifier, setting
nireland1977 North Ireland 1977 mean identity, behavior
nounphrasegrammar2019 US 2019 mean, sd identity
occs2019 US 2019 mean, sd, cov identity
occs2020 US 2020 mean, sd, cov identity
occs2021 US 2021 mean, sd, cov identity
ontario1980 Canada 1980-1986 mean identity, behavior, modifier
ontario2001 Canada 2001-2003 mean identity, behavior, modifier, setting
politics2003 US 2003 mean identity, behavior
products2022 US 2022 mean, sd, cov artifact
techvshuman2021 US 2021 mean identity
texas1998 US 1998 mean identity, behavior, modifier
uga2015 US 2015 mean, sd, cov identity, behavior, modifier
ugatech2008 US 2008 mean, sd identity, behavior, artifact
usfullsurveyor2015 US 2015 mean, sd, cov identity, behavior, modifier
usmturk2015 US 2015 mean, sd, cov identity, behavior, modifier
usstudent2015 US 2015 mean, sd, cov identity, behavior, modifier

Accessing dictionary data: summary statistics

Within the package, all summary data is stored in one data frame named epa_summary_statistics. This data frame contains all available EPA means and, when available, institution codes, variances, covariances, and respondent numbers for all terms in all dictionaries. There is one row per term-dataset-gender group.

These data are identical to (and usually sourced directly from) that provided in other places, such as affectcontroltheory.org, Interact Java, and a legacy affect control theory website.

Researchers will rarely need or want to work with all of these data at one time. To easily build subsets of this data frame, use the epa_subset() function. This function allows users to search by term, filter by dataset, return only certain summary statistics, and more. Both the original data frame and these subsets are provided in long form, making them easy to manipulate further using the Tidyverse if needed.

The columns in this data set correspond to those in the dictionary metadata table above. However, to aid sorting, only one year (instead of a range) is provided for all data sets.

# Return all sample average entries for terms containing "friend"
contains_friend <- epa_subset(expr = "friend", group = "all")
head(contains_friend)
#> # A tibble: 6 × 25
#>   term   component dataset context year  group instcodes     E     P     A   n_E
#>   <chr>  <chr>     <chr>   <chr>   <chr> <chr> <chr>     <dbl> <dbl> <dbl> <dbl>
#> 1 best_… identity  calcut… India   2017  all   11 10000…  2.84  2.5   2.08    40
#> 2 boyfr… identity  calcut… India   2017  all   10 00000…  1.33  1.03  0.73    40
#> 3 ex_bo… identity  calcut… India   2017  all   10 00000…  0.13  0.3   0.23    39
#> 4 ex_gi… identity  calcut… India   2017  all   01 00000…  0.21  0.04 -0.05    39
#> 5 friend identity  calcut… India   2017  all   11 10000…  2.54  2.16  2.42    40
#> 6 frien… modifier  calcut… India   2017  all   10 01000…  1.84  1.36  1.68    36
#> # ℹ 14 more variables: n_P <dbl>, n_A <dbl>, sd_E <dbl>, sd_P <dbl>,
#> #   sd_A <dbl>, cov_EE <dbl>, cov_EP <dbl>, cov_EA <dbl>, cov_PE <dbl>,
#> #   cov_PP <dbl>, cov_PA <dbl>, cov_AE <dbl>, cov_AP <dbl>, cov_AA <dbl>

# Return all entries for the identity "friend"
friend <- epa_subset(expr = "friend", exactmatch = TRUE, component = "identity")
head(friend)
#> # A tibble: 6 × 25
#>   term   component dataset context year  group instcodes     E     P     A   n_E
#>   <chr>  <chr>     <chr>   <chr>   <chr> <chr> <chr>     <dbl> <dbl> <dbl> <dbl>
#> 1 friend identity  calcut… India   2017  male  11 10000…  2.73  2.21  2.5     20
#> 2 friend identity  calcut… India   2017  fema… 11 10000…  2.34  2.11  2.34    20
#> 3 friend identity  calcut… India   2017  all   11 10000…  2.54  2.16  2.42    40
#> 4 friend identity  calcut… India   2017  male  11 10000…  2.65  1.99  2.79    19
#> 5 friend identity  calcut… India   2017  fema… 11 10000…  2.34  2.45  3.4     20
#> 6 friend identity  calcut… India   2017  all   11 10000…  2.49  2.3   3.16    39
#> # ℹ 14 more variables: n_P <dbl>, n_A <dbl>, sd_E <dbl>, sd_P <dbl>,
#> #   sd_A <dbl>, cov_EE <dbl>, cov_EP <dbl>, cov_EA <dbl>, cov_PE <dbl>,
#> #   cov_PP <dbl>, cov_PA <dbl>, cov_AE <dbl>, cov_AP <dbl>, cov_AA <dbl>

# Return the entire Ontario 1980 dataset
all_ontario1980 <- epa_subset(dataset = "ontario1980")
head(all_ontario1980)
#> # A tibble: 6 × 25
#>   term   component dataset context year  group instcodes     E     P     A   n_E
#>   <chr>  <chr>     <chr>   <chr>   <chr> <chr> <chr>     <dbl> <dbl> <dbl> <dbl>
#> 1 aband… behavior  ontari… Canada  1980  all   01 11111… -2.74  0.01  0.74    NA
#> 2 aband… behavior  ontari… Canada  1980  male  01 11111… -2.64 -0.04  0.82    NA
#> 3 aband… behavior  ontari… Canada  1980  fema… 01 11111… -2.84  0.06  0.66    NA
#> 4 aband… modifier  ontari… Canada  1980  all   10 00010… -2.68 -1.40 -0.59    NA
#> 5 aband… modifier  ontari… Canada  1980  male  10 00010… -2.71 -1.46 -0.43    NA
#> 6 aband… modifier  ontari… Canada  1980  fema… 10 00010… -2.64 -1.33 -0.75    NA
#> # ℹ 14 more variables: n_P <dbl>, n_A <dbl>, sd_E <dbl>, sd_P <dbl>,
#> #   sd_A <dbl>, cov_EE <dbl>, cov_EP <dbl>, cov_EA <dbl>, cov_PE <dbl>,
#> #   cov_PP <dbl>, cov_PA <dbl>, cov_AE <dbl>, cov_AP <dbl>, cov_AA <dbl>

# Return this same dataset, but only include the female mean values for behaviors
f_mean_ontario1980 <- epa_subset(dataset = "ontario1980", group = "female", component = "behavior", stat = "mean")
head(f_mean_ontario1980)
#> # A tibble: 6 × 10
#>   term        component dataset  context year  group instcodes     E     P     A
#>   <chr>       <chr>     <chr>    <chr>   <chr> <chr> <chr>     <dbl> <dbl> <dbl>
#> 1 abandon     behavior  ontario… Canada  1980  fema… 01 11111… -2.84  0.06  0.66
#> 2 abuse       behavior  ontario… Canada  1980  fema… 10 10100… -3.03  1.21  1.88
#> 3 acclaim     behavior  ontario… Canada  1980  fema… 10 00010…  1.9   0.8   0   
#> 4 accommodate behavior  ontario… Canada  1980  fema… 10 11111…  2.31  0.87  0.12
#> 5 accuse      behavior  ontario… Canada  1980  fema… 10 11111… -1.73  0.97  0.85
#> 6 address     behavior  ontario… Canada  1980  fema… 10 11111…  1.36  0.36 -0.15

Accessing dictionary data: individual-level data

EPA summary information is likely to be sufficient for most research questions. Existing research in affect control theory almost always uses summary data. However, the respondent-level data used to compute these summaries may also be useful in particular instances.

A number of data sets, all from 2015 or later, include individual-level data, newly made publicly available in this package. Some of these are subsets of others. These are the following (see the dictionary table above for more information):

  1. morocco2015
  2. egypt2015
  3. usmturk2015
  4. dukestudent2015
  5. uga2015
  6. dukecommunity2015
  7. usstudent2015 (a combination of 4 and 5)
  8. usfullsurveyor2015 (a combination of 4, 5, and 6)
  9. occs2019
  10. occs2020
  11. artifactmods2022
  12. humanvalues2022
  13. products2022

All individual data is located in the individual data frame within this package. Like the summary datasets, these data are provided in long form, with one respondent’s ratings of one term per row. Where available, respondents’ gender, race, and age are also included.

To subset these data, use the epa_subset() function with the datatype argument set to “individual.”

When using any of the US 2015 individual level data sets, keep in mind that usstudent2015 and usfullsurveyor2015 are combinations of other datasets. In the individual data, rows belonging to these sets are included, but are categorized as belonging to the more specific data set (dukestudent2015, uga2015, or dukecommunity2015). You may still use the usstudent2015 and usfullsurveyor2015 keys in epa_subset() with datatype = individual. The function will return rows from the appropriate combination of data sets.

# Return all individual-level ratings for identities from the egypt2015 data set
egypt_individual <- epa_subset(dataset = "egypt2015", datatype = "individual", component = "identity")
head(egypt_individual)
#> # A tibble: 6 × 16
#>   dataset   context year  userid    gender age   raceeth race1 race2 hisp  term 
#>   <chr>     <chr>   <chr> <chr>     <chr>  <chr> <chr>   <chr> <chr> <chr> <chr>
#> 1 egypt2015 Egypt   2015  Egypt487  Female NA    NA      NA    NA    NA    team…
#> 2 egypt2015 Egypt   2015  Egypt1483 Male   NA    NA      NA    NA    NA    trav…
#> 3 egypt2015 Egypt   2015  Egypt1653 Male   NA    NA      NA    NA    NA    pagan
#> 4 egypt2015 Egypt   2015  Egypt1621 Male   NA    NA      NA    NA    NA    spor…
#> 5 egypt2015 Egypt   2015  Egypt1072 Male   NA    NA      NA    NA    NA    grad…
#> 6 egypt2015 Egypt   2015  Egypt165  Female NA    NA      NA    NA    NA    pede…
#> # ℹ 5 more variables: component <chr>, instcodes <chr>, E <dbl>, P <dbl>,
#> #   A <dbl>

# Return all ratings by female-identifying students in the usstudent2015 data set
female_students <- epa_subset(dataset = "usstudent2015", datatype = "individual") %>% 
  dplyr::filter(gender == "Female")
head(female_students)
#> # A tibble: 6 × 16
#>   dataset      context year  userid gender age   raceeth race1 race2 hisp  term 
#>   <chr>        <chr>   <chr> <chr>  <chr>  <chr> <chr>   <chr> <chr> <chr> <chr>
#> 1 dukestudent… US      2015  DStu6… Female 21    NA      Blac… No a… NA    some…
#> 2 dukestudent… US      2015  DStu5… Female 19    NA      Whit… No a… NA    some…
#> 3 dukestudent… US      2015  DStu6… Female 20    NA      Whit… No a… NA    some…
#> 4 dukestudent… US      2015  DStu5… Female 20    NA      Whit… No a… NA    some…
#> 5 dukestudent… US      2015  DStu5… Female 22 t… NA      Blac… No a… NA    aban…
#> 6 dukestudent… US      2015  DStu5… Female 18    NA      Whit… No a… NA    aban…
#> # ℹ 5 more variables: component <chr>, instcodes <chr>, E <dbl>, P <dbl>,
#> #   A <dbl>

Term table

One of the main goals of this package is to make it easy to compare meaning across dictionaries. To this end, the package provides a data frame called term_table that shows at a glance which terms are included in which dictionaries. Each column in these tables represents a dictionary (labeled with its key) and each row is a term. Cell entries (0/1) indicate whether or not the specified dictionary has the specified term. These tables can easily be modified further to generate summaries across a set of dictionaries of interest. To see the entries for only a particular component, to search by term, or to limit to a particular set of dictionaries, use dplyr functions to filter the term table.

# the whole table
head(term_table)
#> # A tibble: 6 × 42
#>   term   component artifactmods2022 calcuttaall2017 calcuttasubset2017 china1999
#>   <chr>  <chr>                <dbl>           <dbl>              <dbl>     <dbl>
#> 1 abort… identity                 1               1                  1         1
#> 2 abort… modified…                1               0                  0         0
#> 3 abort… modified…                1               0                  0         0
#> 4 abort… modified…                1               0                  0         0
#> 5 abort… modified…                1               0                  0         0
#> 6 abort… modified…                1               0                  0         0
#> # ℹ 36 more variables: dukecommunity2015 <dbl>, dukestudent2015 <dbl>,
#> #   egypt2015 <dbl>, employeeorg2022 <dbl>, expressive2002 <dbl>,
#> #   gaymensanfrancisco1980 <dbl>, generaltech2020 <dbl>, germany1989 <dbl>,
#> #   germany2007 <dbl>, groups2017 <dbl>, groups2019 <dbl>, household1994 <dbl>,
#> #   humanvalues2022 <dbl>, indiana2003 <dbl>,
#> #   internationaldomesticrelations1981 <dbl>, internet1998 <dbl>,
#> #   japan1995 <dbl>, morocco2015 <dbl>, mostafaviestimates2022 <dbl>, …

# settings only
set_tt <- term_table %>% 
  dplyr::filter(component == "setting")
head(set_tt)
#> # A tibble: 6 × 42
#>   term   component artifactmods2022 calcuttaall2017 calcuttasubset2017 china1999
#>   <chr>  <chr>                <dbl>           <dbl>              <dbl>     <dbl>
#> 1 abort… setting                  0               1                  1         0
#> 2 adult… setting                  0               1                  1         1
#> 3 airpl… setting                  0               1                  1         1
#> 4 ambul… setting                  0               1                  1         1
#> 5 amuse… setting                  0               1                  1         1
#> 6 april… setting                  0               1                  1         0
#> # ℹ 36 more variables: dukecommunity2015 <dbl>, dukestudent2015 <dbl>,
#> #   egypt2015 <dbl>, employeeorg2022 <dbl>, expressive2002 <dbl>,
#> #   gaymensanfrancisco1980 <dbl>, generaltech2020 <dbl>, germany1989 <dbl>,
#> #   germany2007 <dbl>, groups2017 <dbl>, groups2019 <dbl>, household1994 <dbl>,
#> #   humanvalues2022 <dbl>, indiana2003 <dbl>,
#> #   internationaldomesticrelations1981 <dbl>, internet1998 <dbl>,
#> #   japan1995 <dbl>, morocco2015 <dbl>, mostafaviestimates2022 <dbl>, …

# limit to only the two Germany dictionaries and exclude terms in neither
german_tt <- term_table %>% 
  dplyr::select(term, component, germany1989, germany2007) %>% 
  dplyr::filter(germany1989 + germany2007 >= 1)
head(german_tt)
#> # A tibble: 6 × 4
#>   term       component germany1989 germany2007
#>   <chr>      <chr>           <dbl>       <dbl>
#> 1 adolescent identity            1           0
#> 2 beginner   identity            1           0
#> 3 brat       identity            1           0
#> 4 brute      identity            1           0
#> 5 buddy      identity            1           0
#> 6 bully      identity            0           1

# limit to terms that contain "friend"
friend_tt <- term_table %>% 
  dplyr::filter(stringr::str_detect(term, "friend"))
head(friend_tt)
#> # A tibble: 6 × 42
#>   term   component artifactmods2022 calcuttaall2017 calcuttasubset2017 china1999
#>   <chr>  <chr>                <dbl>           <dbl>              <dbl>     <dbl>
#> 1 best_… identity                 0               1                  1         0
#> 2 boyfr… identity                 0               1                  1         1
#> 3 ex_bo… identity                 0               1                  1         0
#> 4 ex_gi… identity                 0               1                  1         0
#> 5 friend identity                 0               1                  1         0
#> 6 frien… modifier                 0               1                  1         1
#> # ℹ 36 more variables: dukecommunity2015 <dbl>, dukestudent2015 <dbl>,
#> #   egypt2015 <dbl>, employeeorg2022 <dbl>, expressive2002 <dbl>,
#> #   gaymensanfrancisco1980 <dbl>, generaltech2020 <dbl>, germany1989 <dbl>,
#> #   germany2007 <dbl>, groups2017 <dbl>, groups2019 <dbl>, household1994 <dbl>,
#> #   humanvalues2022 <dbl>, indiana2003 <dbl>,
#> #   internationaldomesticrelations1981 <dbl>, internet1998 <dbl>,
#> #   japan1995 <dbl>, morocco2015 <dbl>, mostafaviestimates2022 <dbl>, …

Institution codes

Contexts restrict the labels that we consider reasonable choices. For instance, if two people are discussing next year’s budget in a business meeting, it would seem quite unlikely for one to label the other as a “priest”. A business-related label, like “manager” or “employee”, or a label that applies across a variety of contexts, like “genius” or “jerk”, would seem more realistic.

EPA dictionaries usually contain 14-digit binary strings known as “institution codes” that contain information about what social contexts terms apply within. These codes can be used by analysis software when simulating interaction.

Valid categories are (see Heise’s 2007 book Expressive Order for details):

  • male, female: What genders terms can typically be applied to (identities only)
  • overt, surmised: Whether labeling behaviors requires interpretation or insight on the part of the observer (behaviors only)
  • place, time: Type of setting (settings only)
  • lay, business, law, politics, academe, medicine, religion, family, sexual: Social institutions that terms may belong to. Identities, behaviors, and settings only.
  • monadic, group, corporal: How a term requires or implicates others. Identities, behaviors, and settings only.
  • adjective, adverb: Part of speech (modifiers)
  • emotion, trait, status, feature, emotion_spiral: Categories for modifiers.

This package provides several ways to demystify and make use of these codes. See the function documentation for more details.

  • The epa_subset() function takes an institutions argument that allows a user to filter by institution.
  • The expand_instcodes() function converts institution code strings into columns containing TRUE/FALSE/NA values. These values indicate whether a category is applicable to a term and, if so, whether the term belongs to the category. Representing institutions in this way makes it easier for users to work with them.
  • The create_instcode() function takes a component and a set of logical values indicating category membership and returns a properly formatted institution code binary string. This is useful for creating new institution codes.
businesslaw <- epa_subset(dataset = "morocco2015", institutions = c("business", "law"), stat = "mean") %>% 
  dplyr::select(term, component, instcodes, E, P, A)
head(businesslaw)
#> # A tibble: 6 × 6
#>   term     component instcodes            E     P     A
#>   <chr>    <chr>     <chr>            <dbl> <dbl> <dbl>
#> 1 abandon  behavior  01 111111110 000 -1.81 -1.03  1.2 
#> 2 abuse    behavior  10 101000011 100 -2.62 -1.96  1.13
#> 3 accuse   behavior  10 111111111 010 -1.88 -0.97  1.34
#> 4 address  behavior  10 111111110 010  1.52  1.26 -0.64
#> 5 admonish behavior  10 011111111 000  2.58  2.09 -1.4 
#> 6 advise   behavior  10 011111110 000  2.27  2.02 -1.47

# the default is to keep terms for which there are no institution codes. 
# Change this behavior using the drop.na.instcodes argument in epa_subset().
businesslaw <- expand_instcodes(businesslaw) %>% 
  dplyr::select(-E, -P, -A)
#> At least one of the institution codes is NA.
head(businesslaw, 7)
#> # A tibble: 7 × 19
#>   term     component instcodes  male  female overt surmised lay   business law  
#>   <chr>    <chr>     <chr>      <lgl> <lgl>  <lgl> <lgl>    <lgl> <lgl>    <lgl>
#> 1 abandon  behavior  01 111111… NA    NA     FALSE TRUE     TRUE  TRUE     TRUE 
#> 2 abuse    behavior  10 101000… NA    NA     TRUE  FALSE    TRUE  FALSE    TRUE 
#> 3 accuse   behavior  10 111111… NA    NA     TRUE  FALSE    TRUE  TRUE     TRUE 
#> 4 address  behavior  10 111111… NA    NA     TRUE  FALSE    TRUE  TRUE     TRUE 
#> 5 admonish behavior  10 011111… NA    NA     TRUE  FALSE    FALSE TRUE     TRUE 
#> 6 advise   behavior  10 011111… NA    NA     TRUE  FALSE    FALSE TRUE     TRUE 
#> 7 advisor  identity  NA         NA    NA     NA    NA       NA    NA       NA   
#> # ℹ 9 more variables: politics <lgl>, academe <lgl>, medicine <lgl>,
#> #   religion <lgl>, family <lgl>, sexual <lgl>, monadic <lgl>, group <lgl>,
#> #   corporal <lgl>

newcode <- create_instcode(component = "setting", place = TRUE, family = TRUE, religion = TRUE)
print(paste("The institution code", newcode, "represents a setting that is a place and is relevant to only the family and religion domains."))
#> [1] "The institution code 10 000000110 000 represents a setting that is a place and is relevant to only the family and religion domains."

Dataset collection details

Data collection efforts are typically intended to either be general or specific. General data collections aim to provide information on the typical, normative culture of a place or context, typically a country. They measure EPA for terms that apply to a wide range of social situations, and use samples of respondents that are argued to be either representative of a country’s population at large or “cultural experts” who are deeply familiar with the types of spaces in which the cultural of interest is reproduced (Heise 2007). Data from these general data collections is often used in simulations of actor-behavior-object ACT events. Specific data collections measure EPA for terms in only a particular domain of interest, and may recruit respondents from only a target subpopulation of interest. Such data are sometimes used in simulations of events, but often the reason to collect them is to be able to better describe meanings in a particular domain.

Below, in approximate reverse chronological order within general/specific categories, are details including links for more information about the collection of each of the datasets contained in this package.

General dictionaries

mostafaviestimates2022

Description: These EPA values were estimated using Mostafavi, Porter, and Robinson’s Bidirectional Encoder Representations from Transforms (BERT) model. Most terms from previously collected dictionaries are included.

Sample: Not applicable; values were estimated, not empirically measured. However, the model was trained on the 2015 US Full Surveyor (usfullsurveyor2015) dataset.

Authors: Moeen Mostafavi, Michael D. Porter, Dawn T. Robinson

Relevant publications/more information: Mostafavi, Porter, and Robinson 2022

Calcutta 2017: calcuttaall2017 and calcuttasubset2017

Description: Semantic differential ratings of 1,469 concepts in Bengali, a language spoken by about 250 million individuals in eastern India and Bangladesh. The calcuttaall2017 dataset contains summary information from all respondents. The calcuttasubset2017 dataset contains summary information calculated using data from just respondents who used scales as expected. See linked paper for details.

Sample: 20 male and 20 female native Bengali speakers living in Calcutta, India

Year collected: 2013

Authors: Shibashis Mukherjee, David Heise

Relevant publications/more information: Mukherjee and Heise 2017

United States 2015: dukecommunity2015, dukestudent2015, uga2015, usfullsurveyor2015, usmturk2015, usstudent2015

Description: Ratings of 929 identities, 814 behaviors, and 660 modifiers collected between 2012 and 2014 from several samples of people living in the United States: students at the University of Georgia (uga2015), students at Duke University (dukestudent2015), non-students living in Durham, North Carolina (dukecommunity2015), and US-based workers on Amazon Mechanical Turk (usmturk2015). Some keys refer to combinations of datasets. usstudent2014 is the combination of dukestudent2015 and uga2015. usfullsurveyor2015 is the combination of both student samples and the community sample (dukestudent2015, uga2015, and dukecommunity2015).

Sample: n = 1368 undergraduates at the University of Georgia (uga2015), n = 216 students at Duke University (dukestudent2015), n = 159 non-students living in Durham, North Carolina (dukecommunity2015), and n = 2615 US Amazon Mechanical Turk workers.

Year collected: 2012-2014

Authors: Lynn Smith-Lovin, Dawn T. Robinson, Bryan C. Cannon, Jesse K. Clark, Robert Freeland, Jonathan H. Morgan, Kimberly B. Rogers

Relevant publications/more information/citations: usstudent, usfullsurveyor, uga, usmturk

egypt2015

Description: 397 identities, 368 behaviors, and 233 modifiers

Sample: 1716 residents of Cairo, Egypt

Year collected: 2012-2014

Authors: Hamid Latif, Lynn Smith-Lovin, Dawn T. Robinson, Bryan C. Cannon, Brent Curdy, Darys J Kriegel and Jonathan H. Morgan

Relevant publications/more information: Link to source, Robinson, Smith-Lovin, and Zhao 2020

morocco2015

Description: 397 identities, 368 behaviors, and 233 modifiers

Sample: n = 1546 residents of Rabat, Morocco

Year collected: 2014-2015

Authors: Lynn Smith-Lovin, A. Soudi, Dawn T. Robinson, Bryan C. Cannon, Brent Curdy, Darys J. Kriegel, Jonathan H. Morgan

Relevant publications/more information: Link to source, Robinson, Smith-Lovin, and Zhao 2020

germany2007

Description: 376 identities, 393 behaviors, and 331 modifiers.

Sample: Ratings were obtained with Surveyor from 1905 subjects (734 male and 1171 female) from all over Germany. The research was advertised as a “study of language and emotion” in an extensive recruitment campaign including mailing lists from different universities, weblogs, newspaper reports and radio interviews. Most of the participants (N = 1029) were between 20 and 29 years of age, but the sample covered all ages, including N = 129 being younger than 20 and N = 92 older than 60 years. The data of 83 persons (4.4 %) were excluded from the analysis, as they had indicated German not being their mother tongue. On average, each stimulus was rated by 29.5 male and 46.4 female raters.

Year collected: 2007

Authors: Tobias Schröder

Relevant publications/more information: Schröder 2011. Raw data available through Interact Java applet.

indiana2003

Description: Ratings of 500 Identities, 500 Behaviors, 300 Modifiers, and 200 Settings were collected at Indiana University, via the Internet using the Surveyor applet.

Sample: n = 1027 Indiana University students enrolled in the business and arts and sciences schools who lived in the U.S.A. at age 16 and were about equally male and female.

Year collected: 2002-2003

Authors: Clare Francis and David R. Heise

Relevant publications/more information: Information on sample, information on term selection. Raw data available through Interact Java applet.

ontario2001

Description: Data on 993 Identities, 601 Behaviors, 500 Modifiers, and 200 Settings were gathered with the Attitude program from Guelph, Ontario, undergraduates in 2001-2002. Data on settings were gathered with the Surveyor program at Guelph in 2003. Funded by the Social Science and Humanities Research Council of Canada.

Sample: University of Guelph undergraduate students

Year collected: 2001-2003

Authors: Neil J. MacKinnon

Relevant publications/more information: Luke 2010, Affect Control Theory website entry. Raw data available through Interact Java applet.

china1999

Description: Ratings of 449 Identities, 300 Behaviors, 98 Emotions, 150 Traits, and 149 Settings were obtained with the Attitude program.

Sample: About 380 undergraduate students at Fudan University in Shanghai, China

Year collected: 1991

Authors: Herman W. Smith and Yi Cai

Relevant publications/more information: Zhao 2022. Affect control theory website. Raw data available through Interact Java applet.

texas1998

Description: Ratings of 443 Identities, 278 Behaviors, 65 Modifiers, and 1 Setting were collected at Texas Tech University with program Attitude.

Sample: Some disagreement between sources regarding sample size. Schneider 2007 says 420 undergraduate students received a small monetary incentive for rating 413 identities. The affect control theory website entry claims sample size is 482. It may be that this data set contains the combined ratings of Texas Tech students with those of additional terms by University of Missouri students collected at around the same time by Herm Smith (see Schneider 2007).

Year collected: 1998

Authors: Andreas Schneider

Relevant publications/more information: Schneider 2007. Raw data available through Interact Java applet.

japan1995

Description:

From Affect Control Theory website: Ratings of 403 Identities and 307 Behaviors, and a few Settings were obtained with the Attitude program from 323 Tohoku University students in 1989. In 1995 and 1996, 120 women students at Kyoritsu Women’s, Japan Women’s, and Teikyo Universities and 120 men students at Teikyo and Rikkyo Universities rated an additional 300 settings, 300 modifiers (mainly traits), 200 business identities, and 75 behaviors. Yoichi Murase (Rikkyo University) and Nozomu Matsubara (Tokyo University) provided access to students who rated 102 emotions, 70 behaviors and 55 identities in 2002 using Surveyor. Total numbers of entries in Interact lexicon are: 713 Identities, 455 Behaviors, 426 Modifiers, and 300 Settings. Number of male or female raters generally is about 30 for each concept.

From Smith, Matsuno, and Umino 1994 and Smith, Umino, and Matsuno 1998: 403 identities and 307 behaviors were rated by a convenience sample of 25 men and 25 women at a national university in Japan.

Sample: University students in Japan. Some disagreement among sources regarding sample size.

Year collected: 1989-2002

Authors: Herman W. Smith, Takanori Matsuno, Shuuichirou Ike, and Michio Umino

Relevant publications/more information: Smith, Matsuno, and Umino 1994, Smith, Umino, and Matsuno 1998. Raw data available through Interact Java applet.

germany1989

Description:

From Affect Control Theory website: Ratings of 442 Identities, 295 Behaviors, and 67 Modifiers, selected for back-translatability with the 1978 U.S.A. dictionary were obtained with the Attitude program from 520 Mannheim students. Subjects were matched to the American undergraduate population by proportional inclusion of 12 and 13 grade youths at two German Studenten des Grundstudiums and Gymnasiasten, along with subjects from Mannheim University, which attracts students mainly from the Rhein-Neckar region in former West Germany.

From Schneider 2004: To correspond to the undergraduate population in the United States, subjects were pupils in the thirteenth grade of Gymnasien1 as well as university students. 380 subjects were recruited from Mannheim University and two Gymnasien in Mannheim, a large industrial city, that attract students mainly from the Rhein-Neckar region in former West Germany.

Sample: German students; disagreement among sources about sample size.

Year collected: 1989

Authors: Andreas Schneider

Relevant publications/more information: Schneider 2004. Raw data available through Interact Java applet.

ontario1980

Description: Data on 843 Identities and 593 Behaviors were obtained with paper questionnaires in 1980-1983, and 495 Modifiers were added in 1985-1986. Funded by the Social Science and Humanities Research Council of Canada.

Sample: 5534 (identities and behaviors) or 1260 (modifiers) undergraduate students at the University of Guelph. Each term was rated by approximately 35 men and 35 women.

Year collected: 1980-1986

Authors: Neil J. MacKinnon

Relevant publications/more information: MacKinnon and Luke 2002. Raw data available through Interact Java applet.

nc1978

Description: From Affect Control Theory website: Ratings of 721 Identities, 600 Behaviors, 440 Modifiers, and 345 Settings were obtained with paper questionnaires from 1,225 North Carolina undergraduates. (Ratings for some emotion words in this data set were obtained by Heise from Indiana University undergraduates in 1985.) Funded by National Institute of Mental Health Grant 1-R01-MH29978-01-SSR.

Sample: From Smith-Lovin 1988: Data were collected from students in social sciences and humanities classes at the University of North Carolina at Chapel Hill in the 1977-78 school year. Each out-of-context stimulus was rated by approximately 56 subjects. Approximately half of the raters for each stimulus were males and half females.

Year collected: Mostly 1977-1978; some emotions from 1985

Authors: Lynn Smith-Lovin and David R. Heise

Relevant publications/more information: Smith-Lovin 1988. Raw data available through Interact Java applet.

nireland1977

Description: Ratings of 528 Identities and 498 Behaviors were obtained with paper questionnaires.

Sample: 319 Belfast teenagers in Catholic high schools

Year collected: 1977

Authors: Willigan and Heise

Relevant publications/more information: Affect Control Theory website entry. Raw data available through Interact Java applet.

Specific dictionaries

artifactmods2022

Description: Ratings of 58 identities, 52 physical artifacts, and 212 artifact-modified identities across a range of identities and artifact types.

Sample: n = 825 participants recruited through Amazon Mechanical Turk who had lived in the U.S. over half their lives.

Year collected: 2015

Authors: Rohan Lulham and Daniel B. Shank

Relevant publications/more information: Lulham and Shank 2022

employeeorg2022

Description: Ratings of organizations (e.g., library) and their employees (e.g., employee of a library).

Sample: 118-119 participants in the U.S. recruited through Amazon Mechanical Turk.

Year collected: Unknown

Authors: Daniel B. Shank and Alexander Burns

Relevant publications/more information: Shank and Burns 2022

humanvalues2022

Description: Ratings of 393 values, traits, etc. Collected with data for Lulham and Shank 2022 (artifactmods2022).

Sample: n = 825 participants recruited through Amazon Mechanical Turk who had lived in the U.S. over half their lives.

Year collected: 2015

Authors: Rohan Lulham, Daniel B. Shank, and Clementine Thurgood

Relevant publications/more information: Lulham and Shank 2022

products2022

Description: Ratings of 132 everyday products. Collected with data for Lulham and Shank 2022 (artifactmods2022).

Sample: n = 825 participants recruited through Amazon Mechanical Turk who had lived in the U.S. over half their lives.

Year collected: 2015

Authors: Rohan Lulham, Daniel B. Shank, and Clementine Thurgood

Relevant publications/more information: Lulham and Shank 2022

techvshuman2021

Description: Ratings of 59 roles as performed by a human, a computer, or an AI identity (e.g., a product assembling employee, a product assembling computer system, a product assembling artificial intelligence).

Sample: n = 549 people recruited using Prolific who had lived over half their lives in the United States.

Year collected: Unknown

Authors: Daniel B. Shank, Madison Bowen, Alexander Burns, Matthew Dew

Relevant publications/more information: Shank, Bowen, Burns, and Dew 2021

generaltech2020

Description: Ratings of 25 general technology terms (e.g. bot, artificial intelligence).

Sample: 59 Amazon Mechanical Turk workers who had lived in the US at least half their lives.

Year collected: Unknown

Authors: Daniel B. Shank, Alexander Burns, Sophia Rodriguez, Madison Bowen

Relevant publications/more information: Shank, Burns, Rodriguez, and Bowen 2020

Occupations: occs2019 and occs2020

Description: Ratings of U.S. occupational categories. Wave 1, collected in 2019 and 2020, immediately before the onset of the COVID-19 pandemic, contains ratings for all 650 U.S. census occupational categories. Wave 2, collected in 2020 in the early months of the pandemic, contains ratings for 94 occupations, with a focus on those deemed “essential” in the pandemic context.

Sample: Quasi-nationally representative sample from an online Qualtrics panel, n = 2726

Year collected: 2019-2020

Authors: Joseph M. Quinn, Robert E. Freeland, Kimberly B. Rogers, Jesse Hoey, Lynn Smith-Lovin

Relevant publications/more information: Quinn, Freeland, Rogers, Hoey, and Smith-Lovin 2022

nounphrasegrammar2019

Description: Participants rate 28 social identity concepts, which are either count or collective nouns, presented in one of five grammatical forms. These data have been used to examine the influences of determiners (a/an, the, and all) and grammatical number (singular or plural) on the affective meaning of social identity concepts.

Sample: Between 372 and 384 Amazon Mechanical Turk workers living in the United States.

Year collected: Unknown

Authors: Daniel B. Shank, Sarah E. Hercula, Brent Curdy

Relevant publications/more information: Shank, Hercula, and Curdy 2019

groups2019

Description: Ratings of 170 group-related concepts. These were collected as pilot data for Shank, Hercula, and Curdy 2019 (nounphrasegrammar2019) and Shank and Burns 2022 (employeeorg2022).

Relevant publications/more information: Shank, Hercula, and Curdy 2019, Shank and Burns 2022

groups2017

Description: EPA ratings of 64 group-related concepts, including primary groups, task groups, categories of people, and collectives.

Sample: 155 Amazon Mechanical Turk workers who had lived in the US the majority of their lives.

Year collected: 2017

Authors: Daniel B. Shank and Alexander Burns

Relevant publications/more information: Shank and Burns 2018

ugatech2008

Description: Ratings of 80 technology-related items and concepts.

Sample: n = 174 undergraduate students at the University of Georgia; 100 women and 74 men.

Year collected: 2008

Authors: Daniel B. Shank

Relevant publications/more information: Shank 2010

politics2003

Description: Ratings of political actors (individuals, collectives, groups, and organizations) and behaviors (individual and social).

Sample: Students in an introduction to sociology course at the University of Missouri-St Louis. n = 47 men and 74 women.

Year collected: Unclear

Authors: Kyle Irwin

Relevant publications/more information: Irwin 2003, link to source

expressive2002

Description: 98 nonverbal behavior terms collected in Study 1 of Rashotte 2002. Terms were divided into four subsets. Subset 1 of the behaviors was given to 50 men and 50 women; subset 2 to 47 men and 52 women; subset 3 to 39 men and 59 women; and subset 4 to 36 men and 61 women.

Sample: 230 female and 172 male students in introductory sociology courses at a large public university in the southwestern United States.

Year collected: Unclear

Authors: Lisa Slattery Walker (formerly Lisa Slattery Rashotte)

Relevant publications/more information: Rashotte 2002, link to source

internet1998

Description: Internet-related identities, behaviors, and settings rated by Internet users in 1998.

Sample: 2431 Internet users (56% male, 44% female) who responded to an ad run for six months on the Yahoo! search engine in 1998.

Year collected: 1998

Authors: Adam B. King

Relevant publications/more information: King 2001, link to source

household1994

Description: 53 terms related to household chores, family, and relationships.

Sample: 23 male and 46 female college students

Year collected: 1994

Authors: Amy Kroska

Relevant publications/more information: Link to source

internationaldomesticrelations1981

Description: 238 domestic and inter-state behaviors that nations can engage in.

Sample: A total of 57 respondents filled out both the international and domestic semantic differential rating booklets; 28 respondents belonging to a “general” sample (group variable = “nonprofessional”) and 29 respondents to an “elite” sample (group variable = “professional”). The assignment of individuals to the two groups was based upon their level of expertise regarding international relations. Members of the elite sample included university professors of international relations, members of the United States’ Departments of State and Defense, advanced graduate students of international relations, and members of consulting firms which contract for foreign policy work. The general sample was largely comprised of university undergraduates along with other individuals who have limited or no substantive knowledge or understanding of the workings of the international system. (Azar and Lerner 1981)

Year collected: Unclear

Authors: Edward E. Azar and Steve Lerner

Relevant publications/more information: Azar and Lerner 1981, link to source

gaymensanfrancisco1980

Description: Ratings of 14 sex-related behaviors gathered from ten San Francisco gay men in the 1980s by Professor Don Barrett, California State University, San Marcos. The “unsafebetter” group is the mean EPA ratings made by those respondents who feel that unsafe sex practices are more pleasurable than safe sex practices. The “safebetter” group is the mean EPA ratings made by those respondents who think that safe sex practices are more pleasurable. Group = “all” returns the average ratings across the sample.

Sample: 10 gay men in San Francisco

Year collected: Unclear; 1980s

Authors: Donald C. Barrett

Relevant publications/more information: Link to source