EPA dictionary data
actdata makes available a number of what are known as EPA dictionaries. These dictionaries provide the measured evaluation, potency, and activity (EPA) values associated with terms.
In addition to measured (or in one case, estimated) EPA values, the package provides metadata on the data collection and the term. These metadata are provided as additional variables. These variables are:
-
dataset: an identifier unique to a particular study
(e.g.
nc1978
;morocco2015
). These keys are used within this package’s functions to access particular datasets. - context: The country (or other context, such as the internet) where the data was originally collected.
- year: The year of data collection. In some cases these are approximate; always check original sources for definitive information.
-
component: indicates what category the term belongs
to. Not all studies provide all possible components, and some focus on
only one component. Components include:
- identity: Words that can be used to refer to people. Identities can serve as the actors or the objects in an actor-behavior-object ACT event. They are typically nouns (e.g. academic, woman, youngster).
- modified_identity: A person with an item that may change their affective connotation (e.g. adolescent with a flat basketball). Note that there are also some identities modified with adjectives mixed into the “identities” category (e.g. white man)–those are classified as identities rather than modified identities at present.
- behavior: Actions that people (or other entities, like governments) can perform. Most can serve as the behavior in an actor-behavior-object ACT event. These are typically verbs (e.g. wheedle, acclaim, work).
- modifier: Typically adjectives that can be applied to identities (e.g. active, witty, young)
- setting: Places and situations (e.g. airplane, alley, worship_service)
- value: Traits that people can have (some overlap with modifiers) or concepts or states of being that they may consider desirable or undesirable (e.g. authenticity, accessibility, tradition, risk averse)
- artifact: A non-human thing (e.g. cigar, gas guzzler, sports car, slippers)
-
group: The subsample of the respondents who
provided the rating. All datasets provide group = “all”, representing
the average over the entire sample. Many data sets additionally provide
separate summary statistics for male and female respondents. Some
dictionaries (e.g., internationaldomesticrelations1981,
gaymensanfrancisco1980) provide other subgroups; see dictionary
documentation for details. Summary statistics across the whole sample
(group = “all) are calculated slightly differently depending on data
set. Some dictionaries (e.g. the 2015 US, Morocco, and Egypt
dictionaries) are originally published as average values over all
respondents. In these cases,
all
is the only provided option. Other dictionaries are originally published in male and female subsets. Average values over all raters are not provided in these originally published sets. In this case, the package provides an approximate average calculated by averaging the values for each subsample. Typically, subsamples are male and female genders, studies recruit approximately equal numbers of men and women, and men and women’s ratings do not differ substantially on most terms. In these cases, we expect these approximate average values to be reasonably close to those that we would obtain from an average over all raters. For more information on gender and affect control theory dictionaries, see section 4.1 of David Heise’s Expressive Order (2007). - instcodes: A fourteen digit binary code that further classifies terms. See the section on institution codes below for details.
Available dictionaries
This package contains data from 40 different publicly available affect control theory dictionary data sets. Basic information on these dictionaries is shown in the table below. Detailed information on each dictionary, including references, is available at the end of this page.
Please cite these data! The data sets included in this package were originally collected by a number of different research teams. When using them for publication, please cite both their publications about the data (linked in the dataset detail section below) and this package.
Not sure which data set to use? If you simply need dependable recent ratings of a wide-ranging set of concepts in a US context (such as is used in much ACT behavior modeling research), I recommend the usfullsurveyor2015 or the usmturk2015 data sets. These are the largest and most recent general dictionaries that are currently available. Other data sets may be useful for questions regarding specific kinds of terms, other cultural contexts, or other points in time.
Dataset key | Country or context | Year | Statistics available | Components |
---|---|---|---|---|
artifactmods2022 | US | 2022 | mean, sd, cov | identity, modified_identity, artifact |
calcuttaall2017 | India | 2017 | mean, sd | identity, behavior, modifier, setting, artifact |
calcuttasubset2017 | India | 2017 | mean, sd | identity, behavior, modifier, setting, artifact |
china1999 | China | 1999 | mean | identity, behavior, modifier, setting |
dukecommunity2015 | US | 2015 | mean, sd, cov | identity, behavior, modifier |
dukestudent2015 | US | 2015 | mean, sd, cov | identity, behavior, modifier |
egypt2015 | Egypt | 2012-2014 | mean, sd, cov | identity, behavior, modifier |
employeeorg2022 | US | 2022 | mean, sd | identity |
expressive2002 | US | 2002 | mean | behavior |
gaymensanfrancisco1980 | US | 1980 | mean | behavior |
generaltech2020 | US | 2020 | mean, sd | artifact |
germany1989 | Germany | 1989 | mean | identity, behavior, modifier |
germany2007 | Germany | 2007 | mean | identity, behavior, modifier, setting |
groups2017 | US | 2017 | mean | identity |
groups2019 | US | 2019 | mean | identity |
household1994 | US | 1994 | mean | identity, behavior |
humanvalues2022 | US | 2022 | mean, sd, cov | value |
indiana2003 | US | 2003 | mean | identity, behavior, modifier, setting |
internationaldomesticrelations1981 | Unknown | 1981 | mean | behavior |
internet1998 | Internet | 1998 | mean | identity, behavior, setting |
japan1995 | Japan | 1989-2002 | mean | identity, behavior, modifier, setting |
morocco2015 | Morocco | 2015 | mean, sd, cov | identity, behavior, modifier |
mostafaviestimates2022 | US | 2022 | mean, sd | identity, behavior, modifier |
nc1978 | US | 1978 | mean | identity, behavior, modifier, setting |
nireland1977 | North Ireland | 1977 | mean | identity, behavior |
nounphrasegrammar2019 | US | 2019 | mean, sd | identity |
occs2019 | US | 2019 | mean, sd, cov | identity |
occs2020 | US | 2020 | mean, sd, cov | identity |
occs2021 | US | 2021 | mean, sd, cov | identity |
ontario1980 | Canada | 1980-1986 | mean | identity, behavior, modifier |
ontario2001 | Canada | 2001-2003 | mean | identity, behavior, modifier, setting |
politics2003 | US | 2003 | mean | identity, behavior |
products2022 | US | 2022 | mean, sd, cov | artifact |
techvshuman2021 | US | 2021 | mean | identity |
texas1998 | US | 1998 | mean | identity, behavior, modifier |
uga2015 | US | 2015 | mean, sd, cov | identity, behavior, modifier |
ugatech2008 | US | 2008 | mean, sd | identity, behavior, artifact |
usfullsurveyor2015 | US | 2015 | mean, sd, cov | identity, behavior, modifier |
usmturk2015 | US | 2015 | mean, sd, cov | identity, behavior, modifier |
usstudent2015 | US | 2015 | mean, sd, cov | identity, behavior, modifier |
Accessing dictionary data: summary statistics
Within the package, all summary data is stored in one data frame
named epa_summary_statistics
. This data frame contains all
available EPA means and, when available, institution codes, variances,
covariances, and respondent numbers for all terms in all dictionaries.
There is one row per term-dataset-gender group.
These data are identical to (and usually sourced directly from) that provided in other places, such as affectcontroltheory.org, Interact Java, and a legacy affect control theory website.
Researchers will rarely need or want to work with all of these data
at one time. To easily build subsets of this data frame, use the
epa_subset()
function. This function allows users to search
by term, filter by dataset, return only certain summary statistics, and
more. Both the original data frame and these subsets are provided in
long form, making them easy to manipulate further using the Tidyverse if
needed.
The columns in this data set correspond to those in the dictionary metadata table above. However, to aid sorting, only one year (instead of a range) is provided for all data sets.
# Return all sample average entries for terms containing "friend"
contains_friend <- epa_subset(expr = "friend", group = "all")
head(contains_friend)
#> # A tibble: 6 × 25
#> term component dataset context year group instcodes E P A n_E
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 best_… identity calcut… India 2017 all 11 10000… 2.84 2.5 2.08 40
#> 2 boyfr… identity calcut… India 2017 all 10 00000… 1.33 1.03 0.73 40
#> 3 ex_bo… identity calcut… India 2017 all 10 00000… 0.13 0.3 0.23 39
#> 4 ex_gi… identity calcut… India 2017 all 01 00000… 0.21 0.04 -0.05 39
#> 5 friend identity calcut… India 2017 all 11 10000… 2.54 2.16 2.42 40
#> 6 frien… modifier calcut… India 2017 all 10 01000… 1.84 1.36 1.68 36
#> # ℹ 14 more variables: n_P <dbl>, n_A <dbl>, sd_E <dbl>, sd_P <dbl>,
#> # sd_A <dbl>, cov_EE <dbl>, cov_EP <dbl>, cov_EA <dbl>, cov_PE <dbl>,
#> # cov_PP <dbl>, cov_PA <dbl>, cov_AE <dbl>, cov_AP <dbl>, cov_AA <dbl>
# Return all entries for the identity "friend"
friend <- epa_subset(expr = "friend", exactmatch = TRUE, component = "identity")
head(friend)
#> # A tibble: 6 × 25
#> term component dataset context year group instcodes E P A n_E
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 friend identity calcut… India 2017 male 11 10000… 2.73 2.21 2.5 20
#> 2 friend identity calcut… India 2017 fema… 11 10000… 2.34 2.11 2.34 20
#> 3 friend identity calcut… India 2017 all 11 10000… 2.54 2.16 2.42 40
#> 4 friend identity calcut… India 2017 male 11 10000… 2.65 1.99 2.79 19
#> 5 friend identity calcut… India 2017 fema… 11 10000… 2.34 2.45 3.4 20
#> 6 friend identity calcut… India 2017 all 11 10000… 2.49 2.3 3.16 39
#> # ℹ 14 more variables: n_P <dbl>, n_A <dbl>, sd_E <dbl>, sd_P <dbl>,
#> # sd_A <dbl>, cov_EE <dbl>, cov_EP <dbl>, cov_EA <dbl>, cov_PE <dbl>,
#> # cov_PP <dbl>, cov_PA <dbl>, cov_AE <dbl>, cov_AP <dbl>, cov_AA <dbl>
# Return the entire Ontario 1980 dataset
all_ontario1980 <- epa_subset(dataset = "ontario1980")
head(all_ontario1980)
#> # A tibble: 6 × 25
#> term component dataset context year group instcodes E P A n_E
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 aband… behavior ontari… Canada 1980 all 01 11111… -2.74 0.01 0.74 NA
#> 2 aband… behavior ontari… Canada 1980 male 01 11111… -2.64 -0.04 0.82 NA
#> 3 aband… behavior ontari… Canada 1980 fema… 01 11111… -2.84 0.06 0.66 NA
#> 4 aband… modifier ontari… Canada 1980 all 10 00010… -2.68 -1.40 -0.59 NA
#> 5 aband… modifier ontari… Canada 1980 male 10 00010… -2.71 -1.46 -0.43 NA
#> 6 aband… modifier ontari… Canada 1980 fema… 10 00010… -2.64 -1.33 -0.75 NA
#> # ℹ 14 more variables: n_P <dbl>, n_A <dbl>, sd_E <dbl>, sd_P <dbl>,
#> # sd_A <dbl>, cov_EE <dbl>, cov_EP <dbl>, cov_EA <dbl>, cov_PE <dbl>,
#> # cov_PP <dbl>, cov_PA <dbl>, cov_AE <dbl>, cov_AP <dbl>, cov_AA <dbl>
# Return this same dataset, but only include the female mean values for behaviors
f_mean_ontario1980 <- epa_subset(dataset = "ontario1980", group = "female", component = "behavior", stat = "mean")
head(f_mean_ontario1980)
#> # A tibble: 6 × 10
#> term component dataset context year group instcodes E P A
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 abandon behavior ontario… Canada 1980 fema… 01 11111… -2.84 0.06 0.66
#> 2 abuse behavior ontario… Canada 1980 fema… 10 10100… -3.03 1.21 1.88
#> 3 acclaim behavior ontario… Canada 1980 fema… 10 00010… 1.9 0.8 0
#> 4 accommodate behavior ontario… Canada 1980 fema… 10 11111… 2.31 0.87 0.12
#> 5 accuse behavior ontario… Canada 1980 fema… 10 11111… -1.73 0.97 0.85
#> 6 address behavior ontario… Canada 1980 fema… 10 11111… 1.36 0.36 -0.15
Accessing dictionary data: individual-level data
EPA summary information is likely to be sufficient for most research questions. Existing research in affect control theory almost always uses summary data. However, the respondent-level data used to compute these summaries may also be useful in particular instances.
A number of data sets, all from 2015 or later, include individual-level data, newly made publicly available in this package. Some of these are subsets of others. These are the following (see the dictionary table above for more information):
- morocco2015
- egypt2015
- usmturk2015
- dukestudent2015
- uga2015
- dukecommunity2015
- usstudent2015 (a combination of 4 and 5)
- usfullsurveyor2015 (a combination of 4, 5, and 6)
- occs2019
- occs2020
- artifactmods2022
- humanvalues2022
- products2022
All individual data is located in the individual
data
frame within this package. Like the summary datasets, these data are
provided in long form, with one respondent’s ratings of one term per
row. Where available, respondents’ gender, race, and age are also
included.
To subset these data, use the epa_subset()
function with
the datatype argument set to “individual.”
When using any of the US 2015 individual level data sets, keep in
mind that usstudent2015 and usfullsurveyor2015 are combinations of other
datasets. In the individual data, rows belonging to these sets are
included, but are categorized as belonging to the more specific data set
(dukestudent2015, uga2015, or dukecommunity2015). You may still use the
usstudent2015 and usfullsurveyor2015 keys in epa_subset()
with datatype = individual
. The function will return rows
from the appropriate combination of data sets.
# Return all individual-level ratings for identities from the egypt2015 data set
egypt_individual <- epa_subset(dataset = "egypt2015", datatype = "individual", component = "identity")
head(egypt_individual)
#> # A tibble: 6 × 16
#> dataset context year userid gender age raceeth race1 race2 hisp term
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 egypt2015 Egypt 2015 Egypt487 Female NA NA NA NA NA team…
#> 2 egypt2015 Egypt 2015 Egypt1483 Male NA NA NA NA NA trav…
#> 3 egypt2015 Egypt 2015 Egypt1653 Male NA NA NA NA NA pagan
#> 4 egypt2015 Egypt 2015 Egypt1621 Male NA NA NA NA NA spor…
#> 5 egypt2015 Egypt 2015 Egypt1072 Male NA NA NA NA NA grad…
#> 6 egypt2015 Egypt 2015 Egypt165 Female NA NA NA NA NA pede…
#> # ℹ 5 more variables: component <chr>, instcodes <chr>, E <dbl>, P <dbl>,
#> # A <dbl>
# Return all ratings by female-identifying students in the usstudent2015 data set
female_students <- epa_subset(dataset = "usstudent2015", datatype = "individual") %>%
dplyr::filter(gender == "Female")
head(female_students)
#> # A tibble: 6 × 16
#> dataset context year userid gender age raceeth race1 race2 hisp term
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 dukestudent… US 2015 DStu6… Female 21 NA Blac… No a… NA some…
#> 2 dukestudent… US 2015 DStu5… Female 19 NA Whit… No a… NA some…
#> 3 dukestudent… US 2015 DStu6… Female 20 NA Whit… No a… NA some…
#> 4 dukestudent… US 2015 DStu5… Female 20 NA Whit… No a… NA some…
#> 5 dukestudent… US 2015 DStu5… Female 22 t… NA Blac… No a… NA aban…
#> 6 dukestudent… US 2015 DStu5… Female 18 NA Whit… No a… NA aban…
#> # ℹ 5 more variables: component <chr>, instcodes <chr>, E <dbl>, P <dbl>,
#> # A <dbl>
Term table
One of the main goals of this package is to make it easy to compare
meaning across dictionaries. To this end, the package provides a data
frame called term_table
that shows at a glance which terms
are included in which dictionaries. Each column in these tables
represents a dictionary (labeled with its key) and each row is a term.
Cell entries (0/1) indicate whether or not the specified dictionary has
the specified term. These tables can easily be modified further to
generate summaries across a set of dictionaries of interest. To see the
entries for only a particular component, to search by term, or to limit
to a particular set of dictionaries, use dplyr functions to filter the
term table.
# the whole table
head(term_table)
#> # A tibble: 6 × 42
#> term component artifactmods2022 calcuttaall2017 calcuttasubset2017 china1999
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 abort… identity 1 1 1 1
#> 2 abort… modified… 1 0 0 0
#> 3 abort… modified… 1 0 0 0
#> 4 abort… modified… 1 0 0 0
#> 5 abort… modified… 1 0 0 0
#> 6 abort… modified… 1 0 0 0
#> # ℹ 36 more variables: dukecommunity2015 <dbl>, dukestudent2015 <dbl>,
#> # egypt2015 <dbl>, employeeorg2022 <dbl>, expressive2002 <dbl>,
#> # gaymensanfrancisco1980 <dbl>, generaltech2020 <dbl>, germany1989 <dbl>,
#> # germany2007 <dbl>, groups2017 <dbl>, groups2019 <dbl>, household1994 <dbl>,
#> # humanvalues2022 <dbl>, indiana2003 <dbl>,
#> # internationaldomesticrelations1981 <dbl>, internet1998 <dbl>,
#> # japan1995 <dbl>, morocco2015 <dbl>, mostafaviestimates2022 <dbl>, …
# settings only
set_tt <- term_table %>%
dplyr::filter(component == "setting")
head(set_tt)
#> # A tibble: 6 × 42
#> term component artifactmods2022 calcuttaall2017 calcuttasubset2017 china1999
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 abort… setting 0 1 1 0
#> 2 adult… setting 0 1 1 1
#> 3 airpl… setting 0 1 1 1
#> 4 ambul… setting 0 1 1 1
#> 5 amuse… setting 0 1 1 1
#> 6 april… setting 0 1 1 0
#> # ℹ 36 more variables: dukecommunity2015 <dbl>, dukestudent2015 <dbl>,
#> # egypt2015 <dbl>, employeeorg2022 <dbl>, expressive2002 <dbl>,
#> # gaymensanfrancisco1980 <dbl>, generaltech2020 <dbl>, germany1989 <dbl>,
#> # germany2007 <dbl>, groups2017 <dbl>, groups2019 <dbl>, household1994 <dbl>,
#> # humanvalues2022 <dbl>, indiana2003 <dbl>,
#> # internationaldomesticrelations1981 <dbl>, internet1998 <dbl>,
#> # japan1995 <dbl>, morocco2015 <dbl>, mostafaviestimates2022 <dbl>, …
# limit to only the two Germany dictionaries and exclude terms in neither
german_tt <- term_table %>%
dplyr::select(term, component, germany1989, germany2007) %>%
dplyr::filter(germany1989 + germany2007 >= 1)
head(german_tt)
#> # A tibble: 6 × 4
#> term component germany1989 germany2007
#> <chr> <chr> <dbl> <dbl>
#> 1 adolescent identity 1 0
#> 2 beginner identity 1 0
#> 3 brat identity 1 0
#> 4 brute identity 1 0
#> 5 buddy identity 1 0
#> 6 bully identity 0 1
# limit to terms that contain "friend"
friend_tt <- term_table %>%
dplyr::filter(stringr::str_detect(term, "friend"))
head(friend_tt)
#> # A tibble: 6 × 42
#> term component artifactmods2022 calcuttaall2017 calcuttasubset2017 china1999
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 best_… identity 0 1 1 0
#> 2 boyfr… identity 0 1 1 1
#> 3 ex_bo… identity 0 1 1 0
#> 4 ex_gi… identity 0 1 1 0
#> 5 friend identity 0 1 1 0
#> 6 frien… modifier 0 1 1 1
#> # ℹ 36 more variables: dukecommunity2015 <dbl>, dukestudent2015 <dbl>,
#> # egypt2015 <dbl>, employeeorg2022 <dbl>, expressive2002 <dbl>,
#> # gaymensanfrancisco1980 <dbl>, generaltech2020 <dbl>, germany1989 <dbl>,
#> # germany2007 <dbl>, groups2017 <dbl>, groups2019 <dbl>, household1994 <dbl>,
#> # humanvalues2022 <dbl>, indiana2003 <dbl>,
#> # internationaldomesticrelations1981 <dbl>, internet1998 <dbl>,
#> # japan1995 <dbl>, morocco2015 <dbl>, mostafaviestimates2022 <dbl>, …
Institution codes
Contexts restrict the labels that we consider reasonable choices. For instance, if two people are discussing next year’s budget in a business meeting, it would seem quite unlikely for one to label the other as a “priest”. A business-related label, like “manager” or “employee”, or a label that applies across a variety of contexts, like “genius” or “jerk”, would seem more realistic.
EPA dictionaries usually contain 14-digit binary strings known as “institution codes” that contain information about what social contexts terms apply within. These codes can be used by analysis software when simulating interaction.
Valid categories are (see Heise’s 2007 book Expressive Order for details):
- male, female: What genders terms can typically be applied to (identities only)
- overt, surmised: Whether labeling behaviors requires interpretation or insight on the part of the observer (behaviors only)
- place, time: Type of setting (settings only)
- lay, business, law, politics, academe, medicine, religion, family, sexual: Social institutions that terms may belong to. Identities, behaviors, and settings only.
- monadic, group, corporal: How a term requires or implicates others. Identities, behaviors, and settings only.
- adjective, adverb: Part of speech (modifiers)
- emotion, trait, status, feature, emotion_spiral: Categories for modifiers.
This package provides several ways to demystify and make use of these codes. See the function documentation for more details.
- The
epa_subset()
function takes aninstitutions
argument that allows a user to filter by institution. - The
expand_instcodes()
function converts institution code strings into columns containing TRUE/FALSE/NA values. These values indicate whether a category is applicable to a term and, if so, whether the term belongs to the category. Representing institutions in this way makes it easier for users to work with them. - The
create_instcode()
function takes a component and a set of logical values indicating category membership and returns a properly formatted institution code binary string. This is useful for creating new institution codes.
businesslaw <- epa_subset(dataset = "morocco2015", institutions = c("business", "law"), stat = "mean") %>%
dplyr::select(term, component, instcodes, E, P, A)
head(businesslaw)
#> # A tibble: 6 × 6
#> term component instcodes E P A
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 abandon behavior 01 111111110 000 -1.81 -1.03 1.2
#> 2 abuse behavior 10 101000011 100 -2.62 -1.96 1.13
#> 3 accuse behavior 10 111111111 010 -1.88 -0.97 1.34
#> 4 address behavior 10 111111110 010 1.52 1.26 -0.64
#> 5 admonish behavior 10 011111111 000 2.58 2.09 -1.4
#> 6 advise behavior 10 011111110 000 2.27 2.02 -1.47
# the default is to keep terms for which there are no institution codes.
# Change this behavior using the drop.na.instcodes argument in epa_subset().
businesslaw <- expand_instcodes(businesslaw) %>%
dplyr::select(-E, -P, -A)
#> At least one of the institution codes is NA.
head(businesslaw, 7)
#> # A tibble: 7 × 19
#> term component instcodes male female overt surmised lay business law
#> <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
#> 1 abandon behavior 01 111111… NA NA FALSE TRUE TRUE TRUE TRUE
#> 2 abuse behavior 10 101000… NA NA TRUE FALSE TRUE FALSE TRUE
#> 3 accuse behavior 10 111111… NA NA TRUE FALSE TRUE TRUE TRUE
#> 4 address behavior 10 111111… NA NA TRUE FALSE TRUE TRUE TRUE
#> 5 admonish behavior 10 011111… NA NA TRUE FALSE FALSE TRUE TRUE
#> 6 advise behavior 10 011111… NA NA TRUE FALSE FALSE TRUE TRUE
#> 7 advisor identity NA NA NA NA NA NA NA NA
#> # ℹ 9 more variables: politics <lgl>, academe <lgl>, medicine <lgl>,
#> # religion <lgl>, family <lgl>, sexual <lgl>, monadic <lgl>, group <lgl>,
#> # corporal <lgl>
newcode <- create_instcode(component = "setting", place = TRUE, family = TRUE, religion = TRUE)
print(paste("The institution code", newcode, "represents a setting that is a place and is relevant to only the family and religion domains."))
#> [1] "The institution code 10 000000110 000 represents a setting that is a place and is relevant to only the family and religion domains."
Dataset collection details
Data collection efforts are typically intended to either be general or specific. General data collections aim to provide information on the typical, normative culture of a place or context, typically a country. They measure EPA for terms that apply to a wide range of social situations, and use samples of respondents that are argued to be either representative of a country’s population at large or “cultural experts” who are deeply familiar with the types of spaces in which the cultural of interest is reproduced (Heise 2007). Data from these general data collections is often used in simulations of actor-behavior-object ACT events. Specific data collections measure EPA for terms in only a particular domain of interest, and may recruit respondents from only a target subpopulation of interest. Such data are sometimes used in simulations of events, but often the reason to collect them is to be able to better describe meanings in a particular domain.
Below, in approximate reverse chronological order within general/specific categories, are details including links for more information about the collection of each of the datasets contained in this package.
General dictionaries
mostafaviestimates2022
Description: These EPA values were estimated using Mostafavi, Porter, and Robinson’s Bidirectional Encoder Representations from Transforms (BERT) model. Most terms from previously collected dictionaries are included.
Sample: Not applicable; values were estimated, not empirically measured. However, the model was trained on the 2015 US Full Surveyor (usfullsurveyor2015) dataset.
Authors: Moeen Mostafavi, Michael D. Porter, Dawn T. Robinson
Relevant publications/more information: Mostafavi, Porter, and Robinson 2022
Calcutta 2017: calcuttaall2017 and calcuttasubset2017
Description: Semantic differential ratings of 1,469 concepts in Bengali, a language spoken by about 250 million individuals in eastern India and Bangladesh. The calcuttaall2017 dataset contains summary information from all respondents. The calcuttasubset2017 dataset contains summary information calculated using data from just respondents who used scales as expected. See linked paper for details.
Sample: 20 male and 20 female native Bengali speakers living in Calcutta, India
Year collected: 2013
Authors: Shibashis Mukherjee, David Heise
Relevant publications/more information: Mukherjee and Heise 2017
United States 2015: dukecommunity2015, dukestudent2015, uga2015, usfullsurveyor2015, usmturk2015, usstudent2015
Description: Ratings of 929 identities, 814 behaviors, and 660 modifiers collected between 2012 and 2014 from several samples of people living in the United States: students at the University of Georgia (uga2015), students at Duke University (dukestudent2015), non-students living in Durham, North Carolina (dukecommunity2015), and US-based workers on Amazon Mechanical Turk (usmturk2015). Some keys refer to combinations of datasets. usstudent2014 is the combination of dukestudent2015 and uga2015. usfullsurveyor2015 is the combination of both student samples and the community sample (dukestudent2015, uga2015, and dukecommunity2015).
Sample: n = 1368 undergraduates at the University of Georgia (uga2015), n = 216 students at Duke University (dukestudent2015), n = 159 non-students living in Durham, North Carolina (dukecommunity2015), and n = 2615 US Amazon Mechanical Turk workers.
Year collected: 2012-2014
Authors: Lynn Smith-Lovin, Dawn T. Robinson, Bryan C. Cannon, Jesse K. Clark, Robert Freeland, Jonathan H. Morgan, Kimberly B. Rogers
Relevant publications/more information/citations: usstudent, usfullsurveyor, uga, usmturk
egypt2015
Description: 397 identities, 368 behaviors, and 233 modifiers
Sample: 1716 residents of Cairo, Egypt
Year collected: 2012-2014
Authors: Hamid Latif, Lynn Smith-Lovin, Dawn T. Robinson, Bryan C. Cannon, Brent Curdy, Darys J Kriegel and Jonathan H. Morgan
Relevant publications/more information: Link to source, Robinson, Smith-Lovin, and Zhao 2020
morocco2015
Description: 397 identities, 368 behaviors, and 233 modifiers
Sample: n = 1546 residents of Rabat, Morocco
Year collected: 2014-2015
Authors: Lynn Smith-Lovin, A. Soudi, Dawn T. Robinson, Bryan C. Cannon, Brent Curdy, Darys J. Kriegel, Jonathan H. Morgan
Relevant publications/more information: Link to source, Robinson, Smith-Lovin, and Zhao 2020
germany2007
Description: 376 identities, 393 behaviors, and 331 modifiers.
Sample: Ratings were obtained with Surveyor from 1905 subjects (734 male and 1171 female) from all over Germany. The research was advertised as a “study of language and emotion” in an extensive recruitment campaign including mailing lists from different universities, weblogs, newspaper reports and radio interviews. Most of the participants (N = 1029) were between 20 and 29 years of age, but the sample covered all ages, including N = 129 being younger than 20 and N = 92 older than 60 years. The data of 83 persons (4.4 %) were excluded from the analysis, as they had indicated German not being their mother tongue. On average, each stimulus was rated by 29.5 male and 46.4 female raters.
Year collected: 2007
Authors: Tobias Schröder
Relevant publications/more information: Schröder 2011. Raw data available through Interact Java applet.
indiana2003
Description: Ratings of 500 Identities, 500 Behaviors, 300 Modifiers, and 200 Settings were collected at Indiana University, via the Internet using the Surveyor applet.
Sample: n = 1027 Indiana University students enrolled in the business and arts and sciences schools who lived in the U.S.A. at age 16 and were about equally male and female.
Year collected: 2002-2003
Authors: Clare Francis and David R. Heise
Relevant publications/more information: Information on sample, information on term selection. Raw data available through Interact Java applet.
ontario2001
Description: Data on 993 Identities, 601 Behaviors, 500 Modifiers, and 200 Settings were gathered with the Attitude program from Guelph, Ontario, undergraduates in 2001-2002. Data on settings were gathered with the Surveyor program at Guelph in 2003. Funded by the Social Science and Humanities Research Council of Canada.
Sample: University of Guelph undergraduate students
Year collected: 2001-2003
Authors: Neil J. MacKinnon
Relevant publications/more information: Luke 2010, Affect Control Theory website entry. Raw data available through Interact Java applet.
china1999
Description: Ratings of 449 Identities, 300 Behaviors, 98 Emotions, 150 Traits, and 149 Settings were obtained with the Attitude program.
Sample: About 380 undergraduate students at Fudan University in Shanghai, China
Year collected: 1991
Authors: Herman W. Smith and Yi Cai
Relevant publications/more information: Zhao 2022. Affect control theory website. Raw data available through Interact Java applet.
texas1998
Description: Ratings of 443 Identities, 278 Behaviors, 65 Modifiers, and 1 Setting were collected at Texas Tech University with program Attitude.
Sample: Some disagreement between sources regarding sample size. Schneider 2007 says 420 undergraduate students received a small monetary incentive for rating 413 identities. The affect control theory website entry claims sample size is 482. It may be that this data set contains the combined ratings of Texas Tech students with those of additional terms by University of Missouri students collected at around the same time by Herm Smith (see Schneider 2007).
Year collected: 1998
Authors: Andreas Schneider
Relevant publications/more information: Schneider 2007. Raw data available through Interact Java applet.
japan1995
Description:
From Affect Control Theory website: Ratings of 403 Identities and 307 Behaviors, and a few Settings were obtained with the Attitude program from 323 Tohoku University students in 1989. In 1995 and 1996, 120 women students at Kyoritsu Women’s, Japan Women’s, and Teikyo Universities and 120 men students at Teikyo and Rikkyo Universities rated an additional 300 settings, 300 modifiers (mainly traits), 200 business identities, and 75 behaviors. Yoichi Murase (Rikkyo University) and Nozomu Matsubara (Tokyo University) provided access to students who rated 102 emotions, 70 behaviors and 55 identities in 2002 using Surveyor. Total numbers of entries in Interact lexicon are: 713 Identities, 455 Behaviors, 426 Modifiers, and 300 Settings. Number of male or female raters generally is about 30 for each concept.
From Smith, Matsuno, and Umino 1994 and Smith, Umino, and Matsuno 1998: 403 identities and 307 behaviors were rated by a convenience sample of 25 men and 25 women at a national university in Japan.
Sample: University students in Japan. Some disagreement among sources regarding sample size.
Year collected: 1989-2002
Authors: Herman W. Smith, Takanori Matsuno, Shuuichirou Ike, and Michio Umino
Relevant publications/more information: Smith, Matsuno, and Umino 1994, Smith, Umino, and Matsuno 1998. Raw data available through Interact Java applet.
germany1989
Description:
From Affect Control Theory website: Ratings of 442 Identities, 295 Behaviors, and 67 Modifiers, selected for back-translatability with the 1978 U.S.A. dictionary were obtained with the Attitude program from 520 Mannheim students. Subjects were matched to the American undergraduate population by proportional inclusion of 12 and 13 grade youths at two German Studenten des Grundstudiums and Gymnasiasten, along with subjects from Mannheim University, which attracts students mainly from the Rhein-Neckar region in former West Germany.
From Schneider 2004: To correspond to the undergraduate population in the United States, subjects were pupils in the thirteenth grade of Gymnasien1 as well as university students. 380 subjects were recruited from Mannheim University and two Gymnasien in Mannheim, a large industrial city, that attract students mainly from the Rhein-Neckar region in former West Germany.
Sample: German students; disagreement among sources about sample size.
Year collected: 1989
Authors: Andreas Schneider
Relevant publications/more information: Schneider 2004. Raw data available through Interact Java applet.
ontario1980
Description: Data on 843 Identities and 593 Behaviors were obtained with paper questionnaires in 1980-1983, and 495 Modifiers were added in 1985-1986. Funded by the Social Science and Humanities Research Council of Canada.
Sample: 5534 (identities and behaviors) or 1260 (modifiers) undergraduate students at the University of Guelph. Each term was rated by approximately 35 men and 35 women.
Year collected: 1980-1986
Authors: Neil J. MacKinnon
Relevant publications/more information: MacKinnon and Luke 2002. Raw data available through Interact Java applet.
nc1978
Description: From Affect Control Theory website: Ratings of 721 Identities, 600 Behaviors, 440 Modifiers, and 345 Settings were obtained with paper questionnaires from 1,225 North Carolina undergraduates. (Ratings for some emotion words in this data set were obtained by Heise from Indiana University undergraduates in 1985.) Funded by National Institute of Mental Health Grant 1-R01-MH29978-01-SSR.
Sample: From Smith-Lovin 1988: Data were collected from students in social sciences and humanities classes at the University of North Carolina at Chapel Hill in the 1977-78 school year. Each out-of-context stimulus was rated by approximately 56 subjects. Approximately half of the raters for each stimulus were males and half females.
Year collected: Mostly 1977-1978; some emotions from 1985
Authors: Lynn Smith-Lovin and David R. Heise
Relevant publications/more information: Smith-Lovin 1988. Raw data available through Interact Java applet.
nireland1977
Description: Ratings of 528 Identities and 498 Behaviors were obtained with paper questionnaires.
Sample: 319 Belfast teenagers in Catholic high schools
Year collected: 1977
Authors: Willigan and Heise
Relevant publications/more information: Affect Control Theory website entry. Raw data available through Interact Java applet.
Specific dictionaries
artifactmods2022
Description: Ratings of 58 identities, 52 physical artifacts, and 212 artifact-modified identities across a range of identities and artifact types.
Sample: n = 825 participants recruited through Amazon Mechanical Turk who had lived in the U.S. over half their lives.
Year collected: 2015
Authors: Rohan Lulham and Daniel B. Shank
Relevant publications/more information: Lulham and Shank 2022
employeeorg2022
Description: Ratings of organizations (e.g., library) and their employees (e.g., employee of a library).
Sample: 118-119 participants in the U.S. recruited through Amazon Mechanical Turk.
Year collected: Unknown
Authors: Daniel B. Shank and Alexander Burns
Relevant publications/more information: Shank and Burns 2022
humanvalues2022
Description: Ratings of 393 values, traits, etc. Collected with data for Lulham and Shank 2022 (artifactmods2022).
Sample: n = 825 participants recruited through Amazon Mechanical Turk who had lived in the U.S. over half their lives.
Year collected: 2015
Authors: Rohan Lulham, Daniel B. Shank, and Clementine Thurgood
Relevant publications/more information: Lulham and Shank 2022
products2022
Description: Ratings of 132 everyday products. Collected with data for Lulham and Shank 2022 (artifactmods2022).
Sample: n = 825 participants recruited through Amazon Mechanical Turk who had lived in the U.S. over half their lives.
Year collected: 2015
Authors: Rohan Lulham, Daniel B. Shank, and Clementine Thurgood
Relevant publications/more information: Lulham and Shank 2022
techvshuman2021
Description: Ratings of 59 roles as performed by a human, a computer, or an AI identity (e.g., a product assembling employee, a product assembling computer system, a product assembling artificial intelligence).
Sample: n = 549 people recruited using Prolific who had lived over half their lives in the United States.
Year collected: Unknown
Authors: Daniel B. Shank, Madison Bowen, Alexander Burns, Matthew Dew
Relevant publications/more information: Shank, Bowen, Burns, and Dew 2021
generaltech2020
Description: Ratings of 25 general technology terms (e.g. bot, artificial intelligence).
Sample: 59 Amazon Mechanical Turk workers who had lived in the US at least half their lives.
Year collected: Unknown
Authors: Daniel B. Shank, Alexander Burns, Sophia Rodriguez, Madison Bowen
Relevant publications/more information: Shank, Burns, Rodriguez, and Bowen 2020
Occupations: occs2019 and occs2020
Description: Ratings of U.S. occupational categories. Wave 1, collected in 2019 and 2020, immediately before the onset of the COVID-19 pandemic, contains ratings for all 650 U.S. census occupational categories. Wave 2, collected in 2020 in the early months of the pandemic, contains ratings for 94 occupations, with a focus on those deemed “essential” in the pandemic context.
Sample: Quasi-nationally representative sample from an online Qualtrics panel, n = 2726
Year collected: 2019-2020
Authors: Joseph M. Quinn, Robert E. Freeland, Kimberly B. Rogers, Jesse Hoey, Lynn Smith-Lovin
Relevant publications/more information: Quinn, Freeland, Rogers, Hoey, and Smith-Lovin 2022
nounphrasegrammar2019
Description: Participants rate 28 social identity concepts, which are either count or collective nouns, presented in one of five grammatical forms. These data have been used to examine the influences of determiners (a/an, the, and all) and grammatical number (singular or plural) on the affective meaning of social identity concepts.
Sample: Between 372 and 384 Amazon Mechanical Turk workers living in the United States.
Year collected: Unknown
Authors: Daniel B. Shank, Sarah E. Hercula, Brent Curdy
Relevant publications/more information: Shank, Hercula, and Curdy 2019
groups2019
Description: Ratings of 170 group-related concepts. These were collected as pilot data for Shank, Hercula, and Curdy 2019 (nounphrasegrammar2019) and Shank and Burns 2022 (employeeorg2022).
Relevant publications/more information: Shank, Hercula, and Curdy 2019, Shank and Burns 2022
groups2017
Description: EPA ratings of 64 group-related concepts, including primary groups, task groups, categories of people, and collectives.
Sample: 155 Amazon Mechanical Turk workers who had lived in the US the majority of their lives.
Year collected: 2017
Authors: Daniel B. Shank and Alexander Burns
Relevant publications/more information: Shank and Burns 2018
ugatech2008
Description: Ratings of 80 technology-related items and concepts.
Sample: n = 174 undergraduate students at the University of Georgia; 100 women and 74 men.
Year collected: 2008
Authors: Daniel B. Shank
Relevant publications/more information: Shank 2010
politics2003
Description: Ratings of political actors (individuals, collectives, groups, and organizations) and behaviors (individual and social).
Sample: Students in an introduction to sociology course at the University of Missouri-St Louis. n = 47 men and 74 women.
Year collected: Unclear
Authors: Kyle Irwin
Relevant publications/more information: Irwin 2003, link to source
expressive2002
Description: 98 nonverbal behavior terms collected in Study 1 of Rashotte 2002. Terms were divided into four subsets. Subset 1 of the behaviors was given to 50 men and 50 women; subset 2 to 47 men and 52 women; subset 3 to 39 men and 59 women; and subset 4 to 36 men and 61 women.
Sample: 230 female and 172 male students in introductory sociology courses at a large public university in the southwestern United States.
Year collected: Unclear
Authors: Lisa Slattery Walker (formerly Lisa Slattery Rashotte)
Relevant publications/more information: Rashotte 2002, link to source
internet1998
Description: Internet-related identities, behaviors, and settings rated by Internet users in 1998.
Sample: 2431 Internet users (56% male, 44% female) who responded to an ad run for six months on the Yahoo! search engine in 1998.
Year collected: 1998
Authors: Adam B. King
Relevant publications/more information: King 2001, link to source
household1994
Description: 53 terms related to household chores, family, and relationships.
Sample: 23 male and 46 female college students
Year collected: 1994
Authors: Amy Kroska
Relevant publications/more information: Link to source
internationaldomesticrelations1981
Description: 238 domestic and inter-state behaviors that nations can engage in.
Sample: A total of 57 respondents filled out both the international and domestic semantic differential rating booklets; 28 respondents belonging to a “general” sample (group variable = “nonprofessional”) and 29 respondents to an “elite” sample (group variable = “professional”). The assignment of individuals to the two groups was based upon their level of expertise regarding international relations. Members of the elite sample included university professors of international relations, members of the United States’ Departments of State and Defense, advanced graduate students of international relations, and members of consulting firms which contract for foreign policy work. The general sample was largely comprised of university undergraduates along with other individuals who have limited or no substantive knowledge or understanding of the workings of the international system. (Azar and Lerner 1981)
Year collected: Unclear
Authors: Edward E. Azar and Steve Lerner
Relevant publications/more information: Azar and Lerner 1981, link to source
gaymensanfrancisco1980
Description: Ratings of 14 sex-related behaviors gathered from ten San Francisco gay men in the 1980s by Professor Don Barrett, California State University, San Marcos. The “unsafebetter” group is the mean EPA ratings made by those respondents who feel that unsafe sex practices are more pleasurable than safe sex practices. The “safebetter” group is the mean EPA ratings made by those respondents who think that safe sex practices are more pleasurable. Group = “all” returns the average ratings across the sample.
Sample: 10 gay men in San Francisco
Year collected: Unclear; 1980s
Authors: Donald C. Barrett
Relevant publications/more information: Link to source