EPA dictionary search and subset — epa

Returns a subset of the EPA summary or individual data that fulfills the given parameters. Filtering can be done by term, data set, component (identity, behavior, modifier, setting), type of data (summary or individual), statistics (mean, standard deviation, covariance), institutions the term belongs to, and gender of raters.

Usage

epa_subset(
  expr = ".*",
  exactmatch = FALSE,
  dataset = "everything",
  component = "everything",
  datatype = "summary",
  group = "everything",
  stat = "everything",
  stat_na_exclude = TRUE,
  instcodes = TRUE,
  institutions = "everything",
  drop.na.instcodes = FALSE
)

Arguments

expr: A term, regular expression, or list of terms or regexs to search. If a list is provided, entries will be treated as separated by "or", so all terms matching one or more of the entries will be returned. Default matches all terms.
exactmatch: Logical indicating whether the function should return only exact matches to the expression provided. If FALSE (default), all terms containing the expression are returned.
dataset: The key of the data set (or list of multiple) to search in. Default is "everything". Call dict_info() to see available data sets.
component: The component of the dictionary to use (identity, behavior, modifier, setting). Default is "everything."
datatype: Whether to retrieve summary or individual data. Default is summary.
group: The subgroup of respondents to use. Usually datasets are subgrouped by gender; options are male, female, all. Default is "everything." Ignored when datatype is individual.
stat: The statistics to include in the subset that is returned. Default is all, options are mean, sd (standard deviation), cov (covariance), and n (number of raters). Terms that do not contain values for the required statistic will be excluded from the results. Ignored if datatype is individual.
stat_na_exclude: Ignored if stat is not specified of datatype is individual. A logical indicating whether to exclude entries with NA values for any of the required statistics. Default is TRUE.
instcodes: Logical. Whether to include the institution codes in the output. Default is TRUE.
institutions: Character list. Institutions to include (defaults to everything)
drop.na.instcodes: Logical. When filtering by institution, whether or not to keep terms which have no institution code.

Value

a dataset containing the entries that match the given parameters or FALSE if no matches are found.

Examples

epa_subset("teacher")
#> # A tibble: 201 × 25
#>    term  component dataset context year  group instcodes     E     P     A   n_E
#>    <chr> <chr>     <chr>   <chr>   <chr> <chr> <chr>     <dbl> <dbl> <dbl> <dbl>
#>  1 scho… identity  calcut… India   2017  male  NA         2.02  1.47  0.83    20
#>  2 scho… identity  calcut… India   2017  fema… NA         1.88  1.14  1.01    20
#>  3 scho… identity  calcut… India   2017  all   NA         1.95  1.3   0.92    40
#>  4 scho… identity  calcut… India   2017  male  NA         2.15  1.72  1.95    20
#>  5 scho… identity  calcut… India   2017  fema… NA         2.27  1.83  1.47    20
#>  6 scho… identity  calcut… India   2017  all   NA         2.21  1.77  1.71    40
#>  7 scho… identity  calcut… India   2017  male  NA         2.09  1.47  1.62    20
#>  8 scho… identity  calcut… India   2017  fema… NA         1.68  1.39  0.5     20
#>  9 scho… identity  calcut… India   2017  all   NA         1.89  1.43  1.06    40
#> 10 teac… identity  calcut… India   2017  male  11 00001…  2.2   1.99  1.91    20
#> # ℹ 191 more rows
#> # ℹ 14 more variables: n_P <dbl>, n_A <dbl>, sd_E <dbl>, sd_P <dbl>,
#> #   sd_A <dbl>, cov_EE <dbl>, cov_EP <dbl>, cov_EA <dbl>, cov_PE <dbl>,
#> #   cov_PP <dbl>, cov_PA <dbl>, cov_AE <dbl>, cov_AP <dbl>, cov_AA <dbl>
epa_subset(dataset = "politics2003")
#> # A tibble: 216 × 25
#>    term       component dataset context year  group instcodes     E     P      A
#>    <chr>      <chr>     <chr>   <chr>   <chr> <chr> <chr>     <dbl> <dbl>  <dbl>
#>  1 alderman   identity  politi… US      2003  all   NA        0.735 0.905  0.335
#>  2 alderman   identity  politi… US      2003  male  NA        0.82  0.96   0.15 
#>  3 alderman   identity  politi… US      2003  fema… NA        0.65  0.85   0.52 
#>  4 analyze_s… behavior  politi… US      2003  all   NA        2.00  1.50  -0.325
#>  5 analyze_s… behavior  politi… US      2003  male  NA        2.32  1.83  -0.53 
#>  6 analyze_s… behavior  politi… US      2003  fema… NA        1.67  1.16  -0.12 
#>  7 assembly   identity  politi… US      2003  all   NA        1.38  1.24   1.09 
#>  8 assembly   identity  politi… US      2003  male  NA        1.51  1.05   1.06 
#>  9 assembly   identity  politi… US      2003  fema… NA        1.25  1.43   1.12 
#> 10 authorize… behavior  politi… US      2003  all   NA        1.10  2.04   0.51 
#> # ℹ 206 more rows
#> # ℹ 15 more variables: n_E <dbl>, n_P <dbl>, n_A <dbl>, sd_E <dbl>, sd_P <dbl>,
#> #   sd_A <dbl>, cov_EE <dbl>, cov_EP <dbl>, cov_EA <dbl>, cov_PE <dbl>,
#> #   cov_PP <dbl>, cov_PA <dbl>, cov_AE <dbl>, cov_AP <dbl>, cov_AA <dbl>
epa_subset(expr = ".*woman", component = "identity", group = c("male", "female"),
    institutions = c("lay", "business"))
#> # A tibble: 72 × 25
#>    term  component dataset context year  group instcodes     E     P     A   n_E
#>    <chr> <chr>     <chr>   <chr>   <chr> <chr> <chr>     <dbl> <dbl> <dbl> <dbl>
#>  1 batt… identity  calcut… India   2017  male  01 10100… -0.58 -0.22  0.33    12
#>  2 batt… identity  calcut… India   2017  fema… 01 10100…  0     0.43 -0.61    16
#>  3 preg… identity  calcut… India   2017  male  NA         0.46  0.27  1.02    16
#>  4 preg… identity  calcut… India   2017  fema… NA         1.97  1.23  0.56    19
#>  5 woman identity  calcut… India   2017  male  01 10000…  1.24  0.58 -0.03    15
#>  6 woman identity  calcut… India   2017  fema… 01 10000…  1.2   0.55  0.85    14
#>  7 woma… identity  calcut… India   2017  male  NA         0.55  0.34  0.11    20
#>  8 woma… identity  calcut… India   2017  fema… NA         1.04  0.57  0       20
#>  9 batt… identity  calcut… India   2017  male  01 10100… -0.58 -1.37 -2.02    12
#> 10 batt… identity  calcut… India   2017  fema… 01 10100…  0     0.13  0.01    16
#> # ℹ 62 more rows
#> # ℹ 14 more variables: n_P <dbl>, n_A <dbl>, sd_E <dbl>, sd_P <dbl>,
#> #   sd_A <dbl>, cov_EE <dbl>, cov_EP <dbl>, cov_EA <dbl>, cov_PE <dbl>,
#> #   cov_PP <dbl>, cov_PA <dbl>, cov_AE <dbl>, cov_AP <dbl>, cov_AA <dbl>
epa_subset(dataset = "morocco2015", stat = "cov", stat_na_exclude = FALSE)
#> # A tibble: 1,448 × 16
#>    term     component dataset context year  group instcodes cov_EE cov_EP cov_EA
#>    <chr>    <chr>     <chr>   <chr>   <chr> <chr> <chr>      <dbl>  <dbl>  <dbl>
#>  1 abandon  behavior  morocc… Morocco 2015  all   01 11111…   4.43   1.76  -1.6 
#>  2 abortio… identity  morocc… Morocco 2015  all   11 00000…   5.5    1.66  -1.39
#>  3 abuse    behavior  morocc… Morocco 2015  all   10 10100…   4.12   0.37  -1.69
#>  4 abusive  modifier  morocc… Morocco 2015  all   10 01000…   5.98   3.34  -1.1 
#>  5 accommo… modifier  morocc… Morocco 2015  all   10 01000…   4.38   2.8   -0.87
#>  6 accuse   behavior  morocc… Morocco 2015  all   10 11111…   6      1.66  -1.15
#>  7 address  behavior  morocc… Morocco 2015  all   10 11111…   3.61   1.43  -1.68
#>  8 admonish behavior  morocc… Morocco 2015  all   10 01111…   4.4    1.23  -1.88
#>  9 adolesc… identity  morocc… Morocco 2015  all   11 10000…   3.03   1.23  -0.19
#> 10 adult    identity  morocc… Morocco 2015  all   11 10000…   4.21   2.43   0.3 
#> # ℹ 1,438 more rows
#> # ℹ 6 more variables: cov_PE <dbl>, cov_PP <dbl>, cov_PA <dbl>, cov_AE <dbl>,
#> #   cov_AP <dbl>, cov_AA <dbl>
epa_subset(dataset = "usmturk2015", datatype = "individual")
#> # A tibble: 264,844 × 16
#>    dataset     context year  userid gender age   raceeth race1 race2 hisp  term 
#>    <chr>       <chr>   <chr> <chr>  <chr>  <chr> <chr>   <chr> <chr> <chr> <chr>
#>  1 usmturk2015 US      2015  MTurk1 Female 51    NA      Whit… No a… No    tele…
#>  2 usmturk2015 US      2015  MTurk1 Female 51    NA      Whit… No a… No    bewi…
#>  3 usmturk2015 US      2015  MTurk1 Female 51    NA      Whit… No a… No    coal…
#>  4 usmturk2015 US      2015  MTurk1 Female 51    NA      Whit… No a… No    unde…
#>  5 usmturk2015 US      2015  MTurk1 Female 51    NA      Whit… No a… No    appl…
#>  6 usmturk2015 US      2015  MTurk1 Female 51    NA      Whit… No a… No    gran…
#>  7 usmturk2015 US      2015  MTurk1 Female 51    NA      Whit… No a… No    love 
#>  8 usmturk2015 US      2015  MTurk1 Female 51    NA      Whit… No a… No    barr…
#>  9 usmturk2015 US      2015  MTurk1 Female 51    NA      Whit… No a… No    humb…
#> 10 usmturk2015 US      2015  MTurk1 Female 51    NA      Whit… No a… No    land…
#> # ℹ 264,834 more rows
#> # ℹ 5 more variables: component <chr>, instcodes <chr>, E <dbl>, P <dbl>,
#> #   A <dbl>