Format a data frame for use in BayesACT or bayesactR
Source:R/export_for_analysis.R
format_for_bayesact.Rd
This function allows users to easily convert dictionary subsets they
have created to the format necessary for providing to BayesACT. These
reformatted dataframes can be provided to the bayesactR::add_actor()
function in the bayesactR package (recommended) or they can be saved and
provided directly to the BayesACT C interface.
Arguments
- data
data frame or tibble object to reformat
- stat
character desired summary statistic to include. Options are "mean", "sd", "cov".
Value
A reformatted data frame. Output data frames have the following columns, in left to right order:
A term column of type "character"
3 columns containing the given E, P, and A mean values (in that order)
For summary statistic type "mean": Three numeric placeholder column. They are not used by BayesACT in calculations but are required in BayesACT inputs for compatibility with the Interact Java format.
For summary statistic type "sd": 3 numeric columns containing the given E, P, and A standard deviation values.
For summary statistic type "cov": 9 numeric columns containing the given covariance values. These are, in order, EE, EP, EA, PE, PP, PA, AE, AP, AA. Note that this covariance matrix is symmetric matrix so there are duplicate columns.
An institution codes column. BayesACT does not use this information, but does require it.
Details
Input data frames have the following requirements:
must have a column labeled "term" that is coercible to character. Duplicate entries are allowed but not recommended in this column.
must have three columns titled "E", "P", and "A" that are coercible to numeric. These will be treated as sentiment mean scores in BayesACT.
if summary statistic type is sd (standard deviation), must have three columns with titles "sd_E", "sd_P", "sd_A" that are coercible to numeric.
if summary statistic type is cov (covariance), must have nine columns with titles "cov_XX", where XX is all two-letter permutations of E, P, and A, that are coercible to numeric
if an institution codes column (titled "instcodes") is not provided, an arbitrary one will be added (this information is required but not used by BayesACT)
all other columns will be dropped
These column name and format requirements are generally satisfied by subsets
of the epa_summary_statistics dataset created by epa_subset()
.
Examples
if (FALSE) {
data <- epa_subset(dataset = "egypt2015", component = "identity",
institutions = c("law", "business"), stat = "cov")
data_bayesformat <- format_for_bayesact(data, stat = "cov")
}