Setting up and running simulations

library(bayesactR)
#> Loading required package: actdata
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(actdata)

There are three stages to running BayesACT simulations: a setup stage, a run stage, and an analysis stage. The setup and run stages are described on this page. For more information on output and analysis, visit the results information page.

Setup: Creating input files

BayesACT needs some information about actors, their relationships with one another, and the structure of their interaction. These parameters are provided to the C package via specially formatted text files. These files can be created by hand (and many examples are provided in the BayesACT C package and documentation), however, bayesactR also provides functions that generate them automatically. Automatic generation of the required input files is easier when running batches of simulations, more reproducible, and less prone to errors than manual creation.

The information needed to generate the proper input files can be divided into three types: information about individual actors, information about the actors’ initial relationships with one another, and information about the kinds of events that can occur in the simulation. Within bayesactR, this information is structured in a way that parallels that often used in social network analysis and agent based modeling. Information specific to individual actors is stored in a data frame structured like a nodelist. Information about relationships between actors is stored in a data frame structured like an edgelist. Finally, information about events is stored in a sequentially ordered data frame, and this is information is conceptually similar to algorithms that define action in agent-based models.

Actor nodelist

The actor nodelist contains information that is specific to each actor in a simulation. Specifically, we need to know which sentiment dictionaries and equations represent each actor’s understanding of the world, and (optionally) we can define parameter values that control how they manage uncertainty. bayesactR pairs with actdata to make specifying dictionaries and equations easy.

The recommended way to create the nodelist is to generate it using a pair of provided functions blank_nodelist() and add_actor().

Dictionaries

Each actor needs four dictionaries representing (a) meanings of identities they assign to themselves, (b) meanings they assign to behaviors, (c) meanings of identities they assign to their interaction partners, and (4) meanings assigned to modifiers and emotions. These dictionaries can be provided in one of three ways (mixing and matching is allowed):

Using actdata data set keys. If you are working with publicly available ACT sentiment dictionaries and equation sets, you can specify dictionaries using keywords from the actdata package. This package is a repository for standardized version of many publicly available ACT sentiment dictionaries and equation data sets, and it and bayesactR were developed to complement each other. If using dictionaries and/or equations from actdata, just provide the applicable keyword as the dict argument. If you wish to use different dictionaries for the four components, provide four dictionary keys in a list (eg, c(“uga2015”, “nc1978”, “uga2015”, “uga2015”)). To see information about available data sets and group subsets, see the actdata help pages on dictionaries or call actdata::dict_info().
Using data frame objects. Dictionaries can be provided as data frame objects. This is particularly useful when you wish to use a subset of terms from a public dictionary–for example, perhaps you only want your agents to be able to take a limited set of behaviors, or identities from just one institution, rather than having access to the whole list. The actdata::epa_subset() function within actdata makes creating subsets from public data straightforward. You can also provide your own data as a properly formatted data frame–see actdata::format_for_bayesact(), which checks and fixes this formatting for you. You can either provide a single data frame with a column titled “component” that is used to determine which rows apply to which of the four categories above, or you can provide a list of four data frames in the order above (use list() rather than c() to create this list).
Using file paths. Finally, you may provide a filepath to the dictionary files. These must already be properly formatted for BayesACT–bayesactR does not do any checking or reformatting of them; it simply passes the files directly to the C code.

Below is an example showing the syntax for specifying dictionaries using data set keys. Further down the page there is an example using data frame objects that are subsets of public dictionaries.

library(bayesactR)

# blank_nodelist() creates an empty data frame with the correct column labels
nodelist <- blank_nodelist()

# add_actor() appends a line representing an actor to this data frame. If dictionaries, equations, or dict/eqn stats or groups are not specified, they will revert to defaults. 
nodelist <- add_actor(nodelist, 
                      name = "Ingrid", 
                      dicts = "germany2007", 
                      eqns = "germany2007", eqns_group = "all")

# To add another actor, use add_actor() again. Different parameter values can be specified for each actor.
# For Felix we use the actdata keyword for the Germany 2007 sentiment dictionary and equations, and we use the values collected from men.
nodelist <- add_actor(nodelist, 
                      name = "Felix", 
                      dicts = "germany2007", 
                      dict_group = "male", 
                      eqns = "germany2007", eqns_group = "all", 
                      alphas = 1)

knitr::kable(nodelist)

name	dict	dict_stat	dict_group	eqns	eqns_group	alphas
Ingrid	germany2007, germany2007, germany2007, germany2007	mean	all	germany2007	all	NA
Felix	germany2007, germany2007, germany2007, germany2007	mean	male	germany2007	all	1

A note about dictionaries and cross-cultural interaction

When two actors are based in the same culture, it is reasonable to assign them the same set of dictionaries and equations. When two actors are from different cultures, we may want to instead assign them different dictionaries and equations. This is possible in BayesACT, but there is a caveat: the dictionaries for all actors (except for modifier dictionaries) must contain the same sets of words. If the set of terms differs between dictionaries, one agent will not be able to comprehend the action performed or identity assigned by the other, and BayesACT will crash. The recommended workaround is simply to subset each of the desired dictionaries so that each contains only the terms that are present in all others. The actdata::epa_subset() function in actdata makes this kind of manipulation reasonably straightforward.

Equations

In addition to dictionaries, each actor also needs two sets of equation coefficients: 1. Impression equation coefficients, which determine ideal elements of A-B-O(actor-behavior-object) events. In actdata, these are referred to as type “impressionabo”. 2. Emotion equation coefficients, which estimate emotional reactions to events. In actdata, these are referred to as type “emotionid”.

Similarly to dictionaries, these equations can be provided using actdata keys, data frames, or filepaths. If using an actdata key, also pay attention to the eqns_group argument of add_actor()–not all equation type-group combinations are available. Call eqn_info() to see what combinations are valid.

Call ?add_actor() for more details on creating the nodelist.

Other parameters

The nodelist is also where optional parameters that control how actors manage uncertainty and strange situations can be adjusted. These parameters are alpha, beta, and delta. See ?add_actor() for more details.

An example:

In this example, we say that Sally is American and so uses the meanings in one of the recent U.S. dictionaries. We say Reem is Egyptian and uses meanings from Egyptian dictionaries. We subset both the U.S. and Egypt dictionaries to contain the same set of identities and behaviors.

BayesACT takes into account uncertainty around identity meanings. This can be represented by an arbitrary constant around mean values or standard deviation or covariance information calculated from EPA measurement data. Most older public datasets contain only mean values, but more recent data collections (2015 or newer) also contain standard deviation and covariance information. In this example we use two recent datasets and we perform the BayesACT simulation using the covariance information they contain.

egypt_identity <- actdata::epa_subset(dataset = "egypt2015", 
                                      component = "identity", 
                                      group = "all", 
                                      stat = c("mean", "cov")) %>% 
  dplyr::semi_join(actdata::epa_subset(dataset = "usfullsurveyor2015", 
                                       component = "identity", 
                                       group = "all"), 
                   by = "term")

us_identity <- actdata::epa_subset(dataset = "usfullsurveyor2015", 
                                   component = "identity", 
                                   group = "all", 
                                   stat = c("mean", "cov")) %>% 
  dplyr::semi_join(actdata::epa_subset(dataset = "egypt2015", 
                                       component = "identity", 
                                       group = "all"), 
                   by = "term")

egypt_behavior <- actdata::epa_subset(dataset = "egypt2015", 
                                      component = "behavior", 
                                      group = "all", 
                                      stat = c("mean", "cov")) %>% 
  dplyr::semi_join(actdata::epa_subset(dataset = "usfullsurveyor2015", 
                                       component = "behavior", 
                                       group = "all"), 
                   by = "term")

us_behavior <- actdata::epa_subset(dataset = "usfullsurveyor2015", 
                                   component = "behavior", 
                                   group = "all", 
                                   stat = c("mean", "cov")) %>% 
  dplyr::semi_join(actdata::epa_subset(dataset = "egypt2015", 
                                       component = "behavior", 
                                       group = "all"), 
                   by = "term")

head(egypt_identity)
#> # A tibble: 6 × 19
#>   term  component dataset context year  group instcodes     E     P     A cov_EE
#>   <chr> <chr>     <chr>   <chr>   <chr> <chr> <chr>     <dbl> <dbl> <dbl>  <dbl>
#> 1 abor… identity  egypt2… Egypt   2015  all   11 00000… -1.89 -0.58  0.88   5.99
#> 2 adol… identity  egypt2… Egypt   2015  all   11 10000…  0.17  0.06  0.6    4.45
#> 3 adult identity  egypt2… Egypt   2015  all   11 10000…  1.37  0.94  0.02   3.31
#> 4 adul… identity  egypt2… Egypt   2015  all   10 00000… -2.99 -1.88  2.26   4.26
#> 5 adul… identity  egypt2… Egypt   2015  all   01 00000… -3.44 -2.19  2.14   2.3 
#> 6 air_… identity  egypt2… Egypt   2015  all   11 00010…  2.17  1.95  0.14   3.51
#> # ℹ 8 more variables: cov_EP <dbl>, cov_EA <dbl>, cov_PE <dbl>, cov_PP <dbl>,
#> #   cov_PA <dbl>, cov_AE <dbl>, cov_AP <dbl>, cov_AA <dbl>

# We can provide these data frames by passing a list to the dict argument. 
# The order is c(agent_identity, agent_behavior, object_identity, agent_emotion).
# Modifier term sets do not have to match, so instead of going to the trouble of creating modifier subsets, 
# here we pass the actdata dataset key for the modifier slot instead.
nodelist <- blank_nodelist()
nodelist <- add_actor(nodelist, 
                      name = "Sally", 
                      dicts = list(us_identity, us_behavior, us_identity, "usfullsurveyor2015"))
# Reem also uses the Egyptian equations (the default, which Sally uses, is us2010).
nodelist <- add_actor(nodelist, 
                      name = "Reem", 
                      dicts = list(egypt_identity, egypt_behavior, egypt_identity, "egypt2015"), 
                      eqns = "egypt2014", eqns_group = c("all", "f"), 
                      alphas = 1)

Interaction edgelist

The interaction edgelist contains information that defines relationships between actors–in particular, the identities they will ascribe to themselves and their alter at the outset of an interaction. The process for creating this edgelist is very similar to that for creating the nodelist–call ?add_interaction() for more details.

# creates a blank data frame with the correct column names
edgelist <- blank_edgelist()

# Note that interactions are directed--how one actor views herself does not necessary match how her partner views her. 
# The focal actor is referred to as the agent, and the partner is referred to as the object or client. 
# Sally views herself as a teacher and Reem as a student when they interact.
edgelist <- add_interaction(edgelist, 
                            agent = "Sally", object = "Reem", 
                            agent_ident = "teacher", agent_ident_prob = 1, 
                            object_ident = "student", object_ident_prob = 1)

# When she interacts with Sally, Reem usually sees herself as a student (p = .9) but sometimes as a genius (p = .1). She usually sees Sally as a teacher (p - .85) but occasionally as a bore (p = .15). 
edgelist <- add_interaction(edgelist, 
                            agent = "Reem", object = "Sally", 
                            agent_ident = c("student", "genius"), agent_ident_prob = c(.9, .1), 
                            object_ident = c("teacher", "bore"), object_ident_prob = c(.85, .15))

knitr::kable(edgelist)

agent	object	agent_ident	agent_ident_prob	object_ident	object_ident_prob
Sally	Reem	teacher	1	student	1
Reem	Sally	student, genius	0.9, 0.1	teacher, bore	0.85, 0.15

Event list

The last piece of input information is the event list. The event list is a data frame that has one line per turn in the simulation and defines who can act and what they can do on that turn.

bayesactR provides a function, basic_event_df() for generating relatively simple events files in which actors take either the BayesACT-optimal action, the interact-optimal action, or a specific action from the dictionary on each of their turns. Actors must switch off on some regular interval.

It is possible to use more complex events files (see the BayesACT C package documentation), but this simple structure should suffice for many applications. It can also be used as a base from which to build more complex specifications.

The contents of this file are more cryptic than the nodelist and edgelist. Each row represents one turn in the simulation. On each turn, one or both actors will have an entry in their action column–this is what they will do on that turn. Similarly, neither, one or both may have an entry in their emotion column–this is the emotion they will express. Asterisks in these columns indicate that the action or emotion will be the one that is optimal (least deflecting) according to BayesACT. Exclamation points mean that it is optimal according to affect control theory (Interact). A plus sign after the entry indicates that a small amount of noise will be added to the other party’s perception of the action or emotion.

# Sally and Reem will take 10 turns using the default specifications: bayesact optimal actions and no emotion expression. A small amount of noise will be added to each person's perception of the other's action--this means that there is a chance actions will be misinterpreted by the observing party. 
eventlist <- basic_event_df(n = 10, 
                            actors = c("Sally", "Reem"), 
                            noise = c("a1_action", "a2_action"))

knitr::kable(eventlist)

agent	agent_action	object
Sally	*+	Reem
Reem	*+	Sally
Sally	*+	Reem
Reem	*+	Sally
Sally	*+	Reem
Reem	*+	Sally
Sally	*+	Reem
Reem	*+	Sally
Sally	*+	Reem
Reem	*+	Sally

Writing input data frames to file

Now that we have created the three data frames, we need to generate the text files that the BayesACT C code takes as input. bayesactR provides the write_input_from_df() function for this purpose. In addition to the three dataframes, this function needs file names for the two text files that it will write. The sim file contains the actor and interaction information, and must have the extension .txt. The event file contains the event information and must have the extension .events. This function also requires the filepath for the directory that houses the BayesACT C package on your machine (see the section on downloading and installing the BayesACT C package below). By default, the function will put the text files it generates in a directory called “bayesact_input” that lives under your current working directory. If you want the files to be saved to a different directory, provide a filepath to the input_dir argument. This function returns the filepath at which it saved the input files.

write_input_from_df(nodelist, edgelist, eventlist, 
                    simfilename = "readme_simfile.txt", 
                    eventfilename = "readme_eventfile.events", 
                    bayesact_dir = "/path/to/my/bayesact/Cpackage/top/level/directory/")

Running BayesACT using bayesactR

The function used to run simulations is run_bayesact(). It requires the file name we gave the input sim file, the path to the top level directory of the Bayesact C package, the path to the directory where the input files were saved (if something other than “bayesact_input” under the working directory, which is the default), and the path where the output should be saved (“bayesact_output” under the current working directory is the default).

This will probably take a minute or two to run (longer if there are more events).

run_bayesact(simfilename = "readme_simfile.txt", 
             bayesact_dir = "/path/to/my/bayesact/Cpackage/top/level/directory/")

Batches of simulations

Now that we know how to set up, run, and analyze results from a single situation, we can think about how to efficiently scale up. It is likely that many BayesACT applications will require running simulations over a range of parameter settings, making running batches in a replicable way very useful.

The functions described above are all amenable to being run inside of loops. Notably, it may not be necessary to recreate every dataframe for every run. For example, if a user wants to run a number of simulations where the probability that the actor takes each of their possible identities (agent_ident_prob in the interaction edgelist) varies, only the edgelist needs to be modified. The nodelist and events file can be created once and passed to write_input_from_df() repeatedly.

# What happens if the probability that Reem sees Sally as a teacher versus a bore varies? Let's run 5 simulations with different values of object_ident_prob to find out.
# This is defined in the edgelist, so the nodelist and events files can just be created once. This code is the same as used above. 

nodelist <- blank_nodelist()
nodelist <- add_actor(nodelist, 
                      name = "Sally", 
                      dicts = list(us_identity, us_behavior, us_identity, "usfullsurveyor2015"))
nodelist <- add_actor(nodelist, 
                      name = "Reem", 
                      dicts = list(egypt_identity, egypt_behavior, egypt_identity, "egypt2015"), 
                      eqns = "egypt2014", eqns_group = c("all", "f"), 
                      alphas = 1)

eventlist <- basic_event_df(n = 6, 
                            actors = c("Sally", "Reem"), 
                            noise = c("a1_action", "a2_action"))

# We need to create a different edgelist for each of the simulations and write out different input files for each as well. We will do this in a for loop. 
# the list of probabilities of Reem seeing Sally as a teacher that we will loop over
p_teacher = seq(.05, .95, .2)

for(i in 1:5){
  this_p_teacher <- p_teacher[i]
  # the probabilities of all identities combined must always sum to 1
  this_p_bore <- 1 - this_p_teacher
  
  
  edgelist <- blank_edgelist()
  edgelist <- add_interaction(edgelist, 
                              agent = "Sally", object = "Reem", 
                              agent_ident = "teacher", agent_ident_prob = 1, 
                              object_ident = "student", object_ident_prob = 1)
  edgelist <- add_interaction(edgelist, 
                              agent = "Reem", object = "Sally", 
                              agent_ident = c("student", "genius"), agent_ident_prob = c(.9, .1), 
                              object_ident = c("teacher", "bore"), object_ident_prob = c(this_p_teacher, this_p_bore))
  
  # each simulation file needs a different name
  simname <- paste0("readme_simfile_batch_", i, ".txt")
  
  # write out input files
  write_input_from_df(nodelist, edgelist, eventlist, 
                      simfilename = simname, 
                      eventfilename = "readme_eventfile_batch.events", 
                      bayesact_dir = "/path/to/my/bayesact/Cpackage/top/level/directory/")
  
  # run the current simulation
  run_bayesact(simname, bayesact_dir = "/path/to/my/bayesact/Cpackage/top/level/directory/")
}