Chapter 4 User Guide

This chapter provides detailed guidance on running simulations, configuring parameters, and interpreting results.

4.1 Running Your First Simulation

4.1.1 Basic Simulation

The simplest way to run a simulation uses all default parameters:

# Load the simulation engine
source("simulation/engine.R")

# Run with defaults
results <- run_asa_simulation()

4.1.2 Understanding the Output

The simulation returns a list with four components:

results$final_organization  # Final state data.table
results$metrics            # Time series metrics
results$parameters         # Parameters used
results$organization_snapshots  # Periodic snapshots

4.2 Simulation Parameters

4.2.1 Overview of All Parameters

Parameter Type Default Description
identity_categories character vector c(“A”,“B”,“C”,“D”,“E”) Possible identity categories
growth_rate numeric 0.01 Proportion to hire each cycle
hiring_frequency integer 12 Steps between hiring cycles
selection_criteria character “conscientiousness” How to select hires
n_interactions_per_step integer 5 Interactions per agent per step
interaction_window integer 10 Steps to consider for satisfaction
turnover_threshold numeric -10 Satisfaction threshold for leaving
turnover_type character “threshold” Type of turnover model
base_turnover_rate numeric 0.05 Base probability of leaving
n_new_applicants integer 50 New applicants per hiring cycle
applicant_attraction_threshold numeric -0.5 Min attraction to stay in pool
max_application_time integer 12 Steps before application expires

4.2.2 Detailed Parameter Guide

4.2.2.1 Identity Categories

Controls the types of identities agents can have:

# Default categories (alphabetical labels)
params <- list(identity_categories = c("A", "B", "C", "D", "E"))

# Custom categories example (e.g., departments) - this is a customization
# Note: The default system uses alphabetical labels (A-E)
params <- list(identity_categories = c("Engineering", "Sales", 
                                      "Marketing", "Operations"))

4.2.2.2 Growth and Hiring

Configure organizational growth:

params <- list(
  growth_rate = 0.02,        # 2% growth per cycle
  hiring_frequency = 4,      # Hire every 4 steps
  n_new_applicants = 100,    # Large applicant pool
  selection_criteria = "fit" # Select based on fit
)

Selection criteria options: - "conscientiousness": Highest conscientiousness scores - "fit": Best person-organization fit - "random": Random selection (baseline)

4.2.2.3 Interaction Settings

Control how agents interact:

params <- list(
  n_interactions_per_step = 10,  # More interactions
  interaction_window = 20        # Longer memory
)

4.2.2.4 Turnover Configuration

Two turnover models available:

Threshold Model:

params <- list(
  turnover_type = "threshold",
  turnover_threshold = -5  # Leave if satisfaction < -5
)

Probabilistic Model:

params <- list(
  turnover_type = "probabilistic",
  base_turnover_rate = 0.10  # 10% base turnover
)

4.3 Common Simulation Scenarios

4.3.1 Scenario 1: High-Growth Startup

startup_params <- list(
  growth_rate = 0.10,           # 10% growth per month
  hiring_frequency = 4,         # Weekly hiring
  selection_criteria = "fit",   # Culture fit important
  turnover_threshold = -3,      # Low tolerance for dissatisfaction
  n_new_applicants = 200        # Large applicant pool
)

results <- run_asa_simulation(
  n_steps = 260,
  initial_size = 20,
  params = startup_params
)

4.3.2 Scenario 2: Stable Corporation

corp_params <- list(
  growth_rate = 0.005,          # 0.5% growth per quarter
  hiring_frequency = 12,        # Monthly hiring
  selection_criteria = "conscientiousness",
  turnover_type = "probabilistic",
  base_turnover_rate = 0.02    # 2% monthly turnover
)

results <- run_asa_simulation(
  n_steps = 520,
  initial_size = 500,
  params = corp_params
)

4.3.3 Scenario 3: Diversity-Focused Organization

diversity_params <- list(
  growth_rate = 0.02,
  selection_criteria = "random",  # Reduce selection bias
  n_interactions_per_step = 20,   # Increase mixing
  interaction_window = 30         # Longer relationship building
)

# Also modify agent preferences
# (Requires custom initialization - see Developer Guide)

4.4 Analyzing Results

4.4.1 Time Series Analysis

library(ggplot2)
library(dplyr)

# Calculate moving averages
results$metrics %>%
  mutate(
    ma_satisfaction = zoo::rollmean(avg_satisfaction, 10, fill = NA),
    ma_size = zoo::rollmean(size, 10, fill = NA)
  ) %>%
  ggplot(aes(x = time)) +
  geom_line(aes(y = avg_satisfaction), alpha = 0.3) +
  geom_line(aes(y = ma_satisfaction), color = "blue", size = 1)

4.4.2 Identity Dynamics

# Extract identity proportions over time
identity_props <- results$organization_snapshots %>%
  lapply(function(snapshot) {
    snapshot[is_active == TRUE, .N, by = identity_category] %>%
      mutate(prop = N / sum(N), 
             time = snapshot$time[1])
  }) %>%
  bind_rows()

# Plot identity evolution
ggplot(identity_props, aes(x = time, y = prop, color = identity_category)) +
  geom_line(size = 1) +
  labs(title = "Identity Category Evolution",
       y = "Proportion")

4.4.3 Turnover Analysis

# Calculate turnover rates
turnover_analysis <- results$metrics %>%
  mutate(
    period = floor(time / 12),  # Monthly periods
    employees_start = lag(size, default = 100)
  ) %>%
  group_by(period) %>%
  summarise(
    turnover_count = sum(employees_start - size + lag(size - employees_start)),
    avg_size = mean(size),
    turnover_rate = turnover_count / avg_size
  )

4.5 Saving and Loading Results

4.5.1 Saving Simulation Output

# Save with automatic file naming
save_simulation_results(results, "my_simulation")

# Creates:
# - my_simulation_metrics.csv
# - my_simulation_params.rds
# - my_simulation_final_org.csv
# - my_simulation_snapshots.rds (if requested)

4.5.2 Loading Previous Results

# Load saved results
metrics <- fread("my_simulation_metrics.csv")
params <- readRDS("my_simulation_params.rds")
final_org <- fread("my_simulation_final_org.csv")

# Recreate results object
results <- list(
  metrics = metrics,
  parameters = params,
  final_organization = final_org
)

4.6 Batch Simulations

4.6.1 Parameter Sweeps

# Define parameter grid
param_grid <- expand.grid(
  growth_rate = c(0.01, 0.02, 0.05),
  turnover_threshold = c(-10, -5, -2),
  selection_criteria = c("conscientiousness", "fit", "random")
)

# Run simulations
all_results <- list()
for(i in 1:nrow(param_grid)) {
  params <- as.list(param_grid[i,])
  
  results <- run_asa_simulation(
    n_steps = 260,
    initial_size = 100,
    params = params,
    verbose = FALSE
  )
  
  all_results[[i]] <- results$metrics %>%
    mutate(
      growth_rate = params$growth_rate,
      turnover_threshold = params$turnover_threshold,
      selection_criteria = params$selection_criteria,
      run_id = i
    )
}

# Combine results
combined_results <- bind_rows(all_results)

4.6.2 Replication Studies

# Run multiple replications
n_replications <- 10
replications <- list()

for(rep in 1:n_replications) {
  set.seed(rep)  # Different random seed
  
  results <- run_asa_simulation(
    n_steps = 260,
    initial_size = 100,
    params = my_params
  )
  
  replications[[rep]] <- results$metrics %>%
    mutate(replication = rep)
}

# Analyze variance across replications
bind_rows(replications) %>%
  group_by(time) %>%
  summarise(
    mean_size = mean(size),
    sd_size = sd(size),
    mean_satisfaction = mean(avg_satisfaction),
    sd_satisfaction = sd(avg_satisfaction)
  )

4.7 Performance Tips

4.7.1 Memory Management

# For large simulations, reduce snapshot frequency
results <- run_asa_simulation(
  n_steps = 1000,
  initial_size = 5000,
  params = list(
    snapshot_frequency = 50  # Only save every 50 steps
  )
)

# Clear memory between runs
rm(results)
gc()

4.7.2 Speed Optimization

# Reduce interaction frequency for faster runs
fast_params <- list(
  n_interactions_per_step = 2,  # Fewer interactions
  interaction_window = 5        # Shorter memory
)

# Profile simulation performance
library(profvis)
profvis({
  results <- run_asa_simulation(n_steps = 100)
})

4.8 Troubleshooting

4.8.1 Common Issues

No hiring occurring: - Check growth_rate > 0 - Verify hiring_frequency aligns with n_steps - Ensure applicant pool is not empty

Rapid organization collapse: - Increase turnover_threshold (less negative) - Reduce base_turnover_rate - Check satisfaction calculations

Unrealistic homogenization: - Increase n_interactions_per_step - Use selection_criteria = "random" - Verify diversity preferences

4.8.2 Debugging Tools

# Enable detailed logging
debug_results <- run_asa_simulation(
  n_steps = 20,
  initial_size = 10,
  verbose = TRUE,
  params = list(debug = TRUE)
)

# Inspect specific time points
time_10 <- results$organization_snapshots[[1]]
summary(time_10)

4.9 Metrics Deep Dive

Understanding the metrics output is crucial for interpreting simulation results. This section provides detailed explanations of each metric, their calculations, and what they reveal about organizational dynamics.

4.9.1 Overview of Metrics

The simulation tracks over 20 metrics at each time step, grouped into several categories:

  1. Organizational Composition: Size and identity distribution
  2. Diversity Indices: Multiple measures of heterogeneity
  3. Personality Distributions: Big Five trait statistics
  4. Satisfaction Metrics: Employee well-being indicators

4.9.2 Identity and Diversity Metrics

4.9.2.1 Blau’s Index (Default)

# Formula: 1 - Σ(p_i^2)
# Where p_i is the proportion of category i
  • Range: 0 (homogeneous) to 0.8 (maximum diversity with 5 categories)
  • Interpretation: Probability two randomly selected employees differ in identity
  • Why it matters: Standard I-O psychology metric for categorical diversity
  • Example: 0.75 indicates high diversity; 0.25 indicates one dominant group

4.9.2.2 Shannon Entropy

# Formula: -Σ(p_i * log(p_i))
# Where p_i is the proportion of category i
  • Range: 0 (homogeneous) to log(5) ≈ 1.61 (equal distribution)
  • Interpretation: Information-theoretic measure of uncertainty
  • Why it matters: More sensitive to rare categories than Blau’s
  • Example: 1.5 indicates near-equal distribution; 0.5 indicates strong dominance

4.9.2.3 Category Proportions (prop_A through prop_E)

  • Range: 0 to 1 for each category
  • Interpretation: Fraction of employees in each identity category
  • Why it matters: Direct view of organizational composition
  • Patterns to watch:
    • Gradual drift toward homogeneity
    • Sudden shifts after mass turnover
    • Equilibrium distributions

4.9.3 Personality Trait Metrics

For each Big Five trait, the simulation tracks:

4.9.3.1 Average Values (avg_openness, etc.)

  • Range: 0 to 1
  • Interpretation: Mean trait level in the organization
  • Organizational implications:
    • Openness: Innovation potential, change readiness
    • Conscientiousness: Reliability, performance orientation
    • Extraversion: Communication patterns, collaboration
    • Agreeableness: Conflict levels, team cohesion
    • Emotional Stability: Stress resistance, turnover risk

4.9.3.2 Standard Deviations (sd_openness, etc.)

  • Range: 0 to ~0.5 (theoretical max)
  • Interpretation: Trait heterogeneity in the organization
  • Why it matters:
    • Low SD indicates cultural convergence
    • High SD suggests diverse perspectives
    • Zero SD means complete homogenization

4.9.4 Satisfaction Metrics

4.9.4.1 Average Satisfaction (avg_satisfaction)

  • Range: Typically -20 to +20
  • Interpretation: Overall employee well-being
  • Key thresholds:
    • Above 0: Generally positive environment
    • Below -5: Risk of increased turnover
    • Below -10: Crisis level (default turnover threshold)

4.9.4.2 Satisfaction Standard Deviation (sd_satisfaction)

  • Range: 0 to ~10
  • Interpretation: Variation in employee experiences
  • Warning signs:
    • High SD with low average: Polarized organization
    • Increasing SD: Growing disparities
    • Very low SD: Possible groupthink

4.9.5 Interpreting Metric Interactions

4.9.5.1 The Diversity-Satisfaction Paradox

# Common pattern
plot(results$metrics$blau_index, results$metrics$avg_satisfaction)
  • High diversity often correlates with lower initial satisfaction
  • Homophily preferences drive this relationship
  • Long-term benefits may offset short-term costs

4.9.5.2 Personality Convergence Cascade

# Track convergence
convergence_rate <- diff(results$metrics$sd_conscientiousness)
  • Selection on one trait affects all traits
  • Convergence accelerates over time
  • Can lead to organizational blindspots

4.9.5.3 Turnover Spirals

# Identify spiral onset
turnover_indicator <- results$metrics$organization_size < 
                     lag(results$metrics$organization_size)
  • Low satisfaction → turnover → lower diversity → lower satisfaction
  • Critical to catch early
  • Intervention points: hiring strategy, satisfaction boost

4.9.6 Advanced Metric Analysis

4.9.6.1 Creating Composite Indices

# Organizational Health Index
results$metrics[, health_index := 
  0.3 * (avg_satisfaction + 10) / 20 +  # Normalized satisfaction
  0.3 * blau_index +                      # Diversity
  0.2 * (organization_size / initial_size) +  # Growth
  0.2 * (1 - sd_satisfaction / 10)       # Cohesion
]

4.9.6.2 Detecting Phase Transitions

# Find inflection points
library(changepoint)
cpt_diversity <- cpt.mean(results$metrics$blau_index)
plot(cpt_diversity)

4.9.6.3 Metric Stability Analysis

# Rolling window stability
window <- 26  # Half year
results$metrics[, `:=`(
  diversity_stability = frollapply(blau_index, window, sd),
  satisfaction_stability = frollapply(avg_satisfaction, window, sd)
)]

4.9.7 Visualization Best Practices

4.9.7.1 Multi-Metric Dashboard

library(ggplot2)
library(patchwork)

# Standardize metrics for comparison
results$metrics[, `:=`(
  std_diversity = scale(blau_index),
  std_satisfaction = scale(avg_satisfaction),
  std_size = scale(organization_size)
)]

# Create aligned time series
p_combined <- ggplot(results$metrics, aes(x = time_step)) +
  geom_line(aes(y = std_diversity, color = "Diversity")) +
  geom_line(aes(y = std_satisfaction, color = "Satisfaction")) +
  geom_line(aes(y = std_size, color = "Size")) +
  scale_color_manual(values = c("Diversity" = "purple", 
                               "Satisfaction" = "green",
                               "Size" = "blue")) +
  labs(title = "Standardized Organizational Metrics",
       y = "Standardized Value (z-score)",
       color = "Metric") +
  theme_minimal()

4.9.7.2 Phase Space Visualization

# 3D phase space
library(plotly)
plot_ly(results$metrics, 
        x = ~blau_index, 
        y = ~avg_satisfaction, 
        z = ~organization_size,
        type = "scatter3d",
        mode = "lines+markers",
        color = ~time_step,
        colors = "Viridis")

4.9.8 Common Misinterpretations to Avoid

  1. Correlation ≠ Causation: High diversity causing low satisfaction may be mediated by homophily preferences
  2. Snapshot Bias: Single time points miss dynamics - always examine trajectories
  3. Scale Sensitivity: Raw values less meaningful than trends and relative changes
  4. Metric Gaming: Optimizing one metric often degrades others
  5. Initial Condition Dependence: Early randomness can have lasting effects

4.9.9 Using Metrics for Model Validation

Compare simulation metrics to empirical data:

# Example validation checks
empirical_turnover_rate <- 0.15  # Annual
simulated_annual_turnover <- mean(diff(results$metrics$organization_size[seq(1, 260, 52)]) < 0)

empirical_diversity <- 0.65  # Blau's index from survey
simulated_diversity_range <- range(results$metrics$blau_index)

# Check if empirical values fall within simulated ranges
validation_passed <- empirical_diversity >= simulated_diversity_range[1] & 
                    empirical_diversity <= simulated_diversity_range[2]