Chapter 3 Architecture Overview
This chapter describes the technical architecture of the ASA ABM v2 system, its design principles, and component interactions.
3.1 Design Principles
3.1.1 1. Performance First
- Built on
data.table
for maximum performance in R - Vectorized operations wherever possible
- Efficient memory management
3.1.2 2. Modularity
- Clear separation of concerns
- Independent, testable components
- Easy to extend or replace modules
3.2 System Architecture
┌─────────────────────────────────────────────────────────┐
│ User Interface │
│ (R Scripts, Shiny Apps, etc.) │
└─────────────────────┬───────────────────────────────────┘
│
┌─────────────────────▼───────────────────────────────────┐
│ Simulation Engine │
│ (simulation/engine.R) │
├─────────────────────────────────────────────────────────┤
│ • Orchestrates simulation flow │
│ • Manages time steps │
│ • Collects metrics │
│ • Handles I/O │
└─────────┬───────────────────────────────┬───────────────┘
│ │
┌─────────▼──────────┐ ┌────────▼───────────┐
│ Core Modules │ │ Simulation Modules │
├────────────────────┤ ├────────────────────┤
│ • organization.R │ │ • hiring.R │
│ • agent.R │ │ • turnover.R │
│ • interactions.R │ │ • (future modules) │
└────────────────────┘ └────────────────────┘
3.3 Core Modules
3.3.1 organization.R
Manages the organization data structure and organizational-level operations.
Key Functions:
- create_organization()
: Initialize organization with agents
- calculate_identity_diversity()
: Compute diversity metrics
- get_organization_summary()
: Extract summary statistics
Data Structure:
Organization <- data.table(
agent_id # Unique identifier
identity_category # Categorical identity
openness # Big Five traits...
conscientiousness
extraversion
agreeableness
emotional_stability
diversity_preference # Preferences
homophily_preference
attraction # State variables
satisfaction
tenure
hire_date # Metadata
is_active
)
3.3.2 agent.R
Handles individual agents and applicant pools.
Key Functions:
- create_applicant_pool()
: Generate potential hires
- calculate_applicant_attraction()
: Compute org attraction
- applicants_to_employees()
: Convert hired applicants
Applicant Structure:
3.4 Simulation Modules
3.5 Data Flow
3.5.1 1. Initialization Phase
create_organization() → Initial agents
↓
initialize_interactions() → Empty interaction table
↓
create_applicant_pool() → Initial applicant pool
3.6 Performance Optimizations
3.8 Configuration Management
3.8.1 Parameter Structure
params <- list(
# Organization
identity_categories = c("A", "B", "C", "D", "E"),
# Hiring
growth_rate = 0.01,
hiring_frequency = 12,
selection_criteria = "conscientiousness",
# Interactions
n_interactions_per_step = 5,
interaction_window = 10,
# Turnover
turnover_type = "threshold",
turnover_threshold = -10,
# ... additional parameters
)
3.10 Testing Architecture
3.11 Architecture Decision Log
This section documents key architectural decisions, their rationale, and trade-offs considered.
3.11.1 ADR-001: Why data.table?
Decision: Use data.table as the primary data structure for agents and organizations.
Context: R offers several data frame implementations: base data.frame, tibble, and data.table.
Rationale: - Performance: data.table is 10-100x faster for large datasets - Memory efficiency: Reference semantics avoid copying - Syntax: Concise syntax for complex operations - Scalability: Handles millions of agents efficiently
Trade-offs: - Steeper learning curve than tidyverse - Less intuitive for R beginners - Non-standard evaluation can be confusing
Benchmarks:
3.11.2 ADR-002: Functional vs Object-Oriented Design
Decision: Use functional programming with immutable data structures (mostly).
Context: R supports both functional and OO (S3, S4, R6) paradigms.
Rationale: - Simplicity: Functions are easier to test and reason about - Parallelization: Pure functions enable future parallel processing - Debugging: Easier to trace data flow - R idioms: More aligned with R community practices
Trade-offs: - No encapsulation of agent state - More verbose for complex state management - Requires discipline to maintain purity
Exception: data.table’s reference semantics for performance-critical updates.
3.11.3 ADR-003: Simulation State Management
Decision: Pass complete state through functions rather than global variables.
Context: Need to track organization, interactions, and metrics across time steps.
Rationale:
- Testability: Each function can be tested in isolation
- Reproducibility: No hidden state affects results
- Clarity: Data flow is explicit
- Debugging: Can inspect state at any point
Implementation:
3.11.4 ADR-004: Vectorization Strategy
Decision: Vectorize all operations where possible, avoid explicit loops.
Context: R loops are slow; vectorized operations leverage C implementations.
Rationale: - Performance: 50-200x speedup for agent operations - Readability: Express operations on entire populations - R-native: Leverages R’s strengths
Example:
3.11.5 ADR-005: Module Boundaries
Decision: Separate by domain concepts, not technical layers.
Context: Could organize by technical role (data, logic, presentation) or domain.
Rationale: - Cohesion: Related functionality stays together - Understanding: Matches mental model of simulation - Extension: Easy to add new organizational concepts
Structure:
core/
agent.R # All agent-related functions
organization.R # All org-related functions
simulation/
hiring.R # Complete hiring process
turnover.R # Complete turnover process
3.11.6 ADR-006: Metric Calculation Timing
Decision: Calculate metrics after each time step, not on-demand.
Context: Could calculate metrics lazily when needed or eagerly each step.
Rationale: - Consistency: All metrics from same state - Performance: One pass through data per step - History: Complete time series available - Memory trade-off: Stores more data
3.11.7 ADR-007: Parameter Validation
Decision: Use checkmate for runtime parameter validation.
Context: Could use base R checks, custom validation, or external package.
Rationale: - Comprehensive: Rich set of check functions - Performance: Minimal overhead - Messages: Clear error messages for users - Consistency: Standard validation across codebase
3.11.8 ADR-008: Random Number Generation
Decision: Use R’s built-in RNG with explicit seed management.
Context: Reproducibility requires careful RNG handling.
Rationale: - Standard: Works with all R workflows - Reproducible: set.seed() ensures repeatability - Simple: No external dependencies
Best Practice:
3.11.9 ADR-009: Identity Categories
Decision: Use character vectors for identity categories, not factors.
Context: R traditionally used factors for categorical data.
Rationale: - Flexibility: Easy to add new categories - No surprises: Factors can have unexpected behavior - Performance: Character operations are fast in data.table - Simplicity: Fewer type conversions needed
3.11.10 ADR-010: Extension Mechanism
Decision: Prepare for extensions through function factories and strategy patterns.
Context: Need to support custom hiring strategies, metrics, etc.
Rationale: - Open/Closed: Extend without modifying core - User-friendly: Clear extension points - Type-safe: Functions validate at runtime
Example:
3.12 Design Philosophy Summary
- Performance First: Every decision considers large-scale simulations
- Functional Core: Pure functions with explicit data flow
- Domain-Driven: Structure mirrors organizational concepts
- Extension Ready: Clear points for customization
- R-Idiomatic: Leverages R’s vectorization strengths
These decisions create a system that is:
- Fast enough for million-agent simulations
- Simple enough for research modifications
- Robust enough for production use
- Clear enough for educational purposes