Skip to contents

Introduction

Edge mixture models can be used to detect different types of social connections by looking for clustering in edge weights. bisonR provides support for edge mixture models with the bison_mixture function. There are two key outputs from a bisonR edge mixture model. First it generates a posterior probability distribution over the number of components (clusters, interpreted associal connection types), from 1 to a maximum \(K\), over the entire network. Secondly, for each edge it calculates the probability that each dyad belongs to each possible component. To show how this all works, first we’ll load in bisonR and simulate some data:

library(bisonR)
#> Loading required package: cmdstanr
#> This is cmdstanr version 0.6.0
#> - CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
#> - CmdStan path: /home/runner/.cmdstan/cmdstan-2.32.2
#> - CmdStan version: 2.32.2
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(igraph)
#> 
#> Attaching package: 'igraph'
#> The following objects are masked from 'package:dplyr':
#> 
#>     as_data_frame, groups, union
#> The following objects are masked from 'package:stats':
#> 
#>     decompose, spectrum
#> The following object is masked from 'package:base':
#> 
#>     union
library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.19.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> 
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#> 
#>     ar
library(mclust)
#> Package 'mclust' version 6.0.0
#> Type 'citation("mclust")' for citing this R package in publications.
#> 
#> Attaching package: 'mclust'
#> The following object is masked from 'package:brms':
#> 
#>     me

In this example we simulate data from a binary edge model with 2 underlying clusters (also called components), with dyads belong to either cluster with equal probability:

sim_data <- simulate_bison_model_mixture("binary", num_components = 2, component_weights = c(0.5, 0.5))
df <- sim_data$df_sim
head(df)
#> # A tibble: 6 × 5
#> # Groups:   node_1_id [1]
#>   node_1_id node_2_id event duration age_diff
#>   <fct>     <fct>     <dbl>    <dbl>    <dbl>
#> 1 1         2             6        6        1
#> 2 1         3             0       12        1
#> 3 1         4             3        4        1
#> 4 1         5            10       11        1
#> 5 1         6            14       16        1
#> 6 1         7             2       11        1

Since the data are already aggregated (e.g. a single row for each dyad), we can use the binary conjugate model for speed:

fit_edge <- bison_model(
  (event | duration) ~ dyad(node_1_id, node_2_id), 
  data=df, 
  model_type="binary_conjugate"
)
#> No priors set by user, using default priors instead. We recommend setting and checking priors explicitly for reliable inference.

We won’t conduct the usual checks for the sake of this tutorial, but at this point we’d recommend conducting diagnostic checks to ensure model fit. Once the model has fit well, we can use the bison_mixture() function to fit the edge mixture model:

# verbose=FALSE for tutorial purposes
fit_mixture <- bison_mixture(fit_edge, num_components=5, verbose=FALSE)
summary(fit_mixture)
#> === Fitted dyadic mixture model ===
#> Maximum number of components: 
#> Best model: 1 components
#> Probability of best model: 87.8%
#> === Component probabilities ===
#>            1    2     3 4 5
#> P(K=k) 0.878 0.11 0.012 0 0
#> === Component means for best model (K = 1) ===
#>              1
#> mean 0.5339477
#> === Edge component probabilities for best model (K = 1) ===
#>         1
#> 1 <-> 2 1
#> 1 <-> 3 1
#> 1 <-> 4 1
#> 1 <-> 5 1
#> 1 <-> 6 1
#> 1 <-> 7 1
#> ...

There’s quite a bit going on in the summary, so let’s break it down. The top section shows the model that fits best and the probability assigned to that model. The second section shows the probabilities of the models corresponding to each number of components (\(K = 1, 2, 3, ...\)). You’ll notice the model with the highest probability is the same as the best fitting model from the section above. The final section shows the edge-level probabilities of component membership for the corresponding dyad. This can be interpreted as the probability a dyad belongs to a given component/cluster/social connection type.

The information shown in the summary above can also be accessed using the functions get_network_component_probabilities() and get_edge_component_probabilities(). These two functions make it possible to access the mixture model output programmatically for downstream analysis or plotting.

To get network-level probabilities of the number of components, we can use get_network_component_probabilities() function:

get_network_component_probabilities(fit_mixture)
#>   num_components  probability
#> 1              1 8.782013e-01
#> 2              2 1.097848e-01
#> 3              3 1.184975e-02
#> 4              4 1.586454e-04
#> 5              5 5.507505e-06

To use probabilities over component membership for a given number of components, we can use the get_edge_component_probabilities() function, where 3 is the number of components we assume to exist in the network:

get_edge_component_probabilities(fit_mixture, 3)
#>     node_1 node_2 P(K = 1) P(K = 2) P(K = 3)
#> 1        1      2    0.005    0.295    0.700
#> 2        1      3    0.925    0.070    0.005
#> 3        1      4    0.125    0.440    0.435
#> 4        1      5    0.005    0.370    0.625
#> 5        1      6    0.005    0.410    0.585
#> 6        1      7    0.825    0.155    0.020
#> 7        1      8    0.085    0.370    0.545
#> 8        1      9    0.015    0.360    0.625
#> 9        1     10    0.010    0.325    0.665
#> 10       1     11    0.870    0.115    0.015
#> 11       1     12    0.550    0.315    0.135
#> 12       1     13    0.900    0.095    0.005
#> 13       1     14    0.005    0.280    0.715
#> 14       1     15    0.910    0.080    0.010
#> 15       2      3    0.740    0.210    0.050
#> 16       2      4    0.870    0.115    0.015
#> 17       2      5    0.795    0.185    0.020
#> 18       2      6    0.905    0.090    0.005
#> 19       2      7    0.135    0.515    0.350
#> 20       2      8    0.910    0.085    0.005
#> 21       2      9    0.895    0.105    0.000
#> 22       2     10    0.875    0.120    0.005
#> 23       2     11    0.005    0.375    0.620
#> 24       2     12    0.000    0.245    0.755
#> 25       2     13    0.880    0.110    0.010
#> 26       2     14    0.000    0.310    0.690
#> 27       2     15    0.000    0.205    0.795
#> 28       3      4    0.900    0.090    0.010
#> 29       3      5    0.010    0.225    0.765
#> 30       3      6    0.005    0.380    0.615
#> 31       3      7    0.005    0.260    0.735
#> 32       3      8    0.050    0.455    0.495
#> 33       3      9    0.005    0.235    0.760
#> 34       3     10    0.005    0.225    0.770
#> 35       3     11    0.885    0.105    0.010
#> 36       3     12    0.830    0.155    0.015
#> 37       3     13    0.000    0.280    0.720
#> 38       3     14    0.375    0.395    0.230
#> 39       3     15    0.860    0.130    0.010
#> 40       4      5    0.850    0.150    0.000
#> 41       4      6    0.820    0.155    0.025
#> 42       4      7    0.885    0.105    0.010
#> 43       4      8    0.105    0.490    0.405
#> 44       4      9    0.005    0.240    0.755
#> 45       4     10    0.895    0.100    0.005
#> 46       4     11    0.005    0.235    0.760
#> 47       4     12    0.005    0.275    0.720
#> 48       4     13    0.870    0.120    0.010
#> 49       4     14    0.820    0.170    0.010
#> 50       4     15    0.920    0.065    0.015
#> 51       5      6    0.020    0.400    0.580
#> 52       5      7    0.010    0.285    0.705
#> 53       5      8    0.005    0.440    0.555
#> 54       5      9    0.865    0.130    0.005
#> 55       5     10    0.020    0.460    0.520
#> 56       5     11    0.900    0.100    0.000
#> 57       5     12    0.035    0.375    0.590
#> 58       5     13    0.005    0.220    0.775
#> 59       5     14    0.005    0.435    0.560
#> 60       5     15    0.795    0.190    0.015
#> 61       6      7    0.125    0.480    0.395
#> 62       6      8    0.880    0.110    0.010
#> 63       6      9    0.700    0.220    0.080
#> 64       6     10    0.705    0.230    0.065
#> 65       6     11    0.595    0.295    0.110
#> 66       6     12    0.875    0.120    0.005
#> 67       6     13    0.550    0.295    0.155
#> 68       6     14    0.700    0.225    0.075
#> 69       6     15    0.125    0.355    0.520
#> 70       7      8    0.010    0.180    0.810
#> 71       7      9    0.000    0.375    0.625
#> 72       7     10    0.000    0.290    0.710
#> 73       7     11    0.005    0.355    0.640
#> 74       7     12    0.005    0.320    0.675
#> 75       7     13    0.010    0.190    0.800
#> 76       7     14    0.005    0.205    0.790
#> 77       7     15    0.880    0.115    0.005
#> 78       8      9    0.770    0.205    0.025
#> 79       8     10    0.055    0.425    0.520
#> 80       8     11    0.000    0.355    0.645
#> 81       8     12    0.215    0.430    0.355
#> 82       8     13    0.020    0.360    0.620
#> 83       8     14    0.010    0.255    0.735
#> 84       8     15    0.000    0.295    0.705
#> 85       9     10    0.895    0.100    0.005
#> 86       9     11    0.165    0.385    0.450
#> 87       9     12    0.885    0.105    0.010
#> 88       9     13    0.875    0.115    0.010
#> 89       9     14    0.000    0.330    0.670
#> 90       9     15    0.165    0.365    0.470
#> 91      10     11    0.015    0.365    0.620
#> 92      10     12    0.040    0.390    0.570
#> 93      10     13    0.715    0.235    0.050
#> 94      10     14    0.015    0.395    0.590
#> 95      10     15    0.045    0.340    0.615
#> 96      11     12    0.010    0.395    0.595
#> 97      11     13    0.855    0.140    0.005
#> 98      11     14    0.630    0.270    0.100
#> 99      11     15    0.835    0.150    0.015
#> 100     12     13    0.880    0.120    0.000
#> 101     12     14    0.895    0.105    0.000
#> 102     12     15    0.880    0.120    0.000
#> 103     13     14    0.050    0.365    0.585
#> 104     13     15    0.870    0.120    0.010
#> 105     14     15    0.000    0.325    0.675

It’s often useful to know where the mixture components are location, so we can get their posterior means using the get_component_means() function for a given number of components. Here we’ll calculate the posterior means of the components for the mixture model with 3 components:

get_component_means(fit_mixture, 3)
#>             50%         5%       95%
#> K = 1 0.0963855 0.01645787 0.1593153
#> K = 2 0.8115574 0.14686768 0.8852002
#> K = 3 0.9064475 0.82612172 0.9964323

Using components in downstream analysis

Now say that we’re interested in the number of strong vs weak partners. Specifically we want to know if the strength of connections only to strong partners predicts a node-level trait. This is a non-standard analysis so will require some manual data wrangling to fit the right kind of model.

For demonstration purposes, we’ll create a dataframe that might be used for this kind of analysis:

df_nodal <- data.frame(node_id=as.factor(levels(df$node_1_id)), trait=rnorm(15))
df_nodal
#>    node_id         trait
#> 1        1  1.8148543187
#> 2        2  1.1480231824
#> 3        3 -0.4418980666
#> 4        4  0.9203127436
#> 5        5 -0.6076233793
#> 6        6  0.2453155151
#> 7        7 -0.5759586577
#> 8        8  0.6012258206
#> 9        9  0.3977612763
#> 10      10 -1.4698397602
#> 11      11  1.6477284018
#> 12      12 -0.4248575037
#> 13      13 -0.3203189079
#> 14      14  0.3859820914
#> 15      15 -0.0004909646

This dataframe will provide the information about traits that we need for the analysis. Next, we need to calculate strength using only individuals categorised as strong according to the mixture model. Since both component membership (strong or weak) and edge weights are probabilistic, we have to sample from both posteriors as we build a new posterior of mixture-based strength:

# Set number of nodes and number of posterior samples
num_nodes <- fit_edge$num_nodes
num_draws <- 5 # Keep short for demo purposes.

# Create a list of igraph networks from edgemodel to represent network posterior
nets <- bison_to_igraph(fit_edge, num_draws)

# Create an empty matrix to hold strengths of top mixture component
mix_strengths <- matrix(0, num_draws, num_nodes)

# Create an empty list for imputed versions of the dataframe
imputed_data <- list()

# Loop through each posterior sample and calculate strength of top mixture.
for (i in 1:num_draws) {
  # Calculate edge components (1 if strong, 0 if weak)
  edge_components <- 1 * (fit_mixture$edge_component_samples[i, 2, ] == 2)
  
  # If the edge is strong, don't change edge weight, but if it's weak then set to zero.
  E(nets[[i]])$weight <- E(nets[[i]])$weight * edge_components
  
  # Change the metric values of the imputed data to be mixture-based strength
  imputed_data[[i]] <- df_nodal
  imputed_data[[i]]$mix_strength <- strength(nets[[i]])
}
imputed_data[[2]]
#>    node_id         trait mix_strength
#> 1        1  1.8148543187     6.564745
#> 2        2  1.1480231824     5.574640
#> 3        3 -0.4418980666     6.671651
#> 4        4  0.9203127436     4.145211
#> 5        5 -0.6076233793     7.859528
#> 6        6  0.2453155151     3.621058
#> 7        7 -0.5759586577     8.959540
#> 8        8  0.6012258206     8.569989
#> 9        9  0.3977612763     4.892605
#> 10      10 -1.4698397602     7.275292
#> 11      11  1.6477284018     5.605594
#> 12      12 -0.4248575037     5.208479
#> 13      13 -0.3203189079     5.052516
#> 14      14  0.3859820914     8.665744
#> 15      15 -0.0004909646     3.610796

Now we have a set of imputed dataframes containing the posteriors of mixture-based node strength, we can fit the model using brm_multiple.

fit_brm <- brm_multiple(trait ~ mix_strength, imputed_data, silent=2, refresh=0)
summary(fit_brm)
#> Warning: Parts of the model have not converged (some Rhats are > 1.05). Be
#> careful when analysing the results! We recommend running more iterations and/or
#> setting stronger priors.
#>  Family: gaussian 
#>   Links: mu = identity; sigma = identity 
#> Formula: trait ~ mix_strength 
#>    Data: imputed_data (Number of observations: 15) 
#>   Draws: 20 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 20000
#> 
#> Population-Level Effects: 
#>              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept        1.10      1.02    -0.94     3.08 1.05      274     2266
#> mix_strength    -0.14      0.16    -0.46     0.18 1.06      204     2488
#> 
#> Family Specific Parameters: 
#>       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sigma     0.98      0.22     0.66     1.51 1.01     1909    11795
#> 
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

This was a short tutorial on how to use bison_mixture() to fit mixture models in bisonR. If you have any questions please get in touch.