To investigate the effect of sample size, we randomly selected a number of samples (5, 6, 7, 9, 11, 13, 16, 25, 40) from each GTEx tissue that had at least 70 samples to choose from. Ten trials were done for each chosen sample size. Below, we have plotted the performance of the networks created with a given number of samples by each workflow. In addition, we examined effects of similarity of gene expression across samples, standard deviation of counts sums across samples, and tissue type. Although only Log2(auPRC/prior) results are shown here, other measures (Precision at 20% recall and Area under the ROC curve) yield very similar results and figures.
Sample size refers to the number of samples used to create a network.
Sample gene expression similarity is determined by subsetting all samples to the 50% most variable genes in the tissue it came from, then calculating the spearman correlation between all sample pairs in the experiment and taking the median value.
Count sum diversity is calulated for a given experiment by taking the standard deviation of the sum of counts in each sample.
Tissue refers to the tissue that the samples were taken from.