Methods & Data

Data presented on PreFer Genes are based on results of binary logistic regression analyses of two datasets (M and H), as conducted with SPSS 23 (IBM) using the "Forward LR" method.

Genes in Dataset M were categorized as either candidates or non-candidates according to phenotypes in male knockout mice (Mouse Genome Informatics). Categorization in Dataset H used the available literature (Pubmed). Variables included in logistic regressions were the following:

  • dN/dS: pairwise dN/dS estimates calculated for 1-to-1 orthologues of human and mouse and downloaded from Ensembl version 86
  • Network parameters (node degree, closeness centrality, betweenness centrality): extracted from a human protein-protein interaction (PPI) network generated and analysed with Cytoscape 3.4.0, using data from IntAct, APID, MINT, DIP-IMEx, MatrixDB, InnateDB-IMEx (all integrated by PSICQUIC in October 2016) as well as BioGrid (version 3.4.141)
  • Closeness to candidate genes in the PPI network: minimal shortest path to other candidate proteins and number of directly neighboured candidate markers
  • Expression fold change: calculated from project E-MTAB-2836, contrasting RNA expression in human testis with expression in human brain, heart, and ovary