Assignment 4, Discovering Gene Sets from PPI Networks Weighted by GO Annotations

(Qualifying Exam)

 

Background
A gene set is a group of genes that are functionally related. Gene sets are often defined based on the genes that share biological pathways, are co-expressed in specific conditions, or are regulated by the same transcription factor. Recent studies have attempted to identify gene sets using (1) biological networks and (2) functional annotations from ontologies. In particular, a protein-protein interaction (PPI) dataset is commonly used to identify gene sets. The densely connected genes in a PPI network are likely to have the same functions.

Description
  1. Construct a weighted PPI network.
    • Provide the annotation-based semantic similarity scores (generated in Assignment-3) as weights in PPIs (provided in Assignment-2).
  2. Implement any weighted graph clustering algorithm.
    • Apply a weighted graph clustering algorithm to the weighted PPI network for discovering gene sets.
    • Collect the resulting clusters with size-3 or greater.
  3. Evaluate the clustering results.
    • Compare the clusters to the human protein complex dataset (provided in Assignment-3).
    • Measure the F1 score between each cluster and the best-matching protein complex, and average the scores across the clusters.

Submission
  • Submit (1) your Python code "Assignment4.py", (2) the clustering results, and (3) the average F1 score via LearnUs.