Assignment 6, Graph Clustering by Maximal Clique Search

Due Date: 11:59pm, November 25, 2024

 

Purpose
  • Understanding of density-based graph clustering algorithms
  • Practice of applying graph clustering techniques to large-scale graph data

Description
Genetic interaction data are provided to discover gene sets densely connected each other. The densely connected genes are likely to have the same cellular functions. Implement the maximal clique algorithm using the anti-monotonic property for density-based graph clustering. Your python code should take an input file ("assignment6_input.txt"), and return an output file ("assignment6_output.txt") which has the cliques of size-8 or greater. In an output file, print output clusters in a decreasing order of cluster size. Print each cluster at each line in the format of the size and gene names in the cluster, for example, "4: YBR160W YDR224C YPL231W YBR081C". In each cluster, print the gene names in an arphabetic order.

Maximal Clique Search using Antimonotonic Property
  1. Create initial clusters:
    1. Find all cliques of size-2.
  2. Increase k, size of cliques, iteratively:
    1. Apply selective joining.
    2. Find all cliques of size-k.
    3. Repeat steps 2-1 and 2-2 until no more cliques are made.

Data Set
Gene-gene interaction data are given in a tab-delimited text file. Each row represents an interaction (edge) between two genes (vertices).

Submission
Submit your Python code, "assignment6.py", and the output file via LearnUs.

Note
  • Just calling the library function of Maximal Cliques in a module is NOT allowed.
  • The algorithm in your code must handle the anti-monotonic property.
  • Put your name as a comment at the first line of your code.