Genetic interaction data are provided to discover gene sets densely connected each other. The densely connected genes are likely to have the same cellular functions. Implement the Graph Entropy algorithm for density-based graph clustering.
- Select a seed node of the highest degree among those which are not in any clusters previously determined, and form an initial cluster including the seed node and its neighbors.
- Remove iteratively each neighbor of the seed in a decreasing order of degree if the removal of the neighbor decreases graph entropy.
- Add iteratively each node on the outer boundary of the current cluster in a decreasing order of degree if the addition of the node decreases graph entropy.
- Return the current cluster
- Repeat steps-1, 2, 3, and 4 until there is no seed node to be selected.
Your python code should take an input file ("assignment6_input.txt"), and return an output file ("assignment7_output.txt") which has the clusters of size greater than 3. In an output file, print output clusters in a decreasing order of cluster size. Print each cluster at each line in the format of the size and gene names in the cluster, for example, "4: YBR160W YDR224C YPL231W YBR081C". In each cluster, print the gene names in an arphabetic order.
|