python - FInd all subsets of a set of pairs wherein every value is connected to every other -


this logic problem i'm not quite equipped deal seems.

i have data set of paired samples each identified unique number. pair indicates level of relatedness between samples. want group these samples such every member of group supported pair every other member of group.

for example, in data set below pairs 6, 7, , 28 constitute group of 3. pairs 22 through 27 constitute group of 4, in case there 4 groups of 3 inside group don't want in output. more difficult pairs 10, 11, , 17 group of 3.

data set:

    [,1] [,2]  [1,]    6  267  [2,]    9   10  [3,]   11   12  [4,]   79   80  [5,]   96  570  [6,]  314  583  [7,]  314  584  [8,]  425  426  [9,]  427  428 [10,]  427  429 [11,]  427  430 [12,]  427  472 [13,]  427  473 [14,]  427  474 [15,]  428  430 [16,]  428  473 [17,]  429  430 [18,]  430  472 [19,]  430  473 [20,]  430  474 [21,]  472  474 [22,]  517  519 [23,]  517  520 [24,]  517  521 [25,]  519  520 [26,]  519  521 [27,]  520  521 [28,]  583  584 [29,]  649  650 

a couple things data set: value in column 2 of pair greater value in column 1 , values in column 1 in ascending order.

this data set simple version of problem limited number of pairs 1 sample (the 427 6) more complicated data sets have number of levels need solution agnostic factor. think recursion of sort way go , have been playing around such in r new concept , not getting right answer.

i can't first person ever haven't found yet, because describing problem bit awkward in search engine.

anyway, if knows appreciate help!

edit -

expected output (each line group):

 [1,]    6  267  [2,]    9   10  [3,]   11   12  [4,]   79   80  [5,]   96  570  [6,]  314  583  584  [7,]  425  426  [8,]  427  428  430  473  [9,]  427  429  430 [10,]  427  430  472  474 [11,]  517  519  520  521 [12,]  649  650 

to clarify mean each member of group being supported pair every other member: group 1 has 2 members therefore requires 1 supporting pair pair 1. group 6 has 3 members , therefore requires support 3 pairs 6, 7, , 28 proving each group member pairs both of others (this can visualized triangle each member vertex , connecting lines pairs). group 8 has 4 members , therefore requires support of 6 pairs 9, 11, 13, 15, 16, 19 (you can picture square each member vertex , lines between vertices giving 6 pairs). hope clarifies! hard explain , solution seems non-trivial.

update 3 add pivoting

this suggested optimisation module below, should reduce number of recursions significantly. add end of module code, , replace loop for $v ( keys %$p ) in _bron_kerbosh for $v ( _choose_pivot($p, $x) )

# find element u of p u x such many possible of # neighbours fall in p # sub _choose_pivot {     ( $p, $x ) = @_;      @p = keys %$p;     @choice = @p;      $u ( @p, keys %$x ) {         $nu = $neighbours{$u};         %nu = map +( $_ => 1 ), @$nu;         @subset = grep { not $nu{$_} } @p;         @choice = @subset if @subset < @choice;     }      @choice; } 

update 2 module

wikipedia describes bron-kerbosch algorithm finding maximal cliques in graph. says

although other algorithms solving clique problem have running times are, in theory, better on inputs have few maximal independent sets, bron–kerbosch algorithm , subsequent improvements reported being more efficient in practice alternatives.

so since cpan appears have no clique module can find thought useful implement it. code. should copy , save graph/cliques/bron_kerbosch.pm. shall prepare tests , put on cpan shortly

package graph::cliques::bron_kerbosch;  use strict; use warnings; use v5.8.3;  use exporter qw/ import /;  our @export_ok = qw/ get_cliques /;  ( %neighbours, @cliques );  sub get_cliques {      ( $edges ) = @_;     %neighbours = ();     @cliques = ();      $edge ( @$edges ) {         ( $n1, $n2 ) = @$edge;         $neighbours{$n1}{$n2} = 1;         $neighbours{$n2}{$n1} = 1;     }     $_ = [ keys %$_ ] values %neighbours;      ( %r, %p, %x );     $p{$_} = 1 map @$_, @$edges;      _bron_kerbosch( \( %r, %p, %x ) );      @cliques; }  sub _bron_kerbosch {     ( $r, $p, $x ) = @_;      unless ( %$p or %$x ) {         push @cliques, [ keys %$r ];         return;     }      $v ( keys %$p ) {          $nv = $neighbours{$v};          %r_ = ( %$r, $v => 1 );         %p_ = map { $_ => 1 } _intersect( [ keys %$p ], $nv);         %x_ = map { $_ => 1 } _intersect( [ keys %$x ], $nv);          _bron_kerbosch( \( %r_, %p_, %x_ ) );          delete $p->{$v};         $x->{$v} = 1;     } }  sub _intersect {     ( $aa, $ab ) = @_;     %ab = map { $_ => 1 } @$ab;     grep $ab{$_}, @$aa; }  1; 

and program drives module using own data. get_cliques executes in under millisecond on system

use strict; use warnings;  use graph::cliques::bron_kerbosch qw/ get_cliques /;  # read data array of arrays, converting question's r # output. each element of @edges contains pair of nodes of graph # @edges; while ( <data> ) {     @pair = split;     next unless @pair > 2 , shift( @pair ) =~ /\[/;     push @edges, \@pair; }  # call utility function list of cliques # @groups = get_cliques( \@edges );  # extract hash keys change array of hashes array of sorted # arrays, sort array first size of clique , # first value in each group # $_ = [ sort { $a <=> $b } @$_ ] @groups; @groups = sort { @$a <=> @$b or $a->[0] <=> $b->[0] } @groups; print join( ' ', map { sprintf '%3d', $_ } @$_ ), "\n" @groups;  __data__     [,1] [,2]  [1,]    6  267  [2,]    9   10  [3,]   11   12  [4,]   79   80  [5,]   96  570  [6,]  314  583  [7,]  314  584  [8,]  425  426  [9,]  427  428 [10,]  427  429 [11,]  427  430 [12,]  427  472 [13,]  427  473 [14,]  427  474 [15,]  428  430 [16,]  428  473 [17,]  429  430 [18,]  430  472 [19,]  430  473 [20,]  430  474 [21,]  472  474 [22,]  517  519 [23,]  517  520 [24,]  517  521 [25,]  519  520 [26,]  519  521 [27,]  520  521 [28,]  583  584 [29,]  649  650 

output

  6 267   9  10  11  12  79  80  96 570 425 426 649 650 314 583 584 427 429 430 427 428 430 473 427 430 472 474 517 519 520 521 



update 1

okay have here known mathematically graph, , describing, every value connected every other value, called complete graph

knowing lets use google, , there question "find complete sub-graphs within graph" here on stack overflow tells complete subgraph called clique, has its own set of clique problems, of yours "listing maximal cliques". wikipedia tells "these problems hard"!

on basis checked cpan clique module , found graph::clique assumed have plug in question. has problems

  • it looks cliques of specific size

  • it's broken, , dies message

    can't use string ("1") scalar ref while "strict refs" in use 
  • because of sorting bug, works numeric node names have same number of digits

it uses brute-force technique employs regex method, while quite clever not fast

as better place start nothing fixed , added calling code checks whether smaller clique found earlier subset of larger one. result program seems want

note though, think expected data wrong, contains cliques subsets of others in list, commented beneath question. , can't want include all subsets, otherwise example list node pairs instead of of them. there 7 two-node cliques in data; [517, 521] isn't 1 of them because subset of [517, 519, 520, 521]

this program runs in under 6 seconds on system. algorithm works looking cliques of successively larger sizes until none found. far biggest delay here establishing there no cliques 5 nodes in data, takes around 5 seconds. finding of 4 nodes or less takes less second

use strict; use warnings;  use list::moreutils qw/ uniq /;  # read data array of arrays. each element of @edges contains # pair of nodes of graph # @edges; push @edges, [ split ] while <data>;  # keep asking cliques of larger size until find none. remove # found subsets of new ones # @groups;  ( $size = 2; @cliques = get_cliques( $size, \@edges ); ++$size ) {      @cliques = map +{ map +( $_ => 1 ), split }, @cliques;      ( $i = 0; $i < @groups; ) {          $group = $groups[$i];          $subset = {             $clique = $_;             { $clique->{$_} } keys %$group;         } @cliques;          if ( $subset ) {             splice @groups, $i, 1;         }         else {             ++$i;         }     }      push @groups, @cliques; }  # extract hash keys change array of hashes array of sorted # arrays, sort array first size of clique , # first value in each group # $_ = [ sort { $a <=> $b } keys %$_ ] @groups; @groups = sort { @$a <=> @$b or $a->[0] <=> $b->[0] } @groups; print join( ' ', map { sprintf '%3d', $_ } @$_ ), "\n" @groups;   # subroutine based on non-functional `graph::clique` cpan module # edward wijaya, <ewijaya@singnet.com.sg> # sub get_cliques {      ( $k, $edges ) = @_;      $string = {         @vertices = sort { $a <=> $b } uniq map @$_, @$edges;         @edges = map "$_->[0]-$_->[1]", sort { $a->[0] <=> $b->[0] } @{$edges};         local $" = ',';              # fix syntax colouring "         "@vertices;@edges";     };      $regex = join '[^;]+', ('\b(\d+)\b') x $k;     $regex .= '[^;]*;';     $regex .= "\n";      $i ( 1 .. $k-1 ) {         $j ( $i+1 .. $k ) {             $regex .= sprintf '(?=.*\b\g%d-\g%d\b)', $i, $j;             $regex .= "\n";         }     }      # backtrack regain identified k-cliques (credit mike mikero)     @cliques;     $regex .= '(?{ push (@cliques, join(" ", map ${$_}, 1..$k) ) })(*fail)' . "\n";     #print $regex, "\n";      {         no strict 'refs';         use re 'eval';         $string =~ /$regex/x;     }      @cliques; }  __data__ 6 267 9 10 11 12 79 80 96 570 314 583 314 584 425 426 427 428 427 429 427 430 427 472 427 473 427 474 428 430 428 473 429 430 430 472 430 473 430 474 472 474 517 519 517 520 517 521 519 520 519 521 520 521 583 584 649 650 

output

  6 267   9  10  11  12  79  80  96 570 425 426 649 650 314 583 584 427 429 430 427 430 472 474 427 428 430 473 517 519 520 521 



original post

this reasonably straightforward once have tossed red herring every member of each group must in pair every other member. believe data structured in way such each group represented every possible pair within it, , problem 1 of gathering values paired any other member of each group

this code perhaps little dense, work done within for loop. 2 data structures maintained in parallel. @groups array of hashes keys members of group. keep members unique if added multiple times. , %group_for hash relating each member element of @groups has been placed

the for loop processes each pair looking group either of pair has been placed. if neither have appeared before new group (anonymous hash) pushed onto array. %groups_for hash updated show both members have been placed

the output section converts groups hashes arrays, sorts each group, , sorts groups in order of first member

use strict; use warnings;  @data;  push @data, [ split ] while <data>;  @groups; %group_for;  $pair ( @data ) {    $group = $group_for{$pair->[0]} || $group_for{$pair->[1]};   push @groups, $group = {} if not $group;    $group->{$_} = 1 @$pair;   $group_for{$_} = $group @$pair; }  # change array of hashes array of sorted values, sort array # first value in each group, , display # $_ = [ sort { $a <=> $b } keys %$_ ] @groups; @groups = sort { $a->[0] <=> $b->[0] } @groups;  print join(' ', map { sprintf '%3d', $_ } @$_), "\n" @groups;   __data__   6 267   9  10  11  12  79  80  96 570 314 583 314 584 425 426 427 428 427 429 427 430 427 472 427 473 427 474 428 430 428 473 429 430 430 472 430 473 430 474 472 474 517 519 517 520 517 521 519 520 519 521 520 521 583 584 649 650 

output

  6 267   9  10  11  12  79  80  96 570 314 583 584 425 426 427 428 429 430 472 473 474 517 519 520 521 649 650 

Comments