this logic problem i'm not quite equipped deal seems.
i have data set of paired samples each identified unique number. pair indicates level of relatedness between samples. want group these samples such every member of group supported pair every other member of group.
for example, in data set below pairs 6, 7, , 28 constitute group of 3. pairs 22 through 27 constitute group of 4, in case there 4 groups of 3 inside group don't want in output. more difficult pairs 10, 11, , 17 group of 3.
data set:
[,1] [,2] [1,] 6 267 [2,] 9 10 [3,] 11 12 [4,] 79 80 [5,] 96 570 [6,] 314 583 [7,] 314 584 [8,] 425 426 [9,] 427 428 [10,] 427 429 [11,] 427 430 [12,] 427 472 [13,] 427 473 [14,] 427 474 [15,] 428 430 [16,] 428 473 [17,] 429 430 [18,] 430 472 [19,] 430 473 [20,] 430 474 [21,] 472 474 [22,] 517 519 [23,] 517 520 [24,] 517 521 [25,] 519 520 [26,] 519 521 [27,] 520 521 [28,] 583 584 [29,] 649 650
a couple things data set: value in column 2 of pair greater value in column 1 , values in column 1 in ascending order.
this data set simple version of problem limited number of pairs 1 sample (the 427 6) more complicated data sets have number of levels need solution agnostic factor. think recursion of sort way go , have been playing around such in r new concept , not getting right answer.
i can't first person ever haven't found yet, because describing problem bit awkward in search engine.
anyway, if knows appreciate help!
edit -
expected output (each line group):
[1,] 6 267 [2,] 9 10 [3,] 11 12 [4,] 79 80 [5,] 96 570 [6,] 314 583 584 [7,] 425 426 [8,] 427 428 430 473 [9,] 427 429 430 [10,] 427 430 472 474 [11,] 517 519 520 521 [12,] 649 650
to clarify mean each member of group being supported pair every other member: group 1 has 2 members therefore requires 1 supporting pair pair 1. group 6 has 3 members , therefore requires support 3 pairs 6, 7, , 28 proving each group member pairs both of others (this can visualized triangle each member vertex , connecting lines pairs). group 8 has 4 members , therefore requires support of 6 pairs 9, 11, 13, 15, 16, 19 (you can picture square each member vertex , lines between vertices giving 6 pairs). hope clarifies! hard explain , solution seems non-trivial.
update 3 add pivoting
this suggested optimisation module below, should reduce number of recursions significantly. add end of module code, , replace loop for $v ( keys %$p )
in _bron_kerbosh
for $v ( _choose_pivot($p, $x) )
# find element u of p u x such many possible of # neighbours fall in p # sub _choose_pivot { ( $p, $x ) = @_; @p = keys %$p; @choice = @p; $u ( @p, keys %$x ) { $nu = $neighbours{$u}; %nu = map +( $_ => 1 ), @$nu; @subset = grep { not $nu{$_} } @p; @choice = @subset if @subset < @choice; } @choice; }
update 2 module
wikipedia describes bron-kerbosch algorithm finding maximal cliques in graph. says
although other algorithms solving clique problem have running times are, in theory, better on inputs have few maximal independent sets, bron–kerbosch algorithm , subsequent improvements reported being more efficient in practice alternatives.
so since cpan appears have no clique module can find thought useful implement it. code. should copy , save graph/cliques/bron_kerbosch.pm
. shall prepare tests , put on cpan shortly
package graph::cliques::bron_kerbosch; use strict; use warnings; use v5.8.3; use exporter qw/ import /; our @export_ok = qw/ get_cliques /; ( %neighbours, @cliques ); sub get_cliques { ( $edges ) = @_; %neighbours = (); @cliques = (); $edge ( @$edges ) { ( $n1, $n2 ) = @$edge; $neighbours{$n1}{$n2} = 1; $neighbours{$n2}{$n1} = 1; } $_ = [ keys %$_ ] values %neighbours; ( %r, %p, %x ); $p{$_} = 1 map @$_, @$edges; _bron_kerbosch( \( %r, %p, %x ) ); @cliques; } sub _bron_kerbosch { ( $r, $p, $x ) = @_; unless ( %$p or %$x ) { push @cliques, [ keys %$r ]; return; } $v ( keys %$p ) { $nv = $neighbours{$v}; %r_ = ( %$r, $v => 1 ); %p_ = map { $_ => 1 } _intersect( [ keys %$p ], $nv); %x_ = map { $_ => 1 } _intersect( [ keys %$x ], $nv); _bron_kerbosch( \( %r_, %p_, %x_ ) ); delete $p->{$v}; $x->{$v} = 1; } } sub _intersect { ( $aa, $ab ) = @_; %ab = map { $_ => 1 } @$ab; grep $ab{$_}, @$aa; } 1;
and program drives module using own data. get_cliques
executes in under millisecond on system
use strict; use warnings; use graph::cliques::bron_kerbosch qw/ get_cliques /; # read data array of arrays, converting question's r # output. each element of @edges contains pair of nodes of graph # @edges; while ( <data> ) { @pair = split; next unless @pair > 2 , shift( @pair ) =~ /\[/; push @edges, \@pair; } # call utility function list of cliques # @groups = get_cliques( \@edges ); # extract hash keys change array of hashes array of sorted # arrays, sort array first size of clique , # first value in each group # $_ = [ sort { $a <=> $b } @$_ ] @groups; @groups = sort { @$a <=> @$b or $a->[0] <=> $b->[0] } @groups; print join( ' ', map { sprintf '%3d', $_ } @$_ ), "\n" @groups; __data__ [,1] [,2] [1,] 6 267 [2,] 9 10 [3,] 11 12 [4,] 79 80 [5,] 96 570 [6,] 314 583 [7,] 314 584 [8,] 425 426 [9,] 427 428 [10,] 427 429 [11,] 427 430 [12,] 427 472 [13,] 427 473 [14,] 427 474 [15,] 428 430 [16,] 428 473 [17,] 429 430 [18,] 430 472 [19,] 430 473 [20,] 430 474 [21,] 472 474 [22,] 517 519 [23,] 517 520 [24,] 517 521 [25,] 519 520 [26,] 519 521 [27,] 520 521 [28,] 583 584 [29,] 649 650
output
6 267 9 10 11 12 79 80 96 570 425 426 649 650 314 583 584 427 429 430 427 428 430 473 427 430 472 474 517 519 520 521
update 1
okay have here known mathematically graph, , describing, every value connected every other value, called complete graph
knowing lets use google, , there question "find complete sub-graphs within graph" here on stack overflow tells complete subgraph called clique, has its own set of clique problems, of yours "listing maximal cliques". wikipedia tells "these problems hard"!
on basis checked cpan clique module , found graph::clique
assumed have plug in question. has problems
it looks cliques of specific size
it's broken, , dies message
can't use string ("1") scalar ref while "strict refs" in use
- because of sorting bug, works numeric node names have same number of digits
it uses brute-force technique employs regex method, while quite clever not fast
as better place start nothing fixed , added calling code checks whether smaller clique found earlier subset of larger one. result program seems want
note though, think expected data wrong, contains cliques subsets of others in list, commented beneath question. , can't want include all subsets, otherwise example list node pairs instead of of them. there 7 two-node cliques in data; [517, 521] isn't 1 of them because subset of [517, 519, 520, 521]
this program runs in under 6 seconds on system. algorithm works looking cliques of successively larger sizes until none found. far biggest delay here establishing there no cliques 5 nodes in data, takes around 5 seconds. finding of 4 nodes or less takes less second
use strict; use warnings; use list::moreutils qw/ uniq /; # read data array of arrays. each element of @edges contains # pair of nodes of graph # @edges; push @edges, [ split ] while <data>; # keep asking cliques of larger size until find none. remove # found subsets of new ones # @groups; ( $size = 2; @cliques = get_cliques( $size, \@edges ); ++$size ) { @cliques = map +{ map +( $_ => 1 ), split }, @cliques; ( $i = 0; $i < @groups; ) { $group = $groups[$i]; $subset = { $clique = $_; { $clique->{$_} } keys %$group; } @cliques; if ( $subset ) { splice @groups, $i, 1; } else { ++$i; } } push @groups, @cliques; } # extract hash keys change array of hashes array of sorted # arrays, sort array first size of clique , # first value in each group # $_ = [ sort { $a <=> $b } keys %$_ ] @groups; @groups = sort { @$a <=> @$b or $a->[0] <=> $b->[0] } @groups; print join( ' ', map { sprintf '%3d', $_ } @$_ ), "\n" @groups; # subroutine based on non-functional `graph::clique` cpan module # edward wijaya, <ewijaya@singnet.com.sg> # sub get_cliques { ( $k, $edges ) = @_; $string = { @vertices = sort { $a <=> $b } uniq map @$_, @$edges; @edges = map "$_->[0]-$_->[1]", sort { $a->[0] <=> $b->[0] } @{$edges}; local $" = ','; # fix syntax colouring " "@vertices;@edges"; }; $regex = join '[^;]+', ('\b(\d+)\b') x $k; $regex .= '[^;]*;'; $regex .= "\n"; $i ( 1 .. $k-1 ) { $j ( $i+1 .. $k ) { $regex .= sprintf '(?=.*\b\g%d-\g%d\b)', $i, $j; $regex .= "\n"; } } # backtrack regain identified k-cliques (credit mike mikero) @cliques; $regex .= '(?{ push (@cliques, join(" ", map ${$_}, 1..$k) ) })(*fail)' . "\n"; #print $regex, "\n"; { no strict 'refs'; use re 'eval'; $string =~ /$regex/x; } @cliques; } __data__ 6 267 9 10 11 12 79 80 96 570 314 583 314 584 425 426 427 428 427 429 427 430 427 472 427 473 427 474 428 430 428 473 429 430 430 472 430 473 430 474 472 474 517 519 517 520 517 521 519 520 519 521 520 521 583 584 649 650
output
6 267 9 10 11 12 79 80 96 570 425 426 649 650 314 583 584 427 429 430 427 430 472 474 427 428 430 473 517 519 520 521
original post
this reasonably straightforward once have tossed red herring every member of each group must in pair every other member. believe data structured in way such each group represented every possible pair within it, , problem 1 of gathering values paired any other member of each group
this code perhaps little dense, work done within for
loop. 2 data structures maintained in parallel. @groups
array of hashes keys members of group. keep members unique if added multiple times. , %group_for
hash relating each member element of @groups
has been placed
the for
loop processes each pair looking group either of pair has been placed. if neither have appeared before new group (anonymous hash) pushed onto array. %groups_for
hash updated show both members have been placed
the output section converts groups hashes arrays, sorts each group, , sorts groups in order of first member
use strict; use warnings; @data; push @data, [ split ] while <data>; @groups; %group_for; $pair ( @data ) { $group = $group_for{$pair->[0]} || $group_for{$pair->[1]}; push @groups, $group = {} if not $group; $group->{$_} = 1 @$pair; $group_for{$_} = $group @$pair; } # change array of hashes array of sorted values, sort array # first value in each group, , display # $_ = [ sort { $a <=> $b } keys %$_ ] @groups; @groups = sort { $a->[0] <=> $b->[0] } @groups; print join(' ', map { sprintf '%3d', $_ } @$_), "\n" @groups; __data__ 6 267 9 10 11 12 79 80 96 570 314 583 314 584 425 426 427 428 427 429 427 430 427 472 427 473 427 474 428 430 428 473 429 430 430 472 430 473 430 474 472 474 517 519 517 520 517 521 519 520 519 521 520 521 583 584 649 650
output
6 267 9 10 11 12 79 80 96 570 314 583 584 425 426 427 428 429 430 472 473 474 517 519 520 521 649 650
Comments
Post a Comment