find homogeneous groups of items based on pairwise information
Suppose you have performed several experiments from four treatments (treatments 1,2,3, and 4). From each treatment, you have collected many independent samples. The question is into what groups of ‘statistically not different’ the results may be divided.
This package, pairs2groups, finds the groups into which the treatments may be divided such that each member of a group is not significantly different than other members in the group.
A picture may make this more clear. Samples from each treatment are plotted below in a boxplot. The letters below each box describe the groups each treatment is a member of. For example, all treatments in group ‘a’ are not statistically significantly different than each other.
In this example, treatment 1 is in groups ‘a’ and ‘b’. Therefore, it is statistically significantly different from treatment 4, which is not in either of these groups. Treatment 1 is not statistically significantly different from treatments 2 or 3, which which it shares membership in groups ‘b’ and ‘a’, respectively.
The problem is how to find the (minimal but complete) set of homogeneous groups of a collection of items. A homogeneous group is defined to have no member that is ‘different’ (defined below) than any other member.
Consider the problem of n items and pairwise knowledge of whether each item is either ‘different’ or ‘not different’ from every other item. This property of ‘different’ is commutative (A is different than B means B is different than A), but not transtive (A is different than B is different than C does not specify the relation between A and C).
How to construct groups such that every member population of a group is not different than the other populations in the group?
The source code and issue tracker for this library are at http://github.com/astraw/pairs2groups
perform statistical comparisons and call find_homogeneous_groups()
The statistical test used is the Mann Whitney U.
Parameters: | populations : A sequence of of sequences
significance_level : float, optional
two_tailed : bool, optional
force_letter : bool, optional
|
---|---|
Returns: | group_info : dictionary
|
Examples
This example generates four populations. Three from the same distribution, and the last from a different distribution. Then, label_homogeneous_groups() is used to find which of these populations belong to statistically non significantly different groups.
>>> import numpy as np
>>> pop1 = np.random.normal(size=(100,))
>>> pop2 = np.random.normal(size=(100,))
>>> pop3 = np.random.normal(size=(100,))
>>> pop4 = np.random.normal(size=(100,)) + 2
>>> populations = [pop1, pop2, pop3, pop4]
>>> group_info = label_homogeneous_groups(populations)
>>> group_info # doctest: +SKIP
{'p_values': array([[ NaN, 0.578, 0.705 , 0. ],
[ 0.578, NaN, 0.855, 0. ],
[ 0.705 , 0.855, NaN, 0. ],
[ 0. , 0. , 0. , NaN]]),
'medians': [0.071, -0.010, -0.0156, 2.054],
'group_strings': ['a', 'a', 'a', ''], 'groups': [(0, 1, 2)]}
Find all homogeneous groups of not-different pairs.
The algorithm used is as follows, where S is the set of all n items.
- Set k equal n, and T equal S.
- Set m equal n choose k. Take all (m in number) k element subsets of T. Denote the i*th subset of *T as U_i.
- For i in (0, ..., m-1):
- 3a. If no pair within U_i is different, then U_i is a
- group. Remember it.
3b. Else, set k equal k-1, and T equal U_i. Goto 2.
Parameters: | different_pairs : list of 2-tuples
N_popupations : int
|
---|
pairs2groups is a software package for Python.
After installation, do:
nosetests pairs2groups.util pairs2groups -v -v --with-doctest
a 2-tuple that hashes the same regardless of order
Methods
get a list of all pairs of values of S
>>> [tuple(p) for p in get_all_pairs( [0,1,2,3] )]
[(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]
get all subsets of T of length k
Returns len(T) choose k subsets.
test whether list of sets A is equal to list of sets B
>>> A = [frozenset([1])]
>>> B = [frozenset([1])]
>>> is_list_of_sets_equal(A, B)
True
>>> B = [frozenset([2])]
>>> is_list_of_sets_equal(A, B)
False
>>> B = [frozenset([1]),frozenset([2])]
>>> is_list_of_sets_equal(A, B)
False
>>> B = []
>>> is_list_of_sets_equal(A, B)
False
remove jth element of T
>>> T = [0, 1, 2, 3]
>>> take_not( T, 1)
[0, 2, 3]