algorithm, closest point between list elements
algorithm, closest point between list elements
I have n ordered lists of unequal size (where I do not know beforehand how many lists there are going to be). I need to find the minimum average distance between one element in each list.
For example given n=3 for three lists:
a = [14, 22, 36, 48]
b = [14, 23, 30, 72]
c = [1, 18, 24]
The output should be (22,23,24) because:
mean(abs(22-23), abs(23-24), abs(22-24)) = 1.33333
which is the smallest among all the points in the example above.
I tried to implement it in Python as following
def alligner(aoa):
'''
read arrays of arrays of peaks and return closest peaks
'''
#one of arrays is empty
if not [y for x in aoa for y in x]:
return None
# there is the same nr in all array no need to do anything
candidate = set.intersection(*map(set, aoa))
if candidate:
# returns intersect
return [max(list(candidate))] * len(aoa)
else:
#tried cartesian product via bumpy malloc err
pass
My doubt is now regarding the implementation of the other part. I thought about using the cartesian product for generating all combination but is going into memory issues. My guesses would be do generate all the combination somehow (maybe itertools??) and loop through all of those but I don't know if there s any algorithm that tackles this problem that I could use.
I do not need code but only hints whether there is any efficient way for solving this or the brute force with n for loops on the permutated lists is the only one
EDIT
Regarding the size of the problem the nr of list is maximum 100 (fixed) while the nr of elements can be vary but I would say that the presented example with 4 or 5 points per list is a realistic scenario.
All points are non-negative.
Tried the proposed itertools solution but of course not memory issues but has been running since hours and stucked on the 3rd element.
It isn't quite clear what is meant by "minimize the difference between all points". You can minimize one number.
– n.m.
Jul 1 at 14:23
From the set of points chosen from the sets (I'm assuming one point per set), you can compute pairwise differences. It's the sum of those differences the OP wants to minimize, I believe.
– chepner
Jul 1 at 15:32
@n.m. yes as written by chepner I want to minimize the pairwise absolute difference between all points in a set (or alternatively as said also the sum of the difference can be used or even the standard deviation between the proposed points)
– D.A.
Jul 2 at 13:26
If memory is the only problem, you can just use
itertools.product
to iterate all the possible combinations, without realizing them in a list, first. But then, if memory is an issue, it's likely that time will also be, and just using a generator won't fix that.– tobias_k
Jul 2 at 15:55
itertools.product
4 Answers
4
This method is a brute force method, but uses a method of elimination similar to Dijkstra's algorithm, that results in far fewer cases (making an algorithm that is most likely orders of magnitude faster, especially for big lists or lots of lists). Tell me if you don't understand it, and I can clarify. The implementation can be found here: https://github.com/nerryoob/closestPoint
What you are doing is making a list of the different combination of numbers (i.e. answers)? Best at the start (index 0), worst at the end, or vice versa, see what works best. You'll make the result list for the first input list only, completely ignoring the others. For one list, of course, all the items are the solution - they all have total difference 0. So just copy the first input list into the result list
Next, probably with a while loop, follow this algorithm. Take the top item and pop it from the result list. Store its value. Go to the next input list and for each item in that next input list, make a copy of the top item you just popped that also has the item of the next input list. Find the new overall difference and insert the new item based off of that into the list. Repeat until the top solution has all of the lists in. This means that you guarantee you have the best solution (of at least joint 1st), whilst spending far less time on the combinations that are obviously not the solution
Example (the
number in brackets is the total difference)
[14, 22, 36, 48]
[14, 23, 30, 72]
[1, 18, 24]
Result list is [14(0), 22(0), 36(0), 48(0)]
[14(0), 22(0), 36(0), 48(0)]
Keep on repeating and you end up with 22, 23, 24 on top. Because this has all n lists in it, you can therefore stop and give back the answer
To optimise it:
can you say something about the complexity of your algorithm?
– kutschkem
Jul 2 at 11:10
Is not clear to me how to stop it (coding it right now). By checking the nr of elements in a solution you would stop with 14-14-18 and if you do want to check the top solution you need to have all the combination for finding the best one.
– D.A.
Jul 3 at 8:04
D.A I think you misinterpreted it. If a solution is at the top and has all the lists in, then by definition it is the best, as the total differences for the others cannot go down as you make them more complete. Carry on you end up with 22 and 23 on top, add in 24 and you end up with 22 and 23 and 24 on top with a total difference of 3- the correct solution
– nerryoob
Jul 3 at 11:27
The implementation has been tested and does work. If you want I can create an interactive one that walks through the steps
– nerryoob
Jul 3 at 11:48
First of all, optimizing the mean of the differences is the same as optimizing the sum of the differences.
If you model you problem as directed graph, this can be solved:
Let your lists be A, B, C. Every entry of your lists is a vertex of the graph v_ai
where a is the list and i the index.
v_ai
For every index i in A, j in B, add an edge v_ai -> v_bj
with weigth abs(A(i) - B(j))
v_ai -> v_bj
abs(A(i) - B(j))
For every index i in B, j in C, add an edge v_bi -> v_cj
with weigth abs(B(i) - C(j))
v_bi -> v_cj
abs(B(i) - C(j))
For every index i in C, j in A, add an edge v_ci -> v_aj
with weigth abs(C(i) - A(j))
v_ci -> v_aj
abs(C(i) - A(j))
What you are looking for now is the minimum cycle in this graph. Use this answer for an O(n^3) algorithm. (a modified Floyd-Warshall algorithm)
Is it really just a cycle? What if there are four lists?
– tobias_k
Jul 2 at 14:58
@tobias_k Then you do A -> B -> C -> D -> A
– kutschkem
Jul 2 at 15:12
@tobias_k ... no that isn't right, is it? Since then you are missing A - C, and B - D ...
– kutschkem
Jul 2 at 15:13
@tobias_k On the other hand, | A - C | <= | A - B | + | B - C | (triangle inequality), so at least it is an approximation. I will see if I can prove that a cycle minimizes the whole thing, but I am not really sure.
– kutschkem
Jul 2 at 15:21
I'm not sure about the best way to find the optimal solution but one heuristic might be to examine ranges. If our lists are sorted, we can check if the list has an element within a range using a binary search. So we can divide and conquer, attempting to narrow ranges that include an element from each list. Because of the nature of mean computation, we might unfortunately also be interested in ranges that include elements from many but not all lists, since a collection of very close numbers with a few outliers might produce a smaller differences-mean than more variation within a smaller range; this complicates the solution quite a bit.
We don't really know a lot about the size of your problem, i.e. how many lists, and how many elements per list. For starters, and for setting a baseline, you could just use itertools.product
to iterate all the possible combinations of elements from the three lists, without realizing them in a list. You can then iterate those and find the best one, or pass them directly into min
and use a special key
function using itertools.combinations
and sum
to find the one with the lowest average distance (if the sum is lowest, so is the average).
itertools.product
min
key
itertools.combinations
sum
>>> a = [14, 22, 36, 48]
>>> b = [14, 23, 30, 72]
>>> c = [1, 18, 24]
>>> len(list(itertools.product(a, b, c)))
48
>>> min(itertools.product(a, b, c),
... key=lambda t: sum(abs(n-m) for n, m in itertools.combinations(t, 2)))
(22, 23, 24)
Depending on the size of your problem, this might be much too slow, but maybe it's sufficient.
the nr of list is maximum a 100 and the nr of points within each list could be variable but not more than 35-40 (while the average should be below 10)
– D.A.
Jul 3 at 7:34
@D.A. Okay, with 100 lists with 10 elements each, you have 10^100 combinations, so this won't work. Still, this might help others with having a similar but smaller problem.
– tobias_k
Jul 3 at 8:19
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Seems like more of a graph theory like problem? math.stackexchange.com/questions/581831/…. If not, perhaps a tree-like structure for each list would be your best bet.
– ZaxR
Jul 1 at 14:11