[CODE-REVIEW] Determine if given lists intersect

bahmanm@lemmy.ml · edit-2 1 year ago

[CODE-REVIEW] Determine if given lists intersect

Cid@lemmy.sdf.org · edit-2 1 year ago

If you don’t care about the intersection values:

init empty hashmap w best guess on size
Iterate sequences
  Iterate elements
    If elt in hashmap return t
  Add elt to hashmap
return nil

If you have maybe million+ elements, a db like sqlite might help. Unique index, insert each item til you get a unique constraint failure.

bahmanm@lemmy.ml · 1 year ago

Yes, that’s essentially the snippet in my post 👍

Cid@lemmy.sdf.org · 1 year ago

Personally, I’d drop all the macro template syntax, applys, intersects, etc. And simplify the function arg to be just: (sequences)

let item-map make-hash-table
  dolist seq sequences
    dolist item seq
      ;; gethash / return / or setf &amp; continue

charje@lemmy.ml · edit-2 1 year ago

Instead of storing intersect-p as a variable and keeping it until the end of the loop, you can return early as soon as you find the first intersection.

Even though a hash table has better symtotic run time, you might find after benchmarking that the O(n^2) is faster for your use case. If you are set on using a hash table, you might consider setting the initial size to something a bit larger (relative to the input lists) to avoid having to dynamically grow the hash table.

I think also the return value of the inner loop is never used…

I personally like to keep my tests assertions top level so I can interactively run each one by itself.

bahmanm@lemmy.ml · 1 year ago

Thanks for the code review and feedback. Here’s a 2nd attempt: https://pastebin.com/WBqs9u8L

I essentially threw away my bloated Java/C#'esq implementation and started from scratch. Please let me know what you think 🙏

charje@lemmy.ml · 1 year ago

This version is definitely a bit harder to follow what is going on.

bahmanm@lemmy.ml · edit-2 1 year ago

Oh!? And I was under the impression that the code reads more naturally than the initial version 😂

Let me try putting it in words and see if it makes sense to you:

Given sequences seq1 and seq2 and sequence of sequences sequences, seq-intersect-p should return non-nil if at least one pair of the input sequences have got an intersection.

If seq1 and seq2 intersect return t
Recursively check if seq1 intersects w/ any element in sequences. If it does, return t. Otherwise we know seq1 is safe to be ignored - no intersection whatsoever.
Recursively check if seq2 intersects w/ any element in sequences. If they don’t, we know seq2 is safe to be ignored too.
Recursively check if any elements of sequences intersect w/ each other.

There’s no caching or optimisation in this version. So it’s always O(n²).

bahmanm@lemmy.ml · 1 year ago

tests assertions top level

Noted. Makes sense.

bahmanm@lemmy.ml · 1 year ago

Thanks all for your feedback 🙏

I’m going to stick to “version 2” which, to my mind, reads more naturally. I’ll definitely consider the iterative suggestions for the sake of performance if I ever decide to submit a patch upstream. But for now, what I’ve got does the job for me dealing w/ sequences w/ less than 50 elements.

[CODE-REVIEW] Determine if given lists intersect

[CODE-REVIEW] Determine if given lists intersect

Version 2