cross-posted from: https://lemmy.ml/post/4591838
Suppose I need to find out if the intersection of an arbitrary number of lists or sequences is empty.
Instead of the obvious O(n2) approach I used a hash table to achieve an O(n) implementation.
Now,
loop
mini-language aside, is this idiomatic elisp code? Could it be improved w/o adding a lot of complexity?You can view the same snippet w/ syntax highlighting on pastebin.
(defun seq-intersect-p (seq1 seq2 &rest sequences) "Determine if the intersection of SEQ1, SEQ2 and SEQUENCES is non-nil." (cl-do* ((sequences `(,seq1 ,seq2 ,@sequences) (cdr sequences)) (seq (car sequences) (car sequences)) (elements (make-hash-table :test #'equal)) (intersect-p nil)) ((or (seq-empty-p sequences)) intersect-p) (cl-do* ((seq seq (cdr seq)) (element (car seq) (car seq))) ((or intersect-p (seq-empty-p seq)) intersect-p) (if (ht-get elements element) (setf intersect-p t) (ht-set elements element t))))) (defun test-seq-intersect-p () "Test cases." (cl-assert (null (seq-intersect-p '() '()))) (cl-assert (null (seq-intersect-p '(1) '()))) (cl-assert (null (seq-intersect-p '(1 2) '(3 4) '(5 6)))) (cl-assert (seq-intersect-p '(1 2) '(3 4) '(5 6) '(1))) t) (test-seq-intersect-p)
Version 2
(defun seq-intersect-p (first second & sequences)
"Determine if FIRST, SECOND and any of the sequences in SEQUENCES have
an intersection."
(if (seq-empty-p sequences)
(seq-intersection first second)
(or (seq-intersection first second)
(apply #'seq-intersect-p
first
(seq-first sequences)
`,@(seq-rest sequences))
(apply #'seq-intersect-p
second
(seq-first sequences)
`,@(seq-rest sequences))
(apply #'seq-intersect-p
(seq-first sequences)
(seq-elt sequences 2)
`,@(seq-rest (seq-rest sequences))))))
If you don’t care about the intersection values:
init empty hashmap w best guess on size Iterate sequences Iterate elements If elt in hashmap return t Add elt to hashmap return nil
If you have maybe million+ elements, a db like sqlite might help. Unique index, insert each item til you get a unique constraint failure.
Yes, that’s essentially the snippet in my post 👍
Personally, I’d drop all the macro template syntax, applys, intersects, etc. And simplify the function arg to be just: (sequences)
let item-map make-hash-table dolist seq sequences dolist item seq ;; gethash / return / or setf & continue
Instead of storing intersect-p as a variable and keeping it until the end of the loop, you can return early as soon as you find the first intersection.
Even though a hash table has better symtotic run time, you might find after benchmarking that the O(n^2) is faster for your use case. If you are set on using a hash table, you might consider setting the initial size to something a bit larger (relative to the input lists) to avoid having to dynamically grow the hash table.
I think also the return value of the inner loop is never used…
I personally like to keep my tests assertions top level so I can interactively run each one by itself.
Thanks for the code review and feedback. Here’s a 2nd attempt: https://pastebin.com/WBqs9u8L
I essentially threw away my bloated Java/C#'esq implementation and started from scratch. Please let me know what you think 🙏
This version is definitely a bit harder to follow what is going on.
Oh!? And I was under the impression that the code reads more naturally than the initial version 😂
Let me try putting it in words and see if it makes sense to you:
Given sequences
seq1
andseq2
and sequence of sequencessequences
,seq-intersect-p
should return non-nil
if at least one pair of the input sequences have got an intersection.- If
seq1
andseq2
intersect returnt
- Recursively check if
seq1
intersects w/ any element insequences
. If it does, returnt
. Otherwise we knowseq1
is safe to be ignored - no intersection whatsoever. - Recursively check if
seq2
intersects w/ any element insequences
. If they don’t, we knowseq2
is safe to be ignored too. - Recursively check if any elements of
sequences
intersect w/ each other.
There’s no caching or optimisation in this version. So it’s always O(n2).
- If
tests assertions top level
Noted. Makes sense.
Thanks all for your feedback 🙏
I’m going to stick to “version 2” which, to my mind, reads more naturally. I’ll definitely consider the iterative suggestions for the sake of performance if I ever decide to submit a patch upstream. But for now, what I’ve got does the job for me dealing w/ sequences w/ less than 50 elements.