Advent of Code 2024 part 3

Dec 16, 2024
Part 1 in Uiua was fairly simple, with dimensions hardcoded in for brevity:
For Part 2, while it was fairly easy to make an image, I struggled a bit with how to iterate through the steps to manually find the tree. So I jumped over to Clojure, where this was straightforward.
Llama 3.2-vision with ASCII art
Having solved the problem, I then decided to play around with ollama and the llama3.2-vision model, seeing if it could find the tree in the ASCII art output. I first write a function to send a prompt to a local ollama server:
(defn ollama [q]
(try
(->
(http/post "http://localhost:11434/api/generate"
{:body (json/encode {:model "llama3.2-vision"
:prompt q
:stream false})})
:body
(json/parse-string true)
:response)
(catch Exception e (do
(print "exception querying ollama: " (.getMessage e))
(pprint (:body (ex-data e)))))))
(comment
(ollama "Is the sky blue? Only say yes or no.") ; "Yes."
)
And a function to step through and prompt, basically the same as the manual version:
(defn find-tree [& args]
(loop [bots (parse-inp (aoc/get-input 2024 14))
w 101
h 103
i 0]
(let [m (with-out-str (print-map w h bots))
p (str "Does this look like a christmas tree? Only say yes or no.\n" m)
r (ollama p)]
(println p)
(println i)
(println r)
(when (st/index-of (st/lower-case r) "no")
(recur (mapv #(step % w h 1) bots ) w h (inc i))))
)
)
It did terrible, often finding false positives and giving poor reasoning (when "Only say yes or no." was omitted from the prompt).
Llama 3.2-vision with image
However, I realizd that llama3.2-vision was really trained on images, not so much on ASCII art, so I really should be rendering to an image file and including that in the prompt. This took me down a deep rabbit hole of Java image classes, building an image with java.awt.image.BufferedImage
then writing it with javax.imageio.ImageIO
, plus of course Base64-encoding it with java.util.Base64
. A great part of Clojure being hosted on the JVM is access to this huge library, but not being so steeped in Java tradition I find it all a bit cryptic compared to doing this stuff in Go.
(defn render-map
"Render map to PNG and write to OutputStream os, and optionally also to file."
[w h bots os scale & [filename]]
(let [img (BufferedImage. (* w scale) (* h scale) BufferedImage/TYPE_INT_RGB)]
(doseq [b bots]
(.setRGB img
(* scale (first (:p b)))
(* scale (second (:p b)))
scale scale
(into-array Integer/TYPE (repeat scale 0x00ff88))
0 0))
(when filename (ImageIO/write img "png" (file filename)))
(ImageIO/write img "png" os)))
(defn render-map-base64
"Render map to png and return base64 encoding. Optionally also write to file."
[w h bots scale & [filename]]
(let [os (ByteArrayOutputStream.)
enc (Base64/getEncoder)]
(render-map w h bots os scale filename)
(.encodeToString enc (.toByteArray os))))
(defn find-tree-image [& args]
(loop [bots (mapv #(step % 101 103 1) (parse-inp (aoc/get-input 2024 14)))
w 101
h 103
i 1]
(let [m (render-map-base64 w h bots 1)
p (str "Does this image contain a christmas tree? Respond yes or no.")
r (ollama p {:images [m]})]
(println p)
(println i)
(println r)
(when (st/index-of r "No")
(recur (mapv #(step % w h 103) bots ) w h (+ i 103))))))
This worked perfectly! The LLM had no problem finding the tree and gave no false positives. It wasn't the fastest, taking about a minute per query on an M3 Pro 18GB, but it was a fun exercise.