Some Clojure scripts to digest Hong Kong District Council Election.
Create some utility functions to simplify parsing html table
(def parse-slurp (comp parse slurp))
(defn- read-hdrs [tbl data]
(->> (extract-from data tbl [:x] "tr th" text)
first vals first))
(defn- read-data [url]
(let [d (parse-slurp url)
hdrs (read-hdrs "table" d)]
(->> (extract-from d "table tr" [:x] "td" text)
(map :x)
(remove nil?)
(map #(zipmap hdrs %)))))
Ungroup the result table (i.e. split the merged cell)
(->> (extract-from data "table.contents2 tr" [:x :y] "td[rowspan]" text "td" text)
(remove #(every? nil? (vals %)))
(reduce (fn [[last res] {:keys [x y]}]
(if (nil? x)
[last (conj res (concat last y))]
[x (conj res y)]))
[nil []])
(map #(zipmap hdrs %))
Use the utility functions created in Step 1 to parse the master nomination table
(->> ""
Collect all other nomination data
(->> (extract-from (parse-slurp "")
"table tr td"
[:x] "a" (attr :href))
(map :x)
(remove nil?)
(apply concat)
(filter #(and (string? %) (re-matches #"\.\./pdf/nomination.*html" %)))
(map #(->> % (drop 2) (apply str) (str "")))
(mapcat read-data)
Join the tables created in Step 3 - noms and Step 4 - noms2
($join [["選區代號" "獲提名人士姓名 (姓氏先行)"] ["選區號碼" "姓名"]] noms2 noms)
Final step - join the nomination data with election result and output to Excel
(-> ($join [["選區號碼" "候選人編號"] ["Constituency Code" "Candidate Number"]]
(save-xls "output.xls"))
