Some Clojure scripts to digest Hong Kong District Council Election.
Create some utility functions to simplify parsing html table
(def parse-slurp (comp parse slurp))
(defn- read-hdrs [tbl data]
(->> (extract-from data tbl [:x] "tr th" text)
first vals first))
(defn- read-data [url]
(let [d (parse-slurp url)
hdrs (read-hdrs "table" d)]
(->> (extract-from d "table tr" [:x] "td" text)
(map :x)
(remove nil?)
(map #(zipmap hdrs %)))))
Ungroup the result table (i.e. split the merged cell)
(->> (extract-from data "table.contents2 tr" [:x :y] "td[rowspan]" text "td" text)
(remove #(every? nil? (vals %)))
(reduce (fn [[last res] {:keys [x y]}]
(if (nil? x)
[last (conj res (concat last y))]
[x (conj res y)]))
[nil []])
last
(map #(zipmap hdrs %))
to-dataset)
Use the utility functions created in Step 1 to parse the master nomination table
(->> "http://www.elections.gov.hk/dc2015/pdf/2015_DCE_Valid_Nominations_C.html"
read-data
to-dataset)
Collect all other nomination data
(->> (extract-from (parse-slurp "http://www.elections.gov.hk/dc2015/chi/nominat2.html")
"table tr td"
[:x] "a" (attr :href))
(map :x)
(remove nil?)
(apply concat)
(filter #(and (string? %) (re-matches #"\.\./pdf/nomination.*html" %)))
(map #(->> % (drop 2) (apply str) (str "http://www.elections.gov.hk/dc2015")))
(mapcat read-data)
to-dataset)
Join the tables created in Step 3 - noms and Step 4 - noms2
($join [["選區代號" "獲提名人士姓名 (姓氏先行)"] ["選區號碼" "姓名"]] noms2 noms)
Final step - join the nomination data with election result and output to Excel
(-> ($join [["選區號碼" "候選人編號"] ["Constituency Code" "Candidate Number"]]
mm
results)
(save-xls "output.xls"))
Copyright © 2015 RMCV
Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.