CSS selectors in go

I'm still enamored with parsers - parsers are just neat. My most recent endavour in that direction has been a css selector library - because apparently I always wanted one and just didn't know it until now 1.

Just like with my websocket library for clojure this one is mostly the result of reading through the (less boring and convuluted than expected) specs (e.g. this one) - but i definitely also got some inspiration from Andy Balholm's cascadia and Eric Chiang's css libraries - especially once it came to testing and benchmarking because I'm lazy and free test cases are free test cases.

What seperates my library from the existing libraries is that it is much more extensible - PseudoClasses, PseudoFunctions, Matchers & Combinators are looked up in maps which are exposed as public variables. Also it's shorter (~ -20% LOC) and more modular (lex and parse are separate steps) and thus hopefully easier to grok.

Benchmarking

Benchmarking showed that quite a few of my naive approaches turned out to be less than optimal - e.g. using strings.Fields to split the attribute value for the ~= attribute selector (commit).

I already had a benchmark test at that point - which just runs all the selectors in the <style></style> block of benchmark.html against the benchmark.html file and validates that the selection output is as expected, i.e. as defined in benchmark.json 2.

From there it's as simple as

go test -run=none -bench=Niklas -memprofile=profile # also try -cpuprofile=profile
go tool pprof --web css.test profile

and there we go - a nice graphical representation where all our memory goes! the tooling for go is just so nice!

  • before (commit) (before.svg)

    your browser does not support svg
  • after (commit) (after.svg)

    your browser does not support svg

As you can see there was quite some memory allocation by strings.Fields in the ~= attribute selector - switching to the cascadia string indexing approach made for quite a performance improvement. That and a few further inspections of the memprofile and cpuprofile in the end helped make my css library just as fast as cascadia :).

If you want to read more about profiling check out this blog post on the official go blog.

Footnotes


1

No, but seriously - I always end up wanting to scrape something and I wasn't happy with the semantics of goquery - and then I somehow got sidetracked into writing a css selector library because writing parsers is neat. I'm also still working on the library i want to replace goquery with - soup - but that's going more slowly. Exploring gtfs feeds for berlin is more interesting right now and the sun is out so there's less time to code.

2

I plan on writing a blog post on that kind of fixture based testing. For testing pure functions this approach has been pure bliss - more readable and easier to maintain than handwritten tests as the fixtures are automatically generated using the same workflow that is actually run in the tests.

To make it short, the json just describes behavour at one point in time and any changes in behaviour are easily compared and possibly commited using git diff.