Notes
Making Discogs Data 13% Smaller with Parquet
Recently, I have been working with the Discogs data dumps. Discogs uploads monthly dumps of their database in a gzipped XML format. They release dumps for: artists, labels, masters, and releases. I was curious about converting them to the Parquet file format. Parquet is a binary columnar file format heavily used in data engineering. It allows different compression algorithms per column and nested structures. It is also natively supported by databases such as ClickHouse or DuckDB. I was mostly curious about the size of a parquet file vs a compressed XML file. Would parquet files be smaller than a gzipped XML? If so, by how much? Also, what would be the conversion speed?
0b5vr GLSL Techno Live Set - "0mix"
A 7-minute techno live set created entirely in GLSL shaders that fits in just 64KB. Yes, 64kb. This WebGL intro by 0b5vr was submitted to the Revision 2023 demoscene competition. Procedural visuals meets algorave meets extreme compression. My mind is blown.
Small-scale data engineering with Go and PostgreSQL: a few lessons learned
I just released dgtools, a command line utility to work with the Discogs data dumps. This little endeavor was supposed to be a quick side quest, but it transformed into a rabbit hole.
Discogs is the go-to service for record collectors. They might have one of the biggest databases for physical music releases. On a monthly basis, they release a compressed XML of a subset of their database under a CC0 license. Tools already exist to import them into a PostgreSQL database, but I wanted the flexibility of a custom-built solution. I started building something in a Ruby on Rails app but quickly diverged to Go as I didn't want to pay the ActiveRecord performance cost.
OpenSimplex noise
OpenSimplex noise is a gradient noise function designed to avoid patent issues with simplex noise while fixing the directional artifacts in Perlin noise. It uses a different grid structure with stretched hypercubic honeycombs and larger kernel sizes, making it smoother but slower than simplex noise.
-
OpenSimplex noise - Wikipedia
en.wikipedia.org
The Art of Rosa Menkman
Late to the party (as I often can be), I recently discovered Rosa Menkman’s work while at NXT Museum for the “Still Processing” exhibition.
Turns out, Rosa Menkman has quite the background in Glitch Art, having worked on theorizing it and having produced artworks bought in the Stedelijk Museum collection. Usually not a big fan of video essays, I ended up being very interested in two of her productions. The first one about racial and sexist biases in analog and digital image processing. The second one about the changing nature of rainbows due to atmospheric conditions (pollution) or changes in our analog wetware (eyes).