Notes
Chevy Ray on Creating Hundreds of Fonts Using Rust
Chevy Ray goes into a lot of details on building her own tool to generate 175 (!!) pixel fonts. The post walks through the technical implementation including converting pixel clusters into TrueType contours, automatic kerning calculation, and deploying everything to itch.io with command-line scripts. Very cool read.
-
Chevy Ray | How I Created 175 Fonts Using Rust
chevyray.dev
Making Discogs Data 13% Smaller with Parquet
Recently, I have been working with the Discogs data dumps. Discogs uploads monthly dumps of their database in a gzipped XML format. They release dumps for: artists, labels, masters, and releases. I was curious about converting them to the Parquet file format. Parquet is a binary columnar file format heavily used in data engineering. It allows different compression algorithms per column and nested structures. It is also natively supported by databases such as ClickHouse or DuckDB. I was mostly curious about the size of a parquet file vs a compressed XML file. Would parquet files be smaller than a gzipped XML? If so, by how much? Also, what would be the conversion speed?
Implementation
SELECT path_in_schema, type, encodings, compression, (total_compressed_size / 1024) AS compressed_size, (total_uncompressed_size / 1024) AS uncompressed_size FROM parquet_metadata('file.parquet');
will show the respective size of columns.
Results
Conversion speed
| Type | Records | Time | Records / Second | |----------|------------|--------|------------------| | Labels | 2,274,143 | 12.48s | 182,222 | | Artists | 9,174,834 | 63.44s | 144,713 | | Masters | 2,459,324 | 69.77s | 35,249 | | Releases | 18,412,655 | 34m14s | 8,964 |
File size
| Type | .xml.gz | Parquet | Difference | |----------|---------|---------|------------| | Labels | 83M | 72M | -13.2% | | Artists | 441M | 397M | -9.9% | | Masters | 577M | 537M | -6.7% | | Releases | 10.74G | 10.14G | -5.5% |
0b5vr GLSL Techno Live Set - "0mix"
A 7-minute techno live set created entirely in GLSL shaders that fits in just 64KB. Yes, 64kb. This WebGL intro by 0b5vr was submitted to the Revision 2023 demoscene competition. Procedural visuals meets algorave meets extreme compression. My mind is blown.
Small-scale data engineering with Go and PostgreSQL: a few lessons learned
I just released dgtools, a command line utility to work with the Discogs data dumps. This little endeavor was supposed to be a quick side quest, but it transformed into a rabbit hole.
Discogs is the go-to service for record collectors. They might have one of the biggest databases for physical music releases. On a monthly basis, they release a compressed XML of a subset of their database under a CC0 license. Tools already exist to import them into a PostgreSQL database, but I wanted the flexibility of a custom-built solution. I started building something in a Ruby on Rails app but quickly diverged to Go as I didn't want to pay the ActiveRecord performance cost.
I have quite a bit of experience with Go. I started learning it in 2012, and some code of mine still runs on thousands of production servers worldwide. I haven't really touched the language for a few years though. So it was interesting to write a small project and learn a few things.
Go's XML parser is great to work with. With a mix of streaming and parsing into tagged structs, it is very convenient and performant. I wasn't expecting it would be SO easy to reliably parse XML.
Go's ecosystem is rich. I thought pq was the go-to choice when working with PostgreSQL and Go, but the development has stalled. I hit a bug where pq is unable to bulk load data into jsonb columns. pgx is now the right choice for that kind of workload. Similarly, there are tons of good libraries when it comes to structuring a CLI application. And don't get me started on the cool TUI libs!
I forgot how amazing Go's embed feature is. Embedding migration files into a binary is so easy and such a useful feature. The ability to distribute a single binary is my favorite feature of Go.
While Go correctly produces and consumes multi-line CSV, PostgreSQL cannot import them. Unless I missed something, having new lines in a CSV means that PostgreSQL cannot process them correctly with a COPY FROM 'filename' query.
Go is a framework as much as it is a language. The documentation, standard library, and bundled tools make it an amazing built-in developer experience for small-scale projects like the one I wrote.
OpenSimplex noise
-
OpenSimplex noise - Wikipedia
en.wikipedia.org
- OpenSimplex noise
- procedural generation
- computer graphics
- gradient noise
- simplex noise
- Perlin noise