George V. Reilly

Compressing Tar Files in Parallel

TL;DR: use tar -I pigz or tar -I lbzip2 to compress large tar files much more quickly.

I in­ves­ti­gat­ed various ways of com­press­ing a 7GiB tar file.

The built-in --gzip and --bzip2 com­pres­sion methods in GNU tar are single-threaded. If you invoke an external compressor with --use-compress-program, you can get some huge reductions in com­pres­sion time, with slightly worse com­pres­sion ratios.

You can use pigz as a parallel re­place­ment for gzip and lbzip2 as a parallel version of bzip2. Both of them will make heavy use of all the cores in your system, greatly reducing the real time relative to the user time.

Single-threaded com­pres­sion timing: gzip is a lot faster than bzip2:

$ time tar --bzip2 -cf 
continue.

Implementing the Tree command in Rust, part 2: Printing Trees

In Part 1, we saw how to walk directory trees, re­cur­sive­ly using fs::read_dir to construct an in-memory tree of FileNodes. In Part 2, we’ll implement the rest of the core of the tree command: printing the directory tree with Box Drawing characters.

Let’s take a look at some output from tree:

.
├── alloc.rs
├── ascii.rs
├── os
│   ├── wasi
│   │   ├── ffi.rs
│   │   ├── mod.rs          ➊
│   │   └── net             ➋
│   │       └── mod.rs
│   └── windows
│       ├── ffi.rs    
continue.

Implementing the Tree command in Rust, part 1: Walking Directories

I’ve been learning Rust lately. I started by reading several books, including Rust in Action, Code Like a Pro in Rust, and most of Pro­gram­ming Rust. Now, I’m starting to actually write code.

I read the Command-Line Rust book last month, which challenged readers to write our own im­ple­men­ta­tions of the tree command.

I decided to accept the challenge.

At its simplest, tree simply prints a directory tree, using some of the Unicode Box Drawing characters to show the hi­er­ar­chi­cal re­la­tion­ship, as in the image at right.

I’ve split the code into two phases, which will be covered in two blog posts.

  1. Walking the directory tree on disk to build an in-memory tree.
  2. Pretty-printing the in-memory tree.

While it’s certainly possible to print a continue.

fsymbols for Unicode weirdness

My display name on Twitter currently looks like @ɢᴇᴏʀɢᴇᴠʀᴇɪʟʟʏ@ᴛᴇᴄʜ.ʟɢʙᴛ, an attempt to route around Twitter’s apparent censorship of Mastodon in­for­ma­tion.

I used the FSymbols Generators to produce several variants.

@𝕘𝕖𝕠𝕣𝕘𝕖𝕧𝕣𝕖𝕚𝕝𝕝𝕪@𝕥𝕖𝕔𝕙.𝕝𝕘𝕓𝕥
ʇqƃʅ.ɥɔǝʇ@ʎʅʅᴉǝɹʌǝƃɹoǝƃ@
@𝗀𝖾𝗈𝗋𝗀𝖾𝗏𝗋𝖾𝗂𝗅𝗅𝗒@𝗍𝖾𝖼𝗁.𝗅𝗀𝖻𝗍
@𝘨𝘦𝘰𝘳𝘨𝘦𝘷𝘳𝘦𝘪𝘭𝘭𝘺@𝘵𝘦𝘤𝘩.𝘭𝘨𝘣𝘵
@𝑔𝑒𝑜𝑟𝑔𝑒𝑣𝑟𝑒𝑖𝑙𝑙𝑦@𝑡𝑒𝑐ℎ.𝑙𝑔𝑏𝑡
@𝙜𝙚𝙤𝙧𝙜𝙚𝙫𝙧𝙚𝙞𝙡𝙡𝙮@𝙩𝙚𝙘𝙝.𝙡𝙜𝙗𝙩
@𝚐𝚎𝚘𝚛𝚐𝚎𝚟𝚛𝚎𝚒𝚕𝚕𝚢@𝚝𝚎𝚌𝚑.𝚕𝚐𝚋𝚝
@𝔤𝔢𝔬𝔯𝔤𝔢𝔳𝔯𝔢𝔦𝔩𝔩𝔶@𝔱𝔢𝔠𝔥.𝔩𝔤𝔟𝔱

Many of these variants come from Unicode Block “Math­e­mat­i­cal Al­phanu­mer­ic Symbols”.

There are a lot more things you can do with Unicode than just upside-down text.

Backwards Ranges in Python

In Python, if you want to specify a sequence of numbers from a up to (but excluding) b, you can write range(a, b). This generates the sequence a, a+1, a+2, ..., b-1. You start at a and keep going until the next number would be b.

In Python 3, range is lazy and the values in the sequence do not ma­te­ri­al­ize until you consume the range.

>>> range(3,12)
range(3, 12)
>>> list(range(3,12))
[3, 4, 5, 6, 7, 8, 9, 10, 11]

Trey Hunner makes the point that range is a lazy iterable rather than an iterator.

You can also step by an increment other than one: range(a, b, s). This generates a, a+s, a+2*s, ..., b-s (assuming that (b - continue.

Ulysses at 100

On 2nd February 1882, in the Dublin suburb of Rathgar, a son was given unto John and May Joyce. James Joyce celebrated his 40th birthday in Paris on 2nd February 1922 by receiving the first printed copy of his novel Ulysses. Parts of it had already been published in literary magazines and the book was eagerly received by the cognoscen­ti. It took more than a decade for Ulysses to be published in Britain and the United States. Censors had considered the book obscene, but the courts es­tab­lished that it had legitimate literary merit.

For decades, Ulysses was poorly received in Ireland. The book was considered blas­phe­mous and obscene by many. Worse, Joyce had continue.

Diffing a fragment of a file

A while back, I had extracted some code out of a large file into a separate file and made some mod­i­fi­ca­tions. I wanted to check that the dif­fer­ences were minimal. Let’s say that the extracted code had been between lines 123 and 456 of large_old_­file.

diff -u <(sed -n '123,456p;457q' large_old_file) new_file

What’s happening here?

A similar example: Diff a Trans­formed continue.

40 Years of Programming

40 years ago this month, I sat down at a computer and wrote a program. (Or "programme", as I spelled it then.) It was the first time I had ever used a computer. Very few people had used computers in 1982, in Ireland or elsewhere.

What was the program? No idea. Just a few lines of AppleSoft Basic. But it was enough to get me hooked and change my life.

I still get a hit when a little bit of code unlocks in my brain. It’s quite addictive. There’s always more to learn and to see.

I wrote more about this in 2012: 30 Years of Pro­gram­ming.

On Circumnavigating the Aubreyiad Again

At the beginning of 2021, prompted by Russell Crowe’s defense of Master and Commander, I began yet another re-read of the twenty Aubrey-Maturin novels. Or, as the fandom would have it, another cir­cum­nav­i­ga­tion. It’s probably my fifth or sixth cir­cum­nav­i­ga­tion, since I bought the complete boxed set as a Christmas present to myself in the early aughts.

I completed the twentieth book, Blue at the Mizzen, yesterday, and also the few pages of the final, unfinished novel, 21. (I also read about 120 other books in 2021, down from a stupendous 200 books in 2020, but that’s neither here nor there.)

I think I'm due for another re-read of Patrick O'Brian's Aubrey/Maturin novels (all continue.

Review: Crafting Interpreters

Author: Robert Nystrom
Rating: ★ ★ ★ ★ ★
Publisher: Genever Benning
Copyright: 2021
Pages: 640
Keywords: pro­gram­ming, in­ter­preters
Reading period: 10–28 December, 2021

I’ve read hundreds of technical books over the last 40 years. Crafting In­ter­preters is an instant classic, and far more readable and fun than many of the classics.

Nystrom covers a lot of ground in this book, building two very different in­ter­preters for Lox, a small dynamic language of his own design. He takes us through every line of jlox, a Java-based tree-walk in­ter­preter, and of clox, a bytecode virtual machine written in C.

For the first im­ple­men­ta­tion, jlox, he covers such topics as scanning, parsing ex­pres­sions with recursive descent, evaluating ex­pres­sions, control flow, functions continue.

Previous » « Next