TL;DR: use tar -I pigz or tar -I lbzip2
to compress large tar files much more quickly.
I investigated various ways of compressing a 7GiB tar file.
The built-in --gzip and --bzip2 compression methods in GNU tar
are single-threaded.
If you invoke an external compressor with --use-compress-program,
you can get some huge reductions in compression time,
with slightly worse compression ratios.
You can use pigz as a parallel replacement for gzip
and lbzip2 as a parallel version of bzip2.
Both of them will make heavy use of all the cores in your system,
greatly reducing the real time relative to the user time.
Single-threaded compression timing:
gzip is a lot faster than bzip2:
$ time tar --bzip2 -cf
…continue.
In Part 1, we saw how to walk directory trees,
recursively using fs::read_dir
to construct an in-memory tree of FileNodes.
In Part 2, we’ll implement the rest of the core of the tree command:
printing the directory tree with Box Drawing characters.
Let’s take a look at some output from tree:
.
├── alloc.rs
├── ascii.rs
├── os
│ ├── wasi
│ │ ├── ffi.rs
│ │ ├── mod.rs ➊
│ │ └── net ➋
│ │ └── mod.rs
│ └── windows
│ ├── ffi.rs
…continue.
I’ve been learning Rust lately.
I started by reading several books,
including Rust in Action,
Code Like a Pro in Rust,
and most of Programming Rust.
Now, I’m starting to actually write code.
I read the Command-Line Rust book last month,
which challenged readers to write
our own implementations of the tree command.
I decided to accept the challenge.
At its simplest, tree simply prints a directory tree,
using some of the Unicode Box Drawing characters
to show the hierarchical relationship,
as in the image at right.
I’ve split the code into two phases,
which will be covered in two blog posts.
- Walking the directory tree on disk to build an in-memory tree.
- Pretty-printing the in-memory tree.
While it’s certainly possible to print a …continue.
My display name on Twitter currently looks like @ɢᴇᴏʀɢᴇᴠʀᴇɪʟʟʏ@ᴛᴇᴄʜ.ʟɢʙᴛ,
an attempt to route around Twitter’s apparent censorship of Mastodon information.
I used the FSymbols Generators to produce several variants.
@𝕘𝕖𝕠𝕣𝕘𝕖𝕧𝕣𝕖𝕚𝕝𝕝𝕪@𝕥𝕖𝕔𝕙.𝕝𝕘𝕓𝕥
ʇqƃʅ.ɥɔǝʇ@ʎʅʅᴉǝɹʌǝƃɹoǝƃ@
@𝗀𝖾𝗈𝗋𝗀𝖾𝗏𝗋𝖾𝗂𝗅𝗅𝗒@𝗍𝖾𝖼𝗁.𝗅𝗀𝖻𝗍
@𝘨𝘦𝘰𝘳𝘨𝘦𝘷𝘳𝘦𝘪𝘭𝘭𝘺@𝘵𝘦𝘤𝘩.𝘭𝘨𝘣𝘵
@𝑔𝑒𝑜𝑟𝑔𝑒𝑣𝑟𝑒𝑖𝑙𝑙𝑦@𝑡𝑒𝑐ℎ.𝑙𝑔𝑏𝑡
@𝙜𝙚𝙤𝙧𝙜𝙚𝙫𝙧𝙚𝙞𝙡𝙡𝙮@𝙩𝙚𝙘𝙝.𝙡𝙜𝙗𝙩
@𝚐𝚎𝚘𝚛𝚐𝚎𝚟𝚛𝚎𝚒𝚕𝚕𝚢@𝚝𝚎𝚌𝚑.𝚕𝚐𝚋𝚝
@𝔤𝔢𝔬𝔯𝔤𝔢𝔳𝔯𝔢𝔦𝔩𝔩𝔶@𝔱𝔢𝔠𝔥.𝔩𝔤𝔟𝔱
Many of these variants come from
Unicode Block “Mathematical Alphanumeric Symbols”.
There are a lot more things you can do with Unicode
than just upside-down text.
In Python, if you want to specify a sequence of numbers
from a
up to (but excluding) b
,
you can write range(a, b)
.
This generates the sequence a, a+1, a+2, ..., b-1
.
You start at a
and keep going until the next number would be b
.
In Python 3, range
is lazy
and the values in the sequence do not materialize
until you consume the range.
>>> range(3,12)
range(3, 12)
>>> list(range(3,12))
[3, 4, 5, 6, 7, 8, 9, 10, 11]
Trey Hunner makes the point that
range is a lazy iterable
rather than an iterator.
You can also step by an increment other than one:
range(a, b, s)
.
This generates a, a+s, a+2*s, ..., b-s
(assuming that (b -
…continue.
On 2nd February 1882, in the Dublin suburb of Rathgar,
a son was given unto John and May Joyce.
James Joyce celebrated his 40th birthday in Paris on 2nd February 1922
by receiving the first printed copy of his novel Ulysses.
Parts of it had already been published in literary magazines and
the book was eagerly received by the cognoscenti.
It took more than a decade for Ulysses to be published
in Britain and the United States.
Censors had considered the book obscene,
but the courts established that it had legitimate literary merit.
For decades, Ulysses was poorly received in Ireland.
The book was considered blasphemous and obscene by many.
Worse, Joyce had …continue.
A while back, I had extracted some code out of a large file
into a separate file and made some modifications.
I wanted to check that the differences were minimal.
Let’s say that the extracted code had been between
lines 123 and 456 of large_old_file.
diff -u <(sed -n '123,456p;457q' large_old_file) new_file
What’s happening here?
- sed -n '123,456p' is printing lines 123–456 of large_old_file.
- The 457q tells sed to abandon the file at line 457.
Otherwise, it will keep reading all the way to the end.
- The <(sed ...) is an example of process substitution.
The output of the sed invocation
becomes the first input of the diff command.
A similar example: Diff a Transformed …continue.
40 years ago this month,
I sat down at a computer and wrote a program.
(Or "programme", as I spelled it then.)
It was the first time I had ever used a computer.
Very few people had used computers in 1982,
in Ireland or elsewhere.
What was the program?
No idea.
Just a few lines of AppleSoft Basic.
But it was enough to get me hooked and change my life.
I still get a hit when a little bit of code unlocks in my brain.
It’s quite addictive.
There’s always more to learn and to see.
I wrote more about this in 2012: 30 Years of Programming.
At the beginning of 2021,
prompted by Russell Crowe’s defense of Master and Commander,
I began yet another re-read of the twenty Aubrey-Maturin novels.
Or, as the fandom would have it, another circumnavigation.
It’s probably my fifth or sixth circumnavigation,
since I bought the complete boxed set as a Christmas present to myself
in the early aughts.
I completed the twentieth book, Blue at the Mizzen, yesterday,
and also the few pages of the final, unfinished novel, 21.
(I also read about 120 other books in 2021,
down from a stupendous 200 books in 2020,
but that’s neither here nor there.)
Author: Robert Nystrom
Rating: ★ ★ ★ ★ ★
Publisher: Genever Benning
Copyright: 2021
Pages: 640
Keywords: programming, interpreters
Reading period: 10–28 December, 2021
I’ve read hundreds of technical books over the last 40 years.
Crafting Interpreters is an instant classic,
and far more readable and fun than many of the classics.
Nystrom covers a lot of ground in this book,
building two very different interpreters for Lox,
a small dynamic language of his own design.
He takes us through every line of
jlox, a Java-based tree-walk interpreter,
and of clox, a bytecode virtual machine written in C.
For the first implementation, jlox,
he covers such topics as scanning,
parsing expressions with recursive descent,
evaluating expressions, control flow,
functions …continue.
Previous »
« Next