We had a performance regression in a test suite recently
when the median test time jumped by two minutes.
We tracked it down to this (simplified) code fragment:
task_inclusions = [ some_collection_of_tasks() ]
invalid_tasks = [t.task_id() for t in airflow_tasks
if t.task_id() not in task_inclusions]
This looks fairly innocuous—and it was—until the size of the result returned from some_collection_of_tasks()
jumped from a few hundred to a few thousand.
The in comparison operator conveniently works
with all of Python's standard sequences and collections,
but its efficiency varies.
For a list and other sequences,
in must search …continue.
Some people, when confronted with a problem, think
“I know, I'll use regular expressions.”
Now they have two problems.
— Jaime Zawinksi
A Twitter thread about very long regexes
reminded me of the longest regex that I ever ran afoul of,
a particularly horrible multilevel mess
that had worked acceptably on the 32-bit .NET CLR,
but brought the 64-bit CLR to its knees.
Whenever I ran our ASP.NET web application [on Win64],
it would go berserk, eat up all 4GB of my physical RAM,
push the working set of IIS's w3wp.exe to 12GB,
and max out one of my 4 cores!
The only way to maintain any sanity was to run iisreset
every 20 minutes to …continue.
I uploaded some presentations to SpeakerDeck.com tonight.
Here are various presentations of mine at SpeakerDeck.com and SlideShare.net:
LKRhash is a hashtable that scales to multiple processors and to millions of items.
LKRhash was invented at Microsoft in 1997
by Per-Åke (Paul) Larson of Microsoft Research
and Murali Krishnan and George Reilly of Internet Information Services.
LKRhash has been used in many Microsoft products.
The techniques that give LKRhash its performance
include linear hashing, cache-friendly data structures, and fine-grained locking.
If Microsoft had had 20% time,
LKRhash would have been my main 20% project.
I put a lot of …continue.
I was investigating the performance of a web app today,
and I spent some time looking at the Flame Chart visualization
in Chrome's profiling tools, which helped identify some problems.
Flame Charts are like Brendan Gregg's Flame Graphs,
except that the charts are sorted by time,
while the graphs are sorted alphabetically.
Quoting from Gregg's recent ACM Queue article:
A flame graph has the following characteristics:
- A stack trace is represented as a column of boxes,
where each box represents a function (a stack frame).
- The y-axis shows the stack depth,
ordered from root at the bottom to leaf at the top.
The top box shows the function
that was on-CPU when the …continue.
Despite being a bona fide performance expert—I spent a couple of years as the Performance Lead
for Microsoft's IIS web server product about 15 years ago—I still forget to measure rather than assume.
I wrote some code today that imported nearly 300,000 nodes into a graph
from a 500MB XML file.
The code was not particularly fast and I assumed that it was the XML parser.
I had been using the built-in streaming parser, cElementTree iterparse.
I assumed that using the lmxl iterparse would make the code faster.
It didn't.
Then I had the bright idea of temporarily disabling the per-node processing,
which left only the XML parsing.
Instead of …continue.
The canonical for-loop in C and C++ is written thus,
counting up i = 0, i = 1, ..., i = N-1:
for (int i=0; i < N; i++) {
// loop body
}
(In C, you have to declare int i before the for-loop.)
Let's unpack that for-loop into an equivalent while-loop:
int i = 0;
while (i < N) {
// loop body
i = i + 1
}
In other words, we initialize i to zero.
Then, before every execution of either loop,
we check i < N.
If i is still within bounds,
we execute the loop body.
Then we postincrement …continue.