F
Feed Atlas
OPML directory + server-side RSS reader

xania.org

SiteRSSBlogs
Back

Latest posts

  • 2025 in Review
    Dec 31, 2025

    Written by me, proof-read by an LLM. Details at end. 2025 has been quite a year for me. The big ticket things for me were having the majority of the year on a non-compete, a new job, and some videos and conference talks. It was a bumper year for my public talks, which included: I also appeared in a number of :Computerphile videos On the front, I finally solved a three-year-old problem with — ou

  • Thank you
    Dec 25, 2025

    Written by me, proof-read by an LLM. Details at end. It's the 25th! Whatever you celebrate this time of year, I wish you the very best and hope you are having a lovely day. For me, this is a family time: I'm not at all religious but was brought up to celebrate Christmas. So, today we'll be cooking a massive roast dinner and enjoying family time.1 This series was an idea I had around this time las

  • When compilers surprise you
    Dec 24, 2025

    Written by me, proof-read by an LLM. Details at end. Every now and then a compiler will surprise me with a really smart trick. When I first saw this optimisation I could hardly believe it. I was looking at loop optimisation, and wrote something like this simple function that sums all the numbers up to a given value: So far so decent: GCC has done some preliminary checks, then fallen into a loop t

  • Switching it up a bit
    Dec 23, 2025

    Written by me, proof-read by an LLM. Details at end. The standard wisdom is that switch statements compile to jump tables. And they do - when the compiler can't find something cleverer to do instead. Let's start with a really simple example: Here the compiler has spotted the relationship between and the return value, and rewritten the code as: - pretty neat. No jump table, just maths!xif (x < 5

  • Clever memory tricks
    Dec 22, 2025

    Written by me, proof-read by an LLM. Details at end. After exploring SIMD vectorisation over the last of , let's shift gears to look at another class of compiler cleverness: memory access patterns. String comparisons seem straightforward enough - check the length, compare the bytes, done. But watch what Clang does when comparing against compile-time constants, and you'll see some rather clever t

  • When SIMD Fails: Floating Point Associativity
    Dec 21, 2025

    Written by me, proof-read by an LLM. Details at end. we saw SIMD work beautifully with integers. But floating point has a surprise in store. Let's try summing an array:Yesterday1 Looking at the core loop, the compiler has pulled off a clever trick: The compiler is using a vectorised add instruction which treats the as 8 separate integers, adding them up individually to the corresponding element

  • SIMD City: Auto-vectorisation
    Dec 20, 2025

    Written by me, proof-read by an LLM. Details at end. It's time to look at one of the most sophisticated optimisations compilers can do: autovectorisation. Most "big data" style problems boil down to "do this maths to huge arrays", and the limiting factor isn't the maths itself, but the feeding of instructions to the CPU, along with the data it needs. To help with this problem, CPU designers came

  • Chasing your tail
    Dec 19, 2025

    Written by me, proof-read by an LLM. Details at end. Inlining is fantastic, as we've . There's a place it surely can't help though: recursion! If we call our own function, then surely we can't inline...seenrecently Let's see what the compiler does with the classic recursive "greatest common divisor" routine - surely it can't avoid calling itself? And yet: The compiler is able to avoid the recurs

  • Partial inlining
    Dec 18, 2025

    Written by me, proof-read by an LLM. Details at end. We've learned how important inlining is to optimisation, but also that it might sometimes cause code bloat. Inlining doesn't have to be all-or-nothing! Let's look at a simple function that has a fast path and slow path; and then see how the compiler handles it.1 In this example we have some function that has a really trivial fast case for numb

  • Inlining - the ultimate optimisation
    Dec 17, 2025

    Written by me, proof-read by an LLM. Details at end. Sixteen days in, and I've been dancing around what many consider the fundamental compiler optimisation: inlining. Not because it's complicated - quite the opposite! - but because inlining is less interesting for what it does (copy-paste code), and more interesting for what it enables. Initially inlining was all about avoiding the expense of the

  • Calling all arguments
    Dec 16, 2025

    Written by me, proof-read by an LLM. Details at end. Today we're looking at calling conventions - which aren't purely optimisation related but are important to understand. The calling convention is part of the ABI (Application Binary Interface), and varies from architecture to architecture and even OS to OS. Today I'll concentrate on the System V ABI for x86 on Linux, as (to me) it's the most san

  • Aliasing
    Dec 15, 2025

    Written by me, proof-read by an LLM. Details at end. we ended on a bit of a downer: aliasing stopped optimisations dead in their tracks. I know this is supposed to be the , not the ! Knowing why your compiler can't optimise is just as important as knowing all the clever tricks it can pull off.YesterdayAdvent of Compiler OptimisationsAdvent of Compiler Giving Up Let's take a simple example of a c

  • When LICM fails us
    Dec 14, 2025

    Written by me, proof-read by an LLM. Details at end. ended with the compiler pulling invariants like and out of our loop - clean assembly, great performance. Job done, right?Yesterday's LICM postsize()get_range Not quite. Let's see how that optimisation can disappear. Let's say you had a , and wanted to write a function to return if there was an exclamation mark or not:const char *1 Here we're

  • Loop-Invariant Code Motion
    Dec 13, 2025

    Written by me, proof-read by an LLM. Details at end. Look back at - there's an optimisation I completely glossed over. Let me show you what I mean:our simple loop example On every loop iteration we are calling to compare the index value, and to check if the index has reached the end of the vector. However, looking in the assembly, the compiler has pulled the size calculation out of the loop ent

  • Unswitching loops for fun and profit
    Dec 12, 2025

    Written by me, proof-read by an LLM. Details at end. Sometimes the compiler decides the best way to optimise your loop is to... write it twice. Sounds counterintuitive? Let's change our to optionally return a sum-of-squares:sumexample from before1 At the compiler turns the ternary into: - using a multiply and add () instruction to do the multiply and add, and conditionally picking either or

  • Pop goes the...population count?
    Dec 11, 2025

    Written by me, proof-read by an LLM. Details at end. Who among us hasn't looked at a number and wondered, "How many one bits are in there?" No? Just me then? Actually, this "population count" operation can be pretty useful in some cases like data compression algorithms, , and . How might one write some simple C to return the number of one bits in an unsigned 64 bit value?cryptography, chess, erro

  • Unrolling loops
    Dec 10, 2025

    Written by me, proof-read by an LLM. Details at end. A common theme for helping the compiler optimise is to give it as much information as possible. Using the , targeting the right CPU model, keeping , and for today's topic: telling it how many loop iterations there are going to be ahead of time.right signedness of typesloop iterations independent Taking the range-based sum example , but using a

  • Induction variables and loops
    Dec 09, 2025

    Written by me, proof-read by an LLM. Details at end. Loop optimisations often surprise us. What looks expensive might be fast, and what looks clever might be slow. we saw how the compiler canonicalises loops so it (usually) doesn't matter how you write them, they'll come out the same. What happens if we do something a little more expensive inside the loop?Yesterday Let's take a look at something

  • Going loopy
    Dec 08, 2025

    Written by me, proof-read by an LLM. Details at end. Which loop style is "best"? This very question led to the creation of Compiler Explorer! In 2011 I was arguing with my team about whether we could switch all our loops from ordinal or iterator-style to the "new" range-for. I wrote a small to iteratively show the compiler output as I edited code in , and the seed of was born.1shell scriptCompi

  • Multiplying our way out of division
    Dec 07, 2025

    Written by me, proof-read by an LLM. Details at end. I occasionally give presentations to undergraduates, and one of my favourites is taking the students on a journey of optimising a "binary to decimal" routine. There are a number of tricks, which I won't go in to here, but the opening question I have is "how do you even turn a number into its ASCII representation?"1 If you've never stopped to th