Here’s a list of resources I enjoyed, with a few comments.
Papers
- Hoard:
A Scalable Memory Allocator for Multithreaded Applications: Hoard is
a parallel memory allocator that avoids fragmentation and false sharing
by hoarding memory from the single-threaded system memory allocators and
giving it out in parallel for better performance.
- The
Emperor’s Old Clothes: Hoare, of quicksort and ALGOL fame, explains
why simplicitly is a virtue, and how a committee can destroy a language.
He mentions how bounds checking was implemented in ALGOL, and
composition allowed it to grow to become a simple yet powerful
language.
- Growing
a Language: A paper about how to build a language to grow – a
language should be flexible, have a welcoming community, have generics
and operator overloading, and worse is better.
- Three
Approaches to the Quantitative Definition of Information: This paper
formulates what’s now called Kolmogorov complexity, which states that
the entropy of an object is determined by the smallest possible
programming language that can express said information. This explains
compression, signal processing, and many other things in CS in just 5
pages.
- Technology and
Courage: A paper by technology pioneer, Ivan Sutherland, who was on
the team at MIT who built the first tablet. This paper goes into his
high level thoughts about business, software, and life.
- Reflections
on Trusting Trust: A classic paper on how software isn’t really
trustable unless you examine both the tools to build it and the code
itself. This paper was surprisngly prescient, given the security
problems we have now with software.
- Hazard
Pointers: Safe Memory Reclamation for Lock-Free Objects: It was
thought for a long time that garbage collection was a prerequisite to
fast concurrent data structures, due to the lack of efficient
bookkeeping for when to free parts of a data structure correctly. This
paper discusses hazard pointers, a way to mark parts of a data structure
as freeable even without garbage collection.
- A
Lazy Concurrent List-Based Set Algorithm: This paper details
concurrent Skip Lists, which was implemented in the java collections
library in Java 1.6.
- Crash
Only Software: A paper on explaining why crash-only software is
good, and a classification of such software.
- Better bitmap
performance with Roaring bitmaps: Roaring bitmaps, a faster data
structure and more storage efficient data structure for bitmaps, by
using both run-length encoding and array packing.
- RRB-Trees: Efficient
Immutable Vectors: RRB-Trees, the Relaxed Radix Balanced Tree, is a
purely functional data structure that is an improved version of the
HAMT, the Hash Array Mapped Trie, which is the vector data structure in
Scala and Clojure.
- Time
Bounds for Selection: A paper that explains the
PICK
selection algorithm, which can select the ith smallest of n numbers in
O(n) time. This algorithm is more commonly known as quickselect.
- Quicksort:
A paper that explains the classic quicksort algorithm, the first O(n log
n) sorting algorithm that took sublinear memory.
- End
to End Arguments in System Design: A paper on a design principle,
the “End to End” argument, that explains why having functionality at the
lower levels of a system may be redundant or useless compared to putting
them at a higher level of a system.
- Pattern Defeating
Quicksort: A sorting algorithm for the Dutch National flag problem,
which can solve it in O(nk) time. This is the current unstable sort
algorithm in rust, with an ~5-10% better performance over the current
rust stable sort, timsort.
Links
Courses
Books
- ARM
System Developer’s Guide: Details the ARM ISA. A bit dated at this
point, but covers the fundamentals.
- Algorithms
and Data Structures for Massive Datasets: Algorithms and data
structures that scale to meet the demands of large datasets.
- Algorithms for Modern
Hardware: A great resource for learning about performance
engineering.
- Antifragile:
A sequel to the Black Swan, focusing on things that get stronger when
put under stress, and how to build systems that do the same.
- Behind
Deep Blue: A book about the team that built deep blue, the first
computer to defeat the world chess champion.
- Computational
Geometry: a book that looks at algorithms geometrically. There’s
sections on calculating nearest neighbors, object collision, mapping
algorithms, dimension reduction, graphs, and even querying a database. I
didn’t know computational geometry had so many applications!
- Crafting
Interpreters: Learn about compilers by implementing two interpreters
for a full featured language named lox.
- Database Internals: A Deep
Dive: A book that teaches databases in two parts: storage engines,
and then as distributed databases.
- Designing
Data Driven Databases: The best book for learning about distributed
systems.
- Game Programming
Patterns: A book about Design Patterns for game development. The
examples are in C++, and focused on games, but can be applied to many
domains outside of it.
- High
Performance MySQL: Learn about how to use and deploy MySQL, while
squeezing as much performance as possible while dodging pitfalls.
- Irrational
Exuberance: A book about economic bubbles and regression to the
mean.
- Kill
it with Fire: explains how to maintain and extend legacy systems,
with notes on leading teams and driving change.
- Learn you a Haskell for Great
Good!: An introduction to haskell, a statically and strongly typed
functional programming language.
- Learn you some Erlang for
Great Good!: An introductory book to Erlang/OTP, a functional
programming language with lots of libraries suited for web
programming.
- Meditations:
A classic book by Marcus Aurelius on the stoic philosophy, with gems
that still ring true today.
- Operating Systems,
Three Easy Pieces: A book that tackles teaching operating systems in
three pieces: virtualization, concurrency, and persistence.
- Probabilistic
Data Structures and Algorithms for Big Data Applications: A book on
probabilistic data structures that are useful for big data. The six
chapters cover many data structures for each problem.
- Programming
Pearls: Algorithms and data structures that Jon Bentely, the creator
of kd-trees explains in succint prose. There’s a lot of great exercises
and the author’s storytelling makes the book an entertaining and fast
read.
- Proofs:
A long form Mathematics Textbook: An approachable book on proofs
with lots of problems and stories. Proofs and being able to read
mathematical notation are much more useful than I would’ve thought.
- Purely
Functional Data Structures: A book on data structures for functional
languages, like SML.
- SQL Performance
Explained: The book that made indexes click for me, with
accompanying SQL code in many dialects of SQL, like SQL Server, Oracle,
MySQL, and Postgres.
- Seven
Concurrency Models in Seven Weeks: concurrency models in many
different languages, including java, clojure, and erlang, and how they
differ, with quick explanations on the pros and cons of each.
- Seven
Databases in Seven Weeks: a whirlwind tour of some common NoSQL
databases, like Redis, Neo4j, DynamoDB, and Hbase.
- Systems
Performance: Enterprise and the Cloud: Learn how to profile and
improve performance and observability of systems. A must have for
anybody learning in a big tech company.
- The
Art of Multiprocessor Programming: A book that goes into great depth
about concurrent programming, with thorough exercises.
- The
Black Swan: An options trader explains why outlier events are
actually quite common and hard to hedge against. I picked this up for
the nod to Popper, but stayed for the thoughts on the market.
- The
Innovator’s Dilemma: How companies adjust (and don’t adjust) to
change, and how incumbents lose market share to new competitors.
- The Linux Programming
Interface: a book on the linux OS, glibc, and system calls. Waiting
for a second edition soemday to cover io_uring, cgroups, and other
things.
- The
Little Book of Semaphores: A gentle introduction to mutual exclusion
by introducing semaphores, the basis of almost all concurrent
algorithms.
Utilities
- fish: A shell with completions
and sane scripting
- pagefind:
Static search that works offline without needing to run a local
server
- pandoc: A universal document
converter
- pdf2txt: Reading
pdfs as plain text
- ripgrep: Grep
but faster
- scalene: To
diagnose python performance problems
- termux: Run a unix terminal on
your android phone
- panamax: Mirror
crates.io locally