Here are 20 random technology-oriented Wikipedia links I recently collected after re-organizing troves of bookmarked links
accumulated over the past few years. These articles peek into the wide variety of things to learn about that exist in Computer Science.
ABL. Always. Be. Learning. Curiosity and well organized browser bookmarks are your friend.
I support Wikipedia with a donation nearly every year. It's an amazing resource to learn about everything and I'm very grateful for it.
Thank you for existing, Wikipedia. It's is a great jumping off point to learn about something I don't understand, which is much of this list here.
Truly enjoying this Intro to Database Systems course from Carnegie Mellon University. Some really great breakdowns of common join algorithms in this lecture. Here are my notes.
Lecture 11- Join Algorithms(CMU Databases Systems / Fall 2019)
Prof. Andy Pavlo, Carnegie Mellon Database Group
screenshot from lecture
Table Positioning for a Join
"In general, your smaller table should be the "left" table when joining two tables."... Professor demonstrates better performance by making the smaller table the "outer" table in a join.
Block Nested Loop Join [mysql example]
- "The brute force approach"
- If you have enough memory to hold a large table, a good option for joining.
- Always pick the smaller table as the outer table.
- Buffer as much of your outer table in memory as possible to reduce redundant I/O.
- Loop over the inner table or use an index.
Index Nested Loop Join [CS Course definition]
If indexes are available, or you could create an index to use for a join.
Sort-Merge Join [wikipedia]
Useful if one or both tables are sorted on a join key. Maximize sequential I/O.
screenshot from lecture
Hash Join
Best performance. For large datasets.
- Phase #1 Build (Hash Table)
- Phase #2 Probe
Use a Bloom Filter set operations for probe phase optimization.
- insert a key
- lookup a key
Additional Reading on Bloom Filters
Let's implement a Bloom Filter
Bloom Filters Debunked
Grace Hash Join [wikipedia]
- "Do hash joins when things don't fit in memory."
- Use a hash table for each table. Break the tables into buckets then do a nested loop join on each bucket. If the buckets do not fit in memory, use recursive partitioning. Then everything fits in memory for the join.
"Split outer relation into partitions based on the hash key."
Prof. Andy Pavlo on Hash Join algorithm
- Hashing is almost always better than sorting for operator execution.
"No join algorithm works well in all scenarios."
-Prof. Andy Pavlo
Algorithm: a process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer.
I recently read "Homo Deus", a book by Yuval Noah Harari. It explores the idea that humans may simply be algorithmic decision making systems. AKA self-aware, self-learning algorithms. There are many parallels between a human and a computer algorithm.
The author says there are organic (human) and non-organic (machine) algorithms. Non-organic algorithms will someday have far more capability than organic algorithms. In some cases, such as diagnosing medical conditions, they already do.
He also recaps the history of humanity, which was a trend towards Humanism, the misguided notion that humans can know themselves. The reality is that algorithms will likely know us better than we could ever know ourselves in the future, according to the author. Note: he is very well credentialed in his studies of human history.
Going forward, he predicts a shift from humanism towards dataism or techno-humanism. Basically, algorithms and data will know us better than we know ourselves. Therefore, machine algorithms will be better qualified to make decisions for us than ourselves. And many (if not all) of our choices will be made by powerful non-organic algorithms.
Everything we do, decided by a machine to maximize health, happiness, and optimal living. The idea of a human's entire life being driven by algorithms sounds dystopian to me, but the author makes a convincing case.