Lo-Fi Python

Nov 28, 2020

20 Eclectic Computer Science Wikipedia Articles

Here are 20 random technology-oriented Wikipedia links I recently collected after re-organizing troves of bookmarked links accumulated over the past few years. These articles peek into the wide variety of things to learn about that exist in Computer Science. ABL. Always. Be. Learning. Curiosity and well organized browser bookmarks are your friend.

I support Wikipedia with a donation nearly every year. It's an amazing resource to learn about everything and I'm very grateful for it. Thank you for existing, Wikipedia. It's is a great jumping off point to learn about something I don't understand, which is much of this list here.

Algorithms & Theories

Deterministic algorithm

Greater fool theory (economics)

Man or boy test - compiler theory from Donald Knuth

Recursion - the most common algorithm I read about in passing

Databases

Cache

Cache invalidation

Caching is a common technique to store data so that it can be quickly fetched later with limited usage of database resources.

Column-oriented DBMS:

Data definition language

Database normalization

Lock (mutex)

Hybrid transactional/analytical processing (HTAP)

Online transaction processing (OLTP) - Making lots of writes to the database.

Online analytical processing (OLAP) - Reading the database. Lots of joins.

Developer

Asynchrony Used for performing operations in parallel.

RPC

Event-driven programming

Async is always event-driven, but not the other way round.

Stephen Chung - Stack Overflow

Functional Programming

Garbage collection - automatic memory management

Parallel Computing

Reinforcement learning - one paradigm of machine learning

Remote procedure call

List of Lists of Lists - check out the "Technology" section

Final Reminders

Thank you for reading, hope you found something you liked! Here's another post with more free resources for learning computer science online.

Aug 09, 2020

Pondering Join Algorithms

Truly enjoying this Intro to Database Systems course from Carnegie Mellon University. Some really great breakdowns of common join algorithms in this lecture. Here are my notes.

Lecture 11- Join Algorithms(CMU Databases Systems / Fall 2019)

Prof. Andy Pavlo, Carnegie Mellon Database Group

Join Algorithms

screenshot from lecture

Table Positioning for a Join

"In general, your smaller table should be the "left" table when joining two tables."... Professor demonstrates better performance by making the smaller table the "outer" table in a join.

Block Nested Loop Join [mysql example]

  • "The brute force approach"
  • If you have enough memory to hold a large table, a good option for joining.
  • Always pick the smaller table as the outer table.
  • Buffer as much of your outer table in memory as possible to reduce redundant I/O.
  • Loop over the inner table or use an index.

Index Nested Loop Join [CS Course definition]

If indexes are available, or you could create an index to use for a join.

Sort-Merge Join [wikipedia]

Useful if one or both tables are sorted on a join key. Maximize sequential I/O.

Sort - Merge Join

screenshot from lecture

Hash Join

Best performance. For large datasets.

  1. Phase #1 Build (Hash Table)
  2. Phase #2 Probe

Use a Bloom Filter set operations for probe phase optimization.

  1. insert a key
  2. lookup a key

Additional Reading on Bloom Filters

Let's implement a Bloom Filter

Bloom Filters Debunked

Grace Hash Join [wikipedia]

  • "Do hash joins when things don't fit in memory."
  • Use a hash table for each table. Break the tables into buckets then do a nested loop join on each bucket. If the buckets do not fit in memory, use recursive partitioning. Then everything fits in memory for the join.

"Split outer relation into partitions based on the hash key."

Prof. Andy Pavlo on Hash Join algorithm

  • Hashing is almost always better than sorting for operator execution.

"No join algorithm works well in all scenarios."

-Prof. Andy Pavlo

webmention

webmention