Lo-Fi Python

Jul 06, 2022

The Things They Don't Tell You About Ampersands and XML

In an XML document, you need to escape any ampersands in your text as &

I began a new coding project. Sure, there's documentation for the API that solves my problem. I find out it uses XML. Extensible Markup Language, a classic API format. Cool. I craft a beautiful script that works at first. Or so it seems!

Later on, I realize it doesn't work as well as I believed. It turns out, if I want a server to accept my XML document, escaping certain characters might be required. The documentation didn't mention this. It was my first time using XML, how would I know?

I noticed a script only worked for a handful of requests. It failed for most, returning a 400 status code. Suspecting the issue was likely in my payload, I studied the data of the request bodies that failed compared to the others that succeeded. All of the payload bodies that failed contained text with an ampersand.

Suspecting it might be an XML + ampersand related issue, I googled this Stack Overflow post which explains the ampersand escaping situation. There are a handful of characters that must be escaped. Otherwise, you might not be able to connect to the server.

These are the things they often don't tell you. Those little details you must sometimes realize for yourself, unless someone bothers to mention it or write it down. Now you know something that cost me an hour or two of tinkering to realize!

`Image Source <https://github.com/sichkar-valentyn/XML_files_in_Python/blob/master/example.xml>`__

Image Source

Want to read more on HTTP? Check out my guide on making HTTP requests with Python to read more about HTTP requests.

Jun 24, 2022

Hammock-Driven Development Notes

Occasionally you will find a video or talk that connects or resonates with you in a great way. Rich Hickey's "Hammock Driven Development", a self-described "rant" is packed with wisdom. I keep coming back to re-watch and today, I have written down some key points from this amazing rant!

Key Ideas

Take more time to think through your problem.

When was the last time you...

thought about something for a whole day?

thought about something for a whole month or year?

Hammock Driven Development, https://www.youtube.com/watch?v=f84n5oFoZBc

On Bugs

  • Bugs are cheaper to fix in development.
  • Least expensive to avoid in design
  • Most expensive in to fix in production

Analysis & Design, Simplified

  • Identify problem trying to solve.
  • Assess whether it solves that problem.

On Problem Solving

solving problems by Rich Hickey

Problem Solving (cont.)

  • State the problem out loud.
  • Understand the problem's facts, context and constraints.
  • What don't you know?
  • Find problems in your solution.
  • Write it all down.

More Input, Better Output

  • Read in and around your space.
  • Look critically at other solutions.
  • You can't connect things you don't know about.

On Focus

  • On the hammock, no one knows if you're sleeping and they don't bother you because of this.
  • Computers are distracting.
  • Let loved ones know you are going to be "gone", focusing deeply for some time.

Waking Mind vs Background Mind

  • The waking mind is good at critical thinking.
  • Use waking time to assign tasks to background mind.
  • The background mind is good at making connections and good at strategy.

Sleep According to Scientific American:

  • The brain processes info learned while sleeping.
  • Sleep makes memories stonger and weeds out irrelevant details.
  • Our brain finds hidden relations among memories to solve waking problems.

Closing Ideas

Write the proposed solution down. Hammock time is important "mind's eye time". We switch from "input mode" to "recall mode" during hammock time. Wait overnight, or sometimes months, to think about your problem, sleep sober for best results! Eventually coding is required, and your feedback loop is important, but "don't lean on it too much". You will be wrong, facts and requirements will change. Mistakes happen. That's fine, do not be afraid of being wrong. /rant

The notes in this blog post are paraphrased from this rant.

Feb 14, 2021

So You Want to Learn Python?

Here are a few Python concepts for beginners to explore if you are starting out with the language. In this post, I'll highlight my favorite "must-learn" tools to master that come with your Python installation. Understanding them will make you a more capable Python programmer and problem solver.

  1. Built-in Functions. They are awesome! You can do so much with these. Learn to apply them. You won't regret it! See also: An Intro to Python's Built-in Functions
  2. String methods. Want to capitalize, lowercase or replace characters in text? How about checking if a str.isdigit()? Get to know Python's string methods. I use these frequently. Also, the pandas string method implementations are great for applying them to tabular data.
  3. Docstrings. I truly enjoy adding docstrings at the beginning of my functions. They add clarity and ease of understanding.
  4. The Mighty Dictionary. Lists and tuples are useful too, but dictionaries are so handy with the ability to store and access key-value pairs.
  5. List Comprehensions. These allow you to perform transformations on lists in one line of code! I love the feeling when I apply a list comprehension that is concise, yet readable.
  6. Lambda Expressions. These can be used to apply a function "on the fly". I love their succinctness. It took me a few years to become comfortable with them. Sometimes it makes sense to use a lambda expression instead of a regular function to transform data.
  7. Date Objects. Wielding date objects and formatting them to your needs is a pivotal Python skill. Once you have it down, it unlocks a lot of automation and scripting abilities when combined with libraries like pathlib, os or glob for reading file metadata and then executing an action based on the date of the file, for example. I use date.today() a lot when I want to fetch today's date and timedelta to compare two dates. The datetime module is your friend, dive in. Must know for custom date formatting: strftime() and strptime(). See also: Time Format Codes

For tabular data, I often use pd.to_datetime() to convert a series of strings to datetime objects:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# install pandas with this command: python -m pip install pandas
import pandas as pd
events = [
    ["USA Born", "1776-07-04"],
    ["WTC Bombings", "2001-09-11"],
    ["Biden Inauguration", "2021-01-20"],
]
df = pd.DataFrame(events, columns=["events", "dates"])
# convert a pandas series of strings to datetime objects
df.dates = pd.to_datetime(df.dates)
print(df.dtypes)
print(df.head())

Just the tip of the iceberg...

The amazing part of Python is that its community has developed an astonishing plethora of external libraries which can be installed by pip. Usually I'll learn how to use new libraries after googling to find a well-written README on Github or helpful documentation. The language comes with an impressive line-up of baked-in tools and libraries way beyond what I've mentioned here. But I think this is a great start. Get to know these common Python language features and you'll be surprised how much you can do!

Additional Comprehensive Python Learning Resources

How long did it take you to learn Python?

Practical Python Programming (free course)

Google Python Style Guide

What the f*ck Python!

PySanity

Aug 09, 2020

Pondering Join Algorithms

Truly enjoying this Intro to Database Systems course from Carnegie Mellon University. Some really great breakdowns of common join algorithms in this lecture. Here are my notes.

Lecture 11- Join Algorithms(CMU Databases Systems / Fall 2019)

Prof. Andy Pavlo, Carnegie Mellon Database Group

Join Algorithms

screenshot from lecture

Table Positioning for a Join

"In general, your smaller table should be the "left" table when joining two tables."... Professor demonstrates better performance by making the smaller table the "outer" table in a join.

Block Nested Loop Join [mysql example]

  • "The brute force approach"
  • If you have enough memory to hold a large table, a good option for joining.
  • Always pick the smaller table as the outer table.
  • Buffer as much of your outer table in memory as possible to reduce redundant I/O.
  • Loop over the inner table or use an index.

Index Nested Loop Join [CS Course definition]

If indexes are available, or you could create an index to use for a join.

Sort-Merge Join [wikipedia]

Useful if one or both tables are sorted on a join key. Maximize sequential I/O.

Sort - Merge Join

screenshot from lecture

Hash Join

Best performance. For large datasets.

  1. Phase #1 Build (Hash Table)
  2. Phase #2 Probe

Use a Bloom Filter set operations for probe phase optimization.

  1. insert a key
  2. lookup a key

Additional Reading on Bloom Filters

Let's implement a Bloom Filter

Bloom Filters Debunked

Grace Hash Join [wikipedia]

  • "Do hash joins when things don't fit in memory."
  • Use a hash table for each table. Break the tables into buckets then do a nested loop join on each bucket. If the buckets do not fit in memory, use recursive partitioning. Then everything fits in memory for the join.

"Split outer relation into partitions based on the hash key."

Prof. Andy Pavlo on Hash Join algorithm

  • Hashing is almost always better than sorting for operator execution.

"No join algorithm works well in all scenarios."

-Prof. Andy Pavlo

webmention

webmention

Jun 27, 2020

Characterizing Database Workloads & Storage Models

Thank you Carnegie Mellon Database Group for putting this online! These are my notes from watching on YouTube.

Carnegie Mellon Databases Storage II, Lecture 4

Prof. Andy Pavlo [Watch on YouTube]

The Problem and Solution

How should the DBMS represent the database in storage files on disk? Solve it by choosing the right storage model for your target workload. The right strategy varies if you are reading data, writing data and with how many joins you are performing.

Workload Characterization

OLTP (Online Transaction Processing): "Simple queries with lots of writes."

OLAP (Online Analytical Processing): "Read only queries. Lots of joins. Doing a lot of reads, but they're more complex."

HTAP (Hybrid Transactional Analytical Processing): "is trying to do both of them. You still want to ingest new data, but analyze it as it comes in. It's used for companies making decisions on the fly as people are browsing websites, like internet advertising companies."

Screenshot 2020-06-27 at 11.56.10 AM

Storage Models

screenshots from the lecture

n-ary model

N-ary used to be the dominant model until the '80s.

DSM model

Additional Reading: All Things Distributed

Column Store Vs. Row Store RDBMS

Row-oriented DBMS(Row Store)

  • PostgreSQL, MySQL
  • Row Store = use OLTP

Column-oriented DBMS(Column Store)

  • Red Shift, BigQuery
  • Column Store = use OLAP

If types are consistent, you can compress data into single column store.

Jun 23, 2020

Free Computer Science Courses and Talks To Absorb

Below you'll find a balanced curriculum of juicy courses and videos that are available for free on the internet. I'll definitely be diving into most of these in the 2nd half of 2020. Stay curious!

University CS Courses For Free

CS50's Web Programming with Python and JavaScript | Harvard University

CS 61-C Great Ideas in Computer Architecture (Machine Structures), Spring 2015 | UC Berkeley

CS 109: Data Science, 2015 | Harvard University

Mathematical Modeling of Football, Fall 2020 | Uppsala Universitet

CS 162 - Operating Systems and Systems Programming, Fall 2013 | UC Berkeley

15-445/645 Intro to Database Systems, Fall 2019 | Carnegie Mellon University

15-721 Advanced Database Systems, Spring 2020 | Carnegie Mellon University

Missing Semester: Shell Tools & Scripting, Spring 2020 | MIT

6.824 Distributed Systems, Spring 2020 | MIT

CSE 373 - Analysis of Algorithms, 2016 | Stony Brook University

CS 4150 Algorithms, Spring 2020 | University of Utah

CS 241 System Programming, Spring 2020 [course wiki] | University of Illinois

CS 6120: Advanced Compilers: The Self-Guided Online Course | Cornell University

Intriguing Coursera Classes

DevOps Culture and Mindset | UC-Davis

Computer Science: Algorithms, Theory, and Machines | Princeton University

Excel Fundamentals for Data Analysis | Macquarie University

Build a Data Science Web App with Streamlit and Python | Guided Project [$10]

Programming Talks & Tutorials

These programming talks piqued my interest, highly recommended.

David Beazley | Built in Super Heroes [YouTube]

Mr. Beazley shows how to use pure Python built-in functions to clean and analyze the City of Chicago's food inspection data. No pandas in this talk, behold the power of the Python standard library. Spoiler: Don't eat at O'hare airport. He also has a new course, available for free:

David Beazley | Practical Python Programming [Course]

This is not a course for absolute beginners on how to program a computer. It is assumed that you already have programming experience in some other programming language or Python itself.

Sebastian Witowski | Modern Python Developer's Toolkit [YouTube]

An overview covering editing tools and setup from PyCon 2020. Honing your development environment is crucial to being an efficent coder. This example uses VS Code. I use Atom as my primary text editor. The most recommended linters are usually pylint, flake8 or pyflakes.

Jake VanderPlas | Reproducible Data Analysis in Jupyter [YouTube]

This 10 video series is a must-watch for aspiring data scientists and analysts if you use Python. Includes a git workflow demonstration, working in Jupyter Notebooks and many other essentials.

Rich Hickey | Hammock Driven Development [YouTube]

Sometimes, the best thing we can do is step away from the keyboard. I really enjoy this speaker's communication style.

Eric J. Ma | Demystifying Deep Learning for Data Scientists [YouTube]

Tutorial-style Pythonmachine learning walk-through from PyCon 2020.

Julie Michelman | Pandas, Pipelines, and Custom Transformers [YouTube]

This video shows a deep dive into the world of sci-kit learn and machine learning. PyCon and PyData videos usually include some cutting edge tech. Machine learning moves so fast there are always new tools surfacing. But certain libraries like sci-kit learn, TensorFlow, keras and PyTorch have been constant.

Ville Tuuls | A Billion Rows per Second: Metaprogramming Python for Big Data [YouTube]

Make your data dense by tactically re-arranging into efficient structures and compiling it down to lower-level bytes. This details a successful Python / Postgres / Numba / Multicorn big data implementation.

Video & Course Grab Bag

Discover the role of Python in space exploration [course]

Microsoft and NASA made a free course about Python in space! 🤓

Ted Nelson | Computers for Cynics [YouTube]

I find these videos to be an entertaining, thought-provoking take on software history. Recommended from Joe Armstrong, the creator of Erlang.

GNU Typist [Tutorial]

You may be able to teach yourself to type more efficiently with this tutorial. I definitely need to do this. It's worth mentioning, per Rich Hickey: with a proper design phase, you'll spend less time typing in the first place!

Extra Credit: Python Wikipedia Library

import wikipedia [GitHub]

Apr 11, 2020

Reflections on 5 Years of Solving Problems with Python

Prior to learning Python, I had no programming experience. I worked in marketing for a book publisher and did not perform well at my job. It was not a good fit. They eventually fired me. As my previous job unraveled, I discovered Python and the Coursera course, Programming for Everybody (Getting Started with Python). Fortunately, that course jump-started me onto a path of learning and reading each day. My aim was to make my own website, a goal that I accomplished. I needed to know how the sausage was made.

Looking back from 2020, I can safely say Python changed my life. Because of it, I now have a fulfilling marketingdata-oriented career. I'm also grateful for the financial stability that came with it. I love to learn about the language and continue to improve my abilities to solve problems with new tools, not only Python.

Below are pieces of wisdom picked up from my experiences. They are the result of many hours of study, reading, mistakes, luck, toil and eventual glory.

These are thought-provoking adages and guidelines, not absolute truths in all cases.

  1. Developing a habit of learning pays off over time, no matter what the subject is. It is an investment in yourself that compounds.
  2. Follow your own curiosity. It's less important to compare what you know to others. Compare what you know today to what you knew yesterday. Don't worry about how long it takes to learn.
  3. Watch educational or technical conference talks on sites like YouTube or InfoQ. Rich Hickey, Brandon Rhodes and David Beazley are some of my favorite speakers. Watch talks from all languages, not just Python. Often the concepts apply to any programming language.
  4. Use an RSS reader. Anytime you find a good blog, subscribe by RSS or email to get new posts. I use the Feeder Chrome extension\Android app.
  5. The Zen of Python contains a lot of wisdom. I like the concept of Explicit is better than implicit. This implies declaring your actions in written or oral fashion, providing additional context. Consider favoring easier to read solutions over clever one-liners. For example:
    • List comprehensions are useful and "pythonic", use them! But sometimes it's easier to use a for loop to hash out an idea. (Contrarily, avoiding the Initialize Then Modify pattern benefits those comfortable with comprehensions.)
    • Explicitly using keyword arguments versus positional arguments is another way to make your code easier to understand.
  6. Can you explain the solution simply? If not, try to clarify your understanding or maybe there's a simpler way. In Python, there are often several ways to accomplish the same goal. But keep in mind the Zen of Python: There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Look for the obvious way. An example of this is string formatting. I've heard f-strings are the hot new way to do this now.
  7. Don't be afraid to change course if things don't feel right. Ask yourself while coding, "Does this feel efficient?" Recently I was trying to format a json string so I approached it like I had in the past, by exporting the request from Postman and formatting the json string with python's format() built-in. But this time, the curly braces were confusing me, I was struggling and it wasn't working. I googled and around and saw python's json module and df.to_json() in pandas. They were a much easier and better-looking solution. But it still wasn't working. Finally, i used the Postman approach and f-strings to format a successful payload. The third try worked! F-strings are super nice and clean.
  8. If you're stuck, there's probably a free online course or blog post that explains whatever is confusing you. Use the Googles. When in doubt, Google the error message.
  9. Begin your project by writing a list of requirements. This often leads to good questions and cases that may need to be addressed. The book Code Complete 2 covers establishing project requirements in great detail, along with the other stages in the life-cycle of a software project. I'm really enjoying this book and highly recommend it.
  10. Names are really important. Take time to think about a good name for your variables and functions. Also, name your scripts well. I name my scripts using action verbs. For example, my script that organizes accumulated files on my desktop into folders is named clean_desktop_files.py. When I see this script months later, its name reminds me the action the script is performing. I believe it's better to err on the side of longer, more descriptive names for variables and functions. It makes code easier to understand. But there is a trade-off with length to keep in check.
  11. Moving a block of code into a function can abstract away repetitive code and increase its readability.
  12. Each function should do one thing only. Follow the single-responsibility principle.
  13. Train yourself to think in data structure conversions. The Python dictionary is very useful and can be converted to and from lists, tuples, sets, etc. I often find it more efficient to convert to a different structure to efficiently organize it. Usually I am googling things like "convert class object to python dictionary" because dictionaries are easy to work with or convert to other structures. The vars() built-in is great for converting objects to a dictionary. For example, once you have a dictionary, you might be able to solve your problem by converting it to a dataframe.
  14. Use only the data you need. Reading in just the essential data helps avoid memory issues and hanging programs. In pandas, the usecols argument in pd.read_csv() is great for this. This creates a dataframe with 2 columns:
1
df = pd.read_csv('emails.csv', usecols=['name','email'])
  1. Assume that if something is broken, it's because of something you've done. Start from the assumption that your code contains the bug and work outward by eliminating possibilities. Avoid jumping to quick conclusions. Instead, carefully consider possible reasons for why something is happening. Many times, I find my 2nd or 3rd hypothesis is actually true.
  2. There will be times when you'll look at someone else's choices and wonder why they did things a certain way. Consider the possibility that they know more than you in this domain.
  3. Beware of sequencing errors. Are your tasks, scripts or functions executing in an efficient order to reach your end goal? Look to unblock bottlenecks and correct chronological mistakes in your processes.
  4. Before you send that email asking for help, go back and take another look. There's also no shame in asking for help. Be sure you proofread your email before sending.
  5. Status code 200 does not guarantee your API request was successful. You may want to write a test to confirm success that doesn't rely on response status codes.
  6. Unfortunately, testing gets shunned sometimes. Make it a priority. I enjoy writing pytest tests more than most other code. Why? Because tests confirm my scripts are working to some degree, detect bugs and provide a refactoring safety net.
  7. Refactoring your code is a crucial step in making it better. Coming back to my code after a few weeks, months or years brings clarity, experience and a new perspective. It feels good to improve the quality of my old work.
  8. Consolidate your tasks. Bundling things can save you a bundle of time! Identify redundant patterns and remove if possible. Observe yourself while working. Any repetitive manual process can probably be automated away. Recently, I figured out how to use a Windows batch file to instantly activate my Python virtual environment. It took me a few years of tediously pasting the cd and activate commands into command prompt every day to realize. Now it's a snap.
  9. Stack Overflow is a useful resource. But the top answers may be outdated. Check the other less popular answers sometimes. Or...
  10. Read the documentation! An updated or more elegant solution might be there. I recently found os.makedirs(path, exist_ok=True) in the os docs. I didn't know about the exist_ok argument. I was creating folders with a more complicated alternative from Stack Overflow for years. I use this way all the time now. In the same vein, if you need the local system username, the Python docs state getpass.getuser() is preferred over os.getlogin().
  11. Write documentation explaining how to use your projects. Even if you can only muster a quick README text file, that's better than nothing. Within your code, docstrings are a nice addition. I have yet to use Sphinx, but it is a popular choice for generating documentation.
  12. Teaching others feels good and solidifies your knowledge. Writing and pair programming are great ways to improve your understanding and pass your skills along to other people. While we're on the subject of writing...
  13. Write everything down! Your head is not good at storing information in memory. Computers are. This frees your mind to come up with new ideas rather than expending energy to remember what you've already done. It also helps you plan. I use a Notepad text file to keep a running to-do list. You could also use services like Trello or Microsoft Planner. While writing code, use comments and docstrings conservatively for quick notes, clarifications or reminders. The important thing is to write it down somewhere.
  14. When editing your writing, continually ask yourself, "Do I need this word or phrase?" for every word you write.
"Brevity is the soul of wit." - William Shakespeare (Hamlet)
  1. Draw inspiration from culture, nature and professional disciplines outside of your own. Insights can be mined from anything. Don't dismiss a situation as mundane without first scanning for knowledge nuggets and gems.
  2. Better solutions often come to me after gaining time and experience with a problem. Building software is an iterative cycle of adjustment towards consistently fulfilling the needs of those it serves in 100% of cases. In a perfect world, you'd never have bugs. But edge cases tend to pop up in ways you didn't think of when you first wrote a solution. There will also be projects where requirements or business rules change. Consider that possibility when you are designing your solution.
  3. It's possible to find a job that you're excited about and genuinely enjoy the work.
  4. Respect your craft, whether it's coding or another profession. A skilled carpenter needs precision, practice and focus to make something beautiful. Approach your craft with the same mindset and pride in making your best art.
  5. We all have holes in our knowledge. Be receptive to other ways of thinking. The best way to learn is from other humans. Everyone has different backgrounds and experiences. I have never used object oriented programming, classes or certain command line tools like ssh. I have a loose understanding of these things but have not yet applied them to my projects. Working with paths (os and pathlib) still gives me fits sometimes. These are knowledge gaps that I want to fill in. Additionally, we don't know what we don't know. Try to illuminate the fog of your unknown.
  6. Choosing to dedicate to learning Python is among the best decisions I've made.
  7. Attitude is more important than intelligence. Anyone can learn to program, play guitar or fly an airplane. You can become an adept problem solver. Acquire an attitude to support your determination and persistence.

[caption id="attachment_2981" align="alignnone" width="959"]brandonrhodes Brandon Rhodes: Stopping to Sharpen Your Tools - PyWaw Summit 2015[/caption]

I'll leave you with the 4 P's and 4 C's from my Programming for Everybody Coursera course graduation ceremony. Cultivating these principles will guide you to growing your education and finding a positive course in life:

4 P's: Passion, Purpose, Persistence, Playfulness

4 C's: Choice, Commitment, Connection, Completion

Thank you for reading and I hope this post helps you on your own educational journey.