Lo-Fi Python

Jan 08, 2023

pymarketer: an HTTP + Spreadsheet Wrangling Python package

Typically, this blog reviews the other Python libraries in its vast ecosystem. This time, it's my own package I made for fun, pymarketer. This was created in a single day and can be installed from the Github repo. Have a go at my most read post if you need help with pip.

Install with pip from the source Github repo:

python -m pip install git+https://github.com/erickbytes/pymarketer.git

The pymarketer package helps you do things like:

  • merging all the tabs of an Excel file into one CSV
  • generate HTTP code
  • make a word cloud image
  • splitting a CSV
  • merging CSVs

Generating a Word Cloud with the pymarketer Package** via wordcloud

1
2
3
4
5
6
7
8
import pandas as pd
import pymarketer as pm

xl = "Chicago Breweries.xlsx"
df = pd.read_excel(xl)
# Make a wordcloud from a pandas dataframe.
wordcloud = pm.word_cloud(df)
wordcloud.to_file("Text Word Cloud Visualization.jpg")
Python wordcloud example

This package relied on several Python libraries to complete:

I'll likely expand on this in the future. Anyone who wrangles data might be able to apply this package to good profit. At minimum, you might find it interesting to take a look at the project's __init__.py to see how some of the functions are implemented.

Additional Resources

Jul 08, 2022

Launching a Live Static Site Blog via Pelican, Github and Cloudflare Pages

Proud to announce my newest side project blog, Diversified Bullish, is live at divbull.com. It is made with Pelican and the Blue Penguin theme. I'm planning to write about stocks and investing there moving forward in addition to this blog which focuses on Python programming.

The divbull.com Github repo serves the static files generated by Pelican via Cloudflare pages. It's free, unless you purchase a domain. I purchased my .com domain with Namecheap before I learned about Cloudflare pages. I followed these instructions to set up my new financial blog. If you're interested, you can subscribe to an RSS feed here to follow when I post something new.

The dashboard provides a number of framework-specific presets. These presets provide the default build command and build output directory values for the selected framework. If you are unsure what the correct values are for this section, refer to Build configuration. If you do not need a build step, leave the Build command field blank.

https://developers.cloudflare.com/pages/get-started/

Cloudflare pages deployment details

Working in the Cloudflare pages build dashboard is sweet. It took me about 5 failed Pelican build commands to get the site to deploy. Finally, I was able to get the site build to complete by leaving the build command blank. Cloudflare was able to scoop up my Pelican "output" folder contents and render the blog. How cool. I feel like I've done the impossible, launching a passable quality blog with top shelf tools this quickly for under $10!

Initially, I spent a few hours getting to know Pelican. Once I correctly installed a theme I liked, I banged out a few philosophical financial musings to give the blog some posts. Then I had the static files generated but no clue how to serve them. Enter Cloudflare pages, a free option to host a blog.

Connecting the repo to Cloudflare pages, adding the files to the repo and finding the correct build command added a few more hours. In total, it took me about 1-2 days to make a live site since I did not know about Pelican or Cloudflare pages when I began playing with a Pelican blog in April. This was my first static site launch!

Cloudflare build settings

Generating a Blue Penguin themed Pelican blog.

showing Pelican blog workflow

Head over to divbull.com to see this Pelican, Github and Cloudflare pages stack in action.

Like static site generators? Check out this post about static site generator libraries in Python.

May 09, 2022

An Ode to Code

Making time to code can be done every day. Carve out those little moments where you can automate tedious tasks or study up on that hot new Python library that takes your quality to another level.

Take time to reorganize and refactor in your favorite text editor. Break your script, then break it again. Break it until it works. Absorb your new abilities as a machine literate human and build skills on top of skills. Make a breakthrough. The code is great. It makes sense. Another tweak here, another tweak there. Run black on it and then have a go at PEP-8 to brush up on your style. More tweaks, and need to add some docstrings for more clarity.

Another one bites the dust. Who knows where your skills could grow. Following the code is a delightful road. Some days it's hard. Some days it's easy. But it's fulfilling if you treat it like a locksmith does keys. Knowledge is flowing. The craft is built in each moment. Challenges overcome. Battles won. New innovations to munge.

Code is the medium to communicate with machines and leverage their efficiency for convenient means. Tighter the web we weave with transistors and screens, the more we'll need dignified intermediaries of man and machine. Here's to the good code and the bad code we all will write. May we never let something stop our logical flights to code a better dream.

Jan 30, 2022

Fix Spelling and Grammar with language_tool_python and textblob

Below are two practical Python libraries for text processing. This function uses textblob's spelling correction along with language_tool_python, which applies grammatical corrections via the Language Tool API. I added these text processing transformations into my concept text generation app. These are free, public APIs up to around 20 requests per second. You can send both text and receive back an improved version of your text, ideally altering and improving your writing.

I found 2 errors when I piped the text of this post into the below code: the proper noun "textblob" corrected to "text blow's" and the word "app" corrected to "pp". Be sure to proof your results. Regardless, I like having these two Python tools in my bag!

Install

textblob

language_tool_python

help with pip

pip install language-tool-python
pip install -U textblob
python -m textblob.download_corpora
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import language_tool_python
from textblob import TextBlob

def fix_spelling_and_grammar(text):
    """Returns str: text transformed by language tool and text blob
    1) Apply language tool API correction
    Language Tool Public API: https://dev.languagetool.org/public-http-api
    https://languagetool.org/http-api/swagger-ui/#!/default/post_check
    python library: https://pypi.org/project/language-tool-python/

    2) Apply textblob's spell check to the text"""
    try:
        # use the public API, language English
        tool = language_tool_python.LanguageToolPublicAPI('en-US')
        text =  tool.correct(text)
        b = TextBlob(text)
        return str(b.correct())
    except:
        return text

text = "Language is incredble. Fascinatng how hoomans have so many."
transformed_text = fix_spelling_and_grammar(text)
print(transformed_text)
# Result: Language is incredible. Fascinating how humans have so many.

Jan 08, 2022

Experiencing Flow While Coding

Yesterday, I experienced a flow state where I became manically obsessed with perfecting a script I was working on. I think it's beautiful code, about 100 lines long without docstrings. It solves a real need and it felt great to write it. Some scripts feel terrible to write and you know they're bad. However, this one felt like one of the best I've ever written.

Flow seems like a mythical, unattainable state these days as portrayed in media, but we can all agree... we love it. When you're in flow, you know it and you feel a grace in improving your work. For coders, maybe it's by wrapping up a few lines here and there into functions. Refactoring, reordering, handling loose ends or edge cases, writing docstrings with supporting documentation and clarifying that you really understand what's happening... these things are all mundane at times but critical to writing reliable code.

While doing these typical tasks, you're attaining skill and mastery, one of the highest dopamine hits humans can register legally in all 50 states. You know how much better this iteration of code is than when you first learned to write software. You take bits and pieces from past projects and fit them all together into a cohesive, purposeful program. For example, I was tickled to use Python's readlines() file reading function to get the last line of a text file. I learned about this function in my first ever free Python course on Coursera, 7 years ago. Thanks again Dr. Chuck!

This time, I realized my flow when researching ISO 8601 time format strings and guiding them into an HTTP request with the requests library. A new solution emerged, regurgitated from a prior project and mashed up into a more refined form to satisfy the project's requirements. I combined old and new ideas into a better solution than I had ever thought, a fitting complement for the API at hand. Time will tell if the solution will actually work as well as I hope.

Flow is real. You can find work that puts you in a flow state, and it doesn't have to be super interesting work to get there. The learning process pays rewards in competency when exposure to different domains combine. Einstein knew a form of this as combinatory play. Repetition enhances this effect and solidifies your foundation. Flow makes it fun! Only rarely do I feel the highest level of engrossment in my work. I sensed I was flowing on this recent project. You can find these types of challenges too. Keep searching for your flow!

Jan 06, 2022

ftfy, The Wonky Text Fixing Python Library

Every Python programmer has undoubtedly come across some crazy characters. The ftfy library "Fixes Text For You" and acts like a swiss army knife when you've got questionable characters breaking your script. In my case, an HTTP request was failing because of weird cryptic letters hiding in the data when it was only supposed to be an apostrophe. This library fixed my text and made it appear flawless. I really like ftfy because it solves a common problem, fixing "mojibake" or mangled characters. It's a good tool to have when you see these types of issues!

Install with pip:

pip install ftfy

See also: Python Unicode How To

Dec 19, 2021

Memory Monitoring Python Libraries + Tools

If you write Python code, there's probably been a time or two when you saw the dreaded "MemoryError". This happens after one of your Python scripts stops because your computer has no spare RAM to execute it. I recently experienced this frustration whilst trying to write hundreds of thousands of csv files. However, this time I grasped for tools that support smarter memory management. Now, I can watch my computer's memory bounce around with the Windows Resource Monitor. Python has quite a few memory profiling libraries for monitoring memory too!

Python Libraries and Guides

Memory Management Overview, Python documentation

Memory Profiler: "monitor memory usage of Python code"

psutil: "Cross-platform lib for process and system monitoring in Python"

py-spy: "Sampling profiler for Python programs"

pyinstrument: "🚴 Call stack profiler for Python. Shows you why your code is slow!"

Scalene: "a high-performance, high-precision CPU, GPU, and memory profiler for Python"

Glances: "Glances an Eye on your system. A top/htop alternative for GNU/Linux, BSD, Mac OS and Windows operating systems."

Yappi: "Yet Another Python Profiler, but this time thread & coroutine & greenlet aware."

Fil: "A Python memory profiler for data processing and scientific computing applications" (Video)

line_profiler: "Line-by-line profiling for Python"

pprofile: "Line-granularity, thread-aware deterministic and statistic pure-python profiler"

Guppy 3: "Python programming environment and heap analysis toolset"

See also: The Python Profilers, Python documentation

CPython standard distribution comes with three deterministic profilers. cProfile, Profile and hotshot. cProfile is implemented as a C module based on lsprof, Profile is in pure Python and hotshot can be seen as a small subset of a cProfile.

Yappi Github, https://github.com/sumerc/yappi

Windows Tools

Task Manager: Windows process management tool with some memory analytics

Collect Data in Windows with Performance Monitor

Resource Monitor: Windows tool with Memory, CPU, Disk and Network monitoring tabs

Resource Monitor can stop processes from running and view in use, standby (Cached) and free memory. This shows 7 Python scripts running and 49% of total memory is being consumed. Looks like we are running steady and safely below "MemoryError" overflow. We might be able to add a few more scripts with 51% of RAM available!

Resource Monitor can stop processes from running and view in use, standby (Cached) and free memory. This shows 7 Python scripts running and 49% of total memory is being consumed. Looks like we are running steady and safely below "MemoryError" overflow. We might be able to add a few more scripts with 51% of RAM available!

Memory Tips and Guides

  • Use only the data you need. Any data you read in and aren't using is held in memory. The usecols argument in pandas is a great way to read a csv and only use the columns you need.
  • Reading data in chunks with the chunksize argument is another way to reduce memory usage for large datasets.
  • Measuring the memory usage of a Pandas dataframe
  • Some tools are line oriented, others are function oriented. If your code contains large functions, you might favor a line based profiling tool.
  • Be aware of the overhead some memory tools may incur. memory_profile was clocked with a whopping 270x slowdown per the Scalene PyCon talk below. The talk shows an awesome comparison of these Python profiling libraries:
Scalene Pycon US 2021 Talk

Recommended Reading

Conclusion

When you'll see "MemoryError" depends on your computer's hardware, the size of your dataset and what operations you need to script out. Generally speaking, I/O or file reads and writes are more expensive operations.

The tools in this post will help you anticipate how much computing power you have available, monitor your memory consumption more closely and avoid pushing your computer past its limits. You can do things like reading data in chunks and only using the columns you need to reduce your memory consumption. Realizing these tools and strategies can make getting things done with Python a smoother ride.

Jul 28, 2021

8 Promising Python Static Site Generators

A static site generator creates static HTML and markdown files to serve as a website. They're commonly used to host blogs but not exclusively. I recently researched my options to roll a static site in Python. I'm assessing a few of them as a potential future self-hosted blogging solution for this Wordpress blog. Or maybe I'll spin up a new one!

Why Statics?

Most "modern" websites are dynamic in the sense that the contents of the site live in a database, and are converted into presentation-ready HTML only when a user wants to see the page. That's great. However, it presents some minor issues that static site generators try to solve.

In a static site, the whole site, every page, everything, is created before the first user even sees it and uploaded to the server as a simple folder full of HTML files (and images, CSS, etc).

The Nikola Handbook - https://getnikola.com/handbook.html#why-static

Static Site Generator Python Libraries

listed in largest to smallest order by # of Github project stars

Pelican | Github - 11K Stars

Seems to be the front running static site generator in Python's ecosystem. It contains a convenient pelican-importer tool to import existing content from WordPress, Dotclear, or RSS feeds. Enjoying the modular nature of the pelican-plugins and pelican-themes!

Lektor | Github - 3.5K Stars

Intriguing CMS project touting a "Python API", plugins for tools like Webpack and talented maintainers including the author of Flask.

Cactus | Github - 3.5K stars

"Simple but powerful static website generator using Python and the Django template system... typical users would be designers that are tech-savvy, want to use templates, but don't like to mess with setting up django or S3." (Mac OS) Demo Video

Nikola | Github - 2.2K stars

Viable option to host your site with the informative Nikola Handbook walking you through each step. Plugins for Jupyter Notebooks, post processing filters, a Wordpress importer command line tool and about 40 ready to go themes to find the perfect style.

Makesite | Github - 1.6K Stars

Offers less configuration, using only a single makesite.py file.

Hyde | Github - 1.6K stars

Port from Jekyll, a Ruby static site generator. It has since formed its own "evil twin" identity.

Mynt | Github - 400 stars

"Designed to give you all the features of a CMS with none of the often rigid implementations of those features."

Staticjinja | Github - 250 Stars

"Minimalist Python library for building static websites with Jinja."

Additional Resources

Update! I launched a Pelican blog about investing with Cloudflare pages. It's my first live static blog. Read more about it here.

Apr 06, 2021

Aggregating A Python Error Summary from Log Files

Follow these steps to maintain more reliable scripts and catch more of your traceback errors:

  1. automate your scripts to run daily, weekly, monthly, etc.
  2. Log your traceback errors with the logging module. I tend to dump all of my logs into a single folder.
  3. automate aggregating the logs and parsing tracebacks
  4. start a feedback loop of fixing the tracebacks until 0 tracebacks remain
  5. re-run the script and confirm tracebacks disappeared
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import itertools
import os

def parse_errors(log):
    """look in each log file, line by line for Python error keywords"""
    errors = list()
    with open(log,'r') as f:
        for line in f:
            if 'Traceback' in line or 'Error' in line:
                # replace commas for csv
                line = line.strip().replace(',','')
                errors.append([log,line])
        return errors

# Parse traceback errors from logs in working directory, then write to them to a csv file.
logs = [f for f in os.listdir(os.getcwd()) if '.log' in f.lower()]
tracebacks = [parse_errors(log) for log in logs]
# dedupe list of lists with itertools module + list comprehension
tracebacks = [t for t,_ in itertools.groupby(tracebacks)]
with open('Log Traceback Errors.csv', 'w') as fhand:
    fhand.write('Log,Traceback') # csv header row
    for t in tracebacks:
        for error in t:
            fhand.write(f"\n{','.join(error)}")

This pure python script allows me to hone in on potential automation problem areas with my scheduled Python scripts. It doesn't catch the entire traceback. Rather, it shows the error type and the name of the log file that contains that error in a csv. I use this log aggregation script to monitor my daily or weekly scheduled python scripts, along with pytest tests.

Noteworthy gains from aggregating my logs:

  • less fear of missing mistakes
  • more freedom to improve the code
  • catch the mistakes faster

See also: Python Documentation - Basic Logging Tutorial

Feb 14, 2021

So You Want to Learn Python?

Here are a few Python concepts for beginners to explore if you are starting out with the language. In this post, I'll highlight my favorite "must-learn" tools to master that come with your Python installation. Understanding them will make you a more capable Python programmer and problem solver.

  1. Built-in Functions. They are awesome! You can do so much with these. Learn to apply them. You won't regret it! See also: An Intro to Python's Built-in Functions
  2. String methods. Want to capitalize, lowercase or replace characters in text? How about checking if a str.isdigit()? Get to know Python's string methods. I use these frequently. Also, the pandas string method implementations are great for applying them to tabular data.
  3. Docstrings. I truly enjoy adding docstrings at the beginning of my functions. They add clarity and ease of understanding.
  4. The Mighty Dictionary. Lists and tuples are useful too, but dictionaries are so handy with the ability to store and access key-value pairs.
  5. List Comprehensions. These allow you to perform transformations on lists in one line of code! I love the feeling when I apply a list comprehension that is concise, yet readable.
  6. Lambda Expressions. These can be used to apply a function "on the fly". I love their succinctness. It took me a few years to become comfortable with them. Sometimes it makes sense to use a lambda expression instead of a regular function to transform data.
  7. Date Objects. Wielding date objects and formatting them to your needs is a pivotal Python skill. Once you have it down, it unlocks a lot of automation and scripting abilities when combined with libraries like pathlib, os or glob for reading file metadata and then executing an action based on the date of the file, for example. I use date.today() a lot when I want to fetch today's date and timedelta to compare two dates. The datetime module is your friend, dive in. Must know for custom date formatting: strftime() and strptime(). See also: Time Format Codes

For tabular data, I often use pd.to_datetime() to convert a series of strings to datetime objects:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# install pandas with this command: python -m pip install pandas
import pandas as pd
events = [
    ["USA Born", "1776-07-04"],
    ["WTC Bombings", "2001-09-11"],
    ["Biden Inauguration", "2021-01-20"],
]
df = pd.DataFrame(events, columns=["events", "dates"])
# convert a pandas series of strings to datetime objects
df.dates = pd.to_datetime(df.dates)
print(df.dtypes)
print(df.head())

Just the tip of the iceberg...

The amazing part of Python is that its community has developed an astonishing plethora of external libraries which can be installed by pip. Usually I'll learn how to use new libraries after googling to find a well-written README on Github or helpful documentation. The language comes with an impressive line-up of baked-in tools and libraries way beyond what I've mentioned here. But I think this is a great start. Get to know these common Python language features and you'll be surprised how much you can do!

Additional Comprehensive Python Learning Resources

How long did it take you to learn Python?

Practical Python Programming (free course)

Google Python Style Guide

What the f*ck Python!

PySanity

Next → Page 1 of 3