Apr 15, 2021
You read the whole thing front to back? Every word? Stop and think about it. What is the computer trying to tell you?
Sometimes something is obvious but needs to be reminded. I found myself thinking this yet again today. Often there are times where I would have saved some time and grief by carefully reading the error.
Carefully read the error when you see a traceback. Then once you've taken it all in, consider what to do next. If you're stumped, googling it might yield a solution. But make sure you actually read the error first.
Apr 06, 2021
Follow these steps to maintain more reliable scripts and catch more of your traceback errors:
- automate your scripts to run daily, weekly, monthly, etc.
- Log your traceback errors with the logging module. I tend to dump all of my logs into a single folder.
- automate aggregating the logs and parsing tracebacks
- start a feedback loop of fixing the tracebacks until 0 tracebacks remain
- re-run the script and confirm tracebacks disappeared
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24 | import itertools
import os
def parse_errors(log):
"""look in each log file, line by line for Python error keywords"""
errors = list()
with open(log,'r') as f:
for line in f:
if 'Traceback' in line or 'Error' in line:
# replace commas for csv
line = line.strip().replace(',','')
errors.append([log,line])
return errors
# Parse traceback errors from logs in working directory, then write to them to a csv file.
logs = [f for f in os.listdir(os.getcwd()) if '.log' in f.lower()]
tracebacks = [parse_errors(log) for log in logs]
# dedupe list of lists with itertools module + list comprehension
tracebacks = [t for t,_ in itertools.groupby(tracebacks)]
with open('Log Traceback Errors.csv', 'w') as fhand:
fhand.write('Log,Traceback') # csv header row
for t in tracebacks:
for error in t:
fhand.write(f"\n{','.join(error)}")
|
This pure python script allows me to hone in on potential automation problem areas with my scheduled Python scripts. It doesn't catch the entire traceback. Rather, it shows the error type and the name of the log file that contains that error in a csv. I use this log aggregation script to monitor my daily or weekly scheduled python scripts, along with pytest tests.
Noteworthy gains from aggregating my logs:
- less fear of missing mistakes
- more freedom to improve the code
- catch the mistakes faster
See also: Python Documentation - Basic Logging Tutorial
Feb 14, 2021
Here are a few Python concepts for beginners to explore if you are starting out with the language.
In this post, I'll highlight my favorite "must-learn" tools to master that come with your Python installation.
Understanding them will make you a more capable Python programmer and problem solver.
- Built-in Functions. They are awesome! You can do so much with these. Learn to apply them. You won't regret it! See also: An Intro to Python's Built-in Functions
- String methods. Want to capitalize, lowercase or replace characters in text? How about checking if a str.isdigit()? Get to know Python's string methods. I use these frequently. Also, the pandas string method implementations are great for applying them to tabular data.
- Docstrings. I truly enjoy adding docstrings at the beginning of my functions. They add clarity and ease of understanding.
- The Mighty Dictionary. Lists and tuples are useful too, but dictionaries are so handy with the ability to store and access key-value pairs.
- List Comprehensions. These allow you to perform transformations on lists in one line of code! I love the feeling when I apply a list comprehension that is concise, yet readable.
- Lambda Expressions. These can be used to apply a function "on the fly". I love their succinctness. It took me a few years to become comfortable with them. Sometimes it makes sense to use a lambda expression instead of a regular function to transform data.
- Date Objects. Wielding date objects and formatting them to your needs is a pivotal Python skill. Once you have it down, it unlocks a lot of automation and scripting abilities when combined with libraries like pathlib, os or glob for reading file metadata and then executing an action based on the date of the file, for example. I use date.today() a lot when I want to fetch today's date and timedelta to compare two dates. The datetime module is your friend, dive in. Must know for custom date formatting: strftime() and strptime(). See also: Time Format Codes
For tabular data, I often use pd.to_datetime() to convert a series of strings to datetime objects:
1
2
3
4
5
6
7
8
9
10
11
12 | # install pandas with this command: python -m pip install pandas
import pandas as pd
events = [
["USA Born", "1776-07-04"],
["WTC Bombings", "2001-09-11"],
["Biden Inauguration", "2021-01-20"],
]
df = pd.DataFrame(events, columns=["events", "dates"])
# convert a pandas series of strings to datetime objects
df.dates = pd.to_datetime(df.dates)
print(df.dtypes)
print(df.head())
|
Just the tip of the iceberg...
The amazing part of Python is that its community has developed an astonishing plethora of external libraries which can be installed by pip. Usually I'll learn how to use new libraries after googling to find a well-written README on Github or helpful documentation. The language comes with an impressive line-up of baked-in tools and libraries way beyond what I've mentioned here. But I think this is a great start. Get to know these common Python language features and you'll be surprised how much you can do!
Additional Comprehensive Python Learning Resources
How long did it take you to learn Python?
Practical Python Programming (free course)
Google Python Style Guide
What the f*ck Python!
PySanity
Feb 07, 2021
After 12 years with Gmail as my primary email inbox, I wanted to clear out old "Promotions" emails. This can be done with some clever use of Gmail's search syntax shown below.
I wanted to preserve my "Starred" emails, but delete old emails to free up space. I was able to delete 58,000 "Promotions" emails! I wrote this post because I feel it might save you a little time figuring it out yourself. I also included brief details on possible Python tools for Gmail and IMAP below if you are considering scripting your contact management.
I didn't need Python in the end after reading some Gmail search operator examples. For example, adding a hypen before a search in Gmail acts similar to a "bitwise" or unary operator (~) in Python. It excludes or inverses the criteria in a search rather than including it. I was able to use this to exclude my starred emails. You can also add a filter for "has attachment". I used that to star any emails with attachments, then delete the rest and excluded "starred" emails.
Gmail search syntax to get "Promotions" minus "Starred" emails and filter on a date range:
category:promotions -is:starred after:2019/12/31 before:2021/1/1
Selecting "all" emails in your search
Once, you've selected "All" from the checkbox dropdown, click "Select all conversations that match this search". Now you can apply actions such as "add a star" or "delete" them by clicking the (⋮) vertical ellipses menu:
To Python or not?
I also considered Python tools for interfacing with gmail to accomplish this. There doesn't seem to be an easy way to group emails by "Category" in the Gmail API or IMAP. IMAP is shorthand for Internet Message Access Protocol.
imaplib is an "IMAP4 protocol client" in the Python standard library. Usually python's smtplib is the first library that comes up for email. Don't forget about imaplib! It might be more suitable for searching based on text in your emails or creating labeled segments yourself, then applying actions to them.
Additional Gmail + Python Resources
Gmail search operators
imaplib - python standard library interface to email accounts [Stack Overflow example]
enabling IMAP in your gmail
Gmail API Python Quickstart
Gmail python library - There are also pythonic wrappers to the Gmail API like this one.
imaplib - Python Module of the Week
Use Google Like a Pro
Jan 16, 2021
How do you calculate stock valuation metrics like Sharpe ratio. Recently I've been reading
about common stock valuation metrics
and wondered how I can apply them to my stock portfolio. I started reading about different
metrics, sketching formulas and entertained writing a script to calculate these metrics.
But Python has no shortage of finance-related libraries. After some furious googling
I found ffn, a way better option than rolling my
own formulas. It's a "financial function" library, installable with pip.
It will be interesting to observe how these metrics vary in my portfolio and learn more
of ffn's API. I like that they use
pandas dataframes
within their library because I'm already familiar with them. At minimum, it's good to understand
what stock formulas purport to measure and what it means if the measure is low or high.
It makes sense to compare stocks in similar industries or competitors like NKE
and ADDYY. This is a neat tool
for stock nerds who want to level up their stock analysis, make smart decisions and ideally pad the portfolio!
The funny thing is... my lowest university grade was a "C" in my only Finance class.
It wasn't easy for me to grasp. But with Python it's a little more interesting and easier to apply.
Anyone can level up their finance skills thanks to a cornucopia of finance calculation libraries in the Python ecosystem.
Recommended Reading: A Brief Introduction - ffn documentation
Install ffn with pip:
python -m pip install ffn
Here's the code to get stock data with ffn:
| import ffn
# ffn.get returns a pandas dataframe of market data.
data = ffn.get(
'tsla,spot,nflx,nke,addyy',
start='2019-01-01',
end='2020-12-31'
)
print(data.head())
stats = data.calc_stats()
print(stats.display())
|
side note on the pyfolio library
I first considered using pyfolio to pull stock data. It is not "out of the box" ready per se to deliver the results pictured in their "single stock" example documentation. You'd need to find another way to fetch your market data or somehow patch the Yahoo Finance API within pyfolio. I preferred ffn, mostly because it worked right away after pip installing it and running the above code.
2024 Update
For a capable finance module, I recommend yfinance. It has worked well for me also.
ffn and pyfolio both depend on the Yahoo Finance API, which tends to change and break these libraries.
Troubleshooting traceback errors may be required.
Try these other Python financial analysis libraries:
Nov 28, 2020
Here are 20 random technology-oriented Wikipedia links I recently collected after re-organizing troves of bookmarked links
accumulated over the past few years. These articles peek into the wide variety of things to learn about that exist in Computer Science.
ABL. Always. Be. Learning. Curiosity and well organized browser bookmarks are your friend.
I support Wikipedia with a donation nearly every year. It's an amazing resource to learn about everything and I'm very grateful for it.
Thank you for existing, Wikipedia. It's is a great jumping off point to learn about something I don't understand, which is much of this list here.
Oct 11, 2020
Introduction
Recently an old idea came back to life. I've posted to a Facebook Page
for several years as part of project I started on a whim. The goal of the page is to share anything positive
and inspirational by famous thinkers, artists and creators I read, or simply something positive to meditate on.
It was partially inspired by the discipline of "Positive Psychology".
Basically, William James was a cool dude. Martin Seligman is too.
I believe that positive feelings create positive outcomes and we can "game" ourselves into this feedback loop with literature
and other habits that support well-being like sleep and exercise.
After building up years of posts, I pondered for years how to capture the dataset of quote images to generate new positive-minded prose.
This post details one implementation and alternatives I considered to accomplish this goal.
All of the data and code in this post is published on Github.
Possibly will post my entire flask website there eventually!
Here's how I made my latest project, positivipy.
Project Overview
- Export all Facebook post images from my page
- Convert images to quote text with Optical character recognition (OCR)
- Data cleaning via pandas and manual correction
- Train on past quotes and generate new quotes with a Markov chain
1. Export all Facebook post images from my page
Facebook made this easy. I exported all of my timeline photos by following these instructions.
2. Converting images to quote text with OCR
Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo
Wikipedia - https://en.wikipedia.org/wiki/Optical_character_recognition
Once I had a folder of .jpg images, I used the Google Vision API's OCR to detect the text in the images. I also considered using the open source Calamari OCR library, but my research found that Google's Vision API may be more effective at detecting text.
Since I had only 771 images, I was able to extract text on all of them and stay within Google's free plan (1,000 requests / month). I followed these installation instructions on my Ubuntu Linux computer. It worked well on most of the images. Here's the code I used to detect text in all my images and save it in a .csv file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49 | import io
import os
from google.cloud import vision
import pandas as pd
"""Setup Instructions
1) Save this as detect_image_text.py
2) Create a folder named 'photos' and put your photos in them.
3) in your terminal, run: python detect_image_text.py
"""
def detect_text(path):
"""Detects text in the file."""
client = vision.ImageAnnotatorClient()
with io.open(path, "rb") as image_file:
content = image_file.read()
image = vision.Image(content=content)
response = client.text_detection(image=image)
texts = response.text_annotations
print("Texts:")
for text in texts:
return text.description
if response.error.message:
raise Exception(
"{}\nFor more info on error messages, check: "
"https://cloud.google.com/apis/design/errors".format(
response.error.message))
images = os.listdir("/images")
img_text = list()
for i, img in enumerate(images):
try:
img_path = f"photos/{img}"
text = detect_text(img_path)
img_text.append(text)
print(f"{i}: {text}")
except:
print(f"Failed: {img}")
quotes_df = pd.DataFrame(img_text, columns=["Text"])
csv = "Extracted Image Text.csv"
quotes_df.to_csv(csv, index=False)
|
3. Data cleaning via pandas and manual correction
The data did not come back perfect, but I was pleased with the Google Vision API's results. It saved me a lot of time compared to manually transcribing the images!
Next I used pandas to clean the data. You can see more in a Jupyter notebook with all of the code on github.
Then I manually removed the author or source names, keeping only the quote text.
4. Train on past quotes and generate new quotes
GPT-3, The State of the Art Option
Initially, I considered machine learning options for generating new text.
The GPT-3 library, released in early 2020 by Open AI, is the current state of the art model
for text generation. However, its API is only accessible on an invite basis. If I get access,
I think I'll try using it with the GPT-Sandbox Python library.
I searched around for other text generation python libraries on Github and found a promising
one named GPT-2_simple, which utilizes GPT-3's predecessor.
However, it requires using an old version of TensorFlow. I feel less inclined to learn older versions of
machine learning libraries. Currently, I'm waiting for GPT-3 access. I may try the GPT-2 route if I don't
get a chance at GPT-3.
A "Simple is Better Than Complex" Approach: Markov Chain
I wondered, are there any simpler options for text generation in python? Enter the Markov chain, which I stumbled across while Googling.
A Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.
Wikipedia - https://en.wikipedia.org/wiki/Markov_chain
Using the markovify library
Google pointed me to this post from Analytics India Magazine
showing the "Markovify" library.
Markovify makes generating your own Markov chain very easy! Install with pip:
pip install markovify
Here's the code to create Markov chain on the quote text:
| import markovify
# Build a markov chain model.
text_model = markovify.Text(text)
# Print five randomly-generated sentences
for i in range(5):
print(text_model.make_sentence())
|
Markov chains are below the level of sophistication of machine learning technologies like GPT-3 or GPT-2.
But Markov chains demonstrate how we can apply mathematics to mimic results or at least achieve an MVP
with a simpler approach. Another intriguing tool worth mentioning is the nltk library,
which offers natural language capabilities.
Eventually I will try the more sophisticated way using machine learning, but at least I am enjoying a quick
taste of success with a Markov chain. Here's what some cherry-picked results look like!
Ok, they're not great, but not too shabby either for my first time generating text from examples:
Maybe in the future I will use this for posts on my Facebook page, but it's not quite ready yet!
I really enjoyed the process of researching this challenge and hope this post helps you evaluate
your own text generation possibilities. This was fun to learn about. And best of all, I achieved
satisfying, albeit primitive results within one weekend. Thanks for reading and stay positive.
Check out the Markov chain in the wild here.
Update
In April 2025, my positivipy app eclipsed approximately 800 page views according to PythonAnywhere's
resources loaded traffic counter. The app is still nearly 10 years later and drawing visits from around the globe!
Oct 07, 2020
How do you download YouTube videos? The easiest answer is to google sites that have youtube downloader videos.
But then I thought, I wonder if I can make something?
Boredom, my curiosity and some googling turned up the pytube3 library, "A lightweight, dependency-free Python 3 library (and command-line utility) for downloading YouTube Videos." Lo and behold, 3 hours of experimentation later, I made a video downloader with Python. 😃
I used pytube3 with Flask and pythonanywhere to accomplish the task. I was pleasantly surprised at how it came together and simply worked! Here's how to make a primitive YouTube video downloader.
Install the pytube library in the pythonanywhere bash console with pip
pip3.8 install --user pytube3 --upgrade
If you're not using pythonanywhere, install Flask (it's already included in pythonanywhere)
python -m pip install flask
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38 | import logging
import sys
from pytube import YouTube
from flask import Flask, request, send_file
logging.basicConfig(stream=sys.stderr, level=logging.DEBUG)
app = Flask(__name__)
@app.route("/")
def youtube_downloader():
"""Render HTML form to accept YouTube URL."""
html_page = f"""<html><head>
<Title>YouTube Downloader</Title></head>
<body><h2>Enter URL to download YouTube Vids!</h2>
<div class="form">
<form action="/download_video" method="post">
<input type="text" name="URL">
<input type="submit" value="Submit">
</form></div><br><br>
</body></html>"""
return html_page
@app.route("/download_video", methods=["GET","POST"])
def download_video():
"""First pytube downloads the file locally in pythonanywhere:
/home/your_username/video_name.mp4
Then use Flask's send_file() to download the video
to the user's Downloads folder.
"""
try:
youtube_url = request.form["URL"]
download_path = YouTube(youtube_url).streams[0].download()
fname = download_path.split("//")[-1]
return send_file(fname, as_attachment=True)
except:
logging.exception("Failed download")
return "Video download failed!"
|
Minimum Viable Prototype Achieved
This is more of a proof of concept than workable solution. It works for many videos I tried. It occasionally had trouble downloading certain videos. I tested it successfully on videos of up to 10 minutes. Maybe it works more consistently if the file size is smaller? Or there is a bug on certain types of videos? For me, some videos of only a few minutes failed, so your results may vary. The videos that failed returned errors like KeyError: 'cipher' and KeyError: 'url'.
Honorable Mentions
youtube-dl: Command-line program to download videos from YouTube.com and other video sites
YoutubeDownload: GUI and CLI for downloading YouTube video/audio
Oct 04, 2020
My goal was to automate posting positive photos and quotes to my Facebook page, "Positive Thoughts Daily", with the Unsplash and Facebook APIs. Here's how I did it!
This implementation relies on my own collection of photos on Unsplash. I will manually select which photos I like to get added to my collection. Then my app will take the new photos and post one a day for me.
Side note: the free version of the Unsplash API is capped at 50 requests per week, which was enough for me.
Setting Up The Facebook API
- Sign up for a Facebook developer account
- Create a new Facebook app
- Add "page_manage_posts" and "pages_read_user_content" permissions to your app in the Graph API Explorer
- Get a "short lived" user access token from the Graph API explorer (optional: fetch a "long lived" user access token, which lasts up to 60 days)
- Use your user access token to a fetch your page's access token
Optional: fetch long lived access token with curl
curl -i -X GET "https://graph.facebook.com/oauth/access_token?grant_type=fb_exchange_token& client_id={app-id}& client_secret={app-secret}& fb_exchange_token={short-lived-user-access-token}"
Fetch your Facebook page's access token
curl -i -X GET "https://graph.facebook.com/{your-user-id}/accounts?access_token={user-access-token}
Setting up Unsplash
- Sign up for an Unsplash developer account
- Install the python-unsplash library. In the terminal enter:
python -m pip install python-unsplash
- Decide what photo you want to post. This example fetches a random photo from my Unsplash collection. You can also fetch any photo at random, or pass in a query to get a certain type of photo.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21 | from unsplash.api import Api
from unsplash.auth import Auth
import requests
"""Python-Unsplash library Github:
https://github.com/yakupadakli/python-unsplash
"""
client_id = "add_your_client_id"
client_secret = "add_your_client_secret"
redirect_uri = "add_your_redirect_uri"
code = ""
auth = Auth(client_id, client_secret, redirect_uri, code=code)
api = Api(auth)
# returns a python list containing a class
image = api.photo.random(collections=66610223) # my collection id
image_id = image[0].id
# Use image_id to get random photo's download link from a collection.
url = f"https://api.unsplash.com/photos/{image_id}/download?client_id={client_id}"
r = requests.get(url)
print(r.text)
image = r.json()
download_link = image['url']
|
Posting the Unsplash Image to Facebook
| """Use download link and post to page with Facebook API."""
page_id = "add_page_id_from_about_section"
url = f"https://graph.facebook.com/{page_id}/photos?access_token={page_access_token}&url={download_link}"
r = requests.post(url)
post_ids = r.json()
print(post_ids)
|
Post Project Reflections
This was my first time working with the Facebook API. Honestly, it's a little crazy trying to balance all the token types in your head. There are about 5 different types of tokens that are used for different things! Ultimately I was able to figure out how to to post a photo. So there is a bit of a learning curve. It's a good challenge to build your API skills. The Unsplash API requires no Oauth tokens and is easier to pick up.
My Facebook page posts are now triggered by page loads on this website! I am using a MySQL database to track which images I post to make sure I don't duplicate any posts and to make sure I only post once every 24 hours. Ah, I love the smell of fresh automation in the morning. 😀
Supplementary Links
Sep 20, 2020
introduction
SQLite is one of the heavy hitters in the database space, up there with other popular choices like MySQL, Postgres, Microsoft SQLServer, Cassandra and MariaDB. There is no shortage of database technologies but SQLite is certainly one that is commonly used. It also has a positive reputation.
Its terminal interface reminds me of MySQL. The syntax of both seem similarly "SQL-like" and easy to pick up.
I finally got around to test-driving a SQLite database this week. In this post, I've listed my impressions of some practical SQLite commands. The "dot" syntax is helpful to do a lot things as you'll see below. I'll conclude by briefly exploring the sqlite3 python library in the python interpreter.
getting started
I installed SQLite from the terminal with apt on Ubuntu Linux.
There are also downloads for Windows. A popular GUI is SQLite Studio.
create a new database
sqlite3 PythonMarketer.db
create a new db + new table and import a csv file to the table, "Readers"
sqlite3 PythonMarketer.db
.mode csv Readers
.import PythonMarketerReaders2015-2020.csv Readers
[source]
create a table
| CREATE TABLE Readers (Country TEXT,Visits INTEGER);
|
add new column with a default value
| ALTER TABLE Readers ADD TEXT DEFAULT '0';
|
show all help (and . syntax) options
.help
show all tables
.tables
show table creation statement (table schema)
.schema Readers
exit sqlite terminal
.exit
show databases
.databases
show all indexes
.indexes
"show" various DB settings
.show
Pictured: "showing" DB settings and "EXPLAIN-ing" a query
Exploring sqlite operators: the GLOB operator
Exploring my new table with the Python sqlite3 library in the Python interpreter
sqlite3 is in the python standard library, always a nice convenience to simply import it! Here we are connecting to an existing .db with sqlite3.connect()
Below, getting a cursor object that holds our SELECT query results. Then iterating through each row of the cursor object with a for loop, as demonstrated in the documentation.
Comparable Cursors and PEP 249
The cursor object has a variety of methods you can call on it for database operations and to execute SQL. You can read more about them in the sqlite3 module documentation. This library also follows PEP 249 - Python Database API Specification for recommended Database API interfaces.
I've noticed that in pyodbc, for example, the cursor object looks and feels the same as the cursor object in sqlite3. This is because they are both likely following PEP 249. Very cool!