Lo-Fi Python

Feb 09, 2022

How to Convert a Python Dictionary to and from a pandas DataFrame

This is an example of how to cast a Python dict into a dataframe and vice versa. I picked up the df to dict part from this Python and R tips post and the dict to df part from a Stack Overflow post. The below adaptation begins by converting an "NFL quarterbacks" Python dictionary into a dataframe and then back into a dict.

Sometimes a dictionary is adequate to solve a problem with handy methods like get() and items(). You can also do a ton with a dict comprehension. When more complex tabular data operations are needed, the pandas pd.DataFrame class is well equipped for the job. Dictionaries and dataframes are delightfully interoperable, like Tom Brady and any football team on the planet.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import pprint
import pandas as pd

qbs_dict = {
    "Matthew Stafford":"Los Angeles Rams",
    "Joe Burrow":"Cincinnati Bengals",
    "Tom Brady":"Tampa Bay Buccaneers",
    "Pat Mahomes":"Kansas City Chiefs",
    "Tony Romo":"Dallas Cowboys"
}
qbs_df = pd.DataFrame(qbs_dict.items(), columns=["name", "team"])
print(qbs_df.info())
qbs_dict = pd.Series(qbs_df.team.values, index=qbs_df.name.values).to_dict()
pprint.pprint(qbs_dict, sort_dicts=True)
print(qbs_dict.get("Tom Brady", "Name not found."))

Terminal Output

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   name    5 non-null      object
 1   team    5 non-null      object
dtypes: object(2)
memory usage: 208.0+ bytes
None

{'Joe Burrow': 'Cincinnati Bengals',
 'Matthew Stafford': 'Los Angeles Rams',
 'Pat Mahomes': 'Kansas City Chiefs',
 'Tom Brady': 'Tampa Bay Buccaneers',
 'Tony Romo': 'Dallas Cowboys'}

Tampa Bay Buccaneers

Did you notice that pprint sorts dicts by default?

Here the printed dict is reordered alphabetically on the QB's names. Per the pprint docs, you can alter this behavior if desired via a keyword argument new in Python version 3.8:

1
pprint.pprint(qbs_dict, sort_dicts=False)

pandas Documentation

pandas installation documentation

pandas.DataFrame

pandas.Series

pandas.DataFrame.to_dict

pandas.DataFrame.info

Python Standard Library Documentation

pprint.pprint

dict

dict.get

Jan 30, 2022

Fix Spelling and Grammar with language_tool_python and textblob

Below are two practical Python libraries for text processing. This function uses textblob's spelling correction along with language_tool_python, which applies grammatical corrections via the Language Tool API. I added these text processing transformations into my concept text generation app. These are free, public APIs up to around 20 requests per second. You can send both text and receive back an improved version of your text, ideally altering and improving your writing.

I found 2 errors when I piped the text of this post into the below code: the proper noun "textblob" corrected to "text blow's" and the word "app" corrected to "pp". Be sure to proof your results. Regardless, I like having these two Python tools in my bag!

Install

textblob

language_tool_python

help with pip

pip install language-tool-python
pip install -U textblob
python -m textblob.download_corpora
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import language_tool_python
from textblob import TextBlob

def fix_spelling_and_grammar(text):
    """Returns str: text transformed by language tool and text blob
    1) Apply language tool API correction
    Language Tool Public API: https://dev.languagetool.org/public-http-api
    https://languagetool.org/http-api/swagger-ui/#!/default/post_check
    python library: https://pypi.org/project/language-tool-python/

    2) Apply textblob's spell check to the text"""
    try:
        # use the public API, language English
        tool = language_tool_python.LanguageToolPublicAPI('en-US')
        text =  tool.correct(text)
        b = TextBlob(text)
        return str(b.correct())
    except:
        return text

text = "Language is incredble. Fascinatng how hoomans have so many."
transformed_text = fix_spelling_and_grammar(text)
print(transformed_text)
# Result: Language is incredible. Fascinating how humans have so many.

Jan 23, 2022

How to Upgrade Requests in the Bash Console

This command can be used to upgrade your Python requests library with pip, Python's package manager. It is tailored for a PythonAnywhere environment. I suppose this command works on any Bash console, but if you're running your app with pythonanywhere, you can find the bash console here:

::
https://www.pythonanywhere.com/user/your_username/consoles/
bash console python install

Install requests with this command:

python3.8 -m pip install requests --upgrade --user

Substitute in whatever your Python version is. This command upgrades the requests library on a PythonAnywhere app. If any libraries depend on a specific version of requests, a warning appears like this one I saw for the python-unsplash library.

ERROR: python-unsplash 1.1.0 has requirement requests==2.20.0, but you'll have requests 2.27.1 which is incompatible.
Requests upgrade python library

Jan 10, 2022

Analyzing Messi vs. Ronaldo with the FIFA API + jq + curl

Who is the world's greatest footballer, Messi or Ronaldo? EA Sports surely has calculated the answer to this question in their player ratings. They rate peak Crisitiano Ronaldo, Lionel Messi and Luka Modrić at 99 overall, with Neymar and Lewandowski at 98. Anecdotally, Messi has won 7 Ballon d'Or, the highest individual football honor one can achieve each year. Ronaldo has won 5 B'allon d'Or. Modrić has won 1 Ballon d'Or. Lewandowski was runner up this year, but has never won the honor. Neymar has never won a Ballon d'Or.

In FIFA, a player's video game representation is modeled intricately in a series of traits and specialties characterizing each player. The "Ultimate Team" EA Sports API is viewable as a plain json page or more cheekily with one line of curl and jq, a "command line json processor":

curl 'https://www.easports.com/fifa/ultimate-team/api/fut/item' | jq '.'

Enter this in a shell or command line. The result is beautiful, readable, pretty printed json!

Messi (Top) Vs. Ronaldo (Bottom) FIFA Player Ratings

These ratings represent the players at their peak of their careers. Messi is a better dribbler, while Ronaldo has more power and strength. Messi has the edge in free kicks, curve in his shot and "longshots" 99 to 98 over Cristiano. They are tied at "finishing", each with 99. Ronaldo has the "Power Free-Kick" trait, whereas Messi has "Chip Shot", "Finesse Shot" and "Playmaker" traits giving him an edge.

EA's ratings suggest that both are prominent goal scorers, with a slight edge to Messi in finesse and shooting from distance. However, there's something to be said for kicking the ball really damn hard. Ronaldo has superior raw shot power and a lethal combo of more powerful jump and stronger headers. All this combined with an "Aerial Threat" specialty enables Ronaldo to vault above and around defenders to smash in golazos off the volley. Ronaldo sizes up to 6' 2" (187 cm) vs. Messi's 5' 7" (170 cm) frame. This Portugese man definitely has an advantage in getting higher in the air. But the Argentinian is quite darty.

Messi has incredible accuracy from distance. He's also a better passer all around and has perfect "vision", great qualities for winning football games. Only in crossing does he have a lower passing rating. Ronaldo is also 10 points better at "penalties" or penalty kicks. The closer he gets to the goal, the more dangerous he is. Messi is more dangerous with the ball while dribbling, passing or shooting except when taking a PK.

Advantages can be gained in many different aspects of soccer. EA has developed a fun dataset to model these all time greats across several football skill dimensions. In 2022's version of the game, Messi is rated a 93, with Cristiano 91. Clearly these two are worthy of top honors. Don't forget Robert Lewandowski, with a 92 rating, who consistently lights up the Champions League and Bundesliga.

jq ftw

I had never used jq before this. Really enjoyed the quick, stylish and practical view of some json. This cool terminal display and syntax highlighting was on my Chromebook shell. It's neat how easily you can pretty print json with jq. I rate it a 99 for json pretty processing and pretty printing on the FIFA scale. Read more in the jq documentation!

Jan 08, 2022

Experiencing Flow While Coding

Yesterday, I experienced a flow state where I became manically obsessed with perfecting a script I was working on. I think it's beautiful code, about 100 lines long without docstrings. It solves a real need and it felt great to write it. Some scripts feel terrible to write and you know they're bad. However, this one felt like one of the best I've ever written.

Flow seems like a mythical, unattainable state these days as portrayed in media, but we can all agree... we love it. When you're in flow, you know it and you feel a grace in improving your work. For coders, maybe it's by wrapping up a few lines here and there into functions. Refactoring, reordering, handling loose ends or edge cases, writing docstrings with supporting documentation and clarifying that you really understand what's happening... these things are all mundane at times but critical to writing reliable code.

While doing these typical tasks, you're attaining skill and mastery, one of the highest dopamine hits humans can register legally in all 50 states. You know how much better this iteration of code is than when you first learned to write software. You take bits and pieces from past projects and fit them all together into a cohesive, purposeful program. For example, I was tickled to use Python's readlines() file reading function to get the last line of a text file. I learned about this function in my first ever free Python course on Coursera, 7 years ago. Thanks again Dr. Chuck!

This time, I realized my flow when researching ISO 8601 time format strings and guiding them into an HTTP request with the requests library. A new solution emerged, regurgitated from a prior project and mashed up into a more refined form to satisfy the project's requirements. I combined old and new ideas into a better solution than I had ever thought, a fitting complement for the API at hand. Time will tell if the solution will actually work as well as I hope.

Flow is real. You can find work that puts you in a flow state, and it doesn't have to be super interesting work to get there. The learning process pays rewards in competency when exposure to different domains combine. Einstein knew a form of this as combinatory play. Repetition enhances this effect and solidifies your foundation. Flow makes it fun! Only rarely do I feel the highest level of engrossment in my work. I sensed I was flowing on this recent project. You can find these types of challenges too. Keep searching for your flow!

Jan 07, 2022

Create a Column of Values in Pandas with df.assign()

Pandas is amazing, what else is there to say? Learning the nuances of its API have yielded tons of times where it helped me get stuff done.

I recently picked up the pandas dataframe's "assign" function for creating a new column of values. This is an elegant way to set a column of values in tabular data with the pandas library. Below you'll see two ways to set a column of values in pandas. In the first way, I am chaining two assign functions together to create 2 new columns, "sound" and "type". I prefer using assign because it looks better and it does not result in any warnings from pandas. Highly recommend getting familiar with pandas functions like assign and API nuances like Series accessors to up your tabular data game.

1
2
3
4
5
6
7
8
9
import pandas as pd
cats = ["Garfield","Meowth","Tom"]
df = pd.DataFrame([cats], columns=["cats"])
# best way
df = df.assign(sound="Meow").assign(type="Cartoon")
print(df.head())

# alternative way that also works, but with warnings from pandas
# df["sound"] = "Meow"

DataFrame.assign : Can evaluate an expression or function to create new values for a column. pandas source code: https://github.com/pandas-dev/pandas/blob/v1.3.5/pandas/core/frame.py#L4421-L4487

Jan 06, 2022

ftfy, The Wonky Text Fixing Python Library

Every Python programmer has undoubtedly come across some crazy characters. The ftfy library "Fixes Text For You" and acts like a swiss army knife when you've got questionable characters breaking your script. In my case, an HTTP request was failing because of weird cryptic letters hiding in the data when it was only supposed to be an apostrophe. This library fixed my text and made it appear flawless. I really like ftfy because it solves a common problem, fixing "mojibake" or mangled characters. It's a good tool to have when you see these types of issues!

Install with pip:

pip install ftfy

See also: Python Unicode How To

Jan 05, 2022

How to Track Python Script Completion

Did your script run to completion? Sure, you might log some tracebacks along the way or terminate the program early with sys.exit(). But did your script actually run completely to the end? I have yet to use the Trace module but it seems worth checking out also. Visualization tools like heartrate are worth mentioning too depending on how you are running your scripts. Task runners typically have run status tracking as well. I like having a visual confirmation by logging some sort of information when a script finishes as intended. It's nice to know when your scripts finished or not. Use logging and Trace to up your reliability of your scripts.

An easy way to track this is with the logging module:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import logging

def improvise():
    """Improv Tutorial: https://www.youtube.com/watch?v=C6wY9OwqJ2A"""
    try:
        print("Boom! Detective Michael Scarn, I'm with the FBI!")
        return None
    except:
        logging.exception("Error occurred during improv.")
        return None

FORMAT = '%(asctime)s %(clientip)-15s %(user)-8s %(message)s'
logging.basicConfig(filename="improvise.log", format=FORMAT)
improvise()
logging.info("Improvisation finished!")

Dec 31, 2021

Phone Number Cleaning Regex + pandas Series Example

This is a solution I worked out recently to strip phone numbers into a uniform format. To install pandas with pip, enter in command prompt:

python -m pip install pandas

The pandas library has regex built in and it's pretty neat! Behold the power of pandas and a regular expression to do trivial telephone tidying:

strip phone formatting with Python
1
2
3
4
5
6
import pandas as pd
s = pd.Series(data=["(010) 001-1010"], name="Phone", dtype="str")
# remove parentheses, hyphens and spaces with pandas + regex
s = s.str.replace(pat="\(|\)|-| ", repl="", regex=True)
print(s)
# resulting number: "0100011010"

Regex is cool.

Grasping the intricacies of what this code is doing feels elegant when you connect the dots.. or pipes. The replace is done via a pandas str accessor. In the pat string, the parentheses are escaped with slashes and separated by pipes "|". They act as an or operator, succinctly chaining multiple characters together for matching and in this case replacing them with nothing. Pretty nifty. If you read the pandas docs, you'll find regex is accessible in different parts of the API. Dive in, it's some of my favorite documentation to snoop. There is so much you can do with pandas. This example demonstrates how its flexible functions get the job done efficiently.

Further Reading:

pandas.Series documentation

pandas str.replace documentation

Source of the famous “Now you have two problems” quote

Dec 19, 2021

Memory Monitoring Python Libraries + Tools

If you write Python code, there's probably been a time or two when you saw the dreaded "MemoryError". This happens after one of your Python scripts stops because your computer has no spare RAM to execute it. I recently experienced this frustration whilst trying to write hundreds of thousands of csv files. However, this time I grasped for tools that support smarter memory management. Now, I can watch my computer's memory bounce around with the Windows Resource Monitor. Python has quite a few memory profiling libraries for monitoring memory too!

Python Libraries and Guides

Memory Management Overview, Python documentation

Memory Profiler: "monitor memory usage of Python code"

psutil: "Cross-platform lib for process and system monitoring in Python"

py-spy: "Sampling profiler for Python programs"

pyinstrument: "🚴 Call stack profiler for Python. Shows you why your code is slow!"

Scalene: "a high-performance, high-precision CPU, GPU, and memory profiler for Python"

Glances: "Glances an Eye on your system. A top/htop alternative for GNU/Linux, BSD, Mac OS and Windows operating systems."

Yappi: "Yet Another Python Profiler, but this time thread & coroutine & greenlet aware."

Fil: "A Python memory profiler for data processing and scientific computing applications" (Video)

line_profiler: "Line-by-line profiling for Python"

pprofile: "Line-granularity, thread-aware deterministic and statistic pure-python profiler"

Guppy 3: "Python programming environment and heap analysis toolset"

See also: The Python Profilers, Python documentation

CPython standard distribution comes with three deterministic profilers. cProfile, Profile and hotshot. cProfile is implemented as a C module based on lsprof, Profile is in pure Python and hotshot can be seen as a small subset of a cProfile.

Yappi Github, https://github.com/sumerc/yappi

Windows Tools

Task Manager: Windows process management tool with some memory analytics

Collect Data in Windows with Performance Monitor

Resource Monitor: Windows tool with Memory, CPU, Disk and Network monitoring tabs

Resource Monitor can stop processes from running and view in use, standby (Cached) and free memory. This shows 7 Python scripts running and 49% of total memory is being consumed. Looks like we are running steady and safely below "MemoryError" overflow. We might be able to add a few more scripts with 51% of RAM available!

Resource Monitor can stop processes from running and view in use, standby (Cached) and free memory. This shows 7 Python scripts running and 49% of total memory is being consumed. Looks like we are running steady and safely below "MemoryError" overflow. We might be able to add a few more scripts with 51% of RAM available!

Memory Tips and Guides

  • Use only the data you need. Any data you read in and aren't using is held in memory. The usecols argument in pandas is a great way to read a csv and only use the columns you need.
  • Reading data in chunks with the chunksize argument is another way to reduce memory usage for large datasets.
  • Measuring the memory usage of a Pandas dataframe
  • Some tools are line oriented, others are function oriented. If your code contains large functions, you might favor a line based profiling tool.
  • Be aware of the overhead some memory tools may incur. memory_profile was clocked with a whopping 270x slowdown per the Scalene PyCon talk below. The talk shows an awesome comparison of these Python profiling libraries:
Scalene Pycon US 2021 Talk

Recommended Reading

Conclusion

When you'll see "MemoryError" depends on your computer's hardware, the size of your dataset and what operations you need to script out. Generally speaking, I/O or file reads and writes are more expensive operations.

The tools in this post will help you anticipate how much computing power you have available, monitor your memory consumption more closely and avoid pushing your computer past its limits. You can do things like reading data in chunks and only using the columns you need to reduce your memory consumption. Realizing these tools and strategies can make getting things done with Python a smoother ride.

← Previous Next → Page 7 of 14