Feb 09, 2022
This is an example of how to cast a Python dict into a dataframe and vice versa. I picked up the df to dict part from this Python and R tips post and the dict to df part from a Stack Overflow post. The below adaptation begins by converting an "NFL quarterbacks" Python dictionary into a dataframe and then back into a dict.
Sometimes a dictionary is adequate to solve a problem with handy methods like get() and items(). You can also do a ton with a dict comprehension. When more complex tabular data operations are needed, the pandas pd.DataFrame class is well equipped for the job. Dictionaries and dataframes are delightfully interoperable, like Tom Brady and any football team on the planet.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 | import pprint
import pandas as pd
qbs_dict = {
"Matthew Stafford":"Los Angeles Rams",
"Joe Burrow":"Cincinnati Bengals",
"Tom Brady":"Tampa Bay Buccaneers",
"Pat Mahomes":"Kansas City Chiefs",
"Tony Romo":"Dallas Cowboys"
}
qbs_df = pd.DataFrame(qbs_dict.items(), columns=["name", "team"])
print(qbs_df.info())
qbs_dict = pd.Series(qbs_df.team.values, index=qbs_df.name.values).to_dict()
pprint.pprint(qbs_dict, sort_dicts=True)
print(qbs_dict.get("Tom Brady", "Name not found."))
|
Terminal Output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 5 non-null object
1 team 5 non-null object
dtypes: object(2)
memory usage: 208.0+ bytes
None
{'Joe Burrow': 'Cincinnati Bengals',
'Matthew Stafford': 'Los Angeles Rams',
'Pat Mahomes': 'Kansas City Chiefs',
'Tom Brady': 'Tampa Bay Buccaneers',
'Tony Romo': 'Dallas Cowboys'}
Tampa Bay Buccaneers
Did you notice that pprint sorts dicts by default?
Here the printed dict is reordered alphabetically on the QB's names. Per the pprint docs, you can alter this behavior if desired via a keyword argument new in Python version 3.8:
| pprint.pprint(qbs_dict, sort_dicts=False)
|
pandas Documentation
pandas installation documentation
pandas.DataFrame
pandas.Series
pandas.DataFrame.to_dict
pandas.DataFrame.info
Python Standard Library Documentation
pprint.pprint
dict
dict.get
Jan 30, 2022
Below are two practical Python libraries for text processing. This function uses textblob's spelling correction along with language_tool_python, which applies grammatical corrections via the Language Tool API. I added these text processing transformations into my concept text generation app. These are free, public APIs up to around 20 requests per second. You can send both text and receive back an improved version of your text, ideally altering and improving your writing.
I found 2 errors when I piped the text of this post into the below code: the proper noun "textblob" corrected to "text blow's" and the word "app" corrected to "pp". Be sure to proof your results. Regardless, I like having these two Python tools in my bag!
Install
textblob
language_tool_python
help with pip
pip install language-tool-python
pip install -U textblob
python -m textblob.download_corpora
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24 | import language_tool_python
from textblob import TextBlob
def fix_spelling_and_grammar(text):
"""Returns str: text transformed by language tool and text blob
1) Apply language tool API correction
Language Tool Public API: https://dev.languagetool.org/public-http-api
https://languagetool.org/http-api/swagger-ui/#!/default/post_check
python library: https://pypi.org/project/language-tool-python/
2) Apply textblob's spell check to the text"""
try:
# use the public API, language English
tool = language_tool_python.LanguageToolPublicAPI('en-US')
text = tool.correct(text)
b = TextBlob(text)
return str(b.correct())
except:
return text
text = "Language is incredble. Fascinatng how hoomans have so many."
transformed_text = fix_spelling_and_grammar(text)
print(transformed_text)
# Result: Language is incredible. Fascinating how humans have so many.
|
Jan 23, 2022
This command can be used to upgrade your Python requests library with pip, Python's package manager. It is tailored for a PythonAnywhere environment. I suppose this command works on any Bash console, but if you're running your app with pythonanywhere, you can find the bash console here:
- ::
- https://www.pythonanywhere.com/user/your_username/consoles/
Install requests with this command:
python3.8 -m pip install requests --upgrade --user
Substitute in whatever your Python version is. This command upgrades the requests library on a PythonAnywhere app. If any libraries depend on a specific version of requests, a warning appears like this one I saw for the python-unsplash library.
ERROR: python-unsplash 1.1.0 has requirement requests==2.20.0, but you'll have requests 2.27.1 which is incompatible.
Jan 10, 2022
Who is the world's greatest footballer, Messi or Ronaldo? EA Sports surely has calculated the answer to this question in their player ratings. They rate peak Crisitiano Ronaldo, Lionel Messi and Luka Modrić at 99 overall, with Neymar and Lewandowski at 98. Anecdotally, Messi has won 7 Ballon d'Or, the highest individual football honor one can achieve each year. Ronaldo has won 5 B'allon d'Or. Modrić has won 1 Ballon d'Or. Lewandowski was runner up this year, but has never won the honor. Neymar has never won a Ballon d'Or.
In FIFA, a player's video game representation is modeled intricately in a series of traits and specialties characterizing each player. The "Ultimate Team" EA Sports API is viewable as a plain json page or more cheekily with one line of curl and jq, a "command line json processor":
curl 'https://www.easports.com/fifa/ultimate-team/api/fut/item' | jq '.'
Enter this in a shell or command line. The result is beautiful, readable, pretty printed json!
Messi (Top) Vs. Ronaldo (Bottom) FIFA Player Ratings
These ratings represent the players at their peak of their careers. Messi is a better dribbler, while Ronaldo has more power and strength. Messi has the edge in free kicks, curve in his shot and "longshots" 99 to 98 over Cristiano. They are tied at "finishing", each with 99. Ronaldo has the "Power Free-Kick" trait, whereas Messi has "Chip Shot", "Finesse Shot" and "Playmaker" traits giving him an edge.
EA's ratings suggest that both are prominent goal scorers, with a slight edge to Messi in finesse and shooting from distance. However, there's something to be said for kicking the ball really damn hard. Ronaldo has superior raw shot power and a lethal combo of more powerful jump and stronger headers. All this combined with an "Aerial Threat" specialty enables Ronaldo to vault above and around defenders to smash in golazos off the volley. Ronaldo sizes up to 6' 2" (187 cm) vs. Messi's 5' 7" (170 cm) frame. This Portugese man definitely has an advantage in getting higher in the air. But the Argentinian is quite darty.
Messi has incredible accuracy from distance. He's also a better passer all around and has perfect "vision", great qualities for winning football games. Only in crossing does he have a lower passing rating. Ronaldo is also 10 points better at "penalties" or penalty kicks. The closer he gets to the goal, the more dangerous he is. Messi is more dangerous with the ball while dribbling, passing or shooting except when taking a PK.
Advantages can be gained in many different aspects of soccer. EA has developed a fun dataset to model these all time greats across several football skill dimensions. In 2022's version of the game, Messi is rated a 93, with Cristiano 91. Clearly these two are worthy of top honors. Don't forget Robert Lewandowski, with a 92 rating, who consistently lights up the Champions League and Bundesliga.
jq ftw
I had never used jq before this. Really enjoyed the quick, stylish and practical view of some json. This cool terminal display and syntax highlighting was on my Chromebook shell. It's neat how easily you can pretty print json with jq. I rate it a 99 for json pretty processing and pretty printing on the FIFA scale. Read more in the jq documentation!
Jan 08, 2022
Yesterday, I experienced a flow state where I became manically obsessed
with perfecting a script I was working on. I think it's beautiful code,
about 100 lines long without docstrings. It solves a real need
and it felt great to write it. Some scripts feel terrible to write and
you know they're bad. However, this one felt like one of the best I've
ever written.
Flow seems like a
mythical, unattainable state these days as portrayed in media, but we
can all agree... we love it. When you're in flow, you know it and you
feel a grace in improving your work. For coders, maybe it's by wrapping
up a few lines here and there into functions. Refactoring, reordering,
handling loose ends or edge cases, writing docstrings with supporting
documentation and clarifying that you really understand what's
happening... these things are all mundane at times but critical to
writing reliable code.
While doing these typical tasks, you're attaining skill and mastery, one
of the highest dopamine hits humans can register legally in all 50
states. You know how much better this iteration of code is than when you
first learned to write software. You take bits and pieces from past
projects and fit them all together into a cohesive, purposeful program.
For example, I was tickled to use Python's readlines() file reading
function
to get the last line of a text file. I learned about this function in my
first ever free Python course on Coursera, 7 years ago. Thanks again
Dr. Chuck!
This time, I realized my flow when researching ISO 8601 time format
strings and guiding them
into an HTTP request with the requests library. A new solution
emerged, regurgitated from a prior project and mashed up into a more
refined form to satisfy the project's requirements. I combined old and
new ideas into a better solution than I had ever thought, a fitting
complement for the API at
hand. Time will tell if the solution will actually work as well as I
hope.
Flow is real. You can find work that puts you in a flow state, and it
doesn't have to be super interesting work to get there. The learning
process pays rewards in competency when exposure to different domains
combine. Einstein knew a form of this as combinatory play.
Repetition enhances this effect and solidifies your foundation. Flow
makes it fun! Only rarely do I feel the highest level of engrossment in
my work. I sensed I was flowing on this recent project. You can find
these types of challenges too. Keep searching for your flow!
Jan 07, 2022
Pandas is amazing, what else is there to say? Learning the nuances of its API have yielded tons of times where it helped me get stuff done.
I recently picked up the pandas dataframe's "assign" function for creating a new column of values. This is an elegant way to set a column of values in tabular data with the pandas library. Below you'll see two ways to set a column of values in pandas. In the first way, I am chaining two assign functions together to create 2 new columns, "sound" and "type". I prefer using assign because it looks better and it does not result in any warnings from pandas. Highly recommend getting familiar with pandas functions like assign and API nuances like Series accessors to up your tabular data game.
| import pandas as pd
cats = ["Garfield","Meowth","Tom"]
df = pd.DataFrame([cats], columns=["cats"])
# best way
df = df.assign(sound="Meow").assign(type="Cartoon")
print(df.head())
# alternative way that also works, but with warnings from pandas
# df["sound"] = "Meow"
|
DataFrame.assign : Can evaluate an expression or function to create new values for a column.
pandas source code: https://github.com/pandas-dev/pandas/blob/v1.3.5/pandas/core/frame.py#L4421-L4487
Jan 06, 2022
Every Python programmer has undoubtedly come across some crazy characters. The ftfy library "Fixes Text For You" and acts like a swiss army knife when you've got questionable characters breaking your script. In my case, an HTTP request was failing because of weird cryptic letters hiding in the data when it was only supposed to be an apostrophe. This library fixed my text and made it appear flawless. I really like ftfy because it solves a common problem, fixing "mojibake" or mangled characters. It's a good tool to have when you see these types of issues!
Install with pip:
pip install ftfy
See also: Python Unicode How To
Jan 05, 2022
Did your script run to completion? Sure, you might log some tracebacks along the way or terminate the program early with sys.exit(). But did your script actually run completely to the end? I have yet to use the Trace module but it seems worth checking out also. Visualization tools like heartrate are worth mentioning too depending on how you are running your scripts. Task runners typically have run status tracking as well. I like having a visual confirmation by logging some sort of information when a script finishes as intended. It's nice to know when your scripts finished or not. Use logging and Trace to up your reliability of your scripts.
An easy way to track this is with the logging module:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 | import logging
def improvise():
"""Improv Tutorial: https://www.youtube.com/watch?v=C6wY9OwqJ2A"""
try:
print("Boom! Detective Michael Scarn, I'm with the FBI!")
return None
except:
logging.exception("Error occurred during improv.")
return None
FORMAT = '%(asctime)s %(clientip)-15s %(user)-8s %(message)s'
logging.basicConfig(filename="improvise.log", format=FORMAT)
improvise()
logging.info("Improvisation finished!")
|
Dec 31, 2021
This is a solution I worked out recently to strip phone numbers into a uniform format. To install pandas with pip, enter in command prompt:
python -m pip install pandas
The pandas library has regex built in and it's pretty neat! Behold the power of pandas and a regular expression to do trivial telephone tidying:
| import pandas as pd
s = pd.Series(data=["(010) 001-1010"], name="Phone", dtype="str")
# remove parentheses, hyphens and spaces with pandas + regex
s = s.str.replace(pat="\(|\)|-| ", repl="", regex=True)
print(s)
# resulting number: "0100011010"
|
Regex is cool.
Grasping the intricacies of what this code is doing feels elegant when you connect the dots.. or pipes. The replace is done via a pandas str accessor. In the pat string, the parentheses are escaped with slashes and separated by pipes "|". They act as an or operator, succinctly chaining multiple characters together for matching and in this case replacing them with nothing. Pretty nifty. If you read the pandas docs, you'll find regex is accessible in different parts of the API. Dive in, it's some of my favorite documentation to snoop. There is so much you can do with pandas. This example demonstrates how its flexible functions get the job done efficiently.
Further Reading:
pandas.Series documentation
pandas str.replace documentation
Source of the famous “Now you have two problems” quote
Dec 19, 2021
If you write Python code, there's probably been a time or two when you saw the dreaded "MemoryError".
This happens after one of your Python scripts stops because your computer has no spare RAM to execute it.
I recently experienced this frustration whilst trying to write hundreds of thousands of csv files. However, this time I grasped for tools that support smarter memory management.
Now, I can watch my computer's memory bounce around with the Windows Resource Monitor. Python has quite a few memory profiling libraries for monitoring memory too!
Python Libraries and Guides
Memory Management Overview, Python documentation
Memory Profiler: "monitor memory usage of Python code"
psutil: "Cross-platform lib for process and system monitoring in Python"
py-spy: "Sampling profiler for Python programs"
pyinstrument: "🚴 Call stack profiler for Python. Shows you why your code is slow!"
Scalene: "a high-performance, high-precision CPU, GPU, and memory profiler for Python"
Glances: "Glances an Eye on your system. A top/htop alternative for GNU/Linux, BSD, Mac OS and Windows operating systems."
Yappi: "Yet Another Python Profiler, but this time thread & coroutine & greenlet aware."
Fil: "A Python memory profiler for data processing and scientific computing applications" (Video)
line_profiler: "Line-by-line profiling for Python"
pprofile: "Line-granularity, thread-aware deterministic and statistic pure-python profiler"
Guppy 3: "Python programming environment and heap analysis toolset"
See also: The Python Profilers, Python documentation
CPython standard distribution comes with three deterministic profilers. cProfile, Profile and hotshot. cProfile is implemented
as a C module based on lsprof, Profile is in pure Python and hotshot can be seen as a small subset of a cProfile.
Yappi Github, https://github.com/sumerc/yappi
Windows Tools
Task Manager: Windows process management tool with some memory analytics
Collect Data in Windows with Performance Monitor
Resource Monitor: Windows tool with Memory, CPU, Disk and Network monitoring tabs
Memory Tips and Guides
- Use only the data you need. Any data you read in and aren't using is held in memory. The usecols argument in pandas is a great way to read a csv and only use the columns you need.
- Reading data in chunks with the chunksize argument is another way to reduce memory usage for large datasets.
- Measuring the memory usage of a Pandas dataframe
- Some tools are line oriented, others are function oriented. If your code contains large functions, you might favor a line based profiling tool.
- Be aware of the overhead some memory tools may incur. memory_profile was clocked with a whopping 270x slowdown per the Scalene PyCon talk below. The talk shows an awesome comparison of these Python profiling libraries:
Recommended Reading
Conclusion
When you'll see "MemoryError" depends on your computer's hardware, the size of your dataset and what operations you need to script out. Generally speaking, I/O or file reads and writes are more expensive operations.
The tools in this post will help you anticipate how much computing power you have available, monitor your memory consumption more closely and avoid pushing your computer past its limits.
You can do things like reading data in chunks and only using the columns you need to reduce your memory consumption.
Realizing these tools and strategies can make getting things done with Python a smoother ride.