Lo-Fi Python

Jul 06, 2022

The Things They Don't Tell You About Ampersands and XML

In an XML document, you need to escape any ampersands in your text as &

I began a new coding project. Sure, there's documentation for the API that solves my problem. I find out it uses XML. Extensible Markup Language, a classic API format. Cool. I craft a beautiful script that works at first. Or so it seems!

Later on, I realize it doesn't work as well as I believed. It turns out, if I want a server to accept my XML document, escaping certain characters might be required. The documentation didn't mention this. It was my first time using XML, how would I know?

I noticed a script only worked for a handful of requests. It failed for most, returning a 400 status code. Suspecting the issue was likely in my payload, I studied the data of the request bodies that failed compared to the others that succeeded. All of the payload bodies that failed contained text with an ampersand.

Suspecting it might be an XML + ampersand related issue, I googled this Stack Overflow post which explains the ampersand escaping situation. There are a handful of characters that must be escaped. Otherwise, you might not be able to connect to the server.

These are the things they often don't tell you. Those little details you must sometimes realize for yourself, unless someone bothers to mention it or write it down. Now you know something that cost me an hour or two of tinkering to realize!

`Image Source <https://github.com/sichkar-valentyn/XML_files_in_Python/blob/master/example.xml>`__

Image Source

Want to read more on HTTP? Check out my guide on making HTTP requests with Python to read more about HTTP requests.

Jan 10, 2022

Analyzing Messi vs. Ronaldo with the FIFA API + jq + curl

Who is the world's greatest footballer, Messi or Ronaldo? EA Sports surely has calculated the answer to this question in their player ratings. They rate peak Crisitiano Ronaldo, Lionel Messi and Luka Modrić at 99 overall, with Neymar and Lewandowski at 98. Anecdotally, Messi has won 7 Ballon d'Or, the highest individual football honor one can achieve each year. Ronaldo has won 5 B'allon d'Or. Modrić has won 1 Ballon d'Or. Lewandowski was runner up this year, but has never won the honor. Neymar has never won a Ballon d'Or.

In FIFA, a player's video game representation is modeled intricately in a series of traits and specialties characterizing each player. The "Ultimate Team" EA Sports API is viewable as a plain json page or more cheekily with one line of curl and jq, a "command line json processor":

curl 'https://www.easports.com/fifa/ultimate-team/api/fut/item' | jq '.'

Enter this in a shell or command line. The result is beautiful, readable, pretty printed json!

Messi (Top) Vs. Ronaldo (Bottom) FIFA Player Ratings

These ratings represent the players at their peak of their careers. Messi is a better dribbler, while Ronaldo has more power and strength. Messi has the edge in free kicks, curve in his shot and "longshots" 99 to 98 over Cristiano. They are tied at "finishing", each with 99. Ronaldo has the "Power Free-Kick" trait, whereas Messi has "Chip Shot", "Finesse Shot" and "Playmaker" traits giving him an edge.

EA's ratings suggest that both are prominent goal scorers, with a slight edge to Messi in finesse and shooting from distance. However, there's something to be said for kicking the ball really damn hard. Ronaldo has superior raw shot power and a lethal combo of more powerful jump and stronger headers. All this combined with an "Aerial Threat" specialty enables Ronaldo to vault above and around defenders to smash in golazos off the volley. Ronaldo sizes up to 6' 2" (187 cm) vs. Messi's 5' 7" (170 cm) frame. This Portugese man definitely has an advantage in getting higher in the air. But the Argentinian is quite darty.

Messi has incredible accuracy from distance. He's also a better passer all around and has perfect "vision", great qualities for winning football games. Only in crossing does he have a lower passing rating. Ronaldo is also 10 points better at "penalties" or penalty kicks. The closer he gets to the goal, the more dangerous he is. Messi is more dangerous with the ball while dribbling, passing or shooting except when taking a PK.

Advantages can be gained in many different aspects of soccer. EA has developed a fun dataset to model these all time greats across several football skill dimensions. In 2022's version of the game, Messi is rated a 93, with Cristiano 91. Clearly these two are worthy of top honors. Don't forget Robert Lewandowski, with a 92 rating, who consistently lights up the Champions League and Bundesliga.

jq ftw

I had never used jq before this. Really enjoyed the quick, stylish and practical view of some json. This cool terminal display and syntax highlighting was on my Chromebook shell. It's neat how easily you can pretty print json with jq. I rate it a 99 for json pretty processing and pretty printing on the FIFA scale. Read more in the jq documentation!

Jan 07, 2022

Create a Column of Values in Pandas with df.assign()

Pandas is amazing, what else is there to say? Learning the nuances of its API have yielded tons of times where it helped me get stuff done.

I recently picked up the pandas dataframe's "assign" function for creating a new column of values. This is an elegant way to set a column of values in tabular data with the pandas library. Below you'll see two ways to set a column of values in pandas. In the first way, I am chaining two assign functions together to create 2 new columns, "sound" and "type". I prefer using assign because it looks better and it does not result in any warnings from pandas. Highly recommend getting familiar with pandas functions like assign and API nuances like Series accessors to up your tabular data game.

1
2
3
4
5
6
7
8
9
import pandas as pd
cats = ["Garfield","Meowth","Tom"]
df = pd.DataFrame([cats], columns=["cats"])
# best way
df = df.assign(sound="Meow").assign(type="Cartoon")
print(df.head())

# alternative way that also works, but with warnings from pandas
# df["sound"] = "Meow"

DataFrame.assign : Can evaluate an expression or function to create new values for a column. pandas source code: https://github.com/pandas-dev/pandas/blob/v1.3.5/pandas/core/frame.py#L4421-L4487

May 18, 2020

A Guide To Making HTTP Requests To APIs With JSON & Python

This contains all of my best API-related knowledge picked up since learning how to use them. All APIs have their own style, quirks and unique requirements. This post explains general terminology, tips and examples if you're looking to tackle your first API.

Here's what is covered:

  1. API & HTTP Lingo You Should Know
  2. Testing and Exporting Python Request Code from Postman (Optional)
  3. Formatting Your Request
  4. Example GET and POST Requests
  5. "Gotchyas" To Avoid
  6. Sidebar: requests.Session()
  7. Dig deeper into requests by raising your HTTPConnection.debuglevel
Terminology Clarification: I will refer to "items" or "data" throughout this post. This could be substituted for contacts or whatever data you are looking for. For example, you might be fetching a page of contacts from your CRM. Or fetching your tweets from Twitter's API. Or searching the Google location API, you might look up an address and return geo-location coordinates.

API & HTTP Lingo You Should Know

Hypertext Transfer Protocol (HTTP)

Per Mozilla, "Hypertext Transfer Protocol (HTTP) is an application-layer protocol for transmitting hypermedia documents, such as HTML. It was designed for communication between web browsers and web servers, but it can also be used for other purposes. HTTP follows a classical client-server model, with a client opening a connection to make a request, then waiting until it receives a response."

HTTP: you = client. API = way to communicate with server

Application Programming Interface (API)

Per Wikipedia, the purpose of an API is to simplify "programming by abstracting the underlying implementation and only exposing objects or actions the developer needs."

Representational State Transfer (REST)

REST is an architectural style of web APIs. It is the dominant architecture that many APIs use. Simple Object Access Protocol (SOAP) is another style I've heard of, but it seems less common nowadays.

A REST API is built for interoperability and has properties like: "simplicity of a uniform interface" and "visibility of communication between components by service agents." [Wikipedia] If an API follows REST, it has many good principles baked in.

GET, POST and PATCH

These are three common types of request methods.

  • GET: Read data returned, such as all of your tweets in the Twitter API.
  • POST: Create a new item, like writing a new tweet. Can also update existing data. Tweets aren't editable though!
  • PATCH: Similar to POST, this is typically used for updating data.

URL or "Endpoint"

This is the website target to send your request. Some APIs have multiple endpoints for different functionality.

URL Parameters

Values you pass to tell the API what you want. They are defined by the API specifications, which are usually well documented. In Python's requests library, they may be passed as keyword arguments. Sometimes they are passable directly within the endpoint url string.

Body or "Payload"

To make a request, you send a payload to the url. Often this is a JSON string with the API's URL parameters and values, AKA the request body. If the API is written specifically for Python, it might accept an actual Python dictionary.

Javascript Object Notation (JSON)

JSON is the data interchange standard for all languages. Usually it is the default way to pass data into and receive data from an API. If making a POST, you can check your json object is formatted correctly by using a json linter. Or try Python's json.tool! You can also pretty print your JSON or python dictionary with the pprint module. If you're using json.dumps remember it has pretty printing accessible by keyword arguments! These features are accessible in the standard library. Isn't Python great? See also: Python 101 - An Intro to Working with JSON

Pages

API data is commonly returned in multiple pages when there is a lot of data returned. Each page can be accessed one request at a time. Sometimes you can specify how many items you want on a page. But there is usually a maximum items per page limit like 100.

Status Code

Each request usually gives you a numeric code corresponding to happened when the server tried to handle your request. There is also usually a message returned.

Headers

These usually contain website cookies and authorization info. They also may tell the API what kind of data you want back. JSON and XML are the two most common types of data to return. You can specify the return format in the content-type headers.

If you need to parse an XML response, check out Python's stock ElementTree API. I've only seen a few APIs using XML responses, such as the USPS Address Validation API.

Authorization

Authorization varies widely. This is the level of identification you need to pass to the API to make a request. Public APIs might require none. Some just need a username and password. Others use the Oauth standard, which is a system involving credentials and tokens for extra security.

Authorization Scheme Example [Mozilla]

Authorization: <auth-scheme> <authorisation-parameters>

1
2
# headers python dict example
headers = {"Authorization": f"basic {token}"}

Creating the Request JSON

I recommend using Postman in most cases, depending on the complexity of the API. If the JSON syntax is straightforward, you can format your data as a python dictionary, then convert it to a JSON object with json.dumps from the standard library's json module. But JSON can be tricky sometimes. You may also need to pass a dictionary of HTTP headers.

Some APIs have "Postman Collections", a set of Python (or any language) script templates for the API. In those cases, it might make sense to use those resources.

Path One: Make HTTP request with json & requests libraries

Format Python dict with json.dumps from the standard library's json module. Infer API requirements from documentation. Use requests for HTTP.

Path Two: Make HTTP request with Postman & requests library

Use Postman to generate the JSON payload. Plug headers and payload into requests. Use requests library for HTTP.

Postman has a friendly interface for plugging in all your pieces and tinkering with your request body until it works. Make it easier on yourself and use Postman, especially if there are collections. An alternative is to troubleshoot in Python if you are confident in your grasp of the API. I use both options depending on my familiarity with the API at hand.

Formatting Your Request

  1. Once you have the request working, you may export your Postman request to almost any language. For Python, you can sometimes export to the requests, http.client or urllib libraries. Hit the "code" button in Postman and then copy your code.
  2. Paste your Postman headers, payload and url into your existing code.
  3. You may want to use a dict or string formatting to pass values to your request parameters or url.
  4. If the API uses a token or other form of authorization that needs to be refreshed intermittently, I usually have a function that returns a token. token = fetch_token() Then put the token in the headers dict. {"Authorization": f"basic {token}"} Finally pass your headers and payload to your requests.get, requests.post, or requests.request function along with the endpoint url. You're now ready to test the request.

If you choose not to use Postman, you can use the json library. See the use of json.dumps() to convert a dictionary to a JSON object in example #2 below.

Python Installation

You can install requests with pip. Alternatively, http.client is included within the Python standard library. If you want to convert HTTP response data to a dataframe or csv, install pandas.

python -m pip install requests
python -m pip install pandas

Example #1: GET the geolocation details of any public location with the Google API

This was modified from another example of Google's Geolocation API. To use this, you need to create a developer account with Google and paste your API keys below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import requests


# Find the best double-cheeseburger + fries $7 can buy.
payload = {"key": "Add_Google_API_Key_Here", "address": "Redhot Ranch"}
url = "https://maps.googleapis.com/maps/api/geocode/json"
# Optional: set a 5 second timeout for the http request.
r = requests.get(url=url, params=payload, timeout=5)
print(r.text)
print(r.status_code)
data = r.json()

# Extract the latitude, longitude and formatted address of the first matching location.
latitude = data["results"][0]["geometry"]["location"]["lat"]
longitude = data["results"][0]["geometry"]["location"]["lng"]
formatted_address = data["results"][0]["formatted_address"]
print(longitude)
print(latitude)
print(formatted_address)

# Optional: convert response into a dataframe with pandas.
# import pandas as pd
# location_df = pd.json_normalize(data['results'])
# location_df.to_csv('Locations.csv')

Above you can see:

  • requests makes it easy to see the server's text response also with response.text
  • requests also makes JSON encoding easy with response.json()
  • pd.json_normalize() is convenient to convert the response dictionary to a dataframe.

Example #2: Encode a Python dictionary to json string and POST to a hypothetical API

  1. Create a dictionary with request body data and pretty inspect it with pprint.
  2. Encode the json string with json.dumps from the standard library's json module.
  3. POST the encoded JSON to the endpoint url with requests.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import pprint
import json
import requests


def dict_to_json_data():
    """Create request body with fictional contact details."""
    payload = {
        "first_name": "P",
        "last_name": "Sherman",
        "address": "42 Wallaby Way",
        "address_2": "",
        "city": "Sydney",
        "state": "NSW",
        "country": "AU",
        "zip": "2000",
    }
    pprint.pprint(payload)
    json_str = json.dumps(payload, ensure_ascii=True)
    # Optional: encode json str to utf-8.
    return json_str.encode("utf-8")


def post_data(json_str):
    """This is a fictional API request that passes a json object to requests.
    It decodes the server response with response.json() and
    Returns dictionary value by calling the data's keys.
    """
    headers = {
        "Authorization": f"Bearer {token}",
        "Content-Type": "application/json",
        "cache-control": "no-cache",
    }
    r = requests.request(
        method="POST",
        url="https://SomeSoftwareAPI.com/people/",
        data=json_str,
        headers=headers,
    )
    data = r.json()
    print(data.keys())
    # Call dict keys to get their values.
    contact_id = data["contact_id"]
    return contact_id


json_str = dict_to_json_data()
contact_id = post_data(json_str)

requests.request keyword argument alternatives for passing data

params – (optional) Dictionary, list of tuples or bytes to send in the query string for the Request.

data – (optional) Dictionary, list of tuples, bytes, or file-like object to send in the body of the Request

json – (optional) A JSON serializable Python object to send in the body of the Request

[requests API documentation]

"Gotchyas" To Avoid

  • Status codes are your friend. They offer a hint at why your request is not working. If you see 200 or 201, that's a good sign. They're usually helpful, but sometimes they can be misleading.
  • Ensure you are defining the correct content-type. I had an experience where Postman defined two conflicting content-type headers and it caused my request to fail. The server's error message indicated the problem was in my JSON, so it took me a while to figure out the headers were the problem.
  • Sometimes it makes a difference if your url has http:// vs. https:// in it. Usually https:// is preferred.

Sidebar: requests.Session()

You might be able to improve performance by using a requests "session" object.

1
2
3
4
5
6
7
8
9
import requests


# A session adds a "keep-alive" header to your HTTP connection + stores cookies across requests.
s = requests.Session()
for page in range(0, 2):
    url = f"https://exampleapi.com/widgets/{str(page)}"
    r = s.get(url)
    print(r.text)

Dig deeper into requests by raising your HTTPConnection.debuglevel

HTTPResponse.debuglevel: A debugging hook. If debuglevel is greater than zero, messages will be printed to stdout as the response is read and parsed. Source: http.client Python Docs
1
2
3
4
5
6
7
8
9
from http.client import HTTPConnection
import requests


HTTPConnection.debuglevel = 1
payload = {"key":"Add_Google_API_Key_Here", "address":"90 Miles"}
url = "https://maps.googleapis.com/maps/api/geocode/json"
r = requests.get(url=url, params=payload, timeout=5)
print(r.text)

Web Server Gateway Interface (WSGI, pronounced "Wis-Ghee")
"As described in PEP3333, the Python Web Server Gateway Interface (WSGI) is a way to make sure that web servers and python web applications can talk to each other." Gunicorn is one of a few Python WSGI clients. web2py is another WSGI client and web framework I have used.

Conclusion

I remember APIs seemed mysterious and daunting before I had used them. But like all things, they can be conquered with knowledge, understanding and tenacity to keep trying until you figure it out. Good luck!

Requests Documentation

requests.request() API documentation

requests.get() API documentation

requests.post() API documentation

Supplementary Reading

Google's HTTP Timing Explanation

List of Interesting "Unofficial" APIs

Proxy servers

Making 1 million requests with python-aiohttp

Nginx

Create, read, update and delete (CRUD)

Jan 14, 2018

Python File Handling Basics

The basis of many great programs revolve around a simple set of operations:

  1. Open a file.
  2. Do something with the file contents.
  3. Save the new file for the user.

Python is nice and simple for this. Paste the below lines into a text editor and save as a .py file. You need to have Python 3 installed. In the same folder as your .py file, save a .txt file with some words in it. Alright, let's write some code:

1
2
3
4
5
file_name = input("Enter your file name. e.g. words.txt")
file_handle = open(file_name, "r")
lines = file_handle.readlines()
print (lines)
file_handle.close()

In line 1, we ask the user to enter their file name with Python's raw_input function. When the program runs, the user enters their text file name with extension. This line stores the name in a variable called file_name.

In line 2, we open your text file and store it in a variable I have named file_handle. Think of the file handle as a bridge between your code and the text file. Quick point about the 'r' above: that tells the program to open the file in "Read" mode. There are several different file modes in programming. Some modes are just for reading an existing file, some are just for writing a new file, and some are capable of both. This Stack Overflow post is well written and details the differences between file modes. Once established, the file handle allows you to read the file's contents or write new contents to the file.

In line 3, we are calling the .readlines() method on our file handle. This method takes the file contents and stores them, line by line, into a list named "lines". An alternative method is .read(), which opens the file and stores its contents as one string. Try switching this out in place of  .readlines() to check out the difference.

In line 4, we are printing the stored lines to show them to the user. We now have the file contents, ready to be used however we please.

In line 5, we are closing the file.

Below, we are going to write a new file using the with statement, which is generally accepted as the best way to read or write a file:

with open("Notes.txt", "w") as fhand:
    fhand.write("Did you know whales can live up to 90 years?")

In line 1, we are using Python's input function to ask the user what to name the file and storing it in a variable named file_name.

In line 2,  we are calling the open function again that we used in the first example, but this time, notice the "w". This indicates that we are opening the file in "write" mode.

In line 3, we are calling the .write() method on our file handle, named save_file, and passing it our text to be saved in our new file.

In line 4, we are closing the file, completing the creation of our new file in the same folder as our .py program file.

Your program is now ready to be run. Double-click your .py file to execute it.

Before learning Python, file operations were a mystery to me. It took me a while to understand this clearly, and I wanted to share. Once you master these basic file operations, programming gets to be a lot more fun. Do try it out for yourself :D