Lo-Fi Python | Page 10

Sep 13, 2020

Delete All Your Tweets with Tweepy and the Twitter API

You may want to download an archive of your tweets before deleting them. I did this and it took about a day to get my archive download.

How To Purge Your Tweet History with Python

Per the Tweepy library documentation, install tweepy with pip. It worked fine in my python 3.8 virtual environment.

pip install tweepy

Sign up for a Twitter Developer account and create an app. I named mine "tweetcleanr".
Find your app under "Projects & Apps". Edit your app's permissions to "Read + Write + Direct Messages".
After you update your permissions, select the "Keys and tokens" tab. Then regenerate new API keys. Then paste them in the below script.

Save the below script as a python file. In command prompt or terminal, run python delete_tweets.py or whatever you want to name it!
You'll be asked to go to a link and enter an authorization code. Then you'll see your tweets being deleted like pictured below.

delete_tweets.py

I found this Github Gist via Google and updated the print and input statements to Python 3. I also added the traceback module in case you need to debug it. Initially, I received an error telling me to complete step 3 above. I didn't see the error message at first, until adding traceback.print_exc() like you see below.

import tweepy
import traceback

"""Delete All Your Tweets - Github Gist by davej
Credit: https://gist.github.com/davej/113241
Ported to Python 3 by Lo-Fi Python: https://lofipython.com/delete-all-your-tweets-with-tweepy-and-the-twitter-api/
"""
CONSUMER_KEY = "get_from_dev_portal"
CONSUMER_SECRET = "get_from_dev_portal"


def oauth_login(consumer_key, consumer_secret):
    """Authenticate with twitter using OAuth"""

    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth_url = auth.get_authorization_url()

    verify_code = input(
        "Authenticate at %s and then enter you verification code here > " % auth_url
    )
    auth.get_access_token(verify_code)

    return tweepy.API(auth)


def batch_delete(api):
    print(
        "You are about to delete all tweets from the account @%s."
        % api.verify_credentials().screen_name
    )
    print("Does this sound ok? There is no undo! Type yes to carry out this action.")
    do_delete = input("> ")
    if do_delete.lower() == "yes":
        for status in tweepy.Cursor(api.user_timeline).items():
            try:
                api.destroy_status(status.id)
                print("Deleted:", status.id)
            except Exception:
                traceback.print_exc()
                print("Failed to delete:", status.id)


if __name__ == "__main__":
    api = oauth_login(CONSUMER_KEY, CONSUMER_SECRET)
    print("Authenticated as: %s" % api.me().screen_name)

    batch_delete(api)

✅ Twitter Cleanse Complete

Twitter has a really slick developer dashboard. Its API combined with the tweepy library got the job done for me. It's great when stuff just works. And it only cost me about 1 hour to complete. Time to start a clean slate. Here's to looking forward.

Supplementary Reading

Tweepy Documentation Tutorial

Twitter's API Tutorials

Twitter Postman Tutorial

posted at 21:07 · coding, programming, python · api tweepy tweets twitter

Aug 09, 2020

Pondering Join Algorithms

Truly enjoying this Intro to Database Systems course from Carnegie Mellon University. Some really great breakdowns of common join algorithms in this lecture. Here are my notes.

Lecture 11- Join Algorithms(CMU Databases Systems / Fall 2019)

Prof. Andy Pavlo, Carnegie Mellon Database Group

screenshot from lecture

Table Positioning for a Join

"In general, your smaller table should be the "left" table when joining two tables."... Professor demonstrates better performance by making the smaller table the "outer" table in a join.

Block Nested Loop Join [mysql example]

"The brute force approach"
If you have enough memory to hold a large table, a good option for joining.
Always pick the smaller table as the outer table.
Buffer as much of your outer table in memory as possible to reduce redundant I/O.
Loop over the inner table or use an index.

Index Nested Loop Join [CS Course definition]

If indexes are available, or you could create an index to use for a join.

Sort-Merge Join [wikipedia]

Useful if one or both tables are sorted on a join key. Maximize sequential I/O.

screenshot from lecture

Hash Join

Best performance. For large datasets.

Phase #1 Build (Hash Table)
Phase #2 Probe

Use a Bloom Filter set operations for probe phase optimization.

insert a key
lookup a key

Additional Reading on Bloom Filters

Let's implement a Bloom Filter

Bloom Filters Debunked

Grace Hash Join [wikipedia]

"Do hash joins when things don't fit in memory."
Use a hash table for each table. Break the tables into buckets then do a nested loop join on each bucket. If the buckets do not fit in memory, use recursive partitioning. Then everything fits in memory for the join.

"Split outer relation into partitions based on the hash key."

Prof. Andy Pavlo on Hash Join algorithm

Hashing is almost always better than sorting for operator execution.

"No join algorithm works well in all scenarios."

-Prof. Andy Pavlo

webmention

posted at 18:24 · coding, data, Databases, programming · algorithms computer science joins learning study technology

Jul 15, 2020

Benefits of Go and Threads in Distributed Systems

Preface

These are my YouTube lecture notes from MIT's Distributed Systems course. Thank you MIT and Professor Morris!

MIT 6.824 Distributed Systems

Lecture 2: RPC and Threads - Feb 7, 2020

Prof. Robert Morris (Spring 2020)

Introduction

Go is a popular programming language choice so my ears perked up when this lecture began. These notes were taken as the professor explains why he teaches his class in Go. He also mentioned he'd be able to teach it with Python or Java. He used C++ years ago.

The beginning of this lecture was a great summary of:

key benefits of Golang
what threads are and why they're great
how Go, threads and async tie together

Go is Good for Distributed Systems

Go is concurrency-friendly. With concurrent threads, you can effectively split a task such as making web requests to a server into many threads, completing them simultaneously.

Golang's Convenient Features and Benefits

convenient Remote Procedure Call library (RPC) C++ lacks anything comparable?
"Go is type-safe and memory-safe, unlike C++"
garbage collection
"the language is simple, unlike C++"
"good support for threads, locking and synchronization between threads"
in Go, "goroutines" are threads
professor's recommended reading to learn Go: Effective Go

Why use threads?

I/O Concurrency
Multi-core Parallelism
Convenience, e.g. "create 10 threads that sleep for a second and then do a little bit of work"

"Threads are the main tool we're using to manage concurrency in programs."

-Prof. Robert Morris

Contrast WithEvent-driven Programming("Asynchronomous")

A single thread, single loop that waits for an event.

Combining Threads and Event Driven Programming

"Create one thread for each procedure call."... "On each of the threads run a stripped down event driven loop. Sort of one event loop per core. That results in parallelism and I/O concurrency."

-Prof. Robert Morris

Postface: Concurrent Python Context

I've rarely if ever used multiple threads in Python. Simply running a single threaded script seems sufficient for most of my tasks. Maybe I could speed up API requests by splitting into threads when making a few hundred thousand requests? Apparently I'm missing out on concurrent threading efficiency gains.

I once experimented with the multiprocessing module's Process class, which worked on Linux but not Windows for me. I ended up taking an simpler, single thread approach instead. I've also heard of using multiprocessing pool objects. There's also the asyncio library concurrent.futures modules to consider. The ProcessPoolExecutor looks promising.

Python also has the queue module. I haven't used it yet but at one point I watched a talk where Raymond Hettinger recommended queue as a good option if you want concurrency in Python.

It seems there are many options available in Python but it's not clear which tools should be deployed and when. And your chosen concurrency strategy may add extra complexity. Handle with care. Or consider learning Go if you want to use threads to scale your distributed system.

Update: Python Concurrency Success

I recently deployed the ThreadPoolExecutor from the concurrent.futures module to efficiently move thousands of files to a new folder. So Python does have fairly accessible alternatives to concurrency. I guess I'll need to try Go sometime to compare!

from concurrent.futures import ThreadPoolExecutor
import numpy as np
import shutil
import os

def main():
    """Move files concurrently from the current working directory to a new folder.
    This script is adapted from the Python ThreadPoolExecutor documentation:
    https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor.shutdown
    """
    csvs = [f for f in os.listdir(os.getcwd()) if '.csv' in f]
    split_num = len(csvs) / 4 + 1
    file_batches = np.array_split(csvs, split_num)
    # write to local folder named "csvs"
    dst_folder = "/csvs"
    with ThreadPoolExecutor(max_workers=4) as e:
        for i, files in enumerate(file_batches):
            csv_A, csv_B, csv_C, csv_D = files
            e.submit(shutil.move, csv_A, dst_folder)
            e.submit(shutil.move, csv_B, dst_folder)
            e.submit(shutil.move, csv_C, dst_folder)
            e.submit(shutil.move, csv_D, dst_folder)

if __name__ == '__main__':
    main()

Additional Reading

New Case Studies About Google's Use of Go

go.dev

posted at 00:24 · coding, concurrency, data, education, Go, lectures notes, programming · computers distributed systems golang MIT python rpc

Jun 27, 2020

Characterizing Database Workloads & Storage Models

Thank you Carnegie Mellon Database Group for putting this online! These are my notes from watching on YouTube.

Carnegie Mellon Databases Storage II, Lecture 4

Prof. Andy Pavlo [Watch on YouTube]

The Problem and Solution

How should the DBMS represent the database in storage files on disk? Solve it by choosing the right storage model for your target workload. The right strategy varies if you are reading data, writing data and with how many joins you are performing.

Workload Characterization

OLTP (Online Transaction Processing): "Simple queries with lots of writes."

OLAP (Online Analytical Processing): "Read only queries. Lots of joins. Doing a lot of reads, but they're more complex."

HTAP (Hybrid Transactional Analytical Processing): "is trying to do both of them. You still want to ingest new data, but analyze it as it comes in. It's used for companies making decisions on the fly as people are browsing websites, like internet advertising companies."

Storage Models

screenshots from the lecture

N-ary used to be the dominant model until the '80s.

Additional Reading: All Things Distributed

Column Store Vs. Row Store RDBMS

Row-oriented DBMS(Row Store)

PostgreSQL, MySQL
Row Store = use OLTP

Column-oriented DBMS(Column Store)

Red Shift, BigQuery
Column Store = use OLAP

If types are consistent, you can compress data into single column store.

posted at 13:22 · data, Databases, education, performance · Carnegie Mellon database storage learning lectures notes

Jun 23, 2020

Free Computer Science Courses and Talks To Absorb

Below you'll find a balanced curriculum of juicy courses and videos that are available for free on the internet. I'll definitely be diving into most of these in the 2nd half of 2020. Stay curious!

University CS Courses For Free

CS50's Web Programming with Python and JavaScript | Harvard University

CS 61-C Great Ideas in Computer Architecture (Machine Structures), Spring 2015 | UC Berkeley

CS 109: Data Science, 2015 | Harvard University

Mathematical Modeling of Football, Fall 2020 | Uppsala Universitet

CS 162 - Operating Systems and Systems Programming, Fall 2013 | UC Berkeley

15-445/645 Intro to Database Systems, Fall 2019 | Carnegie Mellon University

15-721 Advanced Database Systems, Spring 2020 | Carnegie Mellon University

Missing Semester: Shell Tools & Scripting, Spring 2020 | MIT

6.824 Distributed Systems, Spring 2020 | MIT

CSE 373 - Analysis of Algorithms, 2016 | Stony Brook University

CS 4150 Algorithms, Spring 2020 | University of Utah

CS 241 System Programming, Spring 2020 [course wiki] | University of Illinois

CS 6120: Advanced Compilers: The Self-Guided Online Course | Cornell University

Intriguing Coursera Classes

DevOps Culture and Mindset | UC-Davis

Computer Science: Algorithms, Theory, and Machines | Princeton University

Excel Fundamentals for Data Analysis | Macquarie University

Build a Data Science Web App with Streamlit and Python | Guided Project [$10]

Programming Talks & Tutorials

These programming talks piqued my interest, highly recommended.

David Beazley | Built in Super Heroes [YouTube]

Mr. Beazley shows how to use pure Python built-in functions to clean and analyze the City of Chicago's food inspection data. No pandas in this talk, behold the power of the Python standard library. Spoiler: Don't eat at O'hare airport. He also has a new course, available for free:

David Beazley | Practical Python Programming [Course]

This is not a course for absolute beginners on how to program a computer. It is assumed that you already have programming experience in some other programming language or Python itself.

Sebastian Witowski | Modern Python Developer's Toolkit [YouTube]

An overview covering editing tools and setup from PyCon 2020. Honing your development environment is crucial to being an efficent coder. This example uses VS Code. I use Atom as my primary text editor. The most recommended linters are usually pylint, flake8 or pyflakes.

Jake VanderPlas | Reproducible Data Analysis in Jupyter [YouTube]

This 10 video series is a must-watch for aspiring data scientists and analysts if you use Python. Includes a git workflow demonstration, working in Jupyter Notebooks and many other essentials.

Rich Hickey | Hammock Driven Development [YouTube]

Sometimes, the best thing we can do is step away from the keyboard. I really enjoy this speaker's communication style.

Eric J. Ma | Demystifying Deep Learning for Data Scientists [YouTube]

Tutorial-style Pythonmachine learning walk-through from PyCon 2020.

Julie Michelman | Pandas, Pipelines, and Custom Transformers [YouTube]

This video shows a deep dive into the world of sci-kit learn and machine learning. PyCon and PyData videos usually include some cutting edge tech. Machine learning moves so fast there are always new tools surfacing. But certain libraries like sci-kit learn, TensorFlow, keras and PyTorch have been constant.

Ville Tuuls | A Billion Rows per Second: Metaprogramming Python for Big Data [YouTube]

Make your data dense by tactically re-arranging into efficient structures and compiling it down to lower-level bytes. This details a successful Python / Postgres / Numba / Multicorn big data implementation.

Video & Course Grab Bag

Discover the role of Python in space exploration [course]

Microsoft and NASA made a free course about Python in space! 🤓

Ted Nelson | Computers for Cynics [YouTube]

I find these videos to be an entertaining, thought-provoking take on software history. Recommended from Joe Armstrong, the creator of Erlang.

GNU Typist [Tutorial]

You may be able to teach yourself to type more efficiently with this tutorial. I definitely need to do this. It's worth mentioning, per Rich Hickey: with a proper design phase, you'll spend less time typing in the first place!

Extra Credit: Python Wikipedia Library

import wikipedia [GitHub]

Supplementary Resources

posted at 03:35 · coding, data, Databases, education, programming, python, software · free education harvard knowledge learning MIT university

May 25, 2020

Integrating MySQL with Flask, pandas and pythonanywhere

Sometimes a spark comes from seemingly nowhere. That's when you reach for your tools and create. After a series of successful experiments, I decided this stack might be my quickest, best shot to get a functional website up and running in Python. I was pleasantly surprised to make rapid progress over the span of a quarantine weekend. Here are the steps to create a MySQL backed website with Flask.

Hosting With pythonanywhere

pythonanywhere is a web hosting service like GoDaddy. If you host your app with them, MySQL is the default database. Postgres integration is available at higher price tiers.

To get your Flask app's database up and running you need to:

Create your database (see the "Databases" tab in pythonanywhere)
Use the mysql terminal to create your tables
Use the mysql.connector API to connect to your table and execute SQL from your Flask app.

Essential MySQL Terminal Commands

Show MySQL Version

SELECT VERSION();

List tables in a database

SHOW TABLES;

Show All MySQL Variable Values

SHOW VARIABLES;

Creating a Table

CREATE TABLE Marijuana (id INT AUTO_INCREMENT PRIMARY KEY, email VARCHAR(350), date VARCHAR(350));

Create a Table with a JSON column

CREATE TABLE Marijuana (
  `id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
  `store` varchar(200) NOT NULL,
  `details` json DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

Add New Column and specify column to insert AFTER

ALTER TABLE Marijuana
ADD COLUMN date VARCHAR(100) AFTER other_column;

Alter Datatype of a Column

ALTER TABLE Marijuana MODIFY id INT AUTO_INCREMENT PRIMARY KEY;

Describe a Table

DESCRIBE Marijuana;

View All Records in a Table

SELECT * from Marijuana;

Using LIKE in MySQL

Select 10 Newest Records

SELECT * from Marijuana ORDER BY id DESC LIMIT 10;

"Explaining" A Query

EXPLAIN SELECT * from Marijuana;

Using "ANALYZE TABLE" tooptimize themis periodically recommended by MySQL:

ANALYZE TABLE Marijuana;

Installing Libraries in PythonAnywhere

You can use pip to install python libraries within the PythonAnywhere bash terminal. Go to the consoles tab and start a new bash terminal. Then to install a library, such as pandas:

python -m pip3.8 install --user pandas

Flask app with mysql.connector API, SQL and pandas

A Flask app making a mysql database connection with pandas:

Creating an error log with logging.
Connecting to a mysql database hosted through Flask and pythonanywhere
Then reading a table to a pandas dataframe

import mysql.connector
from flask import Flask
import pandas as pd
from datetime import date
import logging
import sys

app = Flask(__name__)
logging.basicConfig(stream=sys.stderr, level=logging.DEBUG)


@app.route("/")
def hello_world():
    """Call database and return data from df. Then display homepage."""
    try:
        email_df = get_database_table_as_dataframe()
        print(email_df.shape)
        html_page = render_homepage()
        return html_page
    except:
        logging.exception("Failed to connect to database.")


def render_homepage():
    """Note: you should use Flask's render_template to render HTML files.
    However, for example you can make a quick f-string HTML page that works in this code.
    """
    html_page = f"""<html><head><link rel='stylesheet' href="/static/styles/some_file.css"><link rel="shortcut icon" type="image/x-icon" href="static/favicon.ico">
                    <Title>Dispensary Alerts</Title></head>
                    <body><h2></h2>
                    <p>Get alerts for your dope.</p><br>
                    <h6><b>Sign Up</b></h6><br>
                    <div class="form">
                    <form action="/add_signup_to_db" method="post" style="width:420px;text-align:center;display:block;" >
                    <input type="text" name="Signup Form">
                    <input type="submit" value="Submit">
                    </form></div><br><br>
                    <p><b>Current Time:</b>
                    {str(date.today())} </p></body></html>"""
    return html_page


def get_database_table_as_dataframe():
    """Connect to a table named 'Emails'. Returns pandas dataframe."""
    try:
        connection = mysql.connector.connect(
            host="username.mysql.pythonanywhere-services.com",
            db="username$DatabaseName",
            user="username",
            password="password",
        )

        email_df = pd.read_sql(sql="""SELECT * FROM Emails""", con=connection)
        logging.info(email_df.head())
        return email_df
    except:
        logging.exception("Failed to fetch dataframe from DB.")
        return "Oops!"


@app.route("/add_signup_to_db", methods=["GET", "POST"])
def add_signup_to_db(email, date):
    """Pass data as SQL parameters with mysql."""
    try:
        connection = mysql.connector.connect(
            host="username.mysql.pythonanywhere-services.com",
            db="username$DatabaseName",
            user="username",
            password="password",
        )
        cursor = connection.cursor()
        sql = """INSERT INTO Emails (message, date) VALUES (%s, %s) """
        record_tuple = (email, date)
        cursor.execute(sql, record_tuple)
        connection.commit()
    except mysql.connector.Error as error:
        logging.info("Failed to insert into MySQL table {}".format(error))
    except:
        logging.exception("Error inserting records to DB.")
    finally:
        if connection.is_connected():
            cursor.close()
            connection.close()
        return "MySQL connection is closed"

Iterative Development

Below: making my website look less like a "my first HTML" website, experimenting with my app's message name and adding a sign-up form connected to the database.

Note: if you see this error when making a request in pythonanywhere:

OSError: Tunnel connection failed: 403 Forbidden

It's likely because you are "whitelisted" on the free plan. Upgrading to the $5/month plan will fix it!

Scoping The Full Stack

I'm really enjoying this web development stack. Here are all of the tools and library choices for this website:

HTML
CSS
web framework: Flask library
email: Flask-Mail library(SMTP)
API calls to external websites: requests and json libraries
data handling: MySQL database, mysql.connector API, pandas library
file system: logging, os and sys libraries
(may add) payment processing: Braintree Library
web hosting: pythonanywhere

Finding Your Flask Groove

Flask is a little scary at first, but reasonable once you get a grasp of the basic syntax. Using the logging module to establish access, error and server log feeds was a big step to finding my Python traceback fixing groove. It's a work in progress.

Recapping My Python Web Development and Database Experiences

I previously created a website with web2py, another Python web framework like Flask and Django. I think it was a decent choice for me at that point in my Python journey. Back then, I connected a MongoDB back-end to web2py. I randomly picked Mongo out of the DB hat and it worked well enough.

My Python Web Development and Database Tools

App #1 web2py + MongoDB

App #2 Flask + MySQL

Future App? py4web + pyDAL + PostgreSQL

Future App? tornado + streamlit (or) Flask + Dash (+ SQLite)

Of these two diverse Python stacks, I favor MySQL and Flask. But I learned a lot from watching web2py's tutorial videos and it's less intimidating for beginners. And I barely scratched the surface of web2py's "pure Python" pyDAL (Database Abstraction Layer), which seems pretty dope.

web2py's creator has a new framework in progress called py4web. It has the same DAL and inherits many other web2py qualities. Definitely looking forward to exploring the DAL on my first py4web website. I'll likely use it to connect to PostgreSQL or SQLite. Maybe I'll install pyDAL with pip in the meantime.

Final Thoughts

Both of my websites are hosted with pythonanywhere, which gives you a text editor and bash terminal to run your scripts in a shell environment. I'm so pleased with all of these tools. They fit together smoothly and made creating my website a fun experience.

posted at 21:48 · data, Databases, ODBC, pandas, programming, SQL, web development, web2py · flask mysql py4web pyDAL python

May 18, 2020

A Guide To Making HTTP Requests To APIs With JSON & Python

This contains all of my best API-related knowledge picked up since learning how to use them. All APIs have their own style, quirks and unique requirements. This post explains general terminology, tips and examples if you're looking to tackle your first API.

Here's what is covered:

API & HTTP Lingo You Should Know
Testing and Exporting Python Request Code from Postman (Optional)
Formatting Your Request
Example GET and POST Requests
"Gotchyas" To Avoid
Sidebar: requests.Session()
Dig deeper into requests by raising your HTTPConnection.debuglevel

Terminology Clarification: I will refer to "items" or "data" throughout this post. This could be substituted for contacts or whatever data you are looking for. For example, you might be fetching a page of contacts from your CRM. Or fetching your tweets from Twitter's API. Or searching the Google location API, you might look up an address and return geo-location coordinates.

API & HTTP Lingo You Should Know

Hypertext Transfer Protocol (HTTP)

Per Mozilla, "Hypertext Transfer Protocol (HTTP) is an application-layer protocol for transmitting hypermedia documents, such as HTML. It was designed for communication between web browsers and web servers, but it can also be used for other purposes. HTTP follows a classical client-server model, with a client opening a connection to make a request, then waiting until it receives a response."

HTTP: you = client. API = way to communicate with server

Application Programming Interface (API)

Per Wikipedia, the purpose of an API is to simplify "programming by abstracting the underlying implementation and only exposing objects or actions the developer needs."

Representational State Transfer (REST)

REST is an architectural style of web APIs. It is the dominant architecture that many APIs use. Simple Object Access Protocol (SOAP) is another style I've heard of, but it seems less common nowadays.

A REST API is built for interoperability and has properties like: "simplicity of a uniform interface" and "visibility of communication between components by service agents." [Wikipedia] If an API follows REST, it has many good principles baked in.

GET, POST and PATCH

These are three common types of request methods.

GET: Read data returned, such as all of your tweets in the Twitter API.
POST: Create a new item, like writing a new tweet. Can also update existing data. Tweets aren't editable though!
PATCH: Similar to POST, this is typically used for updating data.

URL or "Endpoint"

This is the website target to send your request. Some APIs have multiple endpoints for different functionality.

URL Parameters

Values you pass to tell the API what you want. They are defined by the API specifications, which are usually well documented. In Python's requests library, they may be passed as keyword arguments. Sometimes they are passable directly within the endpoint url string.

Body or "Payload"

To make a request, you send a payload to the url. Often this is a JSON string with the API's URL parameters and values, AKA the request body. If the API is written specifically for Python, it might accept an actual Python dictionary.

Javascript Object Notation (JSON)

JSON is the data interchange standard for all languages. Usually it is the default way to pass data into and receive data from an API. If making a POST, you can check your json object is formatted correctly by using a json linter. Or try Python's json.tool! You can also pretty print your JSON or python dictionary with the pprint module. If you're using json.dumps remember it has pretty printing accessible by keyword arguments! These features are accessible in the standard library. Isn't Python great? See also: Python 101 - An Intro to Working with JSON

Pages

API data is commonly returned in multiple pages when there is a lot of data returned. Each page can be accessed one request at a time. Sometimes you can specify how many items you want on a page. But there is usually a maximum items per page limit like 100.

Status Code

Each request usually gives you a numeric code corresponding to happened when the server tried to handle your request. There is also usually a message returned.

Headers

These usually contain website cookies and authorization info. They also may tell the API what kind of data you want back. JSON and XML are the two most common types of data to return. You can specify the return format in the content-type headers.

If you need to parse an XML response, check out Python's stock ElementTree API. I've only seen a few APIs using XML responses, such as the USPS Address Validation API.

Authorization

Authorization varies widely. This is the level of identification you need to pass to the API to make a request. Public APIs might require none. Some just need a username and password. Others use the Oauth standard, which is a system involving credentials and tokens for extra security.

Authorization Scheme Example [Mozilla]

Authorization: <auth-scheme> <authorisation-parameters>

# headers python dict example
headers = {"Authorization": f"basic {token}"}

Creating the Request JSON

I recommend using Postman in most cases, depending on the complexity of the API. If the JSON syntax is straightforward, you can format your data as a python dictionary, then convert it to a JSON object with json.dumps from the standard library's json module. But JSON can be tricky sometimes. You may also need to pass a dictionary of HTTP headers.

Some APIs have "Postman Collections", a set of Python (or any language) script templates for the API. In those cases, it might make sense to use those resources.

Path One: Make HTTP request with json & requests libraries

Format Python dict with json.dumps from the standard library's json module. Infer API requirements from documentation. Use requests for HTTP.

Path Two: Make HTTP request with Postman & requests library

Use Postman to generate the JSON payload. Plug headers and payload into requests. Use requests library for HTTP.

Postman has a friendly interface for plugging in all your pieces and tinkering with your request body until it works. Make it easier on yourself and use Postman, especially if there are collections. An alternative is to troubleshoot in Python if you are confident in your grasp of the API. I use both options depending on my familiarity with the API at hand.

Formatting Your Request

Once you have the request working, you may export your Postman request to almost any language. For Python, you can sometimes export to the requests, http.client or urllib libraries. Hit the "code" button in Postman and then copy your code.
Paste your Postman headers, payload and url into your existing code.
You may want to use a dict or string formatting to pass values to your request parameters or url.
If the API uses a token or other form of authorization that needs to be refreshed intermittently, I usually have a function that returns a token. token = fetch_token() Then put the token in the headers dict. {"Authorization": f"basic {token}"} Finally pass your headers and payload to your requests.get, requests.post, or requests.request function along with the endpoint url. You're now ready to test the request.

If you choose not to use Postman, you can use the json library. See the use of json.dumps() to convert a dictionary to a JSON object in example #2 below.

Python Installation

You can install requests with pip. Alternatively, http.client is included within the Python standard library. If you want to convert HTTP response data to a dataframe or csv, install pandas.

python -m pip install requests
python -m pip install pandas

Example #1: GET the geolocation details of any public location with the Google API

This was modified from another example of Google's Geolocation API. To use this, you need to create a developer account with Google and paste your API keys below.

import requests


# Find the best double-cheeseburger + fries $7 can buy.
payload = {"key": "Add_Google_API_Key_Here", "address": "Redhot Ranch"}
url = "https://maps.googleapis.com/maps/api/geocode/json"
# Optional: set a 5 second timeout for the http request.
r = requests.get(url=url, params=payload, timeout=5)
print(r.text)
print(r.status_code)
data = r.json()

# Extract the latitude, longitude and formatted address of the first matching location.
latitude = data["results"][0]["geometry"]["location"]["lat"]
longitude = data["results"][0]["geometry"]["location"]["lng"]
formatted_address = data["results"][0]["formatted_address"]
print(longitude)
print(latitude)
print(formatted_address)

# Optional: convert response into a dataframe with pandas.
# import pandas as pd
# location_df = pd.json_normalize(data['results'])
# location_df.to_csv('Locations.csv')

Above you can see:

requests makes it easy to see the server's text response also with response.text
requests also makes JSON encoding easy with response.json()
pd.json_normalize() is convenient to convert the response dictionary to a dataframe.

Example #2: Encode a Python dictionary to json string and POST to a hypothetical API

Create a dictionary with request body data and pretty inspect it with pprint.
Encode the json string with json.dumps from the standard library's json module.
POST the encoded JSON to the endpoint url with requests.

import pprint
import json
import requests


def dict_to_json_data():
    """Create request body with fictional contact details."""
    payload = {
        "first_name": "P",
        "last_name": "Sherman",
        "address": "42 Wallaby Way",
        "address_2": "",
        "city": "Sydney",
        "state": "NSW",
        "country": "AU",
        "zip": "2000",
    }
    pprint.pprint(payload)
    json_str = json.dumps(payload, ensure_ascii=True)
    # Optional: encode json str to utf-8.
    return json_str.encode("utf-8")


def post_data(json_str):
    """This is a fictional API request that passes a json object to requests.
    It decodes the server response with response.json() and
    Returns dictionary value by calling the data's keys.
    """
    headers = {
        "Authorization": f"Bearer {token}",
        "Content-Type": "application/json",
        "cache-control": "no-cache",
    }
    r = requests.request(
        method="POST",
        url="https://SomeSoftwareAPI.com/people/",
        data=json_str,
        headers=headers,
    )
    data = r.json()
    print(data.keys())
    # Call dict keys to get their values.
    contact_id = data["contact_id"]
    return contact_id


json_str = dict_to_json_data()
contact_id = post_data(json_str)

requests.request keyword argument alternatives for passing data

params – (optional) Dictionary, list of tuples or bytes to send in the query string for the Request.

data – (optional) Dictionary, list of tuples, bytes, or file-like object to send in the body of the Request

json – (optional) A JSON serializable Python object to send in the body of the Request

[requests API documentation]

"Gotchyas" To Avoid

Status codes are your friend. They offer a hint at why your request is not working. If you see 200 or 201, that's a good sign. They're usually helpful, but sometimes they can be misleading.
Ensure you are defining the correct content-type. I had an experience where Postman defined two conflicting content-type headers and it caused my request to fail. The server's error message indicated the problem was in my JSON, so it took me a while to figure out the headers were the problem.
Sometimes it makes a difference if your url has http:// vs. https:// in it. Usually https:// is preferred.

Sidebar: requests.Session()

You might be able to improve performance by using a requests "session" object.

import requests


# A session adds a "keep-alive" header to your HTTP connection + stores cookies across requests.
s = requests.Session()
for page in range(0, 2):
    url = f"https://exampleapi.com/widgets/{str(page)}"
    r = s.get(url)
    print(r.text)

Dig deeper into requests by raising your HTTPConnection.debuglevel

HTTPResponse.debuglevel: A debugging hook. If debuglevel is greater than zero, messages will be printed to stdout as the response is read and parsed. Source: http.client Python Docs

from http.client import HTTPConnection
import requests


HTTPConnection.debuglevel = 1
payload = {"key":"Add_Google_API_Key_Here", "address":"90 Miles"}
url = "https://maps.googleapis.com/maps/api/geocode/json"
r = requests.get(url=url, params=payload, timeout=5)
print(r.text)

Web Server Gateway Interface (WSGI, pronounced "Wis-Ghee")

"As described in PEP3333, the Python Web Server Gateway Interface (WSGI) is a way to make sure that web servers and python web applications can talk to each other." Gunicorn is one of a few Python WSGI clients. web2py is another WSGI client and web framework I have used.

Conclusion

I remember APIs seemed mysterious and daunting before I had used them. But like all things, they can be conquered with knowledge, understanding and tenacity to keep trying until you figure it out. Good luck!

Requests Documentation

requests.request() API documentation

requests.get() API documentation

requests.post() API documentation

Supplementary Reading

Google's HTTP Timing Explanation

List of Interesting "Unofficial" APIs

Proxy servers

Making 1 million requests with python-aiohttp

Nginx

Create, read, update and delete (CRUD)

posted at 05:41 · Chicago, coding, HTTP, json, pandas, programming · api data python requests sandwiches web

May 06, 2020

Script Windows Like A Pro: Command Line, Batch Files, Remote Desktop Connection and pywin32

Here are a few useful corners of the vast array of Windows scripting tools.

Helpful Windows Command Line Documentation

A Few General Windows Commands

Use find to look in a text file to count the lines matching a string:

find /C "FAIL" < "Test_Results.txt"
# returns: 0 if no match or # of lines found, e.g. 2,50,100

I wrote a post on findstr, which offers similar functionality.

clip: pipe commands into the clipboard.
If: If Statements based on if files exist.
List ip address-related info:

ipconfig

Check system bit (usually 64-bit or 32-bit):

wmic os getosarchitecture

Automate Windows Scripts with Batch Files

Batch files can be run from command prompt or by double-clicking them. Here's an example of text in a batch file that activates a python virtual environment. Swap in your username and environment if you've created it.

cmd /k "cd C:\Users\your_username\PythonEnv\Scripts & activate & cd .. & dir"

Save above as a .bat file.
This uses cmd to open a new command prompt in a Windows batch file.
cdinto my python virtual env then activate it by running a batch file.
Then call dir to print directory contents.

Set a custom system 'last_name' variable to be recalled later.

set /p last_name=Enter a last name:
echo %last_name%
pause

Here we print it out with echo. Then pause.

Line continuation in batch files: Use ^ to continue your batch file scripts on a new line.

System Assessment Tools: powercfg and sfc

Display system stats:

systeminfo

Use powercfg to assess power, sleep and system states

powercfg /SLEEPSTUDY

Use sfc to perform a system file check:

# scan and repair
sfc /SCANNOW
# scan, but do not repair:
sfc /VERIFYONLY

Accessing a Remote Computer From the Command Line

You may want to ping a remote computer to see if it's running. Add your ip address instead of the below 1s and 0s:

ping 01.10.10.01

Log into your Remote Desktop with mstsc:

Run Remote Desktop Connection, save an RDP file from Windows Desktop Client.
You may need to adjust your credentials on your local machine.
Finally, trigger login to an active window from command prompt:

mstsc RDP_File_Name.rdp

WinRM and WinRS can allow terminal access to your Remote Desktop. You may need to set your wifi network to private. To configure winrm:

winrm quickconfig

Log into a remote computer with winrs and run ipconfig:

winrs -r:https://myserver.com -t:600 -u:administrator -p:$%fgh7 ipconfig

Check Out Python's pywin32 Module

This module is extremely useful for scripting out Windows applications. For example, I've made good use of its interfaces to Outlook and Task Scheduler. Install with pip:

python -m pip install pywin32

Here's an example to send an Outlook email:

import win32com.client

outlook = win32com.client.Dispatch('outlook.application')
mail = outlook.CreateItem(0)
mail.To = '[email protected]'
mail.CC = '[email protected]'
mail.Subject = 'Moneyball Review'
mail.Body = """Moneyball is an inspiring movie, based on real events.
            Brad Pitt, Jonah Hill and Philip Seymour Hoffmann gave great performances.
            The trade deadline scene is delightful. Wow.
            Chris Pratt as Hatteberg too. What a solid film.
            Money isn't everything. Playing ball is.
            """
mail.Attachments.Add('Baseball_Analysis.csv')
mail.Send()

posted at 14:11 · automation, command prompt, Scripts, Windows · command line remote computer scripting shell

Apr 11, 2020

Reflections on 5 Years of Solving Problems with Python

Prior to learning Python, I had no programming experience. I worked in marketing for a book publisher and did not perform well at my job. It was not a good fit. They eventually fired me. As my previous job unraveled, I discovered Python and the Coursera course, Programming for Everybody (Getting Started with Python). Fortunately, that course jump-started me onto a path of learning and reading each day. My aim was to make my own website, a goal that I accomplished. I needed to know how the sausage was made.

Looking back from 2020, I can safely say Python changed my life. Because of it, I was able to land a fulfilling marketingdata-oriented job. I'm also grateful for the financial stability that came with it. I love to learn about the language. I'll continue to improve my abilities to solve problems with new tools, not only Python.

Below are pieces of wisdom picked up from my experiences. They are the result of many hours of study, reading, mistakes, luck, toil and eventual glory.

These are thought-provoking adages and guidelines, not absolute truths in all cases.

Developing a habit of learning pays off over time, no matter what the subject is. It is an investment in yourself that compounds.
Follow your own curiosity. It's less important to compare what you know to others. Compare what you know today to what you knew yesterday. Don't worry about how long it takes to learn.
Watch educational or technical conference talks on sites like YouTube or InfoQ. Rich Hickey, Brandon Rhodes and David Beazley are some of my favorite speakers. Watch talks from all languages, not just Python. Often the concepts apply to any programming language.
Use an RSS reader. Anytime you find a good blog, subscribe by RSS or email to get new posts. I use the Feeder Chrome extension\Android app.
The Zen of Python contains a lot of wisdom. I like the concept of Explicit is better than implicit. This implies declaring your actions in written or oral fashion, providing additional context. Consider favoring easier to read solutions over clever one-liners. For example:
- List comprehensions are useful and "pythonic", use them! But sometimes it's easier to use a for loop to hash out an idea. (Contrarily, avoiding the Initialize Then Modify pattern benefits those comfortable with comprehensions.)
- Explicitly using keyword arguments versus positional arguments is another way to make your code easier to understand.
Can you explain the solution simply? If not, try to clarify your understanding or maybe there's a simpler way. In Python, there are often several ways to accomplish the same goal. But keep in mind the Zen of Python: There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Look for the obvious way. An example of this is string formatting. I've heard f-strings are the hot new way to do this now.
Don't be afraid to change course if things don't feel right. Ask yourself while coding, "Does this feel efficient?" Recently I was trying to format a json string so I approached it like I had in the past, by exporting the request from Postman and formatting the json string with python's format() built-in. But this time, the curly braces were confusing me, I was struggling and it wasn't working. I googled and around and saw python's json module and df.to_json() in pandas. They were a much easier and better-looking solution. But it still wasn't working. Finally, i used the Postman approach and f-strings to format a successful payload. The third try worked! F-strings are super nice and clean.
If you're stuck, there's probably a free online course or blog post that explains whatever is confusing you. Use the Googles. When in doubt, Google the error message.
Begin your project by writing a list of requirements. This often leads to good questions and cases that may need to be addressed. The book Code Complete 2 covers establishing project requirements in great detail, along with the other stages in the life-cycle of a software project. I'm really enjoying this book and highly recommend it.
Names are really important. Take time to think about a good name for your variables and functions. Also, name your scripts well. I name my scripts using action verbs. For example, my script that organizes accumulated files on my desktop into folders is named clean_desktop_files.py. When I see this script months later, its name reminds me the action the script is performing. I believe it's better to err on the side of longer, more descriptive names for variables and functions. It makes code easier to understand. But there is a trade-off with length to keep in check.
Moving a block of code into a function can abstract away repetitive code and increase its readability.
Each function should do one thing only. Follow the single-responsibility principle.
Train yourself to think in data structure conversions. The Python dictionary is very useful and can be converted to and from lists, tuples, sets, etc. I often find it more efficient to convert to a different structure to efficiently organize it. Usually I am googling things like "convert class object to python dictionary" because dictionaries are easy to work with or convert to other structures. The vars() built-in is great for converting objects to a dictionary. For example, once you have a dictionary, you might be able to solve your problem by converting it to a dataframe.
Use only the data you need. Reading in just the essential data helps avoid memory issues and hanging programs. In pandas, the usecols argument in pd.read_csv() is great for this. This creates a dataframe with 2 columns:

df = pd.read_csv('emails.csv', usecols=['name','email'])

Assume that if something is broken, it's because of something you've done. Start from the assumption that your code contains the bug and work outward by eliminating possibilities. Avoid jumping to quick conclusions. Instead, carefully consider possible reasons for why something is happening. Many times, I find my 2nd or 3rd hypothesis is actually true.
There will be times when you'll look at someone else's choices and wonder why they did things a certain way. Consider the possibility that they know more than you in this domain.
Beware of sequencing errors. Are your tasks, scripts or functions executing in an efficient order to reach your end goal? Look to unblock bottlenecks and correct chronological mistakes in your processes.
Before you send that email asking for help, go back and take another look. There's also no shame in asking for help. Be sure you proofread your email before sending.
Status code 200 does not guarantee your API request was successful. You may want to write a test to confirm success that doesn't rely on response status codes.
Unfortunately, testing gets shunned sometimes. Make it a priority. I enjoy writing pytest tests more than most other code. Why? Because tests confirm my scripts are working to some degree, detect bugs and provide a refactoring safety net.
Refactoring your code is a crucial step in making it better. Coming back to my code after a few weeks, months or years brings clarity, experience and a new perspective. It feels good to improve the quality of my old work.
Consolidate your tasks. Bundling things can save you a bundle of time! Identify redundant patterns and remove if possible. Observe yourself while working. Any repetitive manual process can probably be automated away. Recently, I figured out how to use a Windows batch file to instantly activate my Python virtual environment. It took me a few years of tediously pasting the cd and activate commands into command prompt every day to realize. Now it's a snap.
Stack Overflow is a useful resource. But the top answers may be outdated. Check the other less popular answers sometimes. Or...
Read the documentation! An updated or more elegant solution might be there. I recently found os.makedirs(path, exist_ok=True) in the os docs. I didn't know about the exist_ok argument. I was creating folders with a more complicated alternative from Stack Overflow for years. I use this way all the time now. In the same vein, if you need the local system username, the Python docs state getpass.getuser() is preferred over os.getlogin().
Write documentation explaining how to use your projects. Even if you can only muster a quick README text file, that's better than nothing. Within your code, docstrings are a nice addition. I have yet to use Sphinx, but it is a popular choice for generating documentation.
Teaching others feels good and solidifies your knowledge. Writing and pair programming are great ways to improve your understanding and pass your skills along to other people. While we're on the subject of writing...
Write everything down! Your head is not good at storing information in memory. Computers are. This frees your mind to come up with new ideas rather than expending energy to remember what you've already done. It also helps you plan. I use a Notepad text file to keep a running to-do list. You could also use services like Trello or Microsoft Planner. While writing code, use comments and docstrings conservatively for quick notes, clarifications or reminders. The important thing is to write it down somewhere.
When editing your writing, continually ask yourself, "Do I need this word or phrase?" for every word you write.

"Brevity is the soul of wit." - William Shakespeare (Hamlet)

Draw inspiration from culture, nature and professional disciplines outside of your own. Insights can be mined from anything. Don't dismiss a situation as mundane without first scanning for knowledge nuggets and gems.
Better solutions often come to me after gaining time and experience with a problem. Building software is an iterative cycle of adjustment towards consistently fulfilling the needs of those it serves in 100% of cases. In a perfect world, you'd never have bugs. But edge cases tend to pop up in ways you didn't think of when you first wrote a solution. There will also be projects where requirements or business rules change. Consider that possibility when you are designing your solution.
It's possible to find a job that you're excited about and genuinely enjoy the work.
Respect your craft, whether it's coding or another profession. A skilled carpenter needs precision, practice and focus to make something beautiful. Approach your craft with the same mindset and pride in making your best art.
We all have holes in our knowledge. Be receptive to other ways of thinking. The best way to learn is from other humans. Everyone has different backgrounds and experiences. I have never used object oriented programming, classes or certain command line tools like ssh. I have a loose understanding of these things but have not yet applied them to my projects. Working with paths (os and pathlib) still gives me fits sometimes. These are knowledge gaps that I want to fill in. Additionally, we don't know what we don't know. Try to illuminate the fog of your unknown.
Choosing to dedicate to learning Python is among the best decisions I've made.
Attitude is more important than intelligence. Anyone can learn to program, play guitar or fly an airplane. You can become an adept problem solver. Acquire an attitude to support your determination and persistence.

Brandon Rhodes: Stopping to Sharpen Your Tools - PyWaw Summit 2015

I'll leave you with the 4 P's and 4 C's from my Programming for Everybody Coursera course graduation ceremony. Cultivating these principles will guide you to growing your education and finding a positive course in life:

4 P's: Passion, Purpose, Persistence, Playfulness

4 C's: Choice, Commitment, Connection, Completion

Thank you for reading and I hope this post helps you on your own educational journey.

posted at 17:15 · coding, education, Marketing, programming, python, wisdom · advice learning problem solving writing

Mar 22, 2020

Easy Tune-ups For Your Windows Computer

This post covers 6 handy Windows tools:

Windows Experience Index
Disk Cleanup
Windows Update Troubleshooter
Windows Update
Microsoft Support and Recovery Assistant
Disk Defragmenter

First, get your baseline Windows Experience Index score.

These metrics are a way to monitor your system's performance over time.

Go to "Control PanelSystem and SecuritySystem"
Select "Your Windows Experience Index needs to be refreshed."
Note the metrics you see and the last updated date. If you've never refreshed them, these are likely your computer's factory scores.
Then click "Refresh Now" and note the results. These are your computer's current scores.

Run Disk Cleanup

The name says it all. Go to the Start menu and search 'Disk Cleanup'. Running this freed up 40 GB of C: drive space for me.

Install OS patches and software updates with Windows Update.

First, troubleshoot Windows Update to fix any errors.

The Windows Update Troubleshooteris supported by Windows and may help with updates that fail. I downloaded the troubleshooter for Windows Update and BITS, the Windows Background Intelligence Transfer System.

The troubleshooter analyzes Windows Update and tries to fix the errors it finds. After running, it provides a status update on the issues it finds.

Run Windows Update to upgrade your software.

Windows Update usually updates your software reliably. However, some updates may fail or are not triggered automatically. Installing updates, especially security patches for your operating system is typically a good idea.In my case, several Windows 7 OS security patches had not auto-updated, some from 6 months ago.

Checking for Software Updates

Go to your start menu and search for 'Windows Update'.
I clicked 'Check online for updates from Windows Update' also.
When you restart your computer, use a power cord for your laptop.
I found more new updates twice after installing new updates and restarting my system. Some updates are required before another update may be installed.

Microsoft Support and Recovery Assistant

Got Microsoft Errors? Check out the Microsoft Support and Recovery Assistant. It may help you if you're having trouble with Microsoft Office, Skype or any other Windows tools.

Finally, defragment your C: drive.

Defragmentation is like spring cleaning for your computer's hard disk. It optimizes your drive's data for more efficient computing and frees up space for other activities.

Go to your start menu and search for 'Disk Defragmenter'.
Click 'Analyze disk' to check your C: Drive's fragmented rate.
If the fragmented rate is above 10%, Windows recommends to defragment your C: Drive. As you can see below, mine had a whopping 48% fragmentation rate. 😨 My poor computer had never been defragged in 2.5 years of use.

46% Less Fragmented Disk Space After Two Defrags

Running the defragmenter once reduced my drive's fragmentation from 48% to 32%. Re-running the defragmenter dropped my C: drive to a 2% fragmented rate. That's more like it. 🤓

Fragmentation makes your hard disk do extra work that can slow down your computer. Removable storage devices such as USB flash drives can also become fragmented. Disk Defragmenter in Windows rearranges fragmented data so your disks and drives can work more efficiently.

Source: Ways to Improve Your Computer's Performance

My Windows 7 System Improvement Results

Raised Windows Experience Index base sub-score from 4.9 to 5.0/7.9.
Added 40 GB of hard drive space thanks to Disk Cleanup.
Patched operating system security vulnerabilities and all software is up to date.
Fixed any misbehaving Windows products.
Decreased fragmented drive space from 48% to 2%. Windows recommends keeping it under 10%.

On paper, that looks great. Hopefully it means less spinning lag wheels and programs not responding when you really shoulda saved that document... We'll see.

posted at 22:59 · performance, Windows · maintenance operating system windows update

← Previous Next → Page 10 of 14