## When is the best time to tweet?

Twitter has made a few recent changes that make it hard to follow the chronology of tweets in my timeline. I’ve mostly accepted that, but as a result I don’t feel like I have a sense of when my followers are most active anymore. (Of course, there are a few people whose active times would be classified as ‘always’.) Anyway, these changes make me feel a little disconnected from my roughly 175 current tweeps and that makes me sad.

But beyond the emotional pain of no longer being able to really, I mean really, connect with one’s twitter followers on an existential level, some people care about all those Twitter status markers like retweet, reply, and like counts. For them, knowing when your followers are most active can help you improve those numbers and determine the best time to tweet.

Regardless of where you fall on that completely contrived spectrum, this is something that’s analyzable and would be cool to know. And for those that care, it can give some insight on the best time to engage with your Twitter followers.

So here’s a snippet of what I found using data from my Twitter followers’ activity over the last 28 days. Day of the week is along the y-axis, hour of the day is along the x-axis.

Number of unique followers posting or retweeting during a given hour

Number of retweets from followers during a given hour

You can see mid-afternoon is the most active time for my followers. Interestingly, as you get closer to Friday, the mid-afternoon activity increases in intensity and and happens earlier.

I’m not going to bother running through code for this one in this write up. I’m thinking about throwing this up as a simple webservice for others to use; if so, I’ll do a detailed write up then.

## Resources for Learning Basic Python Programming

As part of the Python for Data Science video series I wanted to provide some basic Python programming resources for those who may be new to Python. The list of links below is designed to get new Python programmers off to a quick start and it focuses on things that are most relevant to data analysis. (E.g., there’s nothing in here about writing custom classes.)

If you have suggestions for other links, feel free to mention them in the comments below.

General overviews:

Installing packages:

Importing packages:

Data types:

Control flow:

Defining functions:

## Making GitHub Art

The contribution heatmaps on GitHub profiles are interesting. Although they are intended to be passive data visualizations, they don’t have to be. Specifically, they can act as a 7xN pixel –very slowly– scrolling display. After realizing this, I decided I had to do something to shape the blank canvas that is my GitHub commit log.

“An artist is somebody who produces things that people don’t need to have.”
― Andy Warhol

## The plan

Ostensibly, it should be pretty straightforward. The color of each cell of the heatmap is based on the number of commits made that day, so one just needs to automate the appropriate number of commits per day to get the desired shading. For simplicity, I decided to start by using the darkest shade possible to build some text.

## The execution

And to be honest, it pretty much was that simple. The most difficult part was finding a Python library to automate the git commits. Many StackOverflow discussions essentially suggested rolling your own functions because it is relatively simple and flexible. Had I been building something I cared about more, that might have been the way to go, but I was determined not to spend more than a few minutes on this project and I didn’t need a lot of flexibility. I really wanted to find something off-the-shelf with good documentation.

#### Connecting to GitHub

I tried a few valiant entries into the Python/GitHub API space, but what some lacked in functionality the others lacked in documentation. Finally, I tried github3.py and found the right mix. Without too much trouble, I was able to automate connecting to GitHub and making commits. After a little research it looked like ~40 commits per day would be enough to keep the color scaling the way I wanted it.

There is a link to the GitHub repo at the end of this post. These are the main functions for connecting and committing to GitHub:

from github3.py import login
import time

# Comma separated credentials are stored
# in the first row of auth.csv.
with open('auth/auth.csv', newline='') as f:

for row in text:

return(session)

# The function that submits the commits.
# The number of commits should be set to
# something quite a bit higher than your
# normal number of daily commits. Changing number_of_commits
# may also require changing sleep_time
# so that things still complete in a reasonable
# amount of time.
def do_typing(num_of_commits=30, sleep_time=20):

for i in range(num_of_commits):
# Create a file
data = 'typing file'
repo.create_file(path = 'files/dotfile.txt',
content = data.encode('utf-8'))

# Get the file reference for later use
file_sha = repo.contents(path = 'files/dotfile.txt').sha

# Delete the file
repo.delete_file(path = 'files/dotfile.txt',
message = 'Delete dot file',
sha = file_sha)

time.sleep(sleep_time)


#### Translating letters to useable format

With a way to connect in hand, the code needed to know when to connect. Basically, I needed an on/off switch for every day represented on the heatmap. If the switch is on, the committing function should run, making the cell dark. If it is off, the committing function shouldn’t run, leaving the cell gray (or close to it, depending on what other commits are made that day).

Since we’re using the heatmap to display text, a matrix-based font seemed to make sense. If you’ve seen dot-matrix font styles, these will look familiar. Each position in the matrix corresponds to a day on the heatmap. I used values of ‘1’ and ‘0’ to indicate on and off days, respectively. (And technically these are lists, not matrices, but they are laid out like matrices to make them easier to create.)

As an example, here is the setup for the letter ‘A’:

letters_dict = {
'A' : [0,1,1,1,0,0,
1,0,0,0,1,0,
1,0,0,0,1,0,
1,0,0,0,1,0,
1,1,1,1,1,0,
1,0,0,0,1,0,
1,0,0,0,1,0],
...
}


These matrices are time consuming to create, so I’ve only created the few that I needed. If you make more feel free to send them along via pull a pull request.

#### Automating and scheduling the runs

Now that I had a way to do commits programmatically and something to commit, I needed a way to schedule the Python script to run at the appropriate time. The ultimate goal was to be able to tell the script what I wanted to do at the beginning and have it run unsupervised for a few weeks until it completed.

This was achieved with PythonAnywhere.com and a little bash script. Python Anywhere is a Python-oriented hosting environment. Among many other things, it can be used to schedule Python scripts to run at certain times of the day. A free account allows one daily task and http calls to a whitelist of domains. Fortunately, one task is all we need to run and GitHub.com is on the whitelist.

After uploading the Python code, I created a really simple bash script that calls the main Python script and is scheduled to run daily:

#!/bin/sh
python3.5 GitHubArt/main.py '\$echo "Hi"' '2016-07-31'


And that’s all. The first parameter is the message to display on the GitHub heatmap, the second is the date on which to start typing. Since the GitHub heatmap starts with Sunday at the top, this date should also be a Sunday.

## Philosophical Implications

I took a very obvious approach for someone with no artistic talent – I am using this functionality to print out a *nix command. Christo would not be impressed. Honestly, though, the prospect of using this to make art seems really cool. Given that the intensity can differ for each cell in the heatmap, it is essentially as versatile as a grayscale palette. If I had an artistic bone in my body I might give it a shot. For now, I’ll just use text and appreciate my simple creations as a Buddhist would, for their intrinsic and ephemeral beauty.

You can find all the project files here:
https://github.com/bryancshepherd/GitHubArt

## Article XII Alexa Skill

I’ve redone this post several times already and haven’t been able to get the tone in line with the rest of the site so I’ll just stick to the facts:

• Donald Trump said as president he would support Article XII of the Constitution.
• There is no Article XII of the Constitution.
• I love my Amazon Echo.
• I have been wanting to write a skill for my Amazon Echo (a.k.a. Alexa).
• I wrote an Alexa skill that describes some things Article XII might cover, if it existed.
• It is mostly based on this work by Tim Carr.
• If you want to submit additional things that Article XII might cover (if it existed) tweet them to @bryancshepherd hashtag #whatisarticleXII or post them in the comments. I will add them to the next version of the skill if the initial version passes Amazon’s review.
• There is no chance this skill will pass Amazon’s review.

One of the responses below is selected at random each time Alexa is asked ‘Alexa, what is Article Twelve?’

Article XII of the U.S. Constitution…

• requires that all dogs be trained to shoot free throws, in the event that such a skill is required to settle an international dispute.
• governs the creation, distribution, and taxation of loofahs.
• states that harboring pink, furry, intergalactic lifeforms is prohibited.
• makes it illegal to carry different denominations of change in the same pocket.
• describes the process for making dingy whites all nice and sparkly again.
• requires that cats look apathetic and nonchalant after doing something dumb.
• certifies these are not the droids you’re looking for.
• prohibits making the ‘It must be free’ joke to cashiers when products do not ring up correctly.

## Tracking Reddit Freshness with Python, D3.js, and an RPi

#### Background

A couple of years ago I purchased some Raspberry Pis to build a compute cluster at home. A cluster of RPis isn’t going to win any computing awards, but they’re fun little devices and since they run a modified version of Debian Linux, much of what you learn working with them is generalizable to the real world. Also, it’s fun to say you have a compute cluster at your house, right?

Unfortunately, the compute cluster code and project notes are lost to the ages, so let us never speak of it again. However, after finishing that project I moved on to another one. That work was almost lost to the ages too, but as I was cleaning up the RPis for use as print and backup servers, I came across the aging and dusty code. Although old, this code might be useful to others, so I figured I would write it up quickly and post the code to GitHub. This post describes using the Raspberry Pi, Python, Reddit, D3, and some basic statistics to setup a simple dashboard that displays the freshness of the Reddit front page and r/new.

#### Dashboarding

D3.js was the exciting new visualization solution at the time so I decided to use the Pi’s Apache-based webstack to serve a small D3-based dashboard.

#### Getting the data

Raspberry Pis come with several OS choices. Raspbian is a version of Debian tailored to the Raspberry Pi and is usually the best option for general use. It comes preinstalled with Python 2.7 and 3, meaning it’s easy to get up and running for general computing. Given that Python is available out of the box, I often end up using the PRAW module to get toy data. PRAW is a wrapper for the Reddit API. It’s a well written package that I’ve used for several projects because of its ease of use and the depth of the available data. PRAW can be added in the usual way with:

sudo python3 -m pip install praw

(If you’re using Python != 3 just specify your version in the code above)

PRAW is straightforward to use, but I’m trying to keep this short so it’s beyond the scope of this write up. You can check out the well-written docs and examples at the link above and also check my code in the GitHub repo.

#### Analyzing the data

The analysis of the data was just to get to something that could be displayed on the dashboard, not to do fancy stats. The Pearson correlation numbers are essentially just placeholders, so don’t put much weight in them. However, the r/new analysis is based on the percent of new articles on each API pull and ended up showing some interesting trends – just a reminder that value comes from the appropriateness of your stats, not the complexity of them.

Where statistics or data manipulation were required they were done with Scipy and/or pandas. The Pearson correlation metric is defined as:

$r_{xy} =\frac{\sum ^n_{i=1}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum ^n_{i=1}(x_i - \bar{x})^2} \sqrt{\sum ^n_{i=1}(y_i - \bar{y})^2}}$

but you knew that. I just wanted to add some LaTeX to make this writeup look better.

#### Displaying the data

D3 is the main workhorse in the data visualization, primarily using SVG. The code is relatively simple, as far as these things go. There are essentially two key code blocks, one for displaying the percent of new articles in r/new, the other for displaying a correlation of article ranks on the front page. Below is a snippet that covers the r/new chart:


// Create some required variables
var parseDate = d3.time.format("%Y%m%d%H%M%S").parse;
var x = d3.time.scale()
.range([0, width]);
var y = d3.scale.linear()
.range([height, 0]);
var xAxis = d3.svg.axis()
.scale(x)
.orient("bottom");
var yAxis = d3.svg.axis()
.scale(y)
.orient("left");

// Define the line
var line = d3.svg.line()
.x(function(d) { return x(d["dt"]); })
.y(function(d) { return y(+d["pn"]); });

// Skip a few lines ... (check the code in the repo for details)

// Set some general layout parameters
var s = d3.selectAll('svg');
s = s.remove();
var svg2 = d3.select("body").append("svg")
.attr("width", width + margin.left + margin.right)
.attr("height", height + margin.top + margin.bottom)
.append("g")
.attr("transform", "translate(" + margin.left + "," + margin.top + ")");

// Bring in the data
d3.csv("./data/corr_hist.csv", function(data) {

dataset2 = data.map(function(d) {
return {
corr: +d["correlation"],
dt: parseDate(d["datetime"]),
pn: +d["percentnew"],
rmsd : +d["rmsd"]
};
});

// Define the axes and chart text
x.domain(d3.extent(dataset2, function(d) { return d.dt; }));
y.domain([0,100]);
svg2.append("g")
.attr("class", "x axis")
.attr("transform", "translate(0," + height + ")")
.call(xAxis);
svg2.append("g")
.attr("class", "y axis")
.call(yAxis)
.append("text")
.attr("transform", "rotate(-90)")
.attr("y", 6)
.attr("dy", ".71em")
.style("text-anchor", "end")
.text("% new");
svg2.append("path")
.datum(dataset2)
.attr("class", "line")
.attr("d", line);

svg2.append("text")
.attr("x", (width / 2))
.attr("y", -8)
.attr("text-anchor", "middle")
.style("font-size", "16px")
.style("font-weight", "bold")
.text("Freshness of articles in r/new");

});


And here’s a screenshot of what you get with that bit of code:

As you can see, the number of new submissions follow a very strong trend, peaking in the early afternoon (EST) and hitting a low point in the early morning hours. Depending on your perspective, Americans need to spend less time on Reddit at work or Australia needs to pick up the pace.

If you want to know more of the details head on over to the GitHub repo.

## Physical Computing – Puppies VS Kittens on Reddit

“The best way to have a good idea is to have a lot of ideas.” – Linus Pauling

#### Physical Computing

Although the term is used in a lot of ways, physical computing usually refers to using software to monitor and process analog inputs and then use that data to control mechanical processes in the real world. It’s already commonplace in some areas (e.g., autopilots), but it will be all the rage as the Internet of Things grows and automates. It’s also often used in interactive art and museum exhibits like the Soundspace exhibit at the Museum of Life and Sciences in Durham, NC. In this case we’re manipulating the brightness of two LEDs based on the popularity of animals on Reddit, which I’d say is closer to the art end of the spectrum than the autopilot end.

#### RPis

Rasberry Pis are great for generating ideas. Because they consume very little power and have a very small form factor, they almost beg you to think of tasks that you want them to get started on and then shove them away in a corner for a while to work on. Because they run a modified version of Debian, it’s easy to take advantage of things like the Apache webstack and Python to get things up and running quickly. In fact, in a previous post, I showed an example of using a Pi to fetch data and serve it to a simple D3-based dashboard.

This is another project that takes advantage of the Reddit API, Python on the RPi, and the GPIO interface on the RPi to visually answer the age old question “What’s more popular on the internet – puppies or kittens?”

#### Get data via the Reddit API

Reddit has a very easy to use API, especially in combination with the Python PRAW module, so I use it for a lot of little projects. On top of the easy interface, Reddit is a very popular website, ergo lots of data. For this project I used Python to access the Reddit API and grab frequency counts for mentions of puppies and kittens. As you can see in the code (GitHub repo), I actually used a few canine and feline related terms, but ‘puppies’ and ‘kittens’ are where the interest and ‘aww’ factor are, so I’m sticking with that for the title.

The PRAW module does all the work getting the comments. After installing the module all that’s required is three lines of code:

import PRAW
r = praw.Reddit('Term tracker by u/mechanicalreddit') # Change 'yourname' to your Reddit username
allComments = r.get_comments('all', limit=750) # The maximum number of comments that can be fetched at a time is 1000


You now have a lazy generator (allComments) that you can work with to pull comment details. After fetching the comments, tokenizing, and a few other details that you can look at in the Git repo, we have a list of tokens (resultsSet) that we can send to a function that keeps a running sum for each set of terms:

def countAndSumThings(resultsSet, currentCounts):
resultsSet = resultsSet.lower()
for thing in currentCounts:
thingLower = thing.lower()
searchThing = ' '+thingLower+' '
thingCount = resultsSet.count(searchThing)
currentCounts[thing]+=thingCount
return currentCounts


#### Visualizing the data – i.e., Little Blinky Things

Setting up the RPi GPIO circuitry to control the LEDs is beyond the scope of this post, and it’s also covered better elsewhere than I could do. Here are a couple of resources you may find helpful:

• http://www.instructables.com/id/Easiest-Raspberry-Pi-GPIO-LED-Project-Ever/
• http://www.thirdeyevis.com/pi-page-2.php
• http://raspi.tv/2013/how-to-use-soft-pwm-in-rpi-gpio-pt-2-led-dimming-and-motor-speed-control

On the software side, the RPi.GPIO module provides the basic functionality. The first part of this code initializes the hardware and prepares it to handle the last two lines which set the brightness of the LEDs.

# The RPi/Python/LED interface
import RPi.GPIO as GPIO ## Import GPIO library
GPIO.setmode(GPIO.BCM) ## Use internal pin numbering

# Initialize LEDs
# Green light
GPIO.setup(22, GPIO.OUT) ## Setup GPIO pin 25 to OUT

# Initialize Pulse width modulation on GPIO 25. Frequency=100Hz and OFF
pG = GPIO.PWM(22, 100)
pG.start(0)

# Red light
GPIO.setup(25, GPIO.OUT) ## Setup GPIO pin 22 to OUT

# Initialize Pulse width modulation on GPIO 22. Frequency=100Hz and OFF
pR = GPIO.PWM(25, 100)
pR.start(0)

# Skip some intermediary code...(see repo for the details)

# Update lighting
pG.ChangeDutyCycle(greenIntensity)
pR.ChangeDutyCycle(redIntensity)


Put this in a loop and you’re good to go with continual updating.

And here’s a picture of the final setup. I added a cover to disperse the light a little bit and help with some of the issues with visual perception discussed below:

#### Puppies or Kittens?

If the Internet is 90% cats, Reddit is an anomaly. Dogs were usually the most popular topic of conversation.

#### Some frustrations with…

###### …hardware

I didn’t have any breadboard jumper wires to connect the LEDs and I had some difficulty finding them. I had some male-to-male jumpers from an Arduino kit, but connecting an RPi to a breadboard requires female-to-male connectors. I expected Radio Shack to have them, but no luck. Incidentally, an old 40-pin IDE connector will also work with the newer 40-pin RPi GPIOs, but not 80-wire connector. Note that while the headers for these both have 40 sockets and will physically fit onto the RPi board, attempting to use an 80-pin cable will almost certainly break your Pi. If you’re not sure which type of cable you have, just count the ridges in the cable from the wires. If you still have wires to count after you get to 40, you can’t use that cable. Unless, that is, you want to do what I ultimately did and use individual wires from the cable:

The individual wires of IDE cables are easy to separate and can be stripped with a tight pinch. Because they are single wires, they are easier to manage than if you tried to work with something like speaker wire.

##### …the visualization method

Brightness, it turns out, is not to be a great way to indicate magnitude. Because of the way our eyes perceive color and the way that LED brightness is affected by increases in current it’s not a very clear comparison. The relationship between LED brightness and current is not linear, which the raw percentages can’t be used as intensity levels.

##### …the data

Reddit gets very busy sometimes, and during those times posting of comments can slow down considerably due to commenters receiving page errors and a comment backlog developing. Presumably every comment eventually gets posted and is available to the API, but I’m not 100% sure. If comprehensiveness was a concern this would need to be looked into in more detail.

#### What’s next

In hindsight it would have been better to make the lights blinks faster or slower rather than change the brightness. If there’s a version 0.2 of this project, that will be one of the changes.

Currently you have to change the code directly to change the search terms. Although setting up an interactive session or GUI is overkill for this application, it would be pretty easy to have the Reddit bot that does the term searches check its Reddit messages every so often for a list of new search terms. In that case, changing the search terms would be as easy as sending a Reddit message in some predefined format.

And finally, so maybe you don’t care about the relative popularity of pets on Reddit. With the upcoming election plugging in the candidate names might be interesting. Now that I have some real breadboard wires, I’ll probably set that up the next time I have a free weekend. But go ahead, feel free to clone the repo and do something useful.

## Calling Python from R with rPython

Python has generated a good bit of buzz over the past year as an alternative to R. Personal biases aside, an expert makes the best use of the available tools, and sometimes Python is better suited to a task. As a case in point, I recently wanted to pull data via the Reddit API. There isn’t an R package that provides easy access to the Reddit API, but there is a very well designed and documented Python module called PRAW (or, the Python Reddit API Wrapper). Using this module I was able to develop a Python-based solution to get and analyze the data I needed without too much trouble.

However, I prefer working in R, so I was glad to discover the rPython package, which enables calling Python scripts from R. After finding rPython, I was able to rewrite my purely Python script as a primarily R-based program.

If you want to use rPython there are a couple of prerequisites you’ll need to address if you haven’t already. No surprise, you’ll need to have Python installed. After that, you’ll need to install the PRAW module via pip install praw. Finally, install the rPython package from CRAN. (But see the note below first if you’re on Windows.)

After you’ve completed those steps, it’s as easy as writing your Python script and adding a line or two to your R code.

First create a Python script that imports the praw module and does the first data call:

import praw

# Set the user agent information
# IMPORTANT: Change this if you borrow this code. Reddit has very strong
# guidelines about how to report user agent information
r = praw.Reddit('Check New Articles script based on code by ProgrammingR.com')

# Create a (lazy) generator that will get the data when we call it below
new_subs = r.get_new(limit=100)

# Get the data and put it into a usable format
new_subs=[str(x) for x in new_subs]


Since the Python session is persistent, we can also create a shorter Python script that we can use to fetch updated data without reimporting the praw module

# Create a (lazy) generator that will get the data when we call it below
new_subs = r.get_new(limit=100)

# Get the data and create a list of strings
new_subs=[str(x) for x in new_subs]


Finally, some R code that calls the Python script and gets the data from the Python variables we create:

library(rPython)

# Load/run the main Python script

# Get the variable
new_subs_data <- python.get('new_subs')