## When is the best time to tweet?

Twitter has made a few recent changes that make it hard to follow the chronology of tweets in my timeline. I’ve mostly accepted that, but as a result I don’t feel like I have a sense of when my followers are most active anymore. (Of course, there are a few people whose active times would be classified as ‘always’.) Anyway, these changes make me feel a little disconnected from my roughly 175 current tweeps and that makes me sad.

But beyond the emotional pain of no longer being able to really, I mean really, connect with one’s twitter followers on an existential level, some people care about all those Twitter status markers like retweet, reply, and like counts. For them, knowing when your followers are most active can help you improve those numbers and determine the best time to tweet.

Regardless of where you fall on that completely contrived spectrum, this is something that’s analyzable and would be cool to know. And for those that care, it can give some insight on the best time to engage with your Twitter followers.

So here’s a snippet of what I found using data from my Twitter followers’ activity over the last 28 days. Day of the week is along the y-axis, hour of the day is along the x-axis.

Number of unique followers posting or retweeting during a given hour

Number of retweets from followers during a given hour

You can see mid-afternoon is the most active time for my followers. Interestingly, as you get closer to Friday, the mid-afternoon activity increases in intensity and and happens earlier.

I’m not going to bother running through code for this one in this write up. I’m thinking about throwing this up as a simple webservice for others to use; if so, I’ll do a detailed write up then.

## Resources for Learning Basic Python Programming

As part of the Python for Data Science video series I wanted to provide some basic Python programming resources for those who may be new to Python. The list of links below is designed to get new Python programmers off to a quick start and it focuses on things that are most relevant to data analysis. (E.g., there’s nothing in here about writing custom classes.)

If you have suggestions for other links, feel free to mention them in the comments below.

General overviews:

Installing packages:

Importing packages:

Data types:

Control flow:

Defining functions:

## Left-Handed Chord Charts for Guitar and Mandolin

Left-handed chord charts are hard to come by, especially charts that are good for displaying on an iPad or printing and keeping in a songbook. Given that frustration, you can imagine how glad I was to come across some pretty comprehensive left-handed chord charts for guitar, mandolin, and ukulele on a site called Matt’s Music Monday.

Matt has done a really great thing for all us leftys out there. This format is perfect for quick reference as you’re trying to learn a song. I’ve posted snippet of one of his charts below so you can see why you should go check them out. As you can see, this is good stuff. So if you’re looking for left-handed chord charts for guitar, mandolin, or ukulele, head over to his site to get the original files; they’re more complete and that way he gets the site traffic love.

## Making GitHub Art

The contribution heatmaps on GitHub profiles are interesting. Although they are intended to be passive data visualizations, they don’t have to be. Specifically, they can act as a 7xN pixel –very slowly– scrolling display. After realizing this, I decided I had to do something to shape the blank canvas that is my GitHub commit log.

“An artist is somebody who produces things that people don’t need to have.”
― Andy Warhol

## The plan

Ostensibly, it should be pretty straightforward. The color of each cell of the heatmap is based on the number of commits made that day, so one just needs to automate the appropriate number of commits per day to get the desired shading. For simplicity, I decided to start by using the darkest shade possible to build some text.

## The execution

And to be honest, it pretty much was that simple. The most difficult part was finding a Python library to automate the git commits. Many StackOverflow discussions essentially suggested rolling your own functions because it is relatively simple and flexible. Had I been building something I cared about more, that might have been the way to go, but I was determined not to spend more than a few minutes on this project and I didn’t need a lot of flexibility. I really wanted to find something off-the-shelf with good documentation.

#### Connecting to GitHub

I tried a few valiant entries into the Python/GitHub API space, but what some lacked in functionality the others lacked in documentation. Finally, I tried github3.py and found the right mix. Without too much trouble, I was able to automate connecting to GitHub and making commits. After a little research it looked like ~40 commits per day would be enough to keep the color scaling the way I wanted it.

There is a link to the GitHub repo at the end of this post. These are the main functions for connecting and committing to GitHub:

from github3.py import login
import time

# Comma separated credentials are stored
# in the first row of auth.csv.
with open('auth/auth.csv', newline='') as f:

for row in text:

return(session)

# The function that submits the commits.
# The number of commits should be set to
# something quite a bit higher than your
# normal number of daily commits. Changing number_of_commits
# may also require changing sleep_time
# so that things still complete in a reasonable
# amount of time.
def do_typing(num_of_commits=30, sleep_time=20):

for i in range(num_of_commits):
# Create a file
data = 'typing file'
repo.create_file(path = 'files/dotfile.txt',
content = data.encode('utf-8'))

# Get the file reference for later use
file_sha = repo.contents(path = 'files/dotfile.txt').sha

# Delete the file
repo.delete_file(path = 'files/dotfile.txt',
message = 'Delete dot file',
sha = file_sha)

time.sleep(sleep_time)


#### Translating letters to useable format

With a way to connect in hand, the code needed to know when to connect. Basically, I needed an on/off switch for every day represented on the heatmap. If the switch is on, the committing function should run, making the cell dark. If it is off, the committing function shouldn’t run, leaving the cell gray (or close to it, depending on what other commits are made that day).

Since we’re using the heatmap to display text, a matrix-based font seemed to make sense. If you’ve seen dot-matrix font styles, these will look familiar. Each position in the matrix corresponds to a day on the heatmap. I used values of ‘1’ and ‘0’ to indicate on and off days, respectively. (And technically these are lists, not matrices, but they are laid out like matrices to make them easier to create.)

As an example, here is the setup for the letter ‘A’:

letters_dict = {
'A' : [0,1,1,1,0,0,
1,0,0,0,1,0,
1,0,0,0,1,0,
1,0,0,0,1,0,
1,1,1,1,1,0,
1,0,0,0,1,0,
1,0,0,0,1,0],
...
}


These matrices are time consuming to create, so I’ve only created the few that I needed. If you make more feel free to send them along via pull a pull request.

#### Automating and scheduling the runs

Now that I had a way to do commits programmatically and something to commit, I needed a way to schedule the Python script to run at the appropriate time. The ultimate goal was to be able to tell the script what I wanted to do at the beginning and have it run unsupervised for a few weeks until it completed.

This was achieved with PythonAnywhere.com and a little bash script. Python Anywhere is a Python-oriented hosting environment. Among many other things, it can be used to schedule Python scripts to run at certain times of the day. A free account allows one daily task and http calls to a whitelist of domains. Fortunately, one task is all we need to run and GitHub.com is on the whitelist.

After uploading the Python code, I created a really simple bash script that calls the main Python script and is scheduled to run daily:

#!/bin/sh

#### And you’re done

With that information in hand, you should be good to go, hopefully in less time than it would have taken otherwise.

* The distinction between LaTeX and MathJax may or may not be important for your purposes.

## Physical Computing – Puppies VS Kittens on Reddit

“The best way to have a good idea is to have a lot of ideas.” – Linus Pauling

#### Physical Computing

Although the term is used in a lot of ways, physical computing usually refers to using software to monitor and process analog inputs and then use that data to control mechanical processes in the real world. It’s already commonplace in some areas (e.g., autopilots), but it will be all the rage as the Internet of Things grows and automates. It’s also often used in interactive art and museum exhibits like the Soundspace exhibit at the Museum of Life and Sciences in Durham, NC. In this case we’re manipulating the brightness of two LEDs based on the popularity of animals on Reddit, which I’d say is closer to the art end of the spectrum than the autopilot end.

#### RPis

Rasberry Pis are great for generating ideas. Because they consume very little power and have a very small form factor, they almost beg you to think of tasks that you want them to get started on and then shove them away in a corner for a while to work on. Because they run a modified version of Debian, it’s easy to take advantage of things like the Apache webstack and Python to get things up and running quickly. In fact, in a previous post, I showed an example of using a Pi to fetch data and serve it to a simple D3-based dashboard.

This is another project that takes advantage of the Reddit API, Python on the RPi, and the GPIO interface on the RPi to visually answer the age old question “What’s more popular on the internet – puppies or kittens?”

#### Get data via the Reddit API

Reddit has a very easy to use API, especially in combination with the Python PRAW module, so I use it for a lot of little projects. On top of the easy interface, Reddit is a very popular website, ergo lots of data. For this project I used Python to access the Reddit API and grab frequency counts for mentions of puppies and kittens. As you can see in the code (GitHub repo), I actually used a few canine and feline related terms, but ‘puppies’ and ‘kittens’ are where the interest and ‘aww’ factor are, so I’m sticking with that for the title.

The PRAW module does all the work getting the comments. After installing the module all that’s required is three lines of code:

import PRAW
r = praw.Reddit('Term tracker by u/mechanicalreddit') # Change 'yourname' to your Reddit username
allComments = r.get_comments('all', limit=750) # The maximum number of comments that can be fetched at a time is 1000


You now have a lazy generator (allComments) that you can work with to pull comment details. After fetching the comments, tokenizing, and a few other details that you can look at in the Git repo, we have a list of tokens (resultsSet) that we can send to a function that keeps a running sum for each set of terms:

def countAndSumThings(resultsSet, currentCounts):
resultsSet = resultsSet.lower()
for thing in currentCounts:
thingLower = thing.lower()
searchThing = ' '+thingLower+' '
thingCount = resultsSet.count(searchThing)
currentCounts[thing]+=thingCount
return currentCounts


#### Visualizing the data – i.e., Little Blinky Things

Setting up the RPi GPIO circuitry to control the LEDs is beyond the scope of this post, and it’s also covered better elsewhere than I could do. Here are a couple of resources you may find helpful:

• http://www.instructables.com/id/Easiest-Raspberry-Pi-GPIO-LED-Project-Ever/
• http://www.thirdeyevis.com/pi-page-2.php
• http://raspi.tv/2013/how-to-use-soft-pwm-in-rpi-gpio-pt-2-led-dimming-and-motor-speed-control

On the software side, the RPi.GPIO module provides the basic functionality. The first part of this code initializes the hardware and prepares it to handle the last two lines which set the brightness of the LEDs.

# The RPi/Python/LED interface
import RPi.GPIO as GPIO ## Import GPIO library
GPIO.setmode(GPIO.BCM) ## Use internal pin numbering

# Initialize LEDs
# Green light
GPIO.setup(22, GPIO.OUT) ## Setup GPIO pin 25 to OUT

# Initialize Pulse width modulation on GPIO 25. Frequency=100Hz and OFF
pG = GPIO.PWM(22, 100)
pG.start(0)

# Red light
GPIO.setup(25, GPIO.OUT) ## Setup GPIO pin 22 to OUT

# Initialize Pulse width modulation on GPIO 22. Frequency=100Hz and OFF
pR = GPIO.PWM(25, 100)
pR.start(0)

# Skip some intermediary code...(see repo for the details)

# Update lighting
pG.ChangeDutyCycle(greenIntensity)
pR.ChangeDutyCycle(redIntensity)


Put this in a loop and you’re good to go with continual updating.

And here’s a picture of the final setup. I added a cover to disperse the light a little bit and help with some of the issues with visual perception discussed below:

#### Puppies or Kittens?

If the Internet is 90% cats, Reddit is an anomaly. Dogs were usually the most popular topic of conversation.

#### Some frustrations with…

###### …hardware

I didn’t have any breadboard jumper wires to connect the LEDs and I had some difficulty finding them. I had some male-to-male jumpers from an Arduino kit, but connecting an RPi to a breadboard requires female-to-male connectors. I expected Radio Shack to have them, but no luck. Incidentally, an old 40-pin IDE connector will also work with the newer 40-pin RPi GPIOs, but not 80-wire connector. Note that while the headers for these both have 40 sockets and will physically fit onto the RPi board, attempting to use an 80-pin cable will almost certainly break your Pi. If you’re not sure which type of cable you have, just count the ridges in the cable from the wires. If you still have wires to count after you get to 40, you can’t use that cable. Unless, that is, you want to do what I ultimately did and use individual wires from the cable:

The individual wires of IDE cables are easy to separate and can be stripped with a tight pinch. Because they are single wires, they are easier to manage than if you tried to work with something like speaker wire.

##### …the visualization method

Brightness, it turns out, is not to be a great way to indicate magnitude. Because of the way our eyes perceive color and the way that LED brightness is affected by increases in current it’s not a very clear comparison. The relationship between LED brightness and current is not linear, which the raw percentages can’t be used as intensity levels.

##### …the data

Reddit gets very busy sometimes, and during those times posting of comments can slow down considerably due to commenters receiving page errors and a comment backlog developing. Presumably every comment eventually gets posted and is available to the API, but I’m not 100% sure. If comprehensiveness was a concern this would need to be looked into in more detail.

#### What’s next

In hindsight it would have been better to make the lights blinks faster or slower rather than change the brightness. If there’s a version 0.2 of this project, that will be one of the changes.

Currently you have to change the code directly to change the search terms. Although setting up an interactive session or GUI is overkill for this application, it would be pretty easy to have the Reddit bot that does the term searches check its Reddit messages every so often for a list of new search terms. In that case, changing the search terms would be as easy as sending a Reddit message in some predefined format.

And finally, so maybe you don’t care about the relative popularity of pets on Reddit. With the upcoming election plugging in the candidate names might be interesting. Now that I have some real breadboard wires, I’ll probably set that up the next time I have a free weekend. But go ahead, feel free to clone the repo and do something useful.

## Playing with Gradient Descent in R

Gradient Descent is a workhorse in the machine learning world. As proof of its importance, it is one of the first algorithms that Andrew Ng discusses in his canonical Coursera Machine Learning course. There are many flavors and adaptations, but starting simple is usually a good thing. In this example, it is used to minimize the cost function (the sum of squared errors or SSE) for obtaining parameter estimates for a linear model. I.e.:

$\text{minimize} J(\theta_0, \theta_1) = \dfrac {1}{2m} \displaystyle \sum _{i=1}^m \left (h_\theta (x^{(i)}) - y^{(i)} \right)^2$

Which, when applied to a linear model becomes:

$\theta_0 := \theta_0 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}(h_\theta(x^{(i)}) - y^{(i)})$

$\theta_1 := \theta_1 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}\left((h_\theta(x^{(i)}) - y^{(i)}) x^{(i)}\right)$

Where $\theta_0$ is our intercept and $\theta_1$ is the parameter estimate of our only predictor variable.

Ng’s course is Octave-based, but manually calculating the algorithm in an R script is a fun, simple exercise and if you’re primarily an R-user it might help you understand the algorithm better than the Octave examples. The code full code is in this repository, but here is the walkthrough:

• Create some linearly related data with known relationships
• Write a function that takes the data and starting (or current) estimates as inputs
• Calculate the cost based on the current estimates
• Adjust the estimates in the direction and magnitude indicated by the scaling factor $\alpha$.
• Recursively run the function, providing the new parameter estimates each time
• Stop when the estimate converges (i.e., meets the stopping criteria based on the change in the estimates)

This code is for a simple single variable model. Adding additional variables means calculating the partial derivatives with respect to each item. In other words, adding a version of the $\theta_1$ cost component for each feature in the model. I.e.,

$\theta_j := \theta_j - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}\left((h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)}\right)$

I sometimes use Gradient Descent as a ‘Hello World’ program when I’m playing with statistical packages. It helps you get a feel for the language and its capabilities.