Collaborating on github with a Private Repository

In this post, I will explain:

  • how to configure a repository on github for collaboration
  • a workflow using a master, QA, and PROD branches for different stages of your development cycle
  • how to commit a change
  • how to promote a change through the workflow

How to configure a repository on github for collaboration

We’ll assume that a remote repository has already been created by a github user named “stevo”. Now, suppose another github user named “john” would like to collaborate on this remote repository. The first thing to do is allow john access to the remote repository.

Here are the steps to add john as a collaborator:
1. john creates a private and public SSH key, if he/she doesn’t have one ([username]@[hostname] is my preferred naming convention for a key, where username is your local host user name and hostname is your local host name)

ssh-keygen -t rsa -C "[username]@[hostname]"

2. john copies public SSH key and sends it to stevo

cat ~/.ssh/id_rsa.pub | xclip -sel clip

3. stevo adds john’s public SSH key to github repository for access privileges (I name it [username]@[hostname])
github > Account Settings > Account Overview > SSH Public Keys > Click on "Add another public key" link > Copy public SSH key to value textbox
4. stevo adds john to github repository
github > Dashboard > Click on repository URL > Click on "Admin" button > Click on "Add another collaborator" link > Enter collaborator github account name

The workflow

Before we begin, I’ll define some git concepts that we need to understand:
Working tree: Files that are ‘checked out’ for editing.
The Index: Git stages changed files in “the index” before they are committed to the local repository. This allows individual files (or even individual diff blocks) to be committed even when other changes exist in the working tree.
Local repository: Git maintains a complete copy of all files, branches and tags of the repository in the .git directory at the top of a working tree.
Remote repositories: Git can also track repositories maintained outside the .git directory, either elsewhere on the filesystem or on a different host.

The development lifecycle contains these stages: DEV -> QA -> PROD
We’ll represent these stages in our repository using the master branch as our DEV stage, QA branch as our QA stage, and PROD branch as our PROD stage.

Clone the remote repository onto your localhost. By default, git will make this the master branch (this command creates a folder in your current path called [repository_name]).

git clone git@github.com:[stevo]/[repository_name].git

Create 2 remote branches: QA and PROD

git branch -b QA
git branch -b PROD

To see all the branches available, run this command:

git branch -a

This command should output the following (The star denotes the current branch you’re in; HEAD is a pointer to the current branch):


* master
  origin/HEAD
  origin/PROD
  origin/QA
  origin/master

A ‘tracking branch’ in Git is a local branch that is connected to a remote branch. When you push and pull on that branch, it automatically pushes and pulls to the remote branch that it is connected with.
We want to create a tracking branch for both QA and PROD by running these commands on your localhost:

git branch --track QA origin/QA
git branch --track PROD origin/PROD

You should now have all these branches available:


  PROD
  QA
* master
  origin/HEAD
  origin/PROD
  origin/QA
  origin/master

Committing a change

Step 1: Switch to the master branch

git checkout master

Step 2: Update the master branch from the remote master branch (origin/master):

git pull

Step 3a: If you’ve created a new file and you’d like to check it in:
Add new file to local repository

git add 

Push changed file from local repository to remote repository

git push origin master

Step 3b: If you’ve updated an existing file in the local project and you’d like to commit the changes:
Commit changed file to local repository

git commit [path/to/filename] -m "Insert comments here"

Push changed file from local repository to remote repository

git push origin master

Step 3c: If you’ve deleted an existing file in the local project and you’d like to commit the changes:
Delete the file in local repository

git rm [filename]

Commit changes from local repository to remote repository

git commit [path/to/filename] -m "Deleted file"

Push changes from local repository to remote repository

git push

Promoting a change through the workflow

Step 0: Sync the local master branch with the remote master branch

git pull origin master

Step 0: Diff the QA branch with the master branch to verify your changes

git diff QA master

Step 1: Switch to QA branch (Note: after switching, you’ll see that your changes are not there anymore because you’re now in the QA branch! Cool!)

git checkout QA

Step 2: Promote code from master branch to QA branch (Note: this will promote everything)

git merge master

Step 1: Switch to PROD branch

git checkout PROD

Step 2: Promote code from QA branch to PROD branch (Note: this will promote everything)

git merge QA

HTTP post request script with authentication

Rigorous automated unit testing is how I keep my bases covered. It helps me maintain confidence and frankly my sanity. Running the tests is a piece of cake. Manually creating a new test bed containing about 100 image files … not so much. I have a Grails app that uses Spring (formerly known as Acegi) security for authentication. That means, the HTTP POST request must be sent with credentials stored in a cookie file.

Here’s the Groovy/HTML markup for uploading a single image. Note the HTML input element id “file”. The HTTP POST request contains a set of key value pairs. The image file will be identified by the “file” key.

<g:form action="save" method="post" enctype="multipart/form-data">
    <dl>
        <dt><label for="file">File:</label></dt>
        <dd><input id="file" name="file" size="40" type="file" /></dd>
    </dl>
    <span>
        <input type="submit" value="Create" />
    </span>
</g:form>

To upload a single image, I wrote 2 shell scripts to do this and put them in my bin folder.

The first script creates a local cookie file:
bakecookie username password

The second script constructs a HTTP POST request using curl and sends it to the web server:
postimage me.JPG

Firing these HTTP POST requests in batches is just a matter of looping the second script over a directory of image files.

Application Performance Benchmarking

A while ago I wrote an engine for processing images. The algorithm involved reading the image data, parsing the segments, reconciling the data in some of the segments, serializing the data into bytes, and finally writing to an output file. I have an idea on how to improve the algorithm, but I’d like to record an initial performance benchmark so that when I do implement the new algorithm, I can quantify just how much faster it is. Here’s what I did:

1) Log the times of each step in the algorithm

You can subdivide your algorithm into as many steps as you want depending on the level of granularity in which you desire. I’m using the Log4J Logging Framework. This is the default schema for each Log4J performance statement:
2010-03-09 11:10:31,033 INFO - start[1267311835950] time[82] tag[reconcile] message[apollo-1-fgoiwkz9sd73-1267311835950]

start – the epoch time stamp when the event occurred
time – the duration of the event
tag – The tag name for this timing call. Tags are used to group timing logs, thus each block of code being timed should have a unique tag. Note that tags can take a hierarchical format using dot notation. I put the step ID here.
message – Additional text to be printed with the logging statement. I put the my own custom request ID (hostname-cloneNumber-sessionId-timeStamp) here.

2) Parse the performance logs into a CSV

I’m interested in seeing an ordered list of average times and standard deviation for each tag.
First, I need to parse the logs into a format I can work with, like a CSV. To create the CSV, run the following perl script:

USAGE:
log_parser.pl < cat performance.log > performance.csv

INPUT:
2010-03-09 11:10:31,033 INFO - start[1267311835950] time[82] tag[reconcile] message[apollo-1-fgoiwkz9sd73-1267311835950]
...

OUTPUT:
2010-03-09, 11:10:31, 1267311835950, 82, reconcile, apollo-1-fgoiwkz9sd73-1267311835950
...

3) Tallying information from the CSV and create a report

To create an ordered list of average times and standard deviation for each tag, run the following command:

USAGE:
performance_stat_cruncher.pl < performance.csv > stats.csv

INPUT:
2010-03-09, 11:10:31, 1267311835950, 82, reconcile, apollo-1-fgoiwkz9sd73-1267311835950
...

OUTPUT:
read, 1, 1
parse, 2.5, 3.70135110466435
reconcile, 12.123595505618, 5.12617810771974
...

CONCLUSION

That’s it! Once you’ve implemented a performance logging framework in your application, producing reports can be done in 2 simple steps. Here’s the command that ties it all together:
log_parser.pl performance.log | performance_stat_cruncher.pl