The Internet is Improving Productivity More Than We Realize

Everybody knows economic productivity is the key to how fast an economy can grow, right? So the below image from the BLS shows a disturbing lack of productivity in the last 8 years:



Working in IT, that seems totally wrong, because the Internet has been a great productivity booster for more than the last decade. You can get questions answered on StackExchange, use or write open source software on GitHub, collaborate remotely using a ton of different collaboration tools (email, screensharing, group chat, etc.).

The standard productivity measure does not include all these new technologies and ways of working. So I was glad to see economists looking at ways to measure the Internet's contribution to productivity in The Past Two Decades: The Coming of the Information Economy Looks to Have Doubled Our True Rate of Economic Growth:

...So figure on not 1.75%/year but rather 3.5%/year as the true rate of increase of the American economy’s productivity over the past two decades…

That's awesome for the long-term prospects of economic growth!

Cross-Platform Non-Cloud Personal Backups

I've been trying to figure out my backup strategy at home. My current set of data I want to keep, critical documents and home photos/movies, is about 70GB. My current strategy is to keep backups on several machines at home. I'm trying to avoid using cloud storage. I've been using unison, which is a great backup tool. Some of its cool features are:

  • syncs between two machines, across any OS
  • tells you what files changed whenever you run it, and lets you override its default guesses as to which way to copy/delete files
  • pretty fast incremental backup (under a minute to detect all diffs on my data set)

Its only major downside is that it's not actively maintained. However, it's gone through a lot of testing, and I've been using it reliably for years, so I think it's a pretty solid app. It has a solid design, designed so that it never leaves your system in a bad state, even if interrupted. And it's open source, though it's written in OCaml, not the most widely-known language.

For a completely different approach, I looked into using a source-control system as my backup strategy. Specifically, I looked at using the distributed source control system Mercurial hg. After doing some testing, here are the problems I encountered with it:

  1. hg crashes on big files (you need 3x or 5x the RAM for the largest file you have). However, there is an extension called largefiles which ships with hg (though it's turned off by default), which can work around this problem
  2. hg does not preserve modification times of files. There are a few extensions which can work around this, but I wasn't really happy with them.
  3. you will need at least double your storage when using hg, since hg (like all major distributed source control systems) stores a copy of every file you add to a hidden folder (.hg)

I didn't test git, but it will definitely have problems #2 and #3 above. Oh well, back to unison for me for the time being.

P.S. I looked at git-annex as another approach, because it certainly is aiming at someone like me. However, its design seemed a little rickety to me -- I believe it replaces all your files with symlinks, it makes your files read-only by default (you need to explicitly "checkout" a file to change it) and it forks git commands under the covers. I think a backup solution needs to be simple and not have too many dependencies.

How to Integrate App Engine app with Google Drive

Let's assume you're writing a Google App Engine app in Python.  And you want to use the Google File Picker API to select files from the user's Google Drive, and also the Google Drive API to download the selected files. You can write your server-side appengine app in Python, and then use Javascript for the file picker and file download.  It's not super hard, but I couldn't find it completely documented anywhere.  I only found one description of the tough issues .

At a high-level, your architecture is:

- Server: App Engine app in Python, which includes:

    - Google API Python client

    - your custom code

- Client: browser app (we like AngularJS), which includes:

    - Google File Picker js support

    - Google Drive js support

    - your custom code

App Engine App Changes

1. Start with your standard App Engine app.

2. Download and install the Google API Python client into your app. It supports oauth2.

3. Add oauth2 auth code at the top of your file:

    # Put the scopes you want below, here are the scopes for read-only drive access and getting user's email address

# CLIENT_SECRETS, name of a file containing the OAuth 2.0 information for this
# application, including client_id and client_secret, which are found
# on the API Access tab on the Google APIs
# Console <>
CLIENT_SECRETS = os.path.join(os.path.dirname(__file__), 'client_secrets.json')

# Helpful message to display in the browser if the CLIENT_SECRETS file is missing.
MISSING_CLIENT_SECRETS_MESSAGE = '''File client_secrets.json is missing.'''

# Create decorator.
http = httplib2.Http(memcache)
decorator = oauth2decorator_from_clientsecrets(
4. Make your main page have security check:

# Below is the method which handles the home page.
# Notice it uses the decorator we just defined.
# If you're authed then it shows you the standard home page,
# else it shows you an unauth.html page.
def get(self):
if decorator.has_credentials():
self.response.out.write(template.render('index.html', {}))
url = decorator.authorize_url()
{'authorize_url': url}))

So far, we've put security on the app so that only users who have valid Google Accounts and users who have given permission to our app can access the app.

Javascript Changes

On the client side, right before you're about to use one of the Google services, you should make an ajax call to your server, the server code should:
def get(self):
if decorator.has_credentials():
if decorator.credentials.access_token_expired:
http = decorator.http()
access_token = decorator.credentials.access_token,
error = '',
expires_in = '10000', # string duration in seconds. Value doesn't matter in this context.
state = decorator._scope,
access_token = '',
error = 'User is not logged in or authenticated',

When the client receives this response, it should check if there is an error. If there is an error then it should show the user, else it should proceed. If there is no error then it should store the access_token in a javascript variable (called GOOGLE_OAUTH2_ACCESS_TOKEN) below, because it will need it shortly. Now that you've validated the user's current login status, you can use the Google File Picker like this:
        var picker = new google.picker.PickerBuilder()
.addView(new google.picker.View(google.picker.ViewId.DOCS_IMAGES))
Above, you see that we stored the entire results of the ajax call in javascript variable GOOGLE_OAUTH2_ACCESS_TOKEN, and we're passing its access_token attribute to the Google File Picker.

And here is some sample javascript to use the Google Drive API to read the contents of a file selected above:

        grabFileFromGoogleDrive: function (fileId, imageBlock) {
var request ={
'fileId': fileId
request.execute(function(resp) {
// resp has fields such as: title, description, mimeType
function gotFileContents(contents) {
// contents has binary data of file contents
grabFileContentsFromGoogleDrive(resp, gotFileContents);

grabFileContentsFromGoogleDrive: function(file, callback) {
if (file.downloadUrl) {
var xhr = new XMLHttpRequest();
var url = file.downloadUrl + '&access_token=' + GOOGLE_OAUTH2_ACCESS_TOKEN.access_token;'GET', url);
xhr.responseType = 'arraybuffer'; // only way to get binary files properly
xhr.onload = function() {
callback(xhr.response); // notice we're using response not responseText
xhr.onerror = function() {
} else {

There Are Only Two Ways to Enforce Unique Constraints in Google App Engine

Well, it sucks but it's true. There's a bunch of noise on the Internet about how to do unique constraints in Google App Engine, but it seems like these are the only two safe ways. The thing to remember is, the only uniqueness that GAE will guarantee is on key names.


Approach #1: Make the unique field be the key name

As long as you promise to yourself that you'll never need to change the value, then you should make the unique field be the "key name" in the table. Then you can call db.Model.get_or_insert(key_name) to either retrieve an existing one or create a new one. Too bad in the real world, it's really hard to find something like this. Email addresses never change? Let me check my hotmail account. Social security numbers never change? Well, unless there was a typo. Every time I think I found a unique field that can never change, I'm proven wrong. So you'll never catch me doing this. But YMMV.

Approach #2: Create a separate table to track all used ids

This is pretty slick -- I found it in this article: Add a Unique Constraint to Google App Engine. The approach is to create a new table called Unique, and make the "key name" of each row in Unique be a combination of the name of the table and the unique field value. The downside is that you have extra storage overhead: a new table with the list of unique fields, plus the default indices that GAE makes for it. This is fairly analogous to declaring a unique index in SQL databases, since a unique SQL index is merely a persisted list of unique fields. There's just less overhead in a SQL database implementation because it's only one list, whereas in GAE this user-generated unique index is really one list (for the table) plus the default indices GAE makes for any entity.


I must say that Google is aware of this issue. It looks like they'd have to make some major architectural changes to implement this. My guess is if they ever fix this, they'll sneak in approach #2, above their BigTable support and below the API that us users use.

Easy Way to Test Offline HTML5 Web Apps

Assuming you've already written your cache.manifest and everything according to the HTML5 Offline Web Applications spec, you'll want to test it. So far, the easiest way I've found to test is in the chrome browser on your desktop. To test:

1. Go to your url. Keep your javascript console open, and you should initially see messages about the "Application Cache" being filled in. It will tell you if you have any bad links in your cache.manifest file, and you should fix those.

2. Once you're viewing your cacheable web page, click Refresh. This causes the browser to test that the manifest has good links, and again you should fix any problems you see.

3. Now that you've tested it in your desktop's browser, you can test on an ipad by pulling the link up in Safari and clicking the "Add to Home Screen" link. This will create an icon on your ipad home screen for your app. You can then click that icon to test that everything works. And then to be really sure it works, turn on Airplane mode in settings and test that you can still launch your app.

A note about testing in chrome: you cannot clear the application cache by running the standard "Clear Browsing Data". Instead, you should go to the magic url:  chrome://appcache-internals/ and click "Remove this AppCache" to clear it. On the ipad, if you go to Settings, Safari, and "Clear Cache" then it appears to clear out and invalidate your local application cache.

HTML5 Storage Wars - localStorage vs. IndexedDB vs. Web SQL

Currently, there are three competing approaches for saving serious amounts of data (i.e., persistently, and bigger than cookies) locally in your browser:

  1. Web Storage
  2. Indexed Database API
  3. Web SQL Database

These names sure seem similar. But the implementations sure are different. Let's quickly summarize what they do, the PROs and CONs, and what I like best at the moment. Though I'm sure my opinions will age quickly as the technology matures.

All these technologies use the same-origin protection for data access (i.e., javascript can only access data from the url's domain that it was served from), which is fine and not a differentiator so I won't mention that below.

Web Storage

Web Storage, and specifically the localStorage part of it, is a really simple key/value persistence system.

PRO: Really simple API.

PRO: Already available in all major new browsers.

CON: No query language, schemas, really nothing you'd normally call a database. So it wouldn't scale well where you need to impose organization on a larger data set.

CON: They didn't put transactional safety into the standard. I don't think I can sleep at night with an app running that might have race conditions and then have the risk of corrupt data.

Indexed Database API

IndexedDB is basically a simple flat-file database with hierarchical key/value persistence and basic indexing.

PRO: If you're a NoSQL type of person, then this might fit the bill perfectly.

CON: Not yet available in most new browsers.

CON: If you wanted SQL, you're not getting it here. Though in the future, it might be a great building block for implementing a SQL engine for the browser.

Web SQL Database

Web Sql is basically sqlite embedded into the browser. I've used sqlite off-and-on for a few years. At first I was turned off by the name (excuse my superficiality) and by the architecture (just a flat file database). But after digging into it, I found that it's a rock solid platform, and great for production use, as long as you keep in mind its limitations. Its limitations aren't so much size (I think you can go at least 1 GB without a problem), but inherent in flat file databases (high levels of concurrency) and missing features (stored procs and other higher-end database features).

PRO: Fast and pretty feature-rich sql implementation (besides select/insert/update/delete, you can do joins, inner selects, etc).

CON: Available in Chrome and webkit-based browsers (Safari, etc.) but not Firefox or IE.

CON: The darn W3C Working Group has put a hold on the standard since they say they want at least two independent implementations of the standard, and there's only one so far, since everybody is using sqlite.

I wish the standards group would make a special case for sqlite and approve this standard -- it's in the public domain, so it's available to everybody in the world, no strings attached. Sqlite is a perfect fit for browsers - its limitations are not problems. Only one user is using a browser at a time, so no concurrency issues.  And no one wants high-end database features in a simple no-administration database. I think that's why both Android and iOS use sqlite for storage.

Bottom line

If you're only deploying on mobile platforms, then Web SQL is a no-brainer. Or if you're running on desktops and can require Chrome or Safari as your browser, then Web SQL is also for you. I wouldn't use the other two standards in any heavy-duty app at the moment.

Python on Android -- Easy as Pie

Here's what I did to get Python running on my HTC Incredible:

Step 1:  Install a barcode scanner, so that it's easier to install custom apps:  Install ZXing app

Step 2:  Install the SL4A application (Scripting Layer for Android) by going to the SL4A project home page and running the previously installed "Barcode Scanner" app (the title on the icon of the ZXing app) to scan the SL4A's barcode (or just click the link if you're reading this on your Android). This downloads a .apk file which you then run to install this app directly onto your phone, since it's not yet available in the Market. You might have to change your phone's settings to allow installing non-Market apps -- go to Settings / Applications and make sure "Unknown sources" is checked.

Step 3:  Install the Python interpreter:

  1. Run app SL4A.
  2. Press the menu button, then View, then Interpreters.  You should only have Shell at first.
  3. Press the menu button, then Add, then select Python 2.6.2. After the .apk file for Python downloads, run it to install it.

Now you can edit and run Python scripts on your Android, and you have a lot of the Android functionality available. Here's my first test script:

import android

droid = android.Android()

droid.ttsSpeak('Hello World')

You can press Help to see the API that's available, it's a subset of the entire Android API.

Open Source Data Modelling - Power*Architect

There are a ton of half-finished or half-complete open source data modeling software projects out there. I've just been evaluating them for a client. At the moment, Power*Architect seems the best for my needs:

  • open source
  • multi-platform - Java in this case
  • multi-database support - I need PostgreSQL and Oracle
  • pdf export of ER diagram
  • alive and not abandonware

And to top it off, it's actually easy to use! Take that, ERwin. To be fair, it doesn't have all the features of ERwin, but that's OK for me.

Summary of Python Web Frameworks, Beginning of 2008

Let me summarize our recommendations for Python web frameworks:

  • Pylons: for database-driven applications
  • Plone (built on Zope3): for CMS (content management systems)
  • Twisted: for multiple/custom network protocols or extremely high volume

Overall, the Python web community is seeing a lot of active development and evolution in solutions. Whereas the Ruby community has one go-to web framework (Ruby on Rails), the Python community has a plethora of choices. Luckily, a Python standard was created, PEP 333 aka WSGI (pronounced "whiskey" but stands for Web Server Gateway Interface), which provides a standard component architecture for Python web servers. All of the above frameworks support this standard now. This interface standard allows plug-and-play middleware.

We recommend Pylons for database-driven application development because internally it is built on WSGI. If TurboGears 2.0 development proceeds as designed and it is written on top of Pylons, then we will recommend it. Here are the middleware components that we recommend using with Pylons, and the features of each that we really love:

Database ORM: SQLAlchemy

  • well designed: provides nice object interface to SQL tables, but also let's you get to underlying SQL if you want to
  • really efficient: can load entire graph of objects in one SQL query - e.g., load item and line item details in one round-trip to the database

Templating: Mako

  • fast: it's the fastest of the Python templating engines
  • simple: you use standard Python in your templates, instead of custom tag libraries

Request Dispatching: Routes

  • flexible: regular expressions with intelligent name binding to map tidy-looking urls to your internal object hierarchy

Form handling: ToscaWidgets

  • no repetition: you model your form elements using Python objects, and use this model both for display and validation
  • flexible: you can either use the default or specify your own template for html generation

Every category of middleware above has competition, but these are our current recommendations. Even though Pylons is a relatively new project, it is built on more mature components that are on at least their second major version.