Electric Duncan

Wednesday, August 09, 2017

NASA/EOSDIS Earthdata

Update

It's been a few years since I posted on this blog -- most of the technical content I've been contributing to in the past couple years has been in the following:

The LFE Blog

The Clojang Blog

But since the publication of the Mastering matplotlib book, I've gotten more and more into satellite data. The book, it goes without saying, focused on Python for the analysis and interpretation of satellite data (in one of the many topics covered). After that I spent some time working with satellite and GIS data in general using Erlang and LFE. Ultimately though, I found that more and more projects were using the JVM for this sort of work, and in particular, I noted that Clojure had begun to show up in a surprising number of Github projects.

EOSDIS

Enter NASA's Earth Observing System Data and Information System (see also earthdata.nasa.gov and EOSDIS on Wikipedia), a key part of the agency's Earth Science Data Systems Program. It's essentially a concerted effort to bring together the mind-blowing amounts of earth-related data being collected throughout, around, and above the world so that scientists may easily access and correlate earth science data for their research.

Related NASA projects include the following:

The acronym menagerie can be bewildering, but digging into the various NASA projects is ultimately quite rewarding (greater insights, previously unknown resources, amazing research, etc.).

Clojure

Back to the Clojure reference I made above: I've been contributing to the nasa/Common-Metadata-Repository open source project (hosted on Github) for a few months now, and it's been amazing to see how all this data from so many different sources gets added, indexed, updated, and generally made so much more available to any who want to work with it. The private sector always seems to be so far ahead of large projects in terms of tech and continuously improving updates to existing software, so its been pretty cool to see a large open source project in the NASA Github org make so many changes that find ways to keep helping their users do better research. More so that users are regularly delivered new features in a large, complex collection of libraries and services thanks in part to the benefits that come from using a functional programming language.

It may seem like nothing to you, but the fact that there are now directory pages for various data providers (e.g., GES_DISC, i.e., Goddard Earth Sciences Data and Information Services Center) makes a big difference for users of this data. The data provider pages now also offer easy access to collection links such as UARS Solar Ultraviolet Spectral Irradiance Monitor. Admittedly, the directory pages still take a while to load, but there are improvements on the way for page load times and other related tasks. If you're reading this a month after this post was written, there's a good chance it's already been fixed by now.

Summary

In summary, it's been a fun personal journey from looking at Landsat data for writing a book to working with open source projects that really help scientists to do their jobs better :-) And while I have enjoyed using the other programming languages to explore this problem space, Clojure in particular has been a delightfully powerful tool for delivering new features to the science community.

Friday, July 10, 2015

Mastering matplotlib: Acknowledgments

The Book

Well, after nine months of hard work, the book is finally out! It's available both on Packt's site and Amazon.com. Getting up early every morning to write takes a lot of discipline, it takes even more to say "no" to enticing rabbit holes or herds of Yak with luxurious coats ripe for shaving ... (truth be told, I still did a bit of that).

The team I worked with at Packt was just amazing. Highly professional and deeply supportive, they were a complete pleasure with which to collaborate. It was the best experience I could have hoped for. Thanks, guys!

The technical reviewers for the book were just fantastic. I've stated elsewhere that my one regret was that the process with the reviewers did not have a tighter feedback loop. I would have really enjoyed collaborating with them from the beginning so that some of their really good ideas could have been integrated into the book. Regardless, their feedback as I got it later in the process helped make this book more approachable by readers, more consistent, and more accurate. The reviewers have bios at the beginning of the book -- read them, and look them up! These folks are all amazing!

The one thing that slipped in the final crunch was the acknowledgements, and I hope to make up for that here, as well as through various emails to everyone who provided their support, either directly or indirectly.

Acknowledgments

The first two folks I reached out to when starting the book were both physics professors who had published very nice matplotlib problems -- one set for undergraduate students and another from work at the National Radio Astronomy Observatory. I asked for their permission to adapt these problems to the API chapter, and they graciously granted it. What followed were some very nice conversations about matplotlib, programming, physics, education, and publishing. Thanks to Professor Alan DeWeerd, University of Redlands and Professor Jonathan W. Keohane, Hampden Sydney College. Note that Dr. Keohane has a book coming out in the fall from Yale University Press entitled Classical Electrodynamics -- it will contain examples in matplotlib.

Other examples adapted for use in the API chapter included one by Professor David Bailey, University of Toronto. Though his example didn't make it into the book, it gets full coverage in the Chapter 3 IPython notebook.

For one of the EM examples I needed to derive a particular equation for an electromagnetic field in two wires traveling in opposite directions. It's been nearly 20 years since my post-Army college physics, so I was very grateful for the existence and excellence of SymPy which enabled me to check my work with its symbolic computations. A special thanks to the SymPy creators and maintainers.

Please note that if there are errors in the equations, they are my fault! Not that of the esteemed professors or of SymPy :-)

Many of the examples throughout the book were derived from work done by the matplotlib and Seaborn contributors. The work they have done on the documentation in the past 10 years has been amazing -- the community is truly lucky to have such resources at their fingertips.

In particular, Benjamin Root is an astounding community supporter on the matplotlib mail list, helping users of every level with all of their needs. Benjamin and I had several very nice email exchanges during the writing of this book, and he provided some excellent pointers, as he was finishing his own title for Packt: Interactive Applications Using Matplotlib. It was geophysicist and matplotlib savant Joe Kington who originally put us in touch, and I'd like to thank Joe -- on everyone's behalf -- for his amazing answers to matplotlib and related questions on StackOverflow. Joe inspired many changes and adjustments in the sample code for this book. In fact, I had originally intended to feature his work in the chapter on advanced customization (but ran out of space), since Joe has one of the best examples out there for matplotlib transforms. If you don't believe me, check out his work on stereonets. There are many of us who hope that Joe will be authoring his own matplotlib book in the future ...

Olga Botvinnik, a contributor to Seaborn and PhD candidate at UC San Diego (and BioEng/Math double major at MIT), provided fantastic support for my Seaborn questions. Her knowledge, skills, and spirit of open source will help build the community around Seaborn in the years to come. Thanks, Olga!

While on the topic of matplotlib contributors, I'd like to give a special thanks to John Hunter for his inspiration, hard work, and passionate contributions which made matplotlib a reality. My deepest condolences to his family and friends for their tremendous loss.

Quite possibly the tool that had the single-greatest impact on the authoring of this book was IPython and its notebook feature. This brought back all the best memories from using Mathematica in school. Combined with the Python programming language, I can't imagine a better platform for collaborating on math-related problems or producing teaching materials for the same. These compliments are not limited to the user experience, either: the new architecture using ZeroMQ is a work of art. Nicely done, IPython community! The IPython notebook index for the book is available in the book's Github org here.

In Chapters 7 and 8 I encountered a bit of a crisis when trying to work with Python 3 in cloud environments. What was almost a disaster ended up being rescued by the work that Barry Warsaw and the rest of the Ubuntu team did in Ubuntu 15.04, getting Python 3.4.2 into the release and available on Amazon EC2. You guys saved my bacon!

Chapter 7's fictional case study examining the Landsat 8 data for part of Greenland was based on one of Milos Miljkovic's tutorials from PyData 2014, "Analyzing Satellite Images With Python Scientific Stack". I hope readers have just as much fun working with satellite data as I did. Huge thanks to NASA, USGS, the Landsat 8 teams, and the EROS facility in Sioux Falls, SD.

My favourite section in Chapter 8 was the one on HDF5. This was greatly inspired by Yves Hilpisch's presentation "Out-of-Memory Data Analytics with Python". Many thanks to Yves for putting that together and sharing with the world. We should all be doing more with HDF5.

Finally, and this almost goes without saying, the work that the Python community has done to create Python 3 has been just phenomenal. Guido's vision for the evolution of the language, combined with the efforts of the community, have made something great. I had more fun working on Python 3 than I have had in many years.

Thursday, January 01, 2015

Scientific Computing and the Joy of Language Interop

The scientific computing platform for Erlang/LFE has just been announced on the LFE blog. Though written in the Erlang Lisp syntax of LFE, it's fully usable from pure Erlang. It wraps the new py library for Erlang/LFE, as well as the ErlPort project. More importantly, though, it wraps Python 3 libs (e.g., math, cmath, statistics, and more to come) and the ever-eminent NumPy and SciPy projects (those are in-progress, with matplotlib and others to follow).

(That LFE blog post is actually a tutorial on how to use lsci for performing polynomial curve-fitting and linear regression, adapted from the previous post on Hy doing the same.)

With the release of lsci, one can now start to easily and efficiently perform computationally intensive calculations in Erlang/LFE (and any other Erlang Core-compatible language, e.g., Elixir, Joxa, etc.) That's super-cool, but it's not quite the point ...

While working on lsci, I found myself experiencing a great deal of joy. It wasn't just the fact that supervision trees in a programming language are insanely great. Nor just the fact that scientific computing in Python is one of the best in any language. It wasn't only being able to use two syntaxes that I love (LFE and Python) cohesively, in the same project. And it wasn't the sum of these either -- you probably see where I'm going with this ;-) The joy of these and many other fantastic aspects of inter-operation between multiple powerful computing systems is truly greater than the sum of its parts.

I've done a bunch of Julia lately and am a huge fan of this language as well. One of the things that Julia provides is explicit interop with Python. Julia is targeted at the world of scientific computing, aiming to be a compelling alternative to Fortran (hurray!), so their recognition of the enormous contribution the Python scientific computing community has made to the industry is quite wonderful to see.

A year or so ago I did some work with Clojure and LFE using Erlang's JInterface. Around the same time I was using LFE on top of Erjang, calling directly into Java without JInterface. This is the same sort of Joy that users of Jython have, and there are many more examples of languages and tools working to take advantage of the massive resources available in the computing community.

Obviously, language inter-op is not new. Various FFIs have existed for quite some time (I'm a big fan of the Common Lisp CFFI), but what is new (relatively, that is ... as I age, anything in the past 10 years is new) is that we are seeing this not just for programs reaching down into C/C++, but reaching across, to other higher-level languages, taking advantage of their great achievements -- without having to reinvent so many wheels.

When this level of cooperation, credit, etc., is done in the spirit of openness, peer-review, code-reuse, and standing on the shoulders of giants (or enough people to make giants!), we get joy. Beautiful, wonderful coding joy.

And it's so much greater than the sum of the parts :-)

Saturday, December 27, 2014

Improved Python Support in Erlang/LFE

The previous post on Python support in Erlang/LFE made Hacker News this week, climbing in fits and starts to #19 on the front page. That resulted in the biggest spike this blog has seen in several months.

It's a shame, in a way, since it came a few days too early: there's a new library out for the Erlang VM (written in LFE) which makes it much easier to use Python from Erlang (the language from Sweden that's famous for impressing both your mum and your cats).

The library is simply called py. It's a wrapper for ErlPort, providing improved usability for Python-specific code as well as an Erlang process supervision tree for the ErlPort Python server. It has an extensive README that not only does the usual examples with LFE, but gives a full accounting of usage in the more common Prolog-inspired syntax Erlang. The LFE Blog has a new post with code examples as well as a demonstration of the py supervision tree (e.g., killing Python server processes and having them restart automatically) which hasn't actually made it into the README yet -- so get it while it's hot!

The most exciting bits are yet to come: there are open tickets for:

work on multiple Python server processes
scheduling code execution to these, and
full Python distribution infrastructure with parallel execution.

This could drastically change the picture for compute-intensive tasks in Erlang, Elixir, LFE, and Joxa. The Erlang VM was never intended to excel at the sort of problems that Python has traditionally focused on... yet it provides the sort of infrastructure that the Python community has been agonizing over for more than a decade. For Pythonistas, this may not be a very big deal ... but for the Erlang and functional programming communities, the LFE py project could be a life-saver for any number of projects which need easy-access to the strengths of Python.

Friday, November 28, 2014

Scientific Computing with Hy and IPython

This blog post is a bit different than other technical posts I've done in the past in that the majority of the content is not on the blog in or gists; instead, it is in an IPython notebook. Having adored Mathematica back in the 90s, you can imagine how much I love the IPython Notebook app. I'll have more to say on that at a future date.

I've been doing a great deal of NumPy and matplotlib again lately, every day for hours a day. In conjunction with the new features in Python 3, this has been quite a lot of fun -- the most fun I've had with Python in years (thanks Guido, et al!). As you might have guessed, I'm also using it with Erlang (specifically, LFE), but that too is for a post yet to come.

With all this matplotlib and numpy work in standard Python, I've been going through Lisp withdrawals and needed to work with it from a fresh perspective. Needless to say, I had an enormous amount of fun doing this. Naturally, I decided to share with folks how one can do the latest and greatest with the tools of Python scientific computing, but in the syntax of the Python community's best kept secret: Clojure-Flavoured Python (Github, Twitter, Wikipedia).

Spoiler: observed data and
polynomial curve fitting

Looking about for ideas, I decided to see what Clojure's Incanter project had for tutorials, and immediately found what I was looking for: Linear regression with higher-order terms, a 2009 post by David Edgar Liebke.

Nearly every cell in the tutorial notebook is in Hy, and for that we owe a huge thanks to yardsale8 for his Hy IPython magics code. For those that love Python and Lisp equally, who are familiar with the ecosystems' tools, Hy offers a wonderful option for being highly productive with a language supporting Lisp- and Clojure-style macros. You can get your work done, have a great time doing it, and let that inner code artist out!

(In fact, I've started writing a macro for one of the examples in the tutorial, offering a more Lisp-like syntax for creating class methods. We'll see what Paul Tagliamonte has to say about it when it's done ... !)

If you want to check out the notebook code and run it locally, just do the following:

This will do the following:

Create a virtualenv using Python 3
Download all the dependencies, and then
Start up the notebook using a local IPython HTTP server

If you just want to read along, you're more than welcome to do that as well, thanks to the IPython NBViewer service. Here's the link: Scientific Computing with Hy: Linear Regressions.

One thing I couldn't get working was the community-provided code for generating tables of contents in IPython notebooks. If you have any expertise in this area, I'd love to get your feedback to see how I need to configure the custom ihy IPython profile for this tutorial.

Without that, I've opted for the manual approach and have provided a table of contents here:

Introduction

Preparation

If all goes well, you will enjoy that as much as I did :-)

More soon ...

Friday, November 21, 2014

ErlPort: Using Python from Erlang/LFE

Update 1: This post has a sequel here.

Update 2: There is a new LFE library that provides more idiomatic access to Python from LFE/Erlang by wrapping ErlPort and creating convenience functions. Lisp macros were, of course, involved in its making.

This is a short little blog post I've been wanting to get out there ever since I ran across the erlport project a few years ago. Erlang was built for fault-tolerance. It had a goal of unprecedented uptimes, and these have been achieved. It powers 40% of our world's telecommunications traffic. It's capable of supporting amazing levels of concurrency (remember the 2007 announcement about the performance of YAWS vs. Apache?).

With this knowledge in mind, a common mistake by folks new to Erlang is to think these performance characteristics will be applicable to their own particular domain. This has often resulted in failure, disappointment, and the unjust blaming of Erlang. If you want to process huge files, do lots of string manipulation, or crunch tons of numbers, Erlang's not your bag, baby. Try Python or Julia.

But then, you may be thinking: I like supervision trees. I have long-running processes that I want to be managed per the rules I establish. I want to run lots of jobs in parallel on my 64-core box. I want to run jobs in parallel over the network on 64 of my 64-core boxes. Python's the right tool for the jobs, but I wish I could manage them with Erlang.

(There are sooo many other options for the use cases above, many of them really excellent. But this post is about Erlang/LFE :-)).

Traditionally, if you want to run other languages with Erlang in a reliable way that doesn't bring your Erlang nodes down with badly behaved code, you use Ports. (more info is available in the Interoperability Guide). This is what JInterface builds upon (and, incidentally, allows for some pretty cool integration with Clojure). However, this still leaves a pretty significant burden for the Python or Ruby developer for any serious application needs (quick one-offs that only use one or two data types are not that big a deal).

erlport was created by Dmitry Vasiliev in 2009 in an effort to solve just this problem, making it easier to use of and integrate between Erlang and more common languages like Python and Ruby. The project is maintained, and in fact has just received a few updates. Below, we'll demonstrate some usage in LFE with Python 3.

If you want to follow along, there's a demo repo you can check out:
Change into the repo directory and set up your Python environment:
Next, switch over to the LFE directory, and fire up a REPL:
Note that this will first download the necessary dependencies and compile them (that's what the [snip] is eliding).

Now we're ready to take erlport for a quick trip down to the local:
And that's all there is to it :-)

Perhaps in a future post we can dive into the internals, showing you more of the glory that is erlport. Even better, we could look at more compelling example usage, approaching some of the functionality offered by such projects as Disco or Anaconda.

Tuesday, July 29, 2014

OSCON 2014 Theme Song - Andrew Sorensen's Live Coding Keynote

Andrew Sorensen live-coding at OSCON 2014

Keynote

Shortly after Andrew Sorensen began the performance segment of his keynote at OSCON 2014, the #oscon Twitter topic began erupting with posts about the live coding session. Comments, retweets, and additional links persisted for that day and the next. In short, Andrew was a hit :-)

My first encounter with Andrew's work was a few years ago when I was getting back into Lisp. I was playing with generative music with Overtone (and then, a bit later, experimenting with SuperCollider, Hy, and Twisted) and came across his piece A Study in Keith. You might want to take a break from reading this port and watch that now ...

When Andrew started up his presentation, I didn't immediately recognize him. In fact, when the code was displayed on the big screens, I assumed it was Clojure until I looked closely and saw he was using (define ...) and not (defun ...). This seemed very familiar, and then I remembered Impromptu, which ultimately lead to my discovery of Extempore (see more links below) and the realization that this is what Andrew was using to live code.

At the end of the performance a bunch of us jumped up and gave a standing ovation. (In fact, you can hear me yell out "YEAH" at the end of his presentation when he says "And there we go."). It was quite a show. It seemed that OSCON 2014 had been given a theme song. The next step was getting the source code ...

Andrew's gist (Dark Github Theme)

Sharing the Code

Andrew gave a presentation on Extempore in the ballroom right after the keynote. This too was fantastic and resulted in much tweeting.

Afterwards a bunch of us went up front and chatted with him, enthusing about his work, the recent presentation, the keynote, and his previously published pieces.

I had Andrew's ear for a moment, and asked him if he was interested in sharing his keynote source -- there had been several requests for it on Twitter (that also got retweeted and/or favourited). Without hesitation, he gave an enthusiastic "yes" and we were off and running for the lounge where we could sit down to create a gist (and grab a cappuccino!). The availability of the source was announced immediately, to the delight of many.

Setting Up Extempore

Sublime Text 3 connected to Extempore

Later that night in my hotel room, I had time to download and run Extempore ... and discovered that I couldn't actually play the keynote code, since there was some implicit setup I was missing. However, after some digging around on the docs site and the mail list, music was pouring forth from my laptop -- to my great joy :-D

To ensure anyone else who is not familiar with Extempore can also have this pleasure, I've put together the all the prerequisites and setup necessary in a forked gist, in multiple parts. I will go through those in this blog post. Also: all of my testing and live coding was done using Ben Swift's Extempore Sublime Text plugin.

The first step is getting all the dependencies. You'll want to start the downloads right away, since they are large (the sample files are compressed .wavs). While that's going on, you can install Extempore using Homebrew (this worked for me on Mac OS X with no additional tweaking/configuration necessary):

With Extempore running, let's do some setup. We're going to need to:

load some libraries (this takes a while for them to compile),
define some samples, and then
define some musical note aliases for convenience (and visual clarity).

The easiest way to use the files below is to clone the gist repo and load them up in Sublime Text, executing blocks of text by hi-lighting them, and then pressing ^x^x.

Here is the file for the fist two bullets mentioned above:

You will need to edit this file to point to the locations where your samples were downloaded. Also,

at the very end there are some lines of code you can execute to make sure that your samples are working.

Now let's define the note aliases. You can just hi-light the entire contents of this file in Sublime Text and then ^x^x:

At this point, we're ready to play!

Playing the Music

To get started on the music, open up the fourth file from the clone of the gist and ^x^x the root, scale, and left-hand-notes-* constants.

Here is the evolution of the left hand part:
Go ahead and start that first one playing (^x^x the definition as well as the call). Wait for a bit, and then execute the next one, etc. Once you've started playing the final left hand form, you can switch to the wider range of notes defined/updated at the bottom.

Next, you'll want to bring in the right hand ... then bassline ... then the higher fmsynth sparkles for the right hand:

Then you'll increase the energy with the drum section:

Finally, you'll bring it to the climax, and then start the gentle fade out:

A slightly modified code listing for the final keynote form is here:

Variation on a Theme

I have recorded a variation of Andrew's keynote based on the code above, for your listening pleasure :-) You can listen to it in your browser or download it.

This version plays part of the left hand piano an octave lower. There's a tiny bit of clipping in places, and I accidentally jazzed it up (and for too long!) with a hi-hat change in the middle. There are also some awkward transitions and volume oddities. However, these should be inspiration for you to make your own variation of the OSCON 2014 Theme Song :-)

The "script" used for the recording can found here.

Links of Note

Some of these were mentioned above, some haven't been. All relate to Extempore :-)

A Study in Keith.
The Disklavier Sessions - 2013
Andrew on Github: https://github.com/digego
Twitter: https://twitter.com/digego
Extempore:

Site: http://extempore.moso.com.au/
Docs: http://benswift.me/extempore-docs/index.html
Mail list: https://groups.google.com/forum/#!forum/extemporelang
Source code: https://github.com/digego/extempore

Monday, July 28, 2014

The Future of Programming - Adopting The Functional Paradigm?

Series Links

An Overview
Themes at OSCON 2014
Adopting the Functional Paradigm?
Retrospective on Paradigms
The Rise of Polyglotism
Preparing for the Future

Survivors' Breakfast

The previous post covered some thoughts on the future-looking programming themes present at OSCON 2014.

Following that wonderful conference, long-time Open Source advocate, Pythonista, and instructor Steve Holden, was kind enough to host his third annual "OSCON Survivors' Breakfast" with tens of esteemed attendees, speakers, and organizers enjoying great company and conversation, relaxing together after the flurry of conference activity, planning a leisurely day in Portland, and -- most immediately -- having some much-needed breakfast.

The view from the 23rd floor was quite an eyeful, and the conversation ranged across equally panoramic topics. Sitting with Alex Martelli, Anna Ravenscroft, and Katie Miller, the conversation inevitably turned to thoughts programmatical. One thread of the discussion was so compelling that it helped crystallize this series of blog posts. That was kicked off with Katie's question:

Why [have some large companies] not embraced functional programming to the extent that other large ones have?

Multiple points of discussion spawned from this, some of which still continue. The rest of this post explores these.

Large Companies?

What constitutes a large company? We settled on discussing Fortune 500 companies, which, by definition are:

U.S. Companies
Ranked by gross revenue (after adjustments for excise taxes).

Afterwards, I looked up the 2013 top 25 tech companies in the Fortune 500. I've listed them below; in parentheses is the Fortune 500 ranking. After the dash are the functional programming languages used on various company projects -- these are listed only if I have talked to someone who has worked on a project (or interviewed for a job that used the language), or if I have read an article by an employee who has stated that they use the listed language(s) [1].

Apple (6) - Swift, Clojure, Scala
AT&T (11) - Haskell
HP (15) - F#, Scala
Verizon Communications (16) - Scala
IBM (20) - Scala
Microsoft (35) - F#, F*
Comcast (46) - Scala
Amazon (49) - Haskell, Scala, Erlang
Dell (51) - Erlang, Scala
Intel (54) - Haskell, SML, PLT Scheme
Google (55) - Haskell [2]
Cisco (60) - Scala
Ingram Micro (76) - ?
Oracle (80) - Scala
Avnet (117) - ?
Tech Data (119) - ?
Emerson Electric (123) - ?
Xerox (131) - Scala
EMC (133) - Scala
Arrow Electronics (141) - ?
Century Link (150) - ?
Computer Sciences Corp. (176) - ?
eBay (196) - Scala
TI (218) - ?
Western Digital (222) - ?

The companies which have committed to projects guessed to be of significant business value written in FP languages include: Apple, HP, and eBay. Possibly also Oracle and Intel. So, a rough estimate of between 3 to 5 of the top 25 U.S. tech companies have made a significant investment in FP.

Why not Google?

The next two sections offer summaries of some views on this.

Ideal Use Case?

Is an FP language suitable for large organisations? Are smaller companies better served by them? During breakfast, It was postulated that dealing with such things as immutable data, handling I/O in pure FP languages, and creating/using higher order functions is easier for small startups due to the shorter amount of time required to hire or train a critical mass of skilled programmers.

It is certainly true that it will take larger organisations longer to train its personnel simply due to sheer numbers and, even with enough trainers, logistics. But this argument can be made for any corporate level of instruction; in my book, this cancels out on both sides and is not an argument unique to hard topics, even less, specifically pertinent to FP adoption.

Brain Fit?

I've heard this one a bit: "Some people just don't think in FP terms." They need loops and iteration, not higher order functions and recursion. Joel Spolsky makes reference to this in his article The Guerrilla Guide to Interviewing. In particular, he says that "For some reason most people seem to be born without the part of the brain that understands pointers." This has been applied to topics in FP as well as C.

To be fair, Joel's comment was probably made with a bit of lightness and not meant to be a statement on the nature of mind or a theory of cognition. The context of the article is a very practical one: hiring. When trying to identify whether a programmer would be an asset for your team, you're not living in the space of cognitive theory, rather you inhabit the realm of quick approximations, gut instincts, and fast, economical decisions.

Regardless, I find this perspective -- Type Physicalism [3] -- fairly objectionable. This is because I see it as a kind of intellectual "racism." Early social sciences utilized this form of reasoning to justify all sorts of discriminatory thinking in the name of "science", reinforcing a rigid mentality of "us" vs. "them." In my own experience, I've seen this sort of approach used to shutdown exploration, to enforce elitism, and dismiss ideas that threaten the authority of the status quo.

Rather than seeing the problem of comprehending FP as a physical limitation of the individual, I see instructional failure as the obstacle to overcome. If we start with the proposition that certain brains are deficient, we are essentially abandoning education. It is the responsibility of the instructor to engage creatively with each student's learning style. When adhering to the idea that certain brains are limited, one discards creative engagement; one doesn't even consider working with the students and their learning styles. This is a view that, however implicitly, can be used to shun diversity and dismiss potential.

I believe the essence of what Joel was shooting for can be approached in a much kinder fashion (adapted for an FP discussion):

None of us was born knowing GOTO statements, global state, mutable data, or for loops. There are many programmers alive, though, whose first contact with programming involved one or more of these. That's their "home town", as it were; their programmatic birth place. Having utilized -- as well as taught -- imperative, OOP, and functional styles of programming, I do not feel that one is intrinsically any harder than another. However, they are sometimes so vastly different from each other in style or syntax or semantics that once a student has solidified around the concepts of a particular paradigm, it can be a challenge retraining to work easily in another.

Why the Objections?

If both "ideal use case" and "brain fit" are given as arguments against adopting FP (or any other new paradigm) in large organisations, and neither are considered logically or philosophically valid, what's at the root of the resistance?

It is not uncommon for changes in an industry or field of study to be met with resistance. The bigger or more different the change from the status quo, very often is proportional to the amount of resistance. I suspect that this is really what we're seeing when companies take a stance against FP. There are very often valid business concerns: "we've made an investment in OOP" or "it will cost too much to train/hire/migrate to FP."

I would remind those company leaders, though, that new sources of revenue, that product innovation and changes in market adoption do not often come from maintaining or enforcing the current state. Instead, that is an identifying characteristic of companies whose relevance is fading.

Even if your company has market dominance or is a monopoly, there is still a good incentive for exploring alternative paradigms. At the very least, one can uncover inefficiencies and apply new knowledge to remove duplication of efforts, increase margins, etc.

Careers

As a manager, I have found that about half of the senior engineers up for promotion have very little to no interest in taking on different (new to them) programmatic paradigms. They consider current burdens sufficient (or too much) and would rather spend what little free time they have available to them in improving existing systems.

Senior engineers who have a more academic or research bent (or are easily bored) are much more likely to embrace this sort of change. Interestingly, senior engineers who have little to no competitive drive will more readily pick up something new if the need arises. This may be due to such things as not perceiving accumulated knowledge as territory to defend, for example.

Younger engineers with less experience (and less of an investment made in a particular school of thought) are much more willing to take on new challenges. I believe there are many reasons for this, one of which may include an interest in becoming more professionally competitive with their peers.

Junior or senior, I have found that programmers who are currently looking to find new employment are nearly invariably not only willing to take on the challenge of learning different paradigms, but are usually going about that proactively and engaging in self-study.

I want to work with programmers who can take on any problem space in any paradigm and find creative solutions, contributing as valued members of a team. This is certainly an ideal set of characteristics, but one that I have seen in the wilds of the workplace on multiple occasions. It has nothing to do with FP or OOP paradigms, but rather with the people themselves.

Even if a company is locked into well-established processes and views on programming, they may find it in their best interests to provide a more open-minded approach with their employees who would enjoy that. Their retention rates could very well increase dramatically.

Do We Need To?

Philosophy and hiring strategies aside, do we -- as programmers, software projects, or organizations that support programming -- need to take on the burden of learning or adopting functional programming? Quite possibly not.

If Google's plans around Go involve building a new operating system (in the spirit of 1970s C and UNIX), the systems programmers may find pure functions too cumbersome to work with. FP may be too burdensome a fit for that type of work.

If one is not tied to a historical analogy with UNIX, as Mozilla is not with Rust, doing something like creating a new browser engine (or running a remote services company) may be a good fit for FP, especially if one has data showing reduced error counts when using type systems.

As we shall see illustrated in the next post, the usual advice continues to apply: the decision of which paradigm to employ for any given project should be dictated by the best fit and not ideological inflexibility. The bearing this has on programming is innovation: it is the early adopters who have the best chance of leading us into the future.

Up next: Retrospective on Programming Paradigms
Previously: Themes at OSCON 2014

Footnotes

[1] If anyone has additional information as to which FP languages are used by these top 25 companies, please let me know, and I will include that information. Bonus points for knowing of business-critical applications.

[2] Google Switzerland are using Haskell.

[3] Type Physicality is a form of reductive materialism, also known as the Mind-Brain Identity Theory that does not allow for mental states to be realized in organisms or computational systems that do not have a brain. See "Criticisms of Type Physicality" at http://en.wikipedia.org/wiki/Identity_theory_of_mind#Multiple_realizability.

Sunday, July 27, 2014

The Future of Programming - Themes at OSCON 2014

Series Links

An Overview
Themes at OSCON 2014
Adopting the Functional Paradigm?
Retrospective on Paradigms
The Rise of Polyglotism
Preparing for the Future

A Qualitative OSCON Debrief

As you might have noticed from the OSCON Twitter-storm this year, the conference was a blast. Even if you weren't physically present, given the 17 tracks, you can imagine that the presentations -- and subsequent conversations -- were deeply varied.

This was the second OSCON I'd attended; the first was was in 2008 as a guest of Michael Bernstein, a friend who was speaking there. OSCON 2008 was a zoo - I'm not sure of the actual body count, but I've heard that attendees + vendors + miscellaneous topped 12,000 people over the course of the week (I would love to hear if someone has hard data on that -- googling didn't reveal much). OSCON 2008 was dominated by Big Data, Hadoop, endless buzzword bingo, and business posturing by all sorts. The most interesting bits of that conference were the outlines that formed around the conversations people weren't having. In fact, over the following 6 months, that's what I spent my spare time pondering: what people didn't say at OSCON.

This year's conference seemed like a completely different animal. It felt like easily 1/2 to 1/3rd the number of attendees in 2008. Where that one had all the anonymizing feel of rush-hour in a major metropolitan hub, OSCON 2014 had a distinctly small-town vibe to it -- I was completely charmed. Conversations (overheard as well as participated in) were not littered with examples from the latest bizspeak, but rather focused on essence. The interactions were not continually distracted, but rather steadily focused, allowing people to form, express, and dispute complete thoughts with their peers.

Conversations

So what were people talking about this year? Here are some of the topics I heard covered during lunches, in hallways, and at podiums; at pubs, in restaurants and at parks [1]:

What communities are thriving?
Which [projects, organisations, companies, etc.] are treating their people right?
What successful processes are being followed at [project, organisation, etc.]?
Who is hiring and why should someone want to work there?
Where can I go to learn X? Who is teaching X? Who shares the most about X?
Which [projects, organisations] support X?
Why don't more [people, projects, organisations] care about [possible future X]?
Why don't more [people, projects, organisations] spend more time investigating the history of X for "lessons learned"?
There was so much more X in computing during the 60s and 70s -- what happened? [2]
Why are we reinventing X?
When is X going to be invented, and who's going to do it?
Everything is changing! I can't keep up anymore.
I want to keep up, but how?
Why can't we stop making so many X?
Nobody cares about Y anymore; we're all doing X now.
Full stack developers!
Haskell!
Fault-tolerant systems!

After lots of reflection, here's how I classified most of the conversations I heard:

Developing communities,
Developing careers and/or personal/professional qualities, and
Developing software,

along lines such as:

Effective maintenance, maturity, and health,
Focusing on the "art", eventual mastery, and investments of time,
Tempering bare pragmatism with something resembling science or academic excellence,
Learning the new to bolster the old,
Inspiring innovation from a place of contemplation and analysis,
Mining the past for great ideas, and
Figuring out how to better share and spread the adoption of good ideas.

Themes

Generalized to such a degree, this could have been pretty much any congregation of interested, engaged minds since the dawn of civilization. So what does it look like if we don't normalize quite so much? Weighing these with what may well be my own bias (and the bias of like-minded peers), I submit to your review these themes:

A very strong interest in programming (thinking and creating) vs. integration (assessing and consuming).
An express desire to become better at abstraction (higher-order functions, composition, and types) to better deal with growing systems complexities.
An interest in building even more complicated systems.
A fear of reimplementing past mistakes or of letting dust gather on past intellectual achievements.

As you might have guessed, these number very highly among the reasons why the conference was such an unexpected pleasure for me. But it should also not come as a surprise that these themes are present:

We have had several years of companies such as Google and Amazon (AWS) building and deploying some of the most sophisticated examples of logic-made-manifest in human history. This has created perceived value in our industry and many wish to emulate it. Similarly, we have single purpose distributed systems being purchased for nearly 20 billion USD -- a different kind of complexity, with a different kind of perceived reward.
In the 70s and 80s, OOP adoption brought with it the ability to create large software systems in ways that people had not dared dream or were impractical to realize. Today's growing adoption of the Functional paradigm is giving early signs of allowing us to better integrate complex systems with more predictability and fewer errors.
Case studies of improvements in productivity or the capacity to handle highly complex or previously intractable problems with better abstractions, has ignited the passions of many. Not wanting to limit their scope of knowledge or sources of inspiration, people are not simply limiting themselves to the exploration of such things as Category Theory -- they are opening the vaults of computer science with such projects as Papers We Love.

There's a brave new world in the making. It's a world for programmers and thinkers, for philosophers and makers. There's a lot to learn, but it's really not so different from older worlds: the same passions drive us, the same idealism burns brightly. And it's nice to see that these themes arise not only in small, highly specialized venues such as university doctoral programs and StrangeLoop (or LambdaJam), but also in larger intersections of the industry like OSCON (or more general-audience ones like Meetups).

Up next: Adopting the Functional Paradigm?
Previously: An Overview

Footnotes

[1] It goes without saying that any one attendee couldn't possibly be exposed to enough conversations to form a perfectly accurate sense of the total distribution of conversation topics. No claim to the contrary is being made here :-)

[2] I strongly adhere to the multifaceted hypothesis proposed by Bret Victor

here in the section titled "Why did all these ideas happen during this particular time period?"

The Future of Programming - An Overview

Art by Philip Straub

There's a new series of blog posts coming, inspired by on-going conversations with peers, continuous inspection of the development landscape, habitual navel-gazing, and participation at the catalytic OSCON 2014. As you might have inferred, these will be on the topic of "The Future of Programming."

Not to be confused with Bret Victor's excellent talk last year at DBX, these posts will be less about individual technologies or developer user experience, and more about historic trends and viewing the present (and near future) through such a lense.

In this mini-series, the aim is to present posts on following topics:

An Overview
Themes at OSCON 2014
Adopting the Functional Paradigm?
Retrospective on Paradigms
The Rise of Polyglotism
Preparing for the Future

I did a similar set of posts, conceived in late 2008 and published in 2009 on the future of cloud computing entitled After the Cloud. It was a very successful series and the cloud industry seems to be heading towards some of the predictions made in it -- ZeroVM and Docker are an incremental step towards the future of distributed processes/functions outlined in To Atomic Computation and Beyond.

In that post, though, are two quotes from industry greats. These provide an excellent context for this series as well, hinting at an overriding theme:

Alan Kay, 1998: A crucial key to growing large systems is effective communications between components.
Joe Armstrong, 2004: To effectively model and solve problems in a distributed manner, we need concurrency... this is made easier when we isolate processes and do not share data.

In the decade since these statements were made, we have seen individuals, projects, and companies take that vision to heart -- and succeeding as a result. But as an industry, we continue to struggle with the definition of our art; we still are tormented by change -- both from within and externally -- and do not seem to adapt to it well.

These posts will peer into such places ... in the hope that such inspection might guide us better through the tangled forest of our present into the unimagined forest of our future.

Up next: Themes at OSCON 2014 ...