Functional Programming in Scala : Review

Sun 01 October 2017

A few months ago I completed the Coursera specialism "Finctional Programming in Sala". I found the experience extremely rewarding and learned a great deal so I thought I'd write up a quick review of the content; which did vary in quality and audience.

The specialisation contains 4 courses and a capstone project. They were a combination of two previous Coursera courses that ran in 2015 but are no longer available -- "Functional Programming Principles in Scala" and "Principles of Reactive Programming" -- with two entirely new courses and the capstone project. The 2015 courses were very highly recommended so I felt confident this would be a high quality specialism.

In general, these courses are on the more accademic side of the MOOC scale, feeling similar to courses you might take as a undergraduate CS student. They were quite different to the sort of training you find on Code School, Pluralsite or the like. This was what I was looking for. I had already read a lot about purely functional languages like Haskell and some of the theory behind FP. I wanted to discover how these techniques might be applied to real-world software engineering. In this regard, the courses more than met my aims. There are definitely some areas where they could improve.

Functional Programming Principles in Scala

The first course introduces Scala and the principles of functional programming. It gave a solid foundation in both presented by the creator of Scala, Martin Odersky. His teaching style is excellent and it was a pleasure to learn about Scala from the ultimate expert. The material was taken from the 2015 courses but has not aged at all.

One of the strengths was how we were introduced to practical topics like unit testing and property testing (QuickCheck) with more theoretical principles like avoiding mutation of state.

This course would be a great start for those wanting to learn Scala or FP and stands on its own. I don't hesitate to give it 5 stars.

Functional Program Design in Scala

The second course aims to follow on and expand the material the previous course with more advanced topics. Unfortunately the results were mixed. It didn't have the sense of continuity of the first course and some of the subject matter was beginning to become outdated.

The course takes material from the the 2015 FP principles course and selected topics from the 2015 Reactive Programming course. The FP principles leactures were taken by Martin Odersky and were the same high quality as before and the Reactive leactures were given by Erik Meijer, the author of the Rx framework.

Eric Meijer gives a characteristically energetic performance as he presents the theory of separating efffects from pure functions, particularly for modelling failure and asynchronous tasks.

I am a fan of Eric's online leactures and his inspiring advocacy of functional programming. However, stripped of the context from the rest of the Reactive course, these leactures would have left many people confused. His style is so completely different to Martin's that it was obvious the two halves had been planned in isolation. Also, library support for managing effects have moved a long way since these leactures were made. It didn't cover Rx or Akka from the 2015 course, presumably to keep the course short; dispite them being recommended later in the capstone project.

That said, I definitely learned a lot and the assignments remained fun and challenging. I would give it 2 stars.

Parallel programming

This course focussed on task and data parallelism on a single machine, i.e. without distributing the work. It covered parallel data structures and performance optimisation.

I found this course the most challenging of the set. Parallel algorithms on functional data structures were covered in great detail and passing the assignments took a lot of second guessing what was required.

A slight criticism I had was that, after 2 courses of mainly pure functional programming, this course expected you to use mutation and loops at the lowest levels of your code in order to achieve efficiency. This was probably a good lesson to learn but it was jarring to have to slip back into the imperitive style and the instructor could have done more to guide us in that direction and justify why it was necessary.

Well worth 4 stars.

Big Data Analysis with Scala and Spark

I was looking forward to this course as the subject is close to my day job as a Data Engineer. Unfortuantely I had to wait several months before the course became available and I could easily have missed that it had finally started.

However, when it came, it was worth the wait. The course was very well presented by Dr Heather Miller and the assignments were well designed, particularly since big data is difficult to simulate on a laptop. I found the subject matter relatively easy going as someone very familiar with relational databases but it gave me a good grounding in how Spark works, including more advanced topics like Dataframes and SQL.

What was missing was any chance to gain experience deploying a Spark cluster or getting a job to run on an existing cluster. I understand it is difficult to have assignments on this topic since not everyone will have access to multiple machines, however there could have been a module without an assignment which demonstrated setting up a stand-alone cluster on AWS.

I got over this ommision by using Spark in the Capstone project where I created a stand-alone cluster in Amazon and on my work's shared Filesystem cluster. This took some time to get right but was well worth it.

Another 4 star course but for very different reasons than the Parallel programming course. It was less challanging but well taught and covered a good range of topics.

Functional Programming in Scala Capstone

This was a satisfying finale to the specialisation which did a great job of combining skills we had learned throughout the course. I hope to post a complete report in another blog post but if you are interested take a look at my git repository and the end result.

Category: Functional Programming Tagged: scala

comments


What is a dataframe

Sun 24 April 2016

Back when Big Data was a genuinely new concept and Data Scientist's weren't rock stars, the SciPy ecosystem was beginning to mature. It felt as though Python deserved to dominate the data analysis world but uptake was still slow outside the realm of the physical sciences and engineering. Something seemed …

Category: Data Science Tagged: pandas python

comments

Read More

Blogging with IPython-notebook

Thu 14 March 2013

I have a new blogging engine to play with. This time I'm going with Pelican which generates static HTML, making it very easy to redeploy. So far I've been stunned at how quickly I can pull content into this blogging system. In particular it was very easy to convert IPython …

Category: Meta Tagged: ipython

comments

Read More

Example of using esgf-pyclient to plot via OPeNDAP

Sun 24 February 2013

First import the required modules and select the CEDA ESGF-index node's search service.

from pyesgf.search import SearchConnection
from netCDF4 import Dataset
from mpl_toolkits.basemap import Basemap

CEDA_SERVICE = 'http://esgf-index1.ceda.ac.uk/esg-search/search'

Let's ask the question "How many datasets from the HadCM3 decadal2000 simulations do we have …

Category: Data Science Tagged: esgf opendap

comments

Read More

Analysis of replicated CMIP5 datasets

Wed 06 February 2013

This notebook describes the analysis of a snapshot of all replicas in the CMIP5 archive on February 6th 2013. A list of all dataset versions at each of the 3 replicating data centres is read, analysed and compared with the total number of datasets listed in the CMIP5 archive.

This …

Category: Data Science Tagged: esgf cmip5

comments

Read More

Bringing Git to data archival

Tue 15 January 2013

I am increasingly excited about distributed version control and how it enables easy collaboration between software developers without technical and social barriers such as synchronisation of work and maintenance of control.</p>

The obvious question is how the DVC systems can be applied to scientific collaboration and my particular specialism …

Category: Data Science Tagged: esgf cmip5

comments

Read More
Page 1 of 1