Wednesday, December 9, 2015

How to structure code for individual research projects

I have been chatting a bit about coding recently with my supervisor (Richard Everitt).  We have a new student joining the group soon, so I think Richard is thinking a bit about what skills they may need to learn, as well as what the rest of us in the group might benefit from.  I have found it surprising how much my approach to coding continues to evolve, particularly the way that I organise my code.

I started coding in R during my master's where we had a lot of relatively short assignments that involved some computer programming.  For this kind of task it was mostly possible to write a single script with all the code needed for a given assignment in it.  That is quite easy to organise.

When I got into longer research projects that involved a significant amount of coding, I found that this approach was no longer very effective.  It becomes difficult to keep track of which files are recent and being actively used and which are outdated and redundant.  It also becomes difficult to find older versions of your code (e.g. when your most recent code has stopped working for some unknown reason and you want to find what you had last week when it was working).

My first solution to this problem was simply to number my scripts, starting at 1 and working upwards (e.g. r1.r, r2.r r3.r etc.)  If you put all the scripts in the same folder and give the folder a project name then it is still relatively easy to find your code, and you have a very easy way of switching between versions provided that you increment your file-names every time you make a significant change.

However this approach is still quite restrictive in the sense that it only really works well when you can put all the code for a given project in one script / file.

After a few iterative improvements, I am currently quite happy with the following approach to organising my code:

  • Create a new folder for a new project
  • Create a subfolder for code development that contains numbered scripts.
  • Develop code in numbered scripts and then when you have an end-product save it in the parent folder and give it a name that describes what it does.
  • Put any functions that are used in multiple scripts in a functions file so that you can re-use the same version of that function in multiple scripts.  In general, the less redundant / duplicated code in your active scripts the better.
  • Create an output sub-folder to save the results of running long scripts.  After you have run a long script and generated an output file, rename it by prefixing it with a date so that it doesn't get accidentally over-written in future.
  • Use a version control system like git with BitBucket.  Version control allows you to easily switch between different versions of your code / project, and to keep a well-structured record of the changes you have made.
It has taken me a long time to become convinced that version control systems like git are worth the effort for individual research projects, but I am a convert now.  My main objection was that many of the things I wanted to do were easier to do using Dropbox (e.g. sharing code with other people, and being able to access my work on multiple computers).  You can even find old versions of your files through Dropbox.  However, in the end I found that going from Dropbox to git / Bitbucket was like going from Word to LaTeX.  Once I got the hang of it, suddenly it seemed a lot quicker and a lot less hassle in the long-term.  RStudio has nice graphical interface to git, which I now use every day for committing and pushing my code.

Version control is also useful if you ever want to work on software development in a group, or to make your code publicly available.

Wednesday, November 11, 2015

A (selective) history of Neuroscience in 10 minutes!

​I recently gave a talk in Oxford to the Computational Statistics & Machine Learning group (Thanks again to Jeremy Heng for the invitation to speak). I was given a 1 hour slot, which is longer than the normal 20-30 minutes that PhD students are given to present their work. Rather than drawing out my results over a longer period (which I would guess might have been quite boring for the audience) I decided to make my introduction longer to talk more generally about the field(s) I work in - Neuroscience and inference in differential equation models. I really enjoyed preparing this material and think it is something that PhD students should be given the opportunity to do more often. Here is what I chose to talk about.

Neuroscience and Artificial Intelligence: 1940-1975, 1990's & recent developments

This could easily take up the subject of a book (perhaps in 3 volumes!) What I said was highly selective (not least because of the gaps in the timeline) and heavily biased by the things I've seen and found interesting.  Here is a summary of what I said.

Research in the years between 1940 and 1975 laid the foundations for modern Neuroscience and Artificial Intelligence, and many of the key ideas in the field have their origins in this period.  It was also a time when there was a lot of crossover between experimental work and computational / mathematical work:

For example, in the 1940's McCulloch & Pitts developed what we now know as artificial neural networks.  For McCulloch and Pitts, this was a model of how the brain works - they saw neurons essentially as logic gates that were either on or off, and the brain as a series of logic gates that simply calculate weighted sums of their input.

In the 1950's, Hodgkin & Huxley developed a more realistic single-neuron model that linked ionic currents to voltage inside the cell (what we know now as electrophysiology), for which they won the Nobel Prize in 1963.  In my opinion it remains as one of the finest achievements in computational science because of its combination of a nonlinear differential equation model with experimental results, and because it has fundamentally changed the way people think about how neurons process information.  For me, it makes things more mysterious in a way - how does all this electrophysiology end up generating a conscious self that is able to make (reasonably) effective decisions!?  I think that is something that scientists don't have a very good answer to, yet.

Leaving aside those potentially unanswerable philosophical questions, there were also a whole host of practical questions that arose as a result of Hodgkin & Huxley's work.  One question that is quite important for the work that I do is how to relate electrophysiology with the macroscopic obsverations that are obtained from EEG and MRI recordings.  This is challenging because the brain contains around 10^10 neurons, and the Hodgkin-Huxley equations only describe the activity of a single neuron.  Coupling together 10^10 neurons in a single model is (currently) computationally intractable, so in the early 1970's Wilson & Cowan developed what is called a mean-field model which lumps together populations of neurons with similar properties (e.g. excitatory / inhibitory) and describes how the mean activity of those populations evolve over time.  These models are useful because they still give some insight into electrophysiology, but they can also be related to observations.

In the 1990's research became more specialised into sub-fields of neuroscience and artificial intelligence:

Here I am using Statistical Neuroscience mainly to refer to the community of people who do statistical analysis of brain imaging data.  And with Artifical Intelligence I am mainly thinking of the development of artificial neural networks to solve things like classification problems.

Recently, more overlap has developed between the different fields (and there is also a lot more total research activity):

My PhD work is mainly in the intersection of Computational Neuroscience and Statistical Neuroscience.  There is also interesting work going on in the intersection of Computational Neuroscience and Machine Learning (e.g. Demis Hassabis and Google DeepMind).  At UCL, the Gatsby Computational Neuroscience Unit recently co-located with the new Sainsbury Wellcome Centre for Neural Circuits, which is led by Nobel Prize winner John O'Keefe.  It will be exciting to see what happens in the intersection of the 3 circles in the future!

One slightly depressing development from the viewpoint of statisticians is that state-of-the-art neuroscience models tend to be quite computationally expensive and hence are very challenging to do inference for.  I think statisticians should be more vocal in arguing that it is better to have a simpler model that you can estimate parameters for, than a more complex model that you can't estimate the parameters of.

Monday, October 5, 2015

Box's advice for statistical scientists

I recently discovered a new word - mathematistry, coined in 1976 by the statistician / data scientist George Box.  If I had to guess what it meant, I would think it might be something quite cool like a cross between mathematics and wizardry.  In fact, the way George Box defined it, the word is closer to a cross between mathematics and sophistry.

Box's full definition for mathematistry is as follows,

'.. the development of theory for theory's sake, which, since it seldom touches down with practice, has a tendency to redefine the problem rather than solve it.  Typically, there has once been a statistical problem with scientific relevance but this has long been lost sight of.'

According to Box, mathematistry also has a flip side which he calls 'cookbookery'.  This is

'the tendency to force all problems into the molds of one or two routine techniques, insufficient thoughts being given to the real objectives of the investigation or to the relavance of the assumptions implied by the imposed methods.'

There is so much useful advice contained in those two concepts!  Personally I find it quite easy to forget to 'touch down with practice'.  How many times have I spent a long time coming up with a solution to something, only to find that the attempt to put it into practice reveals an obvious flaw that could have been identified quite early on?  And I also find it much easier to focus on one way of solving a problem, rather than considering several different options, and investing time in evaluating which is the best approach.

I have recently been having a look at a kaggle challenge called Springleaf.  I am not particularly interested in trying to get to the top of the leaderboard, since the difference between the top 1000 entries seems pretty marginal to me, but I am interested in finding out what makes a big difference to predictive accuracy.

One of the things I have learnt is that being able to make use of all the variables in the data-set makes a big difference.  Regression trees are very easy to apply to multivariate data-sets (e.g. 100s or 1000s of variables), and I have found these to be quite effective for prediction.  There are also relatively straight-forward ways of training them to avoid over-fitting.

The other thing I have discovered is boosting.  This still seems a bit magical to me.  The basic idea is that by using an ensemble of different models and averaging predictions over them, you get a better prediction.  To me, it seems like there must be a single model that would give the same predictions as the ensemble.  In any case, I have found that boosting makes a surprisingly big difference to predictive accuracy, compared to approaches that try to fit a single model, at least for regression trees.

In The Elements of Statistical Learning by Hastie, Tibshirani & Friedman there is a performance comparison of different classification methods.  They include boosted trees, bagged neural networks and random forests, which (I think) are all different ways of creating an ensemble of models.  The one method that didn't create an ensemble was Bayesian neural nets, which uses MCMC to explore the parameter space.  The approach that came out on top was Bayesian neural networks but the ensembling methods were not too far behind.  This suggests to me that averaging over models and averaging over different parameters values within a sufficiently flexible model have a similar effect.  However it is something I would like to understand in more depth.

I would highly recommend The Elements of Statistical Learning as a great example of statistical science, that avoids both mathematistry and cookbookery.  The drawback of books is that they get out of date quite quickly, but this one is relatively recent and definitely still relevant.  The only thing that is worth being careful with is references to software packages, as these seem to evolving on a faster time-scale than the underlying concepts.

Monday, September 7, 2015

Late Summer Musings

Yesterday I had a go at explaining what I do to someone who doesn't know much about my research area.  I always find it interesting to discover what people think statisticians do.  This person had been given the job of analysing the results of a survey that had been done in their organisation.  This is something that requires some statistical training to do well.  However I found that when it came to explaining what I am doing for my PhD, it was quite difficult to make a connection between that type of work and what I do.  It made me wonder whether the statistics community is more like a sprawling family that have some common ancestry but have ended up doing quite different things?  Or are we a group of people from disparate backgrounds that have been drawn together by the need to solve a set of common problems?

I think both of these are potentially useful metaphors for understanding the statistics community.  On the one hand there has been branching between people who are practitioners of statistics and people who study mathematical statistics.  But at the same time I would argue that all statisticians are interested in being able to draw reliable conclusions from experimental and/or observational data.

Earlier in the summer I went on another APTS (Academy for PhD Training in Statistics) week in Warwick.  The courses were Applied Stochastic Processes (given by Stephen Connor) and Computer Intensive Statistics (given by Adam Johansen).  So what were these courses about, and how is the material in them helpful for drawing reliable conclusions from data?

Applied Stochastic Processes (ASP)

Stochastic Processes is a branch of probability theory.  Some people say that people are either statisticians or probabilists.  Although there are people who study probability theory but are not statisticians, I think that all statisticians must have a little bit of a probabilist inside them!  University courses that I have seen in maths and statistics teach probability distributions before they teach statistical inference.  Understanding what a p-value is (one of the most basic tools used in statistical inference) requires you to know something about probability distributions.

More generally there are two main reasons (I can think of) why statisticians need probability theory.  One is that they study phenomena that are intrinsically random, and are therefore best described by probability distributions.  The other is that they use numerical methods that are intrinsically random, and so understanding the behaviour (e.g. convergence) of these numerical methods requires probability theory.

ASP was primarily concerned with the second area, and in particular with the convergence of MCMC algorithms, although quite a few of the ideas along the way were relevant to the first area.

Reliable numerical methods are pretty essential for drawing conclusions from data, and with MCMC it can be pretty challenging to ensure that the method is reliable.  In an ideal world, probability theory would be able to tell you whether your numerical method was reliable.  However, in the same way that mathematical theory cannot currently account for all physical phenomena, probability theory cannot currently account for all probabilistic numerical methods.  Lots of the numerical methods that I use in my work are not particularly well understood from a theoretical point of view.  However the mathematical theory that exists for special cases can provide some useful guidance for more general cases.

I would like to be more specific about how the theory is useful beyond the special cases, but it is something I am still in the process of trying to understand better.

Computer Intensive Statistics (CIS)

A lot of recent advances in statistics have been catalysed by increased computer power.  Methods that would have been impractical 20-30 years are now routine.  This helps to continually extend the scope of research questions that can be answered, both in applied statistics and in computational / mathematical statistics.

The easiest way to ensure that a numerical method is reliable (and hence enables you to draw reliable conclusions from the data) is to use one that is not computer-intensive.  For a large part of the statistical community this approach works well, so many statisticians find that they never need to use computer-intensive methods.

Returning to our ideal world again for a moment, we would like statistical theory to be able to give us formulas that return the quantities we are interested in as a function of the data.  However in practice, such results only exist for simple models.  And if the phenomena you are interested in is not well described by a simple model, it is unlikely you will be able to draw reliable conclusions without a computer-intensive method.

The main areas covered in the course were the bootstrap and MCMC.  One of the things I learnt about for the first time was the Swensen-Wang algorithm, which is used in statistical physics.  It does seem a bit like a magic trick to me, the way you can make dependent variables conditionally independent through the addition of auxiliary variables.  Worth checking out if like a bit of mathematical aerobics!

Sunday, June 14, 2015

My Perspective on New Perspectives in MCMC

I recently got back from a summer school in Valladolid on New Perspectives in MCMC.

Here are a few thoughts on the lectures that were given.

MCMC-based integrators for SDEs (Nawaf Bou-Rabee)

Nawaf's interests are mainly in numerical solvers for Stochastic Differential Equations (SDEs) that describe diffusion and drift processes.  The first part of the course was about how standard MCMC algorithms can be seen as discretisations of SDEs.  The aim of MCMC algorithms is to sample from a probability distribution.  A valid MCMC algorithm can be thought of as an SDE discretisation that has the probability distribution of interest as its stationary distribution.

There are also many ways of discretising an SDE that do not correspond to existing MCMC algorithms, and that is what the second part of the course was about.  It made me wonder whether there is scope for designing new MCMC methods from these (or other) SDE numerical solvers?

Exact approximations of MCMC algorithms (Christophe Andrieu)

These lectures were about the pseudo-marginal approach, which I have discussed quite a bit in previous blog posts.  The key idea is to find ways of approximating the likelihood or the acceptance probability in such a way that resulting algorithm is still a valid MCMC algorithm, but is computationally much cheaper than the simple ideal algorithm (which might, for example, contain an intractable marginal likelihood term in the acceptance ratio).

The thing that I understood better than before was an idea called Rejuvenation.  (From what I understand this is the idea used in the particle Gibbs variant of particle MCMC).  Without Rejuvenation, the denominator in the acceptance ratio can be very large if the approximation of the likelihood is by chance very poor.  This means that the noisy acceptance probability can be much smaller than the acceptance probability of the ideal algorithm, and therefore the noisy algorithm can have a tendency to get stuck.  Rejuvenation is a way of revising the noisy estimates in the algorithm from one iteration to the next in a way that preserves the validity of the algorithm and makes it less likely to get stuck.

MCMC in High Dimensions (Andrew Stuart)

These lectures had a very clear message that was repeated through-out, which was that thinking in infinite dimensions results in good methodology for high dimensional problems, particularly high dimensional inverse problems and inference for diffusions.

From these lectures I got a much better idea of what it means for measures to be singular with respect to each other.  Understanding these concepts is helpful for designing valid proposals for MCMC algorithms in infinite-dimensional spaces, and leads naturally to good proposals in high-dimensional problems.  Coincidentally the concepts of singularity and equivalence are also very important in Reversible Jump methodology.  For me this illustrates how potent some mathematical ideas are for solving a wide range of problems and for drawing connections between seemingly disparate ideas.

Course website -

Monday, June 1, 2015

Making intractable problems tractable

I recently attended an i-like / SuSTaIn workshop in Bristol, organised by Christophe Andrieu.  I-like is short for Intractable Likelihoods, although the scope of the conference was a bit more general than that.  Most of the work that was presented was about making intractable problems tractable.  Often the most conceptually simple way of solving a problem is computationally intractable, i.e., it would require months / years / aeons, for current computers to solve.  Mathematics can be used to re-formulate or approximate problems in such a way that they become tractable, for example by identifying parts of the computation that do not make much difference to the final answer.  Or by identifying unnecessary computations that duplicate (or perhaps approximately duplicate) other computations.

Here is a selection of application areas that were talked about.

Foot & Mouth epidemics - intractable because of the need to model the interactions between thousands of different farms and model the stochastic dynamics of disease transmission.

High risk behaviour in HIV positive youth - challenging because of the discrepancies between reported and actual behaviour.  Intractable because there are so many different permutations of actual behaviour that underly / could explain reported behaviour.

Genetic recombination - given DNA sequences from a (current-day) population, scientists are interested in inferring how the DNA sequences have evolved / been modified over time.  Recombination is one mechanism for DNA sequence modification.  Inference in this setting is challenging because we cannot observe the past (i.e. ancestral DNA sequences).  There is a connection with the HIV example above: the problem is intractable because there are so many different permutations of past DNA sequences that could have generated the DNA sequences of the current population.

Testing the predictions of general relativity using massive detectors - this reminded me of the little that I know about the Large Hadron Collider experiments  However, in this case the detectors pick up natural signals coming from space, and the physics is completely different.  The general relativity detector problem is intractable because of the huge volume of data generated by the detector.  And also because generating predictions using the general relativity equations is computationally expensive.  (As an aside, it is quite awe-insipring to think about the forces at work on such a large scale in our universe.)

Predicting driving times - the simplest way of predicting driving times is just to use speed limits and assume that there is little of no traffic.  Anyone who has ever tried to get anywhere near a population center in the UK on a Friday evening will know that this does not produce very accurate driving time predictions.  Companies like Microsoft, Google, and Apple collect GPS data that is essentially a set of positions associated with time-stamps.  From this it is straight-forward to work out how long people's past journeys have taken.  However producing predictions of an arbitrary future journey is challenging because many future journeys will not follow an identical route to past journeys.

There were also a couple of presentations that focused more on methodology that could be applied in a wide range of different settings.

Exact-approximate MCMC - this is something I have talked about in previous blog posts and also goes under the name of the pseudo-marginal approach.  The basic question that motivates this work is, 'When is it ok to approximate the Metropolis-Hastings acceptance probability?'  There are lots of settings where it is possible to obtain a tractable approximation to the acceptance probability (State State Models, Markov Random Fields, Spatial Processes, Trans-dimensional problems).  Being able to use the tractable approximation makes it possible to do a lot more statistical inference for problems of this type.

Deep neural networks - this is something that is currently very popular in machine-learning / computer science / artificial intelligence.  There the interest is in training computers to produce predictions from data.  This is sometimes referred to as black-box prediction, because the way in which the computer produces the prediction does not necessarily give you any insight into the underlying process thats connects the x variables (determinants) with the y variable(s) (response).  From my perspective it does seem a lot like regression with more flexibility and automation in the transformations that are applied to the data.  Increasing flexibility means (at least to me) adding in more parameters.  This makes it more challenging to ensure that the algorithms are computationally tractable.

For researcher names and presentation / poster abstracts, see

Tuesday, May 5, 2015

Twelve twitter feeds worth following?

Having spent a long time doing my best to ignore Twitter, I recently decided to take the plunge and start following some feeds.  Below is a list of the ones I have found interesting, so far (in no particular order).

Tim Harford - this is the one that got me on Twitter in the first place.  He mentioned his Twitter feed in one of his books.  It's good for getting an evidence-based understanding of the news.

David Spiegelhalter - great statistician.  Does a lot of good work communicating statistics / risk to everyone.  Pretty funny as well.  Look up micromorts to get a flavour of what he does,

Tom Whipple - science journalist at The Times.  Also pretty funny.

Evan Davis - has down quite a wide range of news-related things for the BBC (Today programme, Newsnight).  Some similarities with Tim Harford in that he is an economist who understands numbers and how to interpret them.  I quite enjoyed his General Election Leader Interviews recently and his 'Mind the Gap' documentary last year.

Tim Montgomerie - writes for The Times.  If anyone can persuade you that it is possible to care for the poor and be conservative (politically), it's probably him.

Tim Keller - Christian author and pastor in New York.  Has written a lot of persuasive books contrasting the Christian worldview with the secular worldview.  I have got more out of his books than his tweets, but he does have quite a pithy way of summing things up which works well on Twitter.

Stat Fact - a good feed for little statistics tips and links.  Quite wide-ranging, not just about the theory of statistics.

Oxford Mindfulness - this feed is for a research group that has developed mindfulness practices and measured their effects in a scientific way.  The feed sometimes has links to radio or TV that is related to their work.

Pizza Artisan Oxford - another Oxford-based feed, but sadly for something that can only be enjoyed in Oxford...

Nature News&Comment - good for getting an idea of what is going on in the top tier of science.

edX - I have enjoyed doing some courses on edX and coursera.  It's great how accessible these courses are, and how many of them there are.  I have looked at Ancient Greek Heros, Statistical Analysis of fMRI data, and Learning how to Learn.

Gresham College - lots of public lectures from academics on an eclectic range of topics.

Tuesday, April 14, 2015

Easter Miscellany

Here are a few loosely connected things that I have been working on or thinking about over the past month or two...

Gaussian Processes

James Robert Lloyd, Zoubin Gharamani and others in his group have developed a really neat tool called the Automatic Statistician.  The basic idea is that you feed the Automatic Statistician some data and it estimates a model that fits the data well.  And it produces an automatically generated report describing what it has done.  The thing that I find most interesting about it is how flexible the model is.  The Automatic Statistician can identify linear trends, periodicity, and other patterns in the data.  It is the machinery of Gaussian Processes that makes this flexibility possible.  This 'Kernel Cookbook' page (from David Duvenand a former member of Zoubin Gharamani's group) gives some information about how to construct a simple Gaussian Process model.

I have had a go at fitting a Gaussian Process model to some bike-sharing data in order to forecast demand in bike-sharing schemes.  This data is available here.  I found that it was more difficult than I expected to find an appropriate Gaussian Process to model the daily pattern of usage.  The key problem that I faced was that most of the simplest kernels assume that the process you are trying to model is stationary.  However that is a lot of non-stationarity in the daily pattern of bike-sharing demand - there is a lot more variability in 7-9am (the rush hour peak) than there is in say 9-11pm.

All this make me intrigued to find out more about recent developments in Gaussian Processes, particularly for non-stationary processes.  From what I have seen so far, it looks a lot more challenging.  It will be interesting to see if the Automatic Statistician can model non-stationary processes.

Mean Field Models for brain activity

Brain activity can be modelled at a variety of temporal and spatial scales from the millisecond to minutes, and from single neurons to whole brain regions.

Mean Field Models describe the activity of populations of neurons, and can be used to model the evolution of a field of neural activity within a particular brain region.  They are therefore sometimes called mesoscopic models (somewhere in the middle).  This means that the models can be more biophysically realistic than models that describe interactions between brain regions.  But it is still possible to do inference with these models and human brain imaging data such as EEG.

I am looking at a Mean Field Model developed by David Liley and Ingo Bojak that models the effect of anaesthesia on brain activity.  This is proving quite challenging because there are so many unknowns: extracortical input, parameter values, initial conditions.

Broader Neuroscience reading

Some of the world's leading neuroscientists have written a collection of essays called The Future of the Brain.  A lot of the essays describe 'Big Science' multi-million dollar research projects.  There is a lot of focus on molecular biology experiments to probe the properties of single neurons, and on building large-scale microscopic models of brain activity that predict macroscopic effects.

As someone with a preference for simplicity in modelling, one thing I am interested in is to what extent mesoscopic (i.e. simpler) models can approximate microscopic (i.e. more complex) models.  As far as I know this is an open question, that could be answered by some of the large-scale neuroscience research projects that are currently underway.

Friday, March 6, 2015

The Pseudo-Marginal Miracle

Over the last month I have been making some interesting connections between different areas of statistics.  Explaining these connections (or even the things being connected) is perhaps an overly ambitious aim for a blog post, but I think it is worth a try!

The three problems below are all interconnected.

MCMC for doubly intractable problems

This typically occurs when you want to use MCMC to estimate parameters of a model where the normalization term is intractable and it is dependent on the parameters you are interested in estimating.

With the basic MCMC approach, the normalization term needs to be evaluated on every iteration of the algorithm.  Recent advances in MCMC (the Pseudo-Marginal approach) have lead to algorithms where a suitable approximation to the normalization terms can be used and the algorithm still gives the same results (asymptotically) as the basic algorithm.

Reversible Jump MCMC

This occurs when one of the things you don't know is the number of parameters in your model (e.g. number of clusters or components in a mixture model).

With the basic MCMC approach it is difficult to make good proposals for the parameter values when your proposal changes the number of parameters in the model.  However the the Pseudo-Marginal approach can also be applied to Reversible Jump MCMC.  This results in parameter proposals that are more likely to be accepted in the MCMC, and therefore makes the method much more computationally efficient because less time is spent generating proposals that subsequently get rejected.

MCMC parameter estimation for nonlinear stochastic processes

This is useful for some models in econometrics, epidemiology, and chemical kinetics.  Suppose you have some data (e.g. stock prices, infection numbers, chemical concentrations) and a nonlinear stochastic model for how these quantities evolve over time.  You may be interested in inferring posterior distributions for the parameter values in your model.

The Pseudo-Marginal approach was used to develop a method called particle MCMC that does this parameter estimation more efficiently than basic MCMC.

In all three cases (doubly intractable, reversible jump, nonlinear stochastic processes), applying basic MCMC results in the need to evaluate the density of an intractable marginal probability distribution.  Somewhat miraculously, the Pseudo-Marginal approach obviates the need to evaluate this probability density exactly, while still preserving key theoretical properties of the algorithm related to convergence and accurate estimation.

Thursday, February 5, 2015

New Perspectives in Markov Chain Monte Carlo

In my last post I mentioned that I had been finding out about SQMC (Sequential Quasi Monte Carlo), a new method for sequential problems (such as time-series modelling).  I would like to revise some of what I said.  I implied that SQMC would be useful for Hidden Markov Models.  However this is not entirely accurate as there are other alternatives to SQMC that can be used for Hidden Markov Models that are much more computationally efficient (such as the Viterbi algorithm for finding the most likely sequence of hidden states).  The benefit of SQMC (and its main competitor SMC) is that it can be applied to a very wide class of models (such as stochastic volatility models).

Since my last post I have been further expanding my repertoire / toolbox of methods for statistical inference.  The aim of this exercise is to expand the class of models that I know how to do statistical inference for, with a view to doing statistical inference for realistic models of neural (brain) activity.

I have been concentrating my efforts in the following two areas.

Particle MCMC

This is an elegant approach to combining a particle filter / SMC with Markov Chain Monte Carlo for parameter estimation.  The main applications in the original paper (Andrieu et al. 2010) are to state space models (such as the stochastic volatility model).  However the method has since been applied to Markov Random Fields (e.g. the Ising Model and random graphs models) and to stochastic models for chemical kinetics.

The exciting thing about Particle MCMC is that parameter estimation for nonlinear stochastic models is much more computationally efficient than it is with standard MCMC methods.

More generally it is also exciting to be working in a field where major breakthroughs are still being made.

MCMC Methods for Functions

Particle MCMC is a solution for cases where the processes in the model are too complex for standard MCMC to be applied.

The work on MCMC methods for functions is a solution for cases where the dimension of the problem is very high.  The paper I have been looking at (Cotter et al. 2013) presents applications to nonparametric density estimation, inverse problems in fluid mechanics, and some diffusion processes.  In all these cases we want to infer something about a function.  Some functions are low-dimensional (such as a straight line which only has 2 parameters).  However the functions considered in the paper require a large number of parameters in order for them to be accurately approximated, hence the statistical inference problem is high-dimensional.

The methods in the paper appear to be much more computationally efficient than standard MCMC methods for high-dimensional problems, and it looks like they are quite easy to implement.

However the methodology does not apply to as wide a range of models as Particle MCMC.

As far as I know there are not currently any methods that work well for high-dimensional complex models.  This means it is difficult to use bayesian inference for weather prediction where high-dimensional complex models are needed to make accurate predictions.  This is an active area of research at Reading (which has a very strong meteorology department), and I will be interested to see what progress is made in this area in the coming years.

I haven't got very far into the neuroscience modelling part of my project.  However I have a feeling I may be faced with a similar problem to the meteorologists, i.e. needing to use a high-dimensional complex model.

I will be continuing to develop my interest and knowledge in these areas by going to a summer school in Valladolid this summer (New Perspectives in Markov Chain Monte Carlo, June 8-12).

Monday, January 5, 2015

Autumn Term

Here are a few things I have found interesting from my first term as a PhD student at Reading University.

I started with the intention of quickly writing a short blog post, but that proved too difficult...

Parameter estimation for the Ising model

The Ising model comes from statistical physics, and describes how neighbouring atoms interact with each other to produce a lattice of spin states.  Each atom is a variable in the model, leading to a complex multivariate model.  For example, a 10x10 lattice is associated with a 100 variable joint probability mass function with a total of 2^100 states (assuming there are 2 spin states for each atom).  The combination of high dimension (number of variables) and dependencies between variables make it challenging to analyse or compute anything of interest.  Monte Carlo methods are needed to approximate the integrals that are used in Bayesian inference for this model.

Meeting people working in Neuroscience at Reading University

I have found out more research in the Systems Engineering department by meeting with Ingo Bojak and Slawek Nasuto.  Among other things, they work with neural field and neural mass models.  Their models include spatial interaction, stochasticity, and uncertain parameter values.  The combination of these model features means that the challenges of statistical inference may be the same as for the Ising model: high dimension and dependencies between variables.

Introduction to Neuroscience course

I am sitting in on an Introduction to Neuroscience undergraduate course to improve my subject knowledge in this area.  I have enjoyed learning random facts about the brain.  Just to pick out one area, the story of Phineas Gage is quite fascinating both at a superficial level (how could someone survive an iron bar going through their brain!?) and at a deeper level (how and why did Gage's mind change after the accident?)  The story is well told in Descartes' Error by Antonio Damasio.

As well as learning random facts, the good thing about taking a course is seeing how different things link together.  I have found the concepts of synaptic plasticity and receptive fields quite interesting.  Synaptic plasticity is the process by which connections between neurons change over time, and underpins learning.  Receptive fields are representations of how neurons respond to a range of different stimuli.  Receptive fields are important for understanding how physical stimuli are transformed into perception.  These concepts are fundamental for understanding a wide range of brain functions - hearing, vision, bodily sensation etc.

RSS Read Paper on SQMC

I travelled to London for an event held by the Royal Statistical Society on Sequential Quasi Monte Carlo, a new method developed by Mathieu Gerber and Nicolas Chopin (a relative of the great Frederic, or so I am told...).  Only a handful of papers each year are read before the society, so they are usually fairly significant advances.  This particular paper was presented as an advance from Sequential Monte Carlo (SMC).  SMC methods are useful for statistical inference of time series or dynamic models (e.g. Hidden Markov Models).  SQMC is also useful for statistical inference of time series / dynamic models, but, computationally, it is an order of magnitude faster than SMC.


The Academy for PhD Training in Statistics (APTS) held a week long course in Cambridge where I learnt about Statistical Inference and Statistical Computing.  There were 50-100 other first year PhD students in Statistics there from all over the UK and Ireland.

It was nice to hear from some statisticians who work in different application areas to me.  I enjoyed finding out about statistical modelling of volcanoes and of the Antarctic ice sheet.

It was also interesting to hear how academic statisticians have worked with government, e.g. MRC Biostatistics Unit, and the Cabinet Office National Risk Register.