Friday, December 8, 2017

tools for writing code, life, the universe and everything - can anything beat emacs?

I have recently finished a 6 month placement with NAG (the Numerical Algorithms Group) based in Oxford.  One of the things I picked up there was how to use emacs for writing code and editing other text.

Previously I have always written code in programs that are designed for specific languages, such as RStudio or Matlab.

Emacs is designed to be a more generic tool that, in principle, can be tailored to any kind of text editing, including coding.  As a popular open source project emacs has many contributed packages.  I used it mainly for writing code in Fortran, but it has modes for pretty much every widely used programming language.  I also used it for writing LaTeX and for writing / editing To Do lists using Org mode.

Beyond it's usefulness as a text editor emacs has many other functions.  For example it has a shell, which behaves similarly to a command-line terminal but with the useful property that you can treat printed output as you would any other text.  I find myself quite frequently wanting to copy and paste from terminal output, or to search for things, such as error messages.  This is quick and easy in emacs.

So will I ever use anything other than emacs again,... for anything?  I think truly hardcore emacs fans do use it for literally everything - email, web browsing, even games emacs -amusements.  But I am not part of that (increasingly exclusive) club.  I find emacs a pain for things that you do infrequently - a shortcut isn't really a shortcut if you have to use google to remind you what it is!

I think the two main selling points of emacs are (i) anything that you do repeatedly using a mouse, you will be able to do at least as quickly in emacs, (ii) it does great syntax highlighting of pretty much any kind of text.

Thursday, March 30, 2017

Statistics in medicine

Last week I went to the AZ MRC Science Symposium organised jointly by Astra Zeneca and the MRC Biostatistics Unit.  Among a line-up of great speakers was Stephen Senn, who has an impressively encyclopaedic knowledge of statistics and its history, particularly relating to statistics in medicine.  Unfortunately his talk was only half an hour and in the middle of the afternoon when I was flagging a bit, so I came away thinking 'that would have been really interesting if I had understood it.'  In terms of what I remember, he made some very forceful remarks directed against personalised medicine, i.e., giving different treatments to different people based on their demography or genetics.  This was particularly memorable because several other speakers seemed to have great hopes for the potential of personalised medicine to transform healthcare.

His opposition to personalised medicine was based on the following obstacles, which I presume he thinks are insurmountable.

  1. Large sample sizes are needed to test for effects by sub-population.  This makes it much more expensive to run a clinical trial than the more traditional case where you only test for effects at the population level.
  2. The analysis becomes more complicated when you include variables that cannot be randomized.  Most demographic or genetic variables fall into this category.  He talked about Nelder's theory of general balance which can apparently account for this in a principled way.  Despite being developed in the 1970's it has been ignored by a lot of people due to its complexity.
  3. Personalised treatment is difficult to market.  I guess this point is about making things as simple as possible for clinicians.  It is easier to say use treatment X for disease Y, instead of use treatment X_i for disease variant Y_j in sub-population Z_k.  
Proponents of personalised medicine would argue that all these problems can be solved through the effective use of computers.  For example,
  1. Collecting data from GPs and hospitals may make it possible to analyse large samples of patients without needing to recruit any additional subjects for clinical trials.
  2. There is already a lot of software that automates part or all of complicated statistical analysis.  There is scope for further automation, enabling the more widespread use of complex statistical methodology.
  3. It should be possible for clinicians to have information on personalised effects at their fingertips.  It may even be possible to automate medical prescriptions.
It's difficult to know how big these challenges are.  Some of the speakers at the AZ MRC symposium said things along the lines of 'ask me again in 2030 whether what I'm doing now is a good idea.'   This doesn't exactly inspire confidence, but at least is an open and honest assessment.

As well as commenting on the future, Stephen Senn has also written a lot about the past.  I particularly like his description of the origins of Statistics in chapter 2 of his book 'Statistical Issues in Drug Development',

Statistics is the science of collecting, analysing and interpreting data.  Statistical theory has its origin in three branches of human activity: first the study of mathematics as applied to games of chance; second, the collection of data as part of the art of governing a country, managing a business or, indeed, carrying out any other human enterprise; and third, the study of errors in measurement, particularly in astronomy.  At first, the connection between these very different fields was not evident but gradually it came to be appreciated that data, like dice, are also governed to a certain extent by chance (consider, for example, mortality statistics), that decisions have to be made in the face of uncertainty in the realms of politics and business no less than at the gaming tables, and that errors in measurement have a random component.  The infant statistics learned to speak from its three parents (no wonder it is such an interesting child) so that, for example, the word statistics itself is connected to the word state (as in country) whereas the words trial and odds come from gambling and error (I mean the word!), has been adopted from astronomy.

Monday, March 6, 2017

Pushing the boundaries of what we know

I have recently been dipping into a book called 'What we cannot know' by Marcus du Sautoy.  Each chapter looks at a different area of physics.  The fall of a dice is used as a running example to explain things like probability, Newton's Laws, and chaos theory.  There are also chapters on quantum theory and cosmology.  It's quite a wide-ranging book, and I found myself wondering how the author had found time to research all these complex topics, which are quite different from each other.  That is related to one the messages of the book - that one person cannot know everything that humans have discovered.  It seems like Marcus du Sautoy has had a go at learning everything, and found that even he has limits!

I think the main message of the book is that many (possibly all) scientific fields have some kind of knowledge barrier beyond which it is impossible to pass.  There are fundamental assumptions which, when you assume they are true, explain empirical phenomena.  The ideal in science (at least for physicists) is to be able to explain a wide range of (or perhaps even all) empirical phenomena from a small set of underlying assumptions.  But science cannot explain why its most fundamental assumptions are true.  They just are.

This raises an obvious question: where is the knowledge barrier?  And how close are we to reaching it?  Unfortunately this is another example of something we probably cannot know.

In my own field of Bayesian computation, I think there are limits to knowledge of a different kind.  In Bayesian computation it is very easy to write down what we want to compute - the posterior distribution.  It is not even that difficult to suggest ways of computing the posterior with arbitrary accuracy.  The problem is that, for a wide range of interesting statistical models, all the methods that have so far been proposed for accurately computing the posterior are computationally intractable.

Here are some questions that could (at least in principle) be answered using Bayesian analysis.   What will earth's climate be like in 100 years time?  Or, given someone's current pattern of brain activity (e.g. EEG or fMRI signal) how likely are they to develop dementia in 10-20 years time?

These are both questions for which it is unreasonable to expect a precise answer.  There is considerable uncertainty.  I would go further and argue that we do not even know how uncertain we are.  In the case of climate we have a fairly good idea of what the underlying physics is.  The problem is in numerically solving physical models at a resolution that is high enough to be a good approximation to the continuous field equations.  In the case of neuroscience, I am not sure we even know enough about the physics.  For example, what is the wiring diagram (or connectome) for the human brain?  We know the wiring diagram for the nematode worm brain - a relatively recent discovery that required a lot of work.  The human brain is a lot harder!  And even if we do get to the point of understanding the physics well enough, we will come up against the same problem with numerical computation that we have for the climate models.

There is a different route that can be followed to answering these questions, which is to simplify the model so that computation is tractable.  Some people think that global temperature trends are fitted quite well by a straight line (see Nate Silver's book 'The signal and the Noise'.)  When it comes to brain disease, if you record brain activity in a large sample of people and then wait 10-20 years to see whether they get the disease, it may be possible to construct a simple statistical model that predicts people's likelihood of getting the disease given their pattern of brain activity.  I went to a talk by Will Penny last week, and he has made some progress in this area using an approach called Dynamic Causal Modelling.

I see this as a valuable approach, but somewhat limited.  For its success it relies on ignoring things that we know.  Surely by including more of what we know it should be possible to make better predictions?  I am sometimes surprised by how often the answer to this question is 'not really' or 'not by much'.

The question of what is computable with Bayesian analysis is still an open question.  This is both frustrating and motivating.  Frustrating because a lot of things that people try don't work, and we have no guarantee that there are solutions to the problems we are working on.  Motivating because science as a whole has a good track record of making the seemingly unknowable known.