wiggins_talk_notes

caveats
( "so let me start with warnings and disclaimers")

- warning: i am not a population geneticist - disclaimer: i will likely not talk about **any** of my own work. - "informal" talk, please ask questions

statement of the problem:
[|graph]

- quest for ``truthiness" / using data to select best model - examples - comparing models: neutral, neutral+mutation, neutral+selection, neutral+migration+sweeping - comparing # features: 50 sites, N-50 hitchhikers, 100 sites, N-100 hitchhikers, N sites 0 hitchhikers? - key ideas - x^2 ergo ML ergo ME (jets) - prediction ergo CV (sharks) - when would this be a bad idea? example: NN-regression - good models are predictive [but/and] - good models are interpretable

basic bayesian notions
- context + historical digression: who was this [|bayes] troublemaker? - what bayes said: "the product rule" - (as opposed to "the sum rule") - examples of what happens when you put product and sum together - diffusion - likelihood with additive and normal noise - behold: why we fit

bayesian model selection
- if you believe in ML, why not MML=ME? - [|plot of P(D|K)] - being bayesian: parameters and priors - payoff: BIC - WARNING: [|SKETCH OF DERIVATION LIKELY] - "the razor" - [|"truthiness" sketch] - summary + swindles - elaboration on priors - gaussian priors (exact case for linear regression; restatement of ridge/Tychonoff regression) - other priors: lasso, grouping, fusion, etc. - who is the most bayesian? bayes' rule, etc. - a note on graphical models

frequentist ideas / cross validation
- modeling is about prediction, ergo minimize empirical estimate of generalization error. - fit (monotonic) & "truthiness" (peaked) plot - nature never hands you distributions, only observations (re: P(D|M), BIC)

illustration: mixture modeling
- cf xing - cf sharply peaked - connection w/stat mech + test distributions + mean field theory + BP

bayesian+variational
- "EM" [|movie] - [|WARNING: SKETCH OF DERIVATION LIKELY] - gibbs=jensen=feynman inequality - ln q(x) \propto < ln p(D,z,t|a,b) > _{ ~x } = dH/dq - q=exp(dH/dq) - < H >_{H'} = H' - recall, as you learned on your mother's knee: m = tanh ( Jm/T) - cf section 33.3 of mackay's book - "VB"/"ensemble learning" - E: q(z) - M: q(t) - [|Movie of VB in action] for vector GMM - a note on graphical models - [|gmm ML graphical model, c/o jake hofman] - [|gmm ME graphical model, c/o jake hofman]

CV approach
- [|figures of number of mixtures set by cross validation]

pvalueology
- how good is my CV? - how good is my likelihood - assumptions inherent - conventions inherent - multiple hypothesis testing

alternative worldviews not mentioned in this talk
- discriminative learning, SVM/hinge loss, boosting loss... - model selection via stability - sampling/MCMC/gibbs (vs variation)

topics for penalty time
- "data mining" - work on community detection as latent variable inference (inc. [|Hofman+Wiggins '08]) - deep thoughts: what is a model? what is a "good" model?

statistics books by physicists
- mckay - bishop

online references
- [|beal's thesis on variational methods] - [|yedidia's lecture notes] - wikipedia pages - [|resampling statistics] - [|Gibbs sampling] - [|Variational Bayes] - [|Bayes] - [|Ridge/Tichinoff/Bayesian regression]

misc papers
- [|beal+ZG: vb vs other things] - [|more yedidia], on corrections to MFT - [|still more yedidia], diagrammatica, less pedagogical. - [|mackay 1995: Probable networks and plausible predictions-a review of practical ...], p(D|M) figure; truthiness plots, the whole shebang. 41 pages. - [|Schwarz 1978]

deep thoughts
- share your code - [|"Everything should be made as simple as possible, but no simpler."] - [|"There is always an easy solution to every problem ‚Äî neat, plausible and wrong"] - [|With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.]

Comments from Paul about AIC -
The book I mentioned is Model Selection and Multi-model inference by KP Burnham and D Anderson. Amazon link [|here]. I got a lot out of reading this book. There are a lot of practiacal examples and some fairly deep understanding of methods. There is also a very dense chapter where the Akaike Information Criterion is derived. I am ashamed to have not understood this chapter as well as I ought to, so if someone thinks they know something, I would be pleased to talk to you. [|Here] is the wikipedia link for AIC. I used AIC personally in recent papers on codon usage and on estimation of substitution rate models in molecular evolution. Seems to work. Happy to chat to you if you are interested in this kind of problem.

AIC is used in the [|Modeltest] program in phylogenetics. There is a fairly standard series of models for sequence evolution beginning with Jukes Cantor, and Kimura 2 parameter, and HKY etc which get progressively more complex. The maximum likelihood tree is obtained for a given set of data using each model, and AIC is used to compare the log likelihoods that come from each model.

However, the central problem is phylogenetics is that there are many possible trees for each data set, even if you only have one model for sequence evolution. Each tree is a model in some sense. Max Likelihood just takes a single tree. All trees have the same number of parameters, so you just take the one with the best likelihood. I would argue that Bayesian methods are better for this because there are many trees with likelihoods that are only slightly worse than the ML tree. You want to look at the ensemble of high likelihood trees, not just the ML tree. This is also appealing to physicists - it is like looking at a finite-temp equilibrium ensemble instead of just the groudstate. Here is the link to our [|PHASE] software for doing Bayesian phylogenies by MCMC. Here is the [|MrBayes] program, which is the most commonly used software for this.

A thought - Model selection is usually done by comparing ML trees, but I have argued that tree selection is better using ensembles of trees. There ought to be a way of comparing the ensemble of trees generated with one model with the ensemble of trees generated with another model. Does anyone know how you do model selection with the ensembles?