tag:blogger.com,1999:blog-8909074830238091680.post5153887305580055841..comments2017-08-23T18:27:31.706+10:00Comments on Jeromy Anglim's Blog: Psychology and Statistics: Using R to replicate common SPSS multiple regression output Jeromy Anglimnoreply@blogger.comBlogger12125tag:blogger.com,1999:blog-8909074830238091680.post-87720400058985146922013-12-11T22:47:15.890+11:002013-12-11T22:47:15.890+11:00You can do it in spss using scatter - matrix; or y...You can do it in spss using scatter - matrix; or you can do it in R; see here: http://www.statmethods.net/graphs/scatterplot.html<br /><br />I suppose such a plot would give you a sense of whether the data is bivariate normal for each pair of variables.Jeromy Anglimhttps://www.blogger.com/profile/12949204812496382042noreply@blogger.comtag:blogger.com,1999:blog-8909074830238091680.post-82238482322225268302013-12-11T10:56:17.410+11:002013-12-11T10:56:17.410+11:00Thanks for the great post Jeromy, I have some thou...Thanks for the great post Jeromy, I have some thoughts for those that are predominantly from psych backgrounds and prefer a UI over syntax.<br /><br />I will paint the picture like this: If the only tool in your toolbox is a hammer, then your solutions all end up looking like nails. <br /><br />I would say the point and click market will always be larger purely because its easier to work hard (hope that makes sense) I like using SPSS for certain tasks, though the reality for me these days means it is becoming less and less useful. SPSS is great in a finite setting. Sadly, reality doesn't follow the massaged data that gets used and given to most students when they analyse data. <br /><br />In defence of R, (and I like SPSS too :) I would argue there is greater skill with the managing of the data than the simple point and click of the UI. To broaden the spectrum, when you add how long it takes to do data munging, preparation, analysis, report writing, you will see that overall, R kicks SPSS to the curb with the ease of doing all this. <br /><br />Whether i am grabbing data through an API, scraping it from the web, using JSON, connecting to a SQL DB, R takes care of this pretty easily and I have data frames ready to rock with exploratory analysis. When I finish my analysis, it is as easy as writing my report in Markdown within the same interface where you can present all your findings in a very nice and presetable way.<br /><br />Not everyones approach but simplicity here can be bent in many ways, for those that like syntax and querying, R will eventually be much more simpler to use, not just for analysis, but as i mentioned, managing your analytics from start to end. End here means publishing it, be it a paper or even a blog post/ github page. All this can be done very easily from an IDE like R Studio. Narbeh Yousefianhttps://www.blogger.com/profile/01979322905142162623noreply@blogger.comtag:blogger.com,1999:blog-8909074830238091680.post-17142999135098667372013-12-10T22:40:07.208+11:002013-12-10T22:40:07.208+11:00Hi Jeromy,
I want to ask you something here becau...Hi Jeromy,<br /><br />I want to ask you something here because my mail never came to you (*delivery problem*). Sorry if it causes trouble.<br /><br />In this presentation, Slide 28 http://web.psych.unimelb.edu.au/jkanglim/IntroductiontoSEM.pdf<br /><br />You show an interesting scatter matrix. I would like to know if you think it's possible to assess multivariate normality with this test ? My goal was to do some EFAs using maximum-likelihood (ML) ans Promax, but ML necessitates multivariate normality.<br /><br />Thanks.MHhttps://www.blogger.com/profile/10656881172906444719noreply@blogger.comtag:blogger.com,1999:blog-8909074830238091680.post-5464966513289020742013-12-10T14:46:00.474+11:002013-12-10T14:46:00.474+11:00That could be useful in some domains. In most psyc...That could be useful in some domains. In most psychology areas that I'm interested in, the null hypothesis of zero correlation is often clearly false, and the challenge is to get some reasonable sense of the confidence intervals around the correlations, and also to disentangle the main themes in the correlation matrix, but I guess that's a whole other set of issues.Jeromy Anglimhttps://www.blogger.com/profile/12949204812496382042noreply@blogger.comtag:blogger.com,1999:blog-8909074830238091680.post-61335321925591494562013-12-10T14:44:11.776+11:002013-12-10T14:44:11.776+11:00Thanks for all the links. And yes RCommander might...Thanks for all the links. And yes RCommander might be the way to proceed for some psychology students.Jeromy Anglimhttps://www.blogger.com/profile/12949204812496382042noreply@blogger.comtag:blogger.com,1999:blog-8909074830238091680.post-12155049892715818952013-12-10T14:40:15.389+11:002013-12-10T14:40:15.389+11:00The above exercise aimed to replicate a specific s...The above exercise aimed to replicate a specific set of SPSS output I had lying around.<br /><br />That said, if you want to assess linearity. A few options include:<br /><br />1. plotting each predictor with DV and fitting a line using something like a loess. You get this in the scatter plot.<br />2. You could vary the plot to control for other predictors (like in SPSS how you get the partial regression coefficient plots).<br />3. If you knew of some particular form of non-linearity, you could test for this. For example, you could include a quadratic term (e.g., x^2) in the regression.Jeromy Anglimhttps://www.blogger.com/profile/12949204812496382042noreply@blogger.comtag:blogger.com,1999:blog-8909074830238091680.post-75632489865003593032013-12-10T02:05:49.434+11:002013-12-10T02:05:49.434+11:00Nice job! For correlations, I prefer rcorr.adjust...Nice job! For correlations, I prefer rcorr.adjust from the Rcmdr package. It offers regular p-values and then it corrects them for the number of tests done using Holm's sequential Bonferroni test.Bob Muenchenhttps://www.blogger.com/profile/14224906531398701275noreply@blogger.comtag:blogger.com,1999:blog-8909074830238091680.post-39523055838601118032013-12-09T10:08:12.556+11:002013-12-09T10:08:12.556+11:00I'm learning R rather than teaching, but here ...I'm learning R rather than teaching, but here are two approaches to teaching with R in other fields (Physiology and Epidemiology) I've found useful. Neither is a complete solution, and RCmdr may be the optimal guide rail for introduction (Deducer is another very good GUI). Each approach may be relevant to considerations raised in your previous Evaluation paper though -- free-form coding and the computing environment. <br /><br />Both impose some control over the R environment and use of packages, indicating the approved methods to use. That seems necessary to deal with the risk of students (and the teacher) getting lost on the wide open plains of R. Those still exist, but beyond the bounds of the teaching material. They're more accessible to student exploration based on that taught foundation, having gained experience using the application.<br /><br />'Explorations in Statistics' a series by Douglas Curran-Everett, et al: http://advan.physiology.org/cgi/collection/explorations<br /><br />"Because this series uses R solely as a vehicle with which to explore basic concepts in statistics, I provide the requisite R commands..."<br /><br />These are all in base R, eliminating dependency on even the most popular packages (set-up is to start R within a folder for the course). Admittedly there's no homework and solutions and it's remote teaching, but there is code to explore:<br /><br />"...When I teach and write about statistics, I want to engage my audience. To do this I use simulations as thought experiments my audience can see (11–14). From my perspective, the only thing better would be if my audience could run the simulations on their own. This series ... provides an opportunity to do just that: we will investigate basic aspects of statistics using a free software package. ... My goals are to provide a theoretical framework for and a vehicle with which to illustrate each concept."<br /><br />'Analysis of epidemiological data using R and Epicalc' by Virasakdi Chongsuvivatwong takes a different approach, imposing much stricter control over the R environment, with use of attach and providing custom functions for convenience, because R:<br /><br />"...is difficult to learn and to use compared with similar statistical packages for epidemiological data analysis such as Stata. The purpose of this book is therefore to bridge this gap by making R easy to learn for researchers from developing countries and also to promote its use."<br /><br />e.g. custom functions for variable and value labels (akin to Hmisc I guess), an accommodation to those concepts in SPSS & Stata:<br /><br />"...Epicalc presents a concept solution for common types of work where the data<br />analyst works on one dataset at a time using only a few commands. ... eliminate the necessity of specifying the dataset and can avoid overloading of the search ... make tidying of memory easy ... easy to recognize the variables by adopting variable labels or descriptions which have been prepared from other software such as SPSS or Stata or locally prepared by Epicalc itself."<br /><br />http://cran.r-project.org/doc/contrib/Epicalc_Book.pdf<br /><br />Installing packages, if not already installed by in some client-server installation on campus, seems a relatively minor obstacle compared to paying software licence fees and limited access to the tool. The range of GUIs and an IDE like RStudio render the interface much more palatable.<br /><br />R plays so much better with other applications, e.g. LaTeX, RMarkdown etc, allowing for reproducible assignments and papers, and tapping into material outside a course, it seems more empowering than licensed alternatives.m. devlinhttps://www.blogger.com/profile/05209227695410994928noreply@blogger.comtag:blogger.com,1999:blog-8909074830238091680.post-62335812505409121492013-12-08T13:37:34.543+11:002013-12-08T13:37:34.543+11:00Hi,
According to my understanding regression one ...Hi,<br /><br />According to my understanding regression one of the assumptions of regression is linearity. Forgive me if you have done that. I am not able to find it.<br /><br />Rule 1 for linearity) There should exist a linear relation between each independent and dependent variable. Where are you checking it statistically, besides the scatter plot ? Is there a lack of fit test ?<br /><br />Rule 2 for linearity ) Overall all the independent variables should possess a linear relationship with dependent variable. Where have you checked that.<br /><br />I can see you have checked for the other 3 assumptions of regression. Again if you have checked already, I may be missing it. Could you help find it ?<br /><br />Suresh Yalamanchilihttps://www.blogger.com/profile/08797196104212850751noreply@blogger.comtag:blogger.com,1999:blog-8909074830238091680.post-54650438944031984032013-12-05T16:09:49.111+11:002013-12-05T16:09:49.111+11:00Hi Jeff,
They are all good points, and they are ...Hi Jeff, <br />They are all good points, and they are the issues that I'm pondering at the moment.<br /><br />Just to make one point in R's defence, I could have started with a basic analysis in R and demonstrated how to reproduce it in SPSS. The result probably would have been very technical and obscure also.<br /><br />For example, I could have <br />* run a multiple regression in R and added a couple of categorical predictors and maybe a quadratic effect. <br />* obtained bootstrap confidence intervals on the coefficients. <br />* extracted the AIC for the model<br />* etc.<br /><br />This would all be relatively simple in R.<br /><br />I imagine each that to replicate any of those steps in SPSS would take a bit of work looking up syntax workarounds, add-on packages, and so on.<br /><br />If the set of output that SPSS provides is what you want, then SPSS is user friendly. If not, then R is often a lot easier to work with.<br />Jeromy Anglimhttps://www.blogger.com/profile/12949204812496382042noreply@blogger.comtag:blogger.com,1999:blog-8909074830238091680.post-7259577656614213402013-12-05T15:54:15.339+11:002013-12-05T15:54:15.339+11:00Great post,
and very detailed.
Unfortunately, it...Great post,<br /><br />and very detailed.<br /><br />Unfortunately, it highlights the biggest problem with R. I can teach a student to generate the same output in SPSS, even if the student has very little familiarity with SPSS, using SPSS's point-and-click interface in maybe 2 minutes.<br /><br />Having to go through all of R's genuflections just to get this type of basic analysis done is precisely why it won't make a dent in this space. Until there is further development on a user-friendly front end, that doesn't involve all this coding, installing various packages, loading libraries, arcane error messages, massively disconnected support options, etc... R just can't compete with SPSS for this purpose.<br /><br />I'm no fan of SPSS, and in fact am increasingly irritated by their business model, which is leaving academics behind in pursuit of their enterprise level strategy. I'd love to make the switch, and think the Field's text is outstanding.<br /><br />But when you're talking undergraduate (and graduate level for that matter), *applied* statistics for social scientists, the learning curve of R is simply too steep to justify. Sure you can Google any question with R. The trouble is that the language is so powerful/flexible, you're bound to find 10 different ways of doing something, all with their own idiosyncrasies. Students could spend inordinate amounts of time massaging code examples just to make an example work for them. Although it's technically not "fragmentation" per se, working with R can sure feel that way.<br /><br />I do have my Ph.D. in Psych, have a good number of graduate hours in all manner of statistics, and teach graduate level stats. At this point, I really believe I'd be doing the students a disservice switching my curriculum to R, as it just doesn't suit their needs.Jeffhttps://www.blogger.com/profile/10634594723028106545noreply@blogger.comtag:blogger.com,1999:blog-8909074830238091680.post-81343914239389785192013-12-05T10:56:32.040+11:002013-12-05T10:56:32.040+11:00That's amazing. Thank-you for taking the time ...That's amazing. Thank-you for taking the time and sharing your expertise in such a useful way.Doug Lawsonhttps://www.blogger.com/profile/11454885709941990550noreply@blogger.com