tag:blogger.com,1999:blog-1275149608391671670.comments2018-03-09T06:29:34.491-05:00SAS and RKen Kleinmanhttp://www.blogger.com/profile/09525118721291529157noreply@blogger.comBlogger626125tag:blogger.com,1999:blog-1275149608391671670.post-26616370433242044712018-03-09T06:29:34.491-05:002018-03-09T06:29:34.491-05:00Best Web Development Services are Available at ...Best <a href="http://www.josoftech.com/web-development.html" rel="nofollow"> Web Development </a> Services are Available at reduced cost.Unknownhttps://www.blogger.com/profile/00215143472805683307noreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-3571199112000231282018-02-28T07:22:25.359-05:002018-02-28T07:22:25.359-05:00Designs of bedsheets will dissent on the premise o...Designs of bedsheets will dissent on the premise of textures, shapes and patterns. the choices multiply once a style is mixed <a href="http://tajpak.com/" rel="nofollow">bed sheets</a> into varied colours. tho' there ar varied styles, many of them ar most popular by variant households. thereon note, here ar some in style bedsheet styles <a href="http://tajpak.com/" rel="nofollow">bed linen</a><br /> that ar adorn by several peopmuhammad azeemhttps://www.blogger.com/profile/18411460123164197365noreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-80650183616307281262018-02-27T04:43:53.000-05:002018-02-27T04:43:53.000-05:00Nice blog and absolutely outstanding. You can do s...Nice blog and absolutely outstanding. You can do something much better but i still say this perfect.Keep trying for the best. <a href="http://www.inwizards.com/hire-R-programmers.html" rel="nofollow">Hire R Programmers</a>Anuj Singhhttps://www.blogger.com/profile/11141323517090981712noreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-47290822369547580812018-02-22T01:07:17.027-05:002018-02-22T01:07:17.027-05:00Thank you for your post. This is excellent informa...<br /><br />Thank you for your post. This is excellent information. It is amazing and wonderful to visit your site.<br /><a href="https://www.logistic-solutions.com/" rel="nofollow">microsoft installation and configuration services</a>logistic-solutionshttps://www.blogger.com/profile/10190470937869760489noreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-9704707323880158882018-02-21T07:15:35.510-05:002018-02-21T07:15:35.510-05:00Useful Information, your blog is sharing unique in...Useful Information, your blog is sharing unique information....<br />Thanks for sharing!!! <br /><a href="http://www.logistic-solutions.com/" rel="nofollow">SAP Consulting Services in usa</a><br /><a href="http://www.logistic-solutions.com/" rel="nofollow">sas value added reseller</a>logistic-solutionshttps://www.blogger.com/profile/10190470937869760489noreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-3245465380131329692018-02-16T14:18:42.855-05:002018-02-16T14:18:42.855-05:00This is very helpful. I was wondering if it is pos...This is very helpful. I was wondering if it is possible for SAS to make the reference group the average score of the outcome, instead of the order of the variable?<br />Nikkihttps://www.blogger.com/profile/14670891862528575390noreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-57580687902677338532018-01-24T17:51:43.358-05:002018-01-24T17:51:43.358-05:00Is it possible to use both propensity score inform...Is it possible to use both propensity score information and post-treatment predictors in an outcome model? I haven't seen an example of how to do this in SAS or R.<br /><br /><br />In my data I have, among other variables:<br /><br />migrated (binary outcome- yes or no)<br />military service (binary treatment- yes or no)<br />age (predictive of outcome and treatment- continuous)<br />nativity (predictive of outcome and treatment- categorical)<br />postwar_occupation (occurs after treatment- categorical)<br /><br /><br />Can anyone provide some guidance on how to build the required models to best account for selection into the treatment group and for the effect of postwar occupation?<br /><br /><br />Thank you.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-24571135764325286882017-11-30T06:05:08.042-05:002017-11-30T06:05:08.042-05:00You can use whatever probabilities you like: the s...You can use whatever probabilities you like: the simulation can be structured to track the scenario of interest.Nick Hortonhttps://www.blogger.com/profile/00242216324355342047noreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-78555907759646624732017-11-30T04:23:20.579-05:002017-11-30T04:23:20.579-05:00Thanks, that was what I meant. What I am intereste...Thanks, that was what I meant. What I am interested in is the probabilities of treatment assignment. Is it that I would have one set of probabilities representative of all the datasets or each dataset would have it's own set computed separately? Safiya Shttps://www.blogger.com/profile/17928319076513960795noreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-6314509654900545182017-11-27T07:45:40.552-05:002017-11-27T07:45:40.552-05:00Safiya, you could certainly use your dataset as th...Safiya, you could certainly use your dataset as the basis of your simulations and create new Y's using the approach we've described. Is that what you mean by "simulate several logistic regression results"?Nick Hortonhttps://www.blogger.com/profile/00242216324355342047noreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-34301181556078238702017-11-27T05:08:11.853-05:002017-11-27T05:08:11.853-05:00Supposing I already have a dataset, can I use same...Supposing I already have a dataset, can I use same to simulate several logistic regression results?Safiya Shttps://www.blogger.com/profile/17928319076513960795noreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-31878596121259393822017-11-14T04:05:39.360-05:002017-11-14T04:05:39.360-05:00download links are broken but you can get the file...download links are broken but you can get the files there:<br />https://www.jstatsoft.org/article/view/v046c02Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-76787583398348885982017-11-12T00:29:15.023-05:002017-11-12T00:29:15.023-05:00Is there an easy way to switch between inline and ...Is there an easy way to switch between inline and displayed mathematics?Jasonhttps://www.blogger.com/profile/09366339492888340194noreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-3200695085341721052017-09-22T16:55:40.085-04:002017-09-22T16:55:40.085-04:00lm() is largely out of our control, and I don'...lm() is largely out of our control, and I don't think it is a good idea to write a replacement for lm().<br /><br />For numerical summaries, take a look at df_stats(). I think you will find it does what you want and interoperates well with ggformula. In particular, it always returns a tidy data frame (hence the d in df_stats). Here is an example:<br /><br /><br />require(mosaic)<br />HELPrct %>% filter(sex == "male") %>% df_stats(age ~ substance, mean, median)<br />## substance mean_age median_age<br />## 1 alcohol 37.95035 38.0<br />## 2 cocaine 34.36036 33.0<br />## 3 heroin 33.05319 32.5Randall Pruimhttps://www.blogger.com/profile/07500805842903136651noreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-23874489521891099772017-09-22T16:10:45.837-04:002017-09-22T16:10:45.837-04:00This is awesome! I just wish I knew about this a ...This is awesome! I just wish I knew about this a week ago (just spent a week teaching my intro stats class a messy conglomerate of tidyverse and mosaic for plotting).<br /><br />I love that the gf_ functions work with the %>% operator! Unfortunately, other formula using methods from base (like lm) or mosaic (like tally or favstats) do not work with the operator, and need data = ., which can potentially be rather confusing.Janhttps://www.blogger.com/profile/06073872742931383080noreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-66112573489488637002017-07-31T10:18:41.982-04:002017-07-31T10:18:41.982-04:00Our goal, consistent with the revised GAISE Colleg...Our goal, consistent with the revised GAISE College report (https://arxiv.org/abs/1705.09530) is to integrate the teaching of key statistical concepts with the use of appropriate technology. <br /><br />It's certainly possible to teach research methodology (e.g., addressing confounding using multiple regression) without software. I think that having students use real-tools with a straightforward interface can augment such instruction.<br />Nick Hortonhttps://www.blogger.com/profile/00242216324355342047noreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-33963424320220172142017-07-31T10:08:57.895-04:002017-07-31T10:08:57.895-04:00We very intentionally organized the material in th...We very intentionally organized the material in the book to ensure that there is a solid foundation in statistics. This permeates the data viz and data wrangling chapters (which are intended to answer a statistical question), the foundations in statistics chapter (which reviews key statistical concepts), and the topics chapters (e.g. text as data, spatial, network statistics). Such an approach seems critically important to be able to "think with data": http://amstat.tandfonline.com/doi/full/10.1080/00031305.2015.1094283Nick Hortonhttps://www.blogger.com/profile/00242216324355342047noreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-52988935120908356832017-07-31T09:38:01.483-04:002017-07-31T09:38:01.483-04:00I agree! I mistakingly chose to get my second Mas...I agree! I mistakingly chose to get my second Master in bioinformatics instead of biostatistics, and I'm very disappointed with my program. It is predicated on the assumption that you can teach people data science without any statistics. The whole field of "data science" seems to have this idea, and it will send us backward in time. It isn't leading to better research, just more research.Jessicahttps://www.blogger.com/profile/01125948286235962017noreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-28766309650373788622017-07-31T03:18:18.811-04:002017-07-31T03:18:18.811-04:00Do not teach beginners how to use R statistics. Te...Do not teach beginners how to use R statistics. Teach them proper research methodology. They will learn by themselves how to use any statistical software including R. That is power in learning. They will never forget.gatara timothyhttps://www.blogger.com/profile/04946108944968998288noreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-39790301932722802162017-07-28T09:59:00.518-04:002017-07-28T09:59:00.518-04:00The treatment of missing data is not really a matt...The treatment of missing data is not really a matter of whether formulas are used but of the particular function used. One difference between R and SAS is that R is written by a community and SAS by a company. So it is easier for SAS to enforce stronger consistency across functions. Naming conventions (camelCase, dots, underscores, etc.) are also inconsistent across R. (The reason functions like lm() discard missing data by default is that they use model.frame() which has this as its default.)<br /><br />The reason mosaic doesn't change the default behavior with regard to missing values is that we decided early on that our versions of the functions should behave just like the originals in cases where the original functions produced sensible output. (It would be bad if mean() gave a different answer depending on whether mosaic was or was not loaded.) The user can, as you note, set the default behavior actively using options(). That seemed like a good compromise.<br /><br />For functions like favstats(), which we could write without worrying about compatibility with core R functions, we were free to do other things. In this case, we compute statistics after dropping missing values and also display the number of values that were missing.<br /><br />Regarding t.test(pre, post, paired = TRUE, data = mydata), I have received numerous requests for things like this. (Usually the request is along the lines of mean(x, data = mydata). And for a time (and against my better judgment) we supported this. But it was a bad idea for several reasons. For starters, the code is ambiguous if x exists both in the environment and in mydata. It also makes constructing the functions and keeping them compatible with their core R counterparts much more challenging and led to some subtle bugs. In R, a formula is the correct way to designate a name to be evaluated in a special way. Finally, it was somewhat confusing that some functions accepted the "bare variables + data" syntax and others required formulas. It is more systematic if they all require the formula.<br /><br />The bigger inconsistency -- that t.test(y ~ x) worked by t.test( ~ x) did not -- we fixed in mosaic.<br /><br /><br />In general, the problem with nonstandard evaluation is that it is nonstandard, so it can be hard to predict behavior. There are times where NSE is very useful, but it works best when operating within a well-defined system where the NSE can be correctly anticipated by the user. The use of formulas in lattice, ggformula, mosaic, and in functions like lm() at its cousins provides an established context for evaluation of formulas with a data context.<br />Randall Pruimhttps://www.blogger.com/profile/07500805842903136651noreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-54888502585821656842017-07-28T09:01:57.734-04:002017-07-28T09:01:57.734-04:00Thanks for writing this thought-proving article. I...Thanks for writing this thought-proving article. I teach a lot of workshops for organizations that are migrating from SAS to R, and one of the things that confuses people is R's inconsistent treatment of missing values. Simple stat functions require setting na.rm = TRUE while formula-based ones don't. Using the mosaic functions, you can set options(na.rm = TRUE) and from then on, its simple functions will find that setting and work more like formula-based ones. Since mosaic is nice enough to add formulas to simple stat functions, it would be nice for that to be the default. <br /><br />Another inconsistency that I would like to see mosaic fix is that the data argument works only with formulas. So this finds the variables: t.test(y ~ group, data = mydata) while this does not: t.test(pre, post, paired = TRUE, data = mydata).Bob Muenchenhttps://www.blogger.com/profile/14224906531398701275noreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-13309438569933827332017-07-28T07:54:12.714-04:002017-07-28T07:54:12.714-04:00While you can certainly begin with base graphics a...While you can certainly begin with base graphics and the plot() function, that only seems like a reasonable solution to me if you want to continue with base graphics throughout the course (which I don't). The different graphics systems don't play well together, so I find it best to pick one and stick with it. If you want to use lattice or ggformula, I'd suggest starting there and avoiding base graphics altogether.<br /><br />Note too, that plot() is a bit quirky in its choices. plot(~price, data = diamonds) is not a very good choice, for example. A student should expect something much better. On balance, I don't find plot() or qplot() from ggplot2 compelling. I generally want to get beyond what these provide, and I don't find teaching lattice or ggformula to be challenging without such a function.Randall Pruimhttps://www.blogger.com/profile/07500805842903136651noreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-51392486450443399082017-07-28T07:21:06.579-04:002017-07-28T07:21:06.579-04:00An additional feature of the mosaic package is the...An additional feature of the mosaic package is the multi-purpose mplot() function available within RStudio. <br /><br />If you provide a linear model object as argument, it allows you to generate the typical regression diagnostics (including a regression coefficient plot). <br /><br />If you provide a dataframe as argument, you get an interactive data visualizer that lets you explore univariate, bivariate, and multivariate graphical displays (see http://escholarship.org/uc/item/84v3774z for an example of how we incorporate this on day one of an introductory statistics course).<br /><br />in RStudio, try running:<br /><br />library(mosaic)<br />mplot(HELPrct)<br /><br />The "Show Expression" feature is particularly useful: it's an easy way to see the syntax to generate the selected plot using lattice, ggplot2, or ggformula.<br />Nick Hortonhttps://www.blogger.com/profile/00242216324355342047noreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-41662107859163601802017-07-28T06:18:53.747-04:002017-07-28T06:18:53.747-04:00Thanks for this post. I agree that strengthening t...Thanks for this post. I agree that strengthening the teaching of formula-based functions is a good idea and easy to learn for beginners. One useful R feature that is often overlooked in this context is that plot(y ~ x, data = df) chooses a suitable plot for various combinations of y and x. Of course, if both variables are numeric, this creates a scatter plot. For numeric "response" and categorical "explanatory variable" we get parallel boxplots. And if the response is categorical we get a spineplot or spinogram, respectively. For many data sets that are relevant to our students (business & economics) this goes quite a long way. And from that point onwards I can teach what kind of principles - and corresponding R functions/packages - can be used to construct more complex displays etc.Achim Zeileishttps://eeecon.uibk.ac.at/~zeileis/noreply@blogger.comtag:blogger.com,1999:blog-1275149608391671670.post-8217944099962425032017-07-28T06:06:48.117-04:002017-07-28T06:06:48.117-04:00One point is that everything that it seems a good ...One point is that everything that it seems a good data scientist should know; statistics, computing including information technology and the art of graphics is what statisticians always knew we needed. The problem seems to be that those of us who learnt statistics in the eighties or earlier only ever got taught statistics and have learnt everything else as we went along. My belief is that a good background in statistics and basic programming skills is the most important thing to know. What is worrying is that the university I'm at, like many others, has the computer science department offering the data science masters.<br /><br />One point on your book. k-means should die. I know everyone teaches it, but mixtures of multivariate normals is so much more powerful and leads on to mixtures as solution to other problems. The main problem is understanding it requires a proper statistical background, but without it nobody understands k-means either.Ken Bushwalkerhttps://www.blogger.com/profile/00356020862476667328noreply@blogger.com