Coding

Installing Edarf

I’ve been meaning to check out random forests in general and the edarf package in particular for a while now – at least since Zach Jones and Fridolin Linder posted their paper on Random Forests and EDA last spring.

I finally had a chance to go through package, and it’s as great as I’d hoped it would be. The standard errors and partial dependence plots point toward a future where ML plays a much larger role in political science overall rather than just the subset devoted to prediction.

All that said though, edarf is still early going.

For anyone trying to get up and running with edarf as well, below are a few issues I ran into while walking through Zach’s IMC tutorial for how to use edarf.

See the appendix for my exact R and OS versions.

1. Fortan compiler

I started by running:

devtools::install_github("zmjones/edarf")

That produced the following error message:

ld: warning: directory not found for option '-L/usr/local/lib/gcc/x86_64-apple-darwin13.0.0/4.8.2'
ld: library not found for -lgfortran
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [edarf.so] Error 1
ERROR: compilation failed for package 'edarf'

Based on the clang: error and initial path it seemed like the compiler was having an issue finding the right version of fortran. I’d updated to 10.10 lately and reinstalled Xcode, so I thought that might have something to do with it.

After a bit of googling, it looks like the issue was that edarf relies on RcppArmadillo, which in turn relies on a version of gfortran that is not shipped with Mac OS X 10.9. Either the bug also applies to 10.10 or I botched the upgrade.

At any rate, to fix the issue I ran:

curl -O http://r.research.att.com/libs/gfortran-4.8.2-darwin13.tar.bz2
sudo tar fvxz gfortran-4.8.2-darwin13.tar.bz2 -C /

Installing edarf worked like a charm after that.

2. Version of party()

edarf also relies on a forked version of the party package:

devtools::install_github("zmjones/party", subdir = "pkg")

The package installed ok, but I couldn’t call the library at first.

I’d already called library(party) in my environment – I think while I was going through Zach’s great international methods colloquium presentation – and there seemed to be a conflict I couldn’t get around.

After restarting R though I could load the library fine.

3. plot_imp() error

Most of the plots in the tutorial worked well. But then I ran the following (lines 30-31):

imp <- variable_importance(fit, features, type = "local", oob = TRUE, parallel = TRUE)
plot_imp(imp)

That yielded this:

Error in as.character(x$label) : 
cannot coerce type 'closure' to vector of type 'character'

Based on the coercion/closure issue, my hunch is that plot_imp() is trying to coerce some function itself to character rather than that function’s output.

4. RStudio Abort Cycle

After working through the tutorial and then using my own data, eventually I somehow got into a cycle in RStudio where something in my current project environment was causing RStudio to abort, but then since RStudio auto-loads the last project environment, it would immediately run into the same issue again.

Eventually I just had to shut down the computer and restart everything.

I didn’t bother trying to reproduce the issue, which I think had something to do with socket settings and not edarf.

Conclusion

Bugs aside the package is great. Can’t wait to see where it goes.

–

APPENDIX

Mac OS X version: 10.10.4

R Version Info:

platform       x86_64-apple-darwin13.4.0   
arch           x86_64                      
os             darwin13.4.0                
system         x86_64, darwin13.4.0        
status                                     
major          3                           
minor          2.1                         
year           2015                        
month          06                          
day            18                          
svn rev        68531                       
language       R                           
version.string R version 3.2.1 (2015-06-18)
nickname       World-Famous Astronaut

Poisson Saturated Log Likelihood

I was poking around the web earlier for the saturated log-likehood in a poisson model, but couldn’t find it.

Here goes, using an example from Millar’s MLE Estimation and Inference:

library(MASS)

z <- c(rep(0,19), 1,1,2,2,3,3,3,3,4,4,4,5,6,6,6,6,7,7,7,9,9)

mod <- glm(z~1,family="poisson")

z.pos <- z[(z>0)]

saturated.ll <- sum(z.pos*log(z.pos)-z.pos-log(factorial((z.pos))))
intercept.ll <- as.numeric(logLik(mod))

saturated.ll
intercept.ll
(my.null.deviance <- 2*(saturated.ll-intercept.ll))
mod$null.deviance

If it’s easier to understand what’s going on, also try this:

total <- 0
for(i in 1:40){
  if(z[i]>0)
    total <- total + z[i]*log(z[i])-(z[i])-log(factorial(z[i]))
  else
    # can't evaluate log(0) ... also, note z[i] here will always be 0 too
    total <- total + 0 - z[i] - 0  
}
total

Example R and Python Code

In addition to Python, Lark now executes R at runtime as well, via Knitr.

Still have an issue with Maplotlib, but otherwise things are looking good!

Here goes. First let’s try R:

x <- rnorm(100)
summary(x)

Now let’s switch over to Python:

>>> x = 3
>>> x + 1
[should be 4]
>>> 2 + 2*x
missing or wrong results will be overwritten

Ok. Let’s try plotting stuff.

First in R:

plot(cars)

Now let’s try a plot in Python …

>>> import matplotlib.pyplot as plt

>>> fig = plt.figure()
>>> plt.plot([1, 2, 3, 4, 5], [6, 7, 2, 4, 5])
>>> fig

Bummer. Looks like fig object exists, it’s just not being passed properly to Lark.

Hopefully we should be able to sort it out in time.

Lark Now Runs Python

One reason I’ve been writing Lark, the static site generator that now powers this site, was to have the freedom to cook up unique ways of merging writing and data analysis.

Thankfully, I’m glad to say I hit the first milestone on that path today. It took a bit of hacking, but thanks to Matthew Rocklin’s great new pymarkdown module, Lark now executes Python code blocks in posts when it builds the site.

The end result? I can now stick this in a post …

```Python
>>> x = 3
>>> x + 1
[should be 4]
>>> 2 + 2*x
missing or wrong results will be overwritten
```

… and have it show up as this:

>>> x = 3
>>> x + 1
[should be 4]
>>> 2 + 2*x
missing or wrong results will be overwritten

Lark still needs a lot of work, but it’s fun to think about the possibilities this opens up.

Rd2pdf bug in devtools::check()

I’m currently trying to compile my first package in RStudio.

I was consistently getting the following warning:

* checking PDF version of manual ... 
WARNING LaTeX errors when creating PDF version.

I didn’t have a clue what was going on, and I wasn’t having any success following Hadley Wickham’s advice to check my LaTeX logs; I didn’t even know where the logs were.

Thankfully though, a bit of google discovered the problem: as Matt Bannert noted in R 3.1.3 the commands R CMD Rd2pdf and R CMD Check expected texi2dvi to be in /usr/bin/local/, even though it was actually in /usr/bin.

To fix it, I followed Matt’s suggestion:

# to check whether the same issue exists for you
which texi2dvi
# if so
cd /usr/local/bin
ln -s /usr/bin/texi2dvi

Worked like a charm, thankfully. After setting the symlink I haven’t run into the same warning again.

Installing Valgrind on Mac OS X Yosemite

I decided to learn a lower level like like C, in part because I want to learn about memory management, leaks, etc.

While following Zed Shaw’s tutorial I got stuck on installing Valgrind, which isn’t yet supported on Yosemite.

To get up and running, I generally followed Taras Kalapun’s tutorial, which installs Valgrind from svn. However, initially I ran into issues because aclocal and autoconf weren’t installed. Then I ran into issues because XCode wasn’t installed.

Below is how I got everything work.

###ACLOCAL AND AUTOCONF

After tinkering with yet another tutorial, first I installed autoconf:

curl -O http://ftp.gnu.org/gnu/autoconf/autoconf-2.69.tar.gz
tar -xzvf autoconf-2.69.tar.gz
cd autoconf-2.69
./configure
make
make install

Second, I installed aclocal:

curl -O http://ftp.gnu.org/gnu/automake/automake-1.14.1.tar.gz
tar -xzvf automake-1.14.1.tar.gz
cd automake-1.14.1
./configure
make
make install

Finally, with both installed, I could now run Taras' tutorial:

svn co svn://svn.valgrind.org/valgrind/trunk valgrind
cd valgrind
./autogen.sh
./configure
make
make install

Just kidding! When I ran make, I got this error:

Making all in coregrind
make[2]: *** No rule to make target `/usr/include/mach/mach_vm.defs', needed by `m_mach/mach_vmUser.c'.  Stop.
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

After some googling, it seemed the new problem was XCode.

###INSTALLING COMMAND LINE X-CODE

Thanks to Jinhui Zhang’s tutorial, I realized that even though I had XCode installed in my applications, I didn’t have it installed in my command line.

So, I ran:

xcode-select --install

Then I got some coffee. xcode is a beast to install.

###MAKE VALGRIND

Once XCode was installed, I could finally finish the Valgrind install. I changed back to my valgrind directory, then ran:

make
make install

And with that, finally, Valgrind worked like it was supposed to. I ran Zed’s tutorial and my output matched his.

Phew.

Loading Octave Graphics on Mac OS X 10.8.4

I decided to try my hand at Octave today, to see if it simplifies model prototyping.

One issue was that Octave wasn’t connecting properly to its graphics library. Anytime I ran hist(), for example, I would get:

dyld: Library not loaded: /usr/X11/lib/libfreetype.6.dylib
  Referenced from: /usr/X11R6/lib/libfontconfig.1.dylib
  Reason: Incompatible library version: libfontconfig.1.dylib requires version 13.0.0 or later, but libfreetype.6.dylib provides version 10.0.0
dyld: Library not loaded: /usr/X11/lib/libfreetype.6.dylib
  Referenced from: /usr/X11R6/lib/libfontconfig.1.dylib
  Reason: Incompatible library version: libfontconfig.1.dylib requires version 13.0.0 or later, but libfreetype.6.dylib provides version 10.0.0
/Applications/Gnuplot.app/Contents/Resources/bin/gnuplot: line 71:   865 Trace/BPT trap          GNUTERM="${GNUTERM}" GNUPLOT_HOME="${GNUPLOT_HOME}" PATH="${PATH}" DYLD_LIBRARY_PATH="${DYLD_LIBRARY_PATH}" HOME="${HOME}" GNUHELP="${GNUHELP}" DYLD_FRAMEWORK_PATH="${DYLD_FRAMEWORK_PATH}" GNUPLOT_PS_DIR="${GNUPLOT_PS_DIR}" DISPLAY="${DISPLAY}" GNUPLOT_DRIVER_DIR="${GNUPLOT_DRIVER_DIR}" "${ROOT}/bin/gnuplot-4.2.6" "$@"
/Applications/Gnuplot.app/Contents/Resources/bin/gnuplot: line 71:   871 Trace/BPT trap          GNUTERM="${GNUTERM}" GNUPLOT_HOME="${GNUPLOT_HOME}" PATH="${PATH}" DYLD_LIBRARY_PATH="${DYLD_LIBRARY_PATH}" HOME="${HOME}" GNUHELP="${GNUHELP}" DYLD_FRAMEWORK_PATH="${DYLD_FRAMEWORK_PATH}" GNUPLOT_PS_DIR="${GNUPLOT_PS_DIR}" DISPLAY="${DISPLAY}" GNUPLOT_DRIVER_DIR="${GNUPLOT_DRIVER_DIR}" "${ROOT}/bin/gnuplot-4.2.6" "$@"
error: you must have gnuplot installed to display graphics; if you have gnuplot installed in a non-standard location, see the 'gnuplot_binary' function

So, after stumbling across this SO question, here’s how I resolved the issue. First, I ran:

vim /Applications/Gnuplot.app/Contents/Resources/bin/gnuplot

Then, in vim, I ran:

:%s/DYLD_LIBRARY_PATH/DYLD_FALLBACK_LIBRARY_PATH/gc
:wq

That worked like a charm.

Managing Ruby Versions

For a project I’m working on, I needed to install the command line interface for AWS Elastic MapReduce.

The catch is that the CLI requires using ruby 1.8.7, and is not compatible with later versions. Bummer, because when I ran ruby -v I learned I was using 1.9.3.

However, I had a vague memory of upgrading to 1.9.3 when I was either playing with Rails or installing Jekyll a couple months back. I also had a vague memory of upgrading using RVM, the Ruby Version Manager.

To check, I ran rvm -v, and sure enough I’d already installed it. This was good news, because it makes managing multiple versions of ruby much easier.

In my case, from here there were two ways to get to running 1.8.7.

###The lazy way

It turns out that RVM doesn’t touch the original system ruby or its gems. So getting back to 1.8.7, which was preinstalled on my Macbook, was simple. All I had to do was run:

rvm use system
ruby -v

Sure enough, I was back to 1.8.7. To get back to the 1.9.3, all I had to do was then run

rvm use default

But switching back and forth from the system ruby to versions controlled by the RVM didn’t seem right. It felt like a better approach in the long run would be to run a version of 1.8.7 that RVM itself controlled.

The better way

So, I switched back to back my rvm ruby and ran:

rvm install 1.8.7
rvm use 1.8.7

That’s it, surprisingly.

Installing the new version takes a while, obviously, but it’s time well spent. I feel like I’m on much more solid footing now, and not just for making use of the Amazon EMR CLI.

Pushing Jekyll to s3

I have two workstations. The first is a custom built desktop that runs Windows (more on this later, hopefully). The second is an Asus laptop that primarily runs Linux.

I recently moved my sites over to Jekyll. I host them locally in a Dropbox directory, then push them to s3.

On Windows, I use jekyll-s3 to do the push. It’s convenient and works well. Just run gem install jekyll-s3 and then to push you run jekyll-s3 (the first time you do this, it walks you through configuration).

On Ubuntu, I couldn’t get the jekyll-s3 gem to install. So I use s3cmd. Installing was straightforward, just run:

sudo apt-get install s3cmd

But configuring s3cmd I ran into a stupid roadblock: my Secret Access Key contains some I and l characters, which look exactly the same in my browser. I tried to configure my s3cmd in the terminal by guessing which were “L"s and which were “I"s, never correctly, before I finally realized I could cut and paste them out of the browser and into a text editor that would make the difference obvious. Why it took me that long to realize I don’t know, but it worked like a charm once it did.

For the encryption and other default config details, I followed the basic config setup for s3cmd using this tutorial here.

Once it was configured, I saved the following as create-md.sh in the root folder for all my sites:

#!/bin/bash
cd /path/to/mysite.com
jekyll --server

I also saved the following as push-md.sh in the root folder for all my sites:

#!/bin/bash
cd /path/to/mysite.com
s3cmd put --delete-removed _site/ s3://mydomain.com
echo 's3cmd has been processed'

Ideally I’d like to combine the scripts (or use a git hook), but I haven’t found a way to programmaticaly shut down jekyll once it’s regenerated the site. Doing it this way also lets me check the changes on a local server first.

I then set an alias for each .sh file, so that all I have to do is open terminal and type create-md, check the site on a browser, and then push-md to push my site to s3.

NB: A good list of all the options for the s3cmd commands is here.

Installing Jekyll on Windows

There are two ways to install jekyll: the easy way and the (relatively) hard way.

####THE EASY WAY

The easy way is to download Railsinstaller. It sets up ruby, rails, bundler and other packages in C:\Railsinstaller. (Note: since I already have git installed elsewhere, I unchecked the install dialog box for git and ssh.)

Once you’ve got Railsinstaller up and running, open powershell and run:

cd C:/Railsinstaller 
gem install jekyll

I tried to run gem install rdiscount as well, but couldn’t get it to work. As a result, in the _config.yml file for any jekyll project I create, I have to add markdown: kramdown. (The other option is markdown: maruku, but that also doesn’t work on Windows.)

####THE (RELATIVELY) HARDER WAY

(Note: since Railsinstaller alters the PATH, if you’ve ever installed ruby before, make sure you amend your PATH variable to not include ruby before you start this.)

First, go to rubyinstaller.org/downloads.

Then, download ruby 1.9.3 as well as the dev kit.

Run the ruby-1.9.3 executable. When prompted, be sure to check the box for adding ruby to your path; without this, ruby commands won’t work in powershell. Also check the other boxes. Finally, specify that the ruby files should be installed to C:/ruby193.

Now extract the devkit files into C:\RubyDevKit.

Open powershell and run:

cd C:\RubyDevKit
ruby dk.rb init
ruby dk.rb install

From within the same devkit directory, now run:

gem install jekyll
gem install rdiscount

Note: as with the easy method, I couldn’t get rdiscount to work here either. So I also had to add markdown: kramdown to the _config.yml file of my projects using this method as well.