Comparing 3 R-packages for data load speed

 

When your data sits in hundreds of files of identical structure, you depend on some kind of automation to load the files for analysis. Using a set of 332 csv files, I am demonstrating how the load time is hugely different depending on if you are using:

In a 7-minute video, I’m going over my R scripts step by step, allowing R-users of all skills levels to follow along, showing how the three different approaches differ. In the examples, I am moving from a 45-seconds load time down to no more than 0.08 seconds (80 ms), when using a disk cache for data already loaded once.

For me, the end goal was to make the data available for analysis in a dashboard built with Datawatch Panopticon Designer. The connector for Rserve in Datawatch Panopticon is an integration that brings all the possibilities of R straight into your dashboards. Datawatch relies on the standard CRAN distribution of R, which means that you have all the thousands of R packages on CRAN and Bioconductor at your disposal. As I will show you in this video, your choice of packages to use will matter: when loading data, you’ll get very different load times depending on how you use R.

VIDEO: Speed Matters When Loading Data

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s