Comparing 3 R-packages for data load speed

 

When your data sits in hundreds of files of identical structure, you depend on some kind of automation to load the files for analysis. Using a set of 332 csv files, I am demonstrating how the load time is hugely different depending on if you are using:

In a 7-minute video, I’m going over my R scripts step by step, allowing R-users of all skills levels to follow along, showing how the three different approaches differ. In the examples, I am moving from a 45-seconds load time down to no more than 0.08 seconds (80 ms), when using a disk cache for data already loaded once.

For me, the end goal was to make the data available for analysis in a dashboard built with Datawatch Panopticon Designer. The connector for Rserve in Datawatch Panopticon is an integration that brings all the possibilities of R straight into your dashboards. Datawatch relies on the standard CRAN distribution of R, which means that you have all the thousands of R packages on CRAN and Bioconductor at your disposal. As I will show you in this video, your choice of packages to use will matter: when loading data, you’ll get very different load times depending on how you use R.

VIDEO: Speed Matters When Loading Data

Swedish Police keep registry of 4000 people with Romani relations

The Swedish daily newspaper Dagens Nyheter today revealed that Swedish police in the Malmö city area in southern Sweden have been keeping a registry over citizens with Romani ancestry for many years.

Pictures in the paper show that the registry is built with IBM’s software product Analyst’s Notebook. The registry is not a legacy left-over since it has been updated very recently; 52 of the over 4,000 people in the file (Total.anb) are below the age of 2 years old, living all across the country. The oldest people in the registry are deceased, born in the 1800’s.

The registry, in the form of a self-contained file of the Analyst’s Notebook file type .anb, is located on what appears to be a shared network drive with the Drive letter X: in a Windows based local network:

X:\SU\SU POMS\Kringresande\Total.anb

The folder name “Kringresande” kan be translated into “Travellers”.

The file path structure suggests that there are other folders besides the “Kringresande” folder, containing people registered for other reasons, and possible other .anb files in the “Kringresande” folder besides “Total.anb”.

In one high-lighted case, the reason for registration was that the 2-year-old’s great grand father was Romani. The child lives in a flat with her family in the Stockholm suburb of Skarpnäck.

As the registration of young children indicates, the registry is not limited to convicted criminals; in contains all kinds of people. Their only common ground is Romani ancestry – even if only one eighth part.

Dagens Nyheter reports that the file has been quite unrestricted within the police organization, with around 70 police officers having access to it on an ongoing bases. Further, the file has allegedly been shared between police colleagues via email, something that poses a great risk of the file reaching people outside of the police organization.

In any case, the registry and the registration of people on the basis of ethnicity is unlawful in Sweden, breaking several laws specifically regulating the work of the police. It is also assumed to be in violation of the European Convention of Human Rights, article 8.

In the words of IBM, Analyst’s Notebook has the following benefits:

With data analysis and visualization capabilities organizations can:

  • Rapidly piece together disparate data into a single cohesive intelligence picture.
  • Identify key people, events, connections, patterns and trends that might otherwise be missed.
  • Increase understanding of the structure, hierarchy, and method of operation of criminal, terrorist and fraudulent networks.
  • Simplify the communication of complex data to enable timely and accurate operational decision making.

Information on the software used, IBM’s Analyst’s Notebook: 

http://www-03.ibm.com/software/products/us/en/analysts-notebook

The original article in Swedish Dagens Nyheter:

Över tusen barn med i olaglig kartläggning – DN.SE.

Wilhelm Agrell: Breivik hade kunnat stoppas

Wilhelm Agrell refererar till Talebs Black Swan-teori när han förklarar misstaget som norska säkerhetstjänsten gjorde med Anders Behring Breivik.

“Den norska säkerhetstjänsten hade kunnat hejda Anders Behring Breivik. Men man såg sig blind på att leta efter terrorister med bakgrund i islamistiska rörelser. Det menar Wilhelm Agrell, professor i underrättelseanalys”

Wilhelm Agrell varnar för fyrkantighet och för att inte ha ett öppet sinnelag. Även om man bara har sett vita svanar i sitt liv är det farligt att dra slutsatsen att det inte existerar någon svart, påpekar han”

Expert: Breivik hade kunnat stoppas – DN.SE.

http://en.wikipedia.org/wiki/Black_swan_theory

http://www.amazon.com/The-Black-Swan-Improbable-Robustness/dp/081297381X