Free books on R

Excellent list of free ebooks on the R language and statistical analysis.

Publicly avalible datasets

I teach a class on Processing, which is a simplified version of Java designed to enable people to easy create graphics. My class focuses on data visualization. Below is my list of publicly available data sets which I encourage my students to use in their visualizations.


IMDB is a website that maintains a list of movies, actors, actresses, and information about them. They offer a set of downloadable information sets. The sets can be a bit challenging to parse though, so there also exists some Perl parsing scripts.


StackOverflow has a list of publicly available data sets.

Data Privacy Day Education Resources

Here are a ton of privacy education resources for everyone from teens to adults. The materials were put up as part of Data Privacy Day 2011. They include everything from classroom lesson plans to educational videos to simple tips and tricks. Highly recommended for anyone trying to educate others on data privacy.

Guide to online privacy

Nice page on privacy resources.

“I Can Stalk U” using online pictures

A new website entitled I Can Stalk U has arrived fresh on the heels of the website Please Rob Me. Both websites try and raise awareness about over sharing on the internet today. Please Rob Me focused on how people proactively share their location online. I Can Stalk U looks a a slightly more scaring form of over sharing, meta data in online photographs.

I Can Stalk U looks through Twitter for posts that include pictures. It then scans the meta data for the picture and tries to find the address where the picture was taken. If successful it posts the username and the location in the ICanStalkU feed.

Using web technologies for research

At the NSF IGERT 2010 Project Meeting this week I will be giving a set of 5 minute talks on how Blogs, Twitter, Wikis, and GoogleDocs can be used in research. Below are some of the links and examples I used in the talk along with short descriptions of how these technologies can be used.


My lab, CUPS, maintains a blog where we post everything from news about the lab to detailed reports from conferences we go to. The blog lets us post information others might be interested in even if it isn’t necessarily a paper worthy event.

Blogs are also an excellent way to learn about new information related to your area. Since there can be many blogs to track I use an RSS feed aggregater, such as Google Reader, to subscribe and keep track of multiple blogs.

Finally, blogs can be an excellent way to collect information about your area in one place where you and others can find it again. I use my personal blog,, to keep track of news articles related to my research. Also when I solve a particularly intricate technological problem that was impeding my research I post the solution to my blog for others to use.


Twitter is an excellent way to aggregate and disseminate information quickly.  Good examples are: CyLab, Electronic Freedom Frontier, and Wombat Security. You can easily create a Twitter account for a lab or research group and post interesting and exiting news about your lab.

Twitter is an excellent way to keep track of what others are doing. For example I have a list of security and privacy twitter feeds that I follow. Everyone on the list posts interesting things about security and privacy so I monitor their feeds for important information.

Twitter is also an excellent way to connect with people online during conferences. In Twitter anything that starts with a # symbol is called a tag. Using Twitter it is easy to search for tags. For example searching for #igert on Twitter brings up a list of all the Twitter posts tagged as #igert.


Wikis are a type of website that let people easily create linked content. Wikis are extremely useful for research for keeping track of information. Basically, using a wiki, you can setup your own Wikipedia that is dedicated to just your research. There are many different types of wikis, most wikis let you create web pages like what you see on Wikipedia but each type of wiki is special in its own way.  Here are some popular ones:

  • MediaWiki – Originally designed to support Wikipedia, one of the more popular wiki softwares.
  • Trac – Wiki software designed to support people who are all working the same project or code base. It has an issue tracking system built in which lets people submit bug reports and mark bugs as fixed. It also integrates with SVN (version tracking) installations.
  • TikiWiki – Fairly standard wiki software with lots of features and plug-ins.

Not all Wikis are public like Wikipedia. My lab manages a wiki that is only visible to members of the lab that we use to coordinate shared resources such as laptops and archive information, such as study procedures, for latter use.

Some good wiki examples:

Google Docs

Google Docs is an online document editing site that lets you create and edit Document, Presentation, Spreadsheet, Form and Drawings online through Google’s interface. What is really nice about GoogleDocs is that you can create one document online and let other people see and edit it.

Google Docs is an extremely useful tool for working with collaborators in other parts of the world. You can easily create a shared document and edit it together at the same time. GoogleDocs also supports a chat functionality so you can talk to the other person while you are both working on the same document.

Google Docs is also very useful for running surveys or setting up registration forms. I’ve created an example form where you can rate this presentation and tell me about how you use these types of technology in your research.

Architecture Is Policy: The Legal and Social Impact of Technical Design Decisions

Over on the CUPS blog I wrote up a summery of the EFF board panel on the legal and social impact of technical design decisions.


Technology design can maximize or decimate our basic rights to free speech, privacy, property ownership, and creative thought.  Board members of the Electronic Frontier Foundation (EFF) discuss some good and bad design decisions through the years and the societal impact of those decisions.

Book: Applied Security Visualization

I just ordered a book entitled “Applied Security Visualization” written by Raffael Marty. The author previously wrote a chapter in “Security Data Visualization: Graphical Techniques for Network Analysis“, another book on how to bring visualization techniques and tools to the aid of the security community. I was somewhat disappointed with the Security Data Visualization book as I felt that it was just throwing eye candy at what I consider to be a serious problem. Many of the tools put forward by the Security Data Visualization book fail to follow the principles put forward by Edward Tufte on how to create useful and effective data visualizations. I have not yet had a chance to review “Applied Security Visualization” but based on the author’s other work I am hopeful for a clearer and more useful application of visualizations to the security domain.

Behavioral Advertising

Behavioral advertising is used by groups, such as online advertisers, to track users as they move around the internet. This method allows third parties to infer and learn significant amounts of information about users and their browsing habits. Members of my research lab, CUPS, have studied how users perceive the issues surrounding behavioral advertising.

Researchers in the Computer Science Department at Worcester Polytechnic Institute are interested in educating users about what information your browser shares with web pages it visits. They setup a web page called What They Know where users can go to see what information they are broadcasting. Users visitors can also see the trends from past visitors.

Update: EFF has a site you can visit which shows the identifiable information your browser broadcasts to every site you visit.

Update: What They Know has published a report of their findings.

AES Explained

I just found an excellent stick figure comic on AES and how it works.  The comic is very accessible for both people who just want a simple explanation and people who want heavy details. It starts at a high level with the history and gets progressively more complex.