RogueWolves

Rogue Wolves is the personal site of .

I'm currently a research scientist with Oculus Info Inc. in Toronto, Ontario Canada.

My research interests include: adaptive user interfaces, machine learning, Bayesian reasoning and distributed artificial intelligence.

Plex Media Center

I’ve recently switched from using Boxee to power my home media center. I was excited for the Boxee Box version of the software to be ported to PCs. Unfortunately, the new version of the PC software, while looking more polished, is actually a downgrade in terms of reliability and features. It’s a disappointment, and Boxee has stated their desire to abandon the PC version going forward so don’t expect a lot of polish to go into this version:

This 1.5 release will be the last version of Boxee for PC/Mac/Ubuntu. It will be available on Boxee.tv through the end of January.

I understand their desire to focus their business on the Boxee Box and I wish them success. However, I needed to find a more reliable solution that fits my setup. My brother recommended I look at Plex as a replacement. First off, this application is gorgeous. It’s well designed and easy to use. Plex integrates with iTunes and iPhoto, allowing you to stream your music, videos and photos. Also, it supports extensibility via plugins such as adding new videos sources. It’s a free download, so I thought I’d give it a try.

The software is split into two major components: 1) streaming media server; and 2) media clients. The server manages your media library, downloads meta-data and streams to the client software. There are media clients for Mac, iOS and Android devices. You can pretty much watch your media wherever you want.

I’m impressed with the software so far. It’s technically still preview release, but it’s been pretty reliable so far. It’s freely available so give it a try if you are looking for media center software.

Natural Language Processing with Python

Freely available text book on using the Python Natural Language Toolkit. I’ve used nltk for several projects. It’s a fantastic toolkit and definitely worth a look if you have some NLP work. I have not read this text. It appears to be a more hands on introduction rather than a deep dive into NLP theory and nltk.

From the preface:

This book is a practical introduction to NLP. You will learn by example, write real programs, and grasp the value of being able to test an idea through implementation. If you haven’t learnt already, this book will teach you programming. Unlike other programming books, we provide extensive illustrations and exercises from NLP. The approach we have taken is also principled, in that we cover the theoretical underpinnings and don’t shy away from careful linguistic and computational analysis. We have tried to be pragmatic in striking a balance between theory and application, identifying the connections and the tensions. Finally, we recognize that you won’t get through this unless it is also pleasurable, so we have tried to include many applications and examples that are interesting and entertaining, sometimes whimsical.

Rise of the Machine-oriented Web

Stephen Wolfram proposes a Top Level Domain (TLD) to form a data web:

But wouldn’t it be nice if there was some standard way to get access to whatever structured data any organization wants to expose? […] My concept for the .data domain is to use it to create the “data web”-in a sense a parallel construct to the ordinary web, but oriented toward structured data intended for computational use. The notion is that alongside a website like wolfram.com (http://www.wolfram.com/), there’d be wolfram.data.

Why have a top level domain (TLD) over say a sub-domain like: data.roguewolves.com or sitemap-like construct?

Now of course one could just start a convention that organizations should have a “/datamap.xml” file (or somesuch) in the root of their web domains, just like a sitemap-rather than having a whole separate .data site. But I think introducing a new .data top-level domain would give much more prominence to the creation of the data web-and would provide the kind of momentum that’d be needed to get good, widespread, standards for the various kinds of data. […] If a human went to wolfram.data, there’d be a structured summary of what data the organization behind it wanted to expose. And if a computational system went there, it’d find just what it needs to ingest the data, and begin computing with it

This is an interesting idea. Creating a TLD could help promote organizing the web into a human-oriented web and data web. This would be a good first step in making data available, but to support machine computation we need standards to describe the data. Semantic web standards are complex, hampering their adoption. Recent proposals such as RDFa and Microformats have taken a different approach to the data web. Rather than having separate human/machine representations of web data, instead the data is semantically marked up inline with human readable content. The same data can be presented to humans and machines for consumption. This is a nice approach, but mostly appropriate for document oriented data that is intended for human consumption. Large data-bases of data are highly valuable but rarely exposed as human consumable data. There is a need for a machine oriented data web.

It’s been interesting for me in the past few years to be involved in the emergence of the modern data community. And from what I have seen, I think we’re now just reaching a critical point, where a wide range of organizations are ready to engage in delivering large-scale structured data in standardized forms.

The rise of the machine-oriented web has tremendous potential for data mining, search, automated inferencing and computation. The advanced reasoning capabilities that can be built on top of the machine-oriented web could transform our society. IBM’s Watson, Apple’s Siri and Wolfram Alpha are examples of advanced capabilities that can be built from availability of large structured data.

Citeology visualizing paper genealogy

Autodesk Research has an interesting interactive visualization project, Citeology which visualizes citations in research publications.

Citeology

From the Citeology website :

Citeology looks at the relationship between research publications through their use of citations. The names of each of the 3,502 papers published at the CHI and UIST Human Computer Interaction (HCI) conferences between 1982 and 2010 are listed by year and sorted with the most cited papers in the middle. In total, 11,699 citations were made from one article to another within this collection. These citations are represented by the curved lines in the graphic, linking each paper to those that it referenced.

The application runs as a Java Applet in the browser, where you can select a paper and see the papers referenced (blue arcs) and papers that reference it (red arcs). This would be a very useful tool for navigating through related work.

(via flowingdata)

Perspective

This info graphic puts our place in the known Universe into perspective. A thought to contemplate while we are celebrating the beginning of 2012.

The Observable Universe

(via Techvert)

Also, check out this great video of the known Universe.

I’m always humbled when I try and wrap my head around the sheer scale of the Universe.