In a previous post I suggested that historians should use quantitative methods less to answer existing questions than to pose new ones. Such a digital humanities (DH) approach would be the reverse of the older social science history approach, in which social science tools were use to “answer” definitively longstanding questions.
What’s most striking about Leon Wieseltier’s essay in the New York Times Book review is how it confirms almost every cliché about the humanities as technophobic, insular, and reactionary. Not to mention some stereotypes about grouchy old men. Now I should confess at the outset to being a longtime Wieseltier cynic. His misreadings of popular culture always seemed mildly ridiculous. But what’s striking about the NYT piece is his vast ignorance of the subject.
In fall 2014 I taught a freshman seminar on data visualization entitled “Charts, Maps, and Graphs.” Over the course of the semester I worked with the students to create vizs that passed Tukey’s “intra-ocular trauma” test: the results should hit you between the eyes. Over the coming months I’ll be blogging based on their final projects. Today’s post is based on the work of Jeffrey You, who used US professional sports data, comparing baseball and football.
In TV and movies men talk more than women, and women talk mostly about men. Hence the Bechdel test. But I thought I’d do a dataviz for this phenomenon using Ben Schmidt’s implementation of Bookworm. His data scraper uses the Open Subtitles database of closed captioned subtitles for hundreds of TV shows. While it can’t measure who’s talking it can measure who’s being talked about. Not surprisingly, the pronoun “he” is substantially more common than “she” for all TV shows.
My nasty “cold” has been diagnosed as Influenza A, so it’s bed rest for 48 hours. And, of course, blogging about why Ebola gets all the news but not good ‘ol killers like influenza. I got CDC figures for deaths and then ran Google searches for the related terms, totaling the number of hits. I was surprised at first. The number of hits seemed to roughly correspond to the death rate. Ebola was way off, massively over reported, but the general trend seemed right. However . . . .
The Guardian recently posted a dataviz comparing Ebola to other infectious diseases. It’s from a forthcoming book entitled Knowledge is Beautiful and it is indeed beautiful. Unfortunately, it’s a really bad viz. Below is my alternative viz (using the Guardian’s data), along with a critique. The basic issue is evolution. Because viruses reproduce quickly so they’re a great example of Darwin at work. Basically a win for a virus is to reproduce a lot. A lot, a lot, a lot.
Playing around with the new ngramr package for R, I came up with a simple viz for both the sushi boom and the rise of US foodie culture. Sometime a picture is worth a thousand words, but a least sushi is low in calories.
Just discovered a great blog post on “data illustration” versus “data visualization” at Information for Humans. AIS argues that data illustration is “for advancing theories” and “for journalism or story-telling.” By contrast data visualization “generate[s] discovery and greater perspective.” I love this distinction, although I’m not sure I like the specific language. Tukey famously argued that data visualization was for developing new theories.
The rise of digital humanities suggests the need to rethink some basic questions in quantitative history. Why, for example, should historians use regression analysis? The conventional answer is simple: regression analysis is a social science tool, and historians should use it to do social science history. But that is a limited and constraining answer.
Aaron at Plan Space from Outer Nine has a valuable insight about how standard statistics textbooks often favor technique over understanding. I think we could extend approach this from “central tendency” to the broader question of “association.” We tend to view various measures of association (for example, Chi-square χ2, Spearman’s rho ρ, Pearson r, R2, etc.) as completely different measurements.
Why graph? And why, in particular, use innovative and unfamiliar graphing techniques? I started this blog without addressing these questions, but a recent blog post by Adam Crymble, critical of “shock and awe” graphs made me realize the need to explain EDA (Exploratory Data Analysis) and data visualization.
After reviewing a book on religion in 19th century Japan, I became curious about the quantitative dimension of religious practice, particularly the persecution of Buddhism. My initial visualizations turned into a exploration of how to visual spatial variation. The 1871 census data reported two types of religious practitioners (monks and priests) totaled by either domain or prefecture. The data show a striking regional trend.
What is clioviz? A blog devoted to data visualization in history and the humanities. What’s data visualization? An interdisciplinary approach to graphics that seeks to make trends and patterns in quantitative data visually apparent. In a well-designed data viz, patterns jump out at the viewer/reader, and results are obvious without the use of descriptive statistics.