Blog post by Brittany Fiore-Silfvast

The Data Science Environment (DSE) Summit took place in beautiful Monterey, CA at the Asilomar Conference Center. The Summit brought together over a hundred participants across three universities (UW, UC Berkeley and NYU) involved in the Moore and Sloan Foundations’ Data Science Environment grant.

As a data science ethnographer, I typically take on the role of participant-observer of various data science events, but at the DSE Summit I ended up being more of a participant than an observer. The high degree of participation made it challenging at times to listen as closely as I would have wanted to for underlying rhythms and patterns across the group. However participating in the discussion sessions and interactions I identified some important undercurrents. I draw out these undercurrents into two main themes that I discuss in this post.

image of Monterey coastline

Photo credit: Kevin Koy


from UW CSE News:


SeaFlow, a research instrument developed in the lab of UW School of Oceanography director Ginger Armbrust, analyzes 15,000 marine microorganisms per second, generating up to 15 gigabytes of data every single day of a typical multi-week-long oceanographic research cruise.

UW professor of astronomy Andy Connolly is preparing for the unveiling of the Large Synoptic Survey Telescope (LSST), which will map the entire night sky every three days and produce about 100 petabytes of raw data about our universe over the course of 10 years. (One petabyte of music in MP3 format would take 2,000 years to play.)

What scientists like Armbrust and Connolly have is popularly known as "big data," and as rich and exciting as it can be, big data can also be a big problem.

"Every field of discovery is transitioning from data-poor to data-rich, and the people doing the research don’t have the wherewithal to cope with this data deluge," says Ed Lazowska, director of the UW’s eScience Institute.

“And now, the eScience team – the core team includes faculty from 12 departments representing five schools and colleges – is poised to scale way up. Last year, the UW won a five-year, $37.8 million grant from the Gordon and Betty Moore Foundation and the Alfred P. Sloan Foundation that will be shared with New York University and the University of California, Berkeley, to foster a data science culture at the three universities.

“We don’t want this to be a magic trick that only computer scientists know how to do,’ [eScience Institute Associate Director Bill] Howe says. ‘It should be something that everybody can do.”

Read more in html or pdf.



The first Astro Hack Week took place from September 15-19, 2014 at University of Washington. We had about 45 attendees through the week. We spent the mornings together learning new coding, statistics, and data analysis skills, and spent the afternoons working in pairs and groups on a wide variety of projects. These projects spanned a range of topics, and comprised everything from short exercises to development of teaching materials to full-blown research projects which will likely lead to publications!

Along with these hacks, the afternoons were also punctuated by informal breakout sessions on everything from using Git to constructing Probabilistic Graphical Models. Thanks to all the participants who stepped up to lead these breakouts and share their expertise with others!

(Photo by Adrian Price-Whelan)


This Fall, the UW eScience Institute will be running the second offering of the Data Science Incubation Program.

We invite short proposals (1-2 pages) for 1-quarter exploratory data-intensive science projects requiring collaboration in scalable data management, scalable machine learning, open source software development, cloud and cluster computing, and/or visualization.

Important dates (see website for more details):

  • Sep 8: 1-hour information session, 11:00 am - 12:00 pm, Paul Allen Center, 403.
  • Sep 18: 1-page proposals due
  • Sep 24: Notification
  • Sep 29: Kickoff meeting

Each project will involve one or more project leads who will come and join us in the Data Science Studio on Tuesdays and Thursdays during Fall quarter.

Each project lead will "own" their project (and its results) and be responsible for its successful completion, with the eScience team providing guidance on methods, technologies, and best practices in extracting knowledge from large, noisy, and/or heterogeneous datasets as well as general software engineering.

In reviewing the proposals, we will be looking for high-risk, high-reward science that this program can help push in a new direction. In addition, we hope to select a set of projects with shared requirements; we find that participants are most successful when they interact with each other as well as with our group.

More information, including instructions on how to submit project proposals, is available on the incubator website:




Learn how the University of Washington is expanding data-intensive discovery.
Learn how the structured query language and SQLShare can help your research.

Director's Picks