Featured

UW CSE News reports that UW has waived indirect cost on cloud services, removing disincentives to the selecting research computing and storage options.

eScience Director Ed Lazowska writes

This decision removes one of several bizarre disincentives to the rational selection of research computing and storage options – disincentives that plague universities nationwide.

Federal guidelines waive indirect cost on purchased equipment – so purchasing a $100K cluster costs a grant budget $100K, despite the fact that this equipment must be housed, powered, cooled, backed up, replaced …

Meanwhile, indirect cost is charged on outsourced cloud services – so purchasing $100K of AWS or Azure services costs $157K (at UW’s rates – different institutions have different markups), despite the fact that the only actual overhead is paying an invoice.

UW IT and the UW Office of Research have now decided to unilaterally waive this nonsensical charge.

Progress! Hopefully others will follow!

Read more here.

Three footnotes:

  1. There is precedent for national action: several years ago it was ruled, nationally, that indirect cost should not be charged on outsourced gene sequencing services.
     
  2. There are additional bizarre disincentives to the rational selection of research computing and storage options. If you want to purchase a large cluster, your NSF program officer will send you to the Major Research Instrumentation program, which is not charged against any specific Program, Division or Directorate – so it’s “free” to his/her program … what could be finer? And once the cluster arrives at your university, Santa Claus pays for the power, Mrs. Santa Claus pays for the cooling, Rudolf shares his space, and the Elves do the backup … all of these, which have very real costs, appear free to the investigator at most universities.
     
  3. Finally, it goes without saying that cloud services are not the right choice for every application. What UW’s decision does is simply to take one step towards leveling the playing field, leading to rational choice.

 

Last week the four project teams from our Data Science for Social Good summer program gathered in the WRF Data Science Studio to present project updates to eScience's Executive Committee and other invited guests. It was a midterm, of sorts, to see how everything and everyone was coming along.

Based on previous eScience incubator programs, as well as programs at the University of Chicago and Georgia Tech, the goal of the ten-week summer program is to identify organizations devoted to social good and use data science to increase each organization's reach and impact. It's also an opportunity to train data scientists. 

Each project team consists four undergraduate and graduate students, a project lead from each organization to serve as domain experts, and an eScience data or research scientist. This year's program is also providing internship opportunities for high schools students from the Alliances for Learning and Vision for Underrepresented Americans (ALVA). 

Along with focusing on their respective projects, students are spending time participating in tutorials lead by eScience staff, as well as talks and tutorials presented by outside speakers, including Socrata, Dato, and the City of Seattle. Students are also taking part in a weekly journal club whose reading focuses on data science. And we're pleased to announce that each project has finally launched their respective blogs, which are designed to share students' thoughts about the challenges and victories they're encountering along the way.

As each project's midterm presentation time was limited, the slides came fast and furious. The takeaway for all in attendance was that, even though each team has faced hurdles surrounding the purity of data and how best to write algorithms to clean, organize, and utilize it, all four projects have made significant progress over the first half of the program and each should be (and were) loudly congratulated on what they've accomplished to date. Bravo, everyone!

You can learn more about the projects and organizations involved with this year's DSSG program by clicking here.

And you can follow each of the project's respective blogs at the links below.

Predictors of Permanent Housing for Homeless Families in King, Snohomish, & Pierce County:
http://uwescience.github.io/DSSG2015-predicting-permanent-housing/

Assessing Community Well-Being Through Open Data and Social Media:
http://uwescience.github.io/DSSG2015-wellbeing/

Open Sidewalk Graph for Accessible Trip Planning:
http://uwescience.github.io/DSSG_sidewalk/

King County Metro Paratransit:
http://dssg-paratransit.github.io/main_repo/

 

Overview: The NSF-sponsored Graduate Data Science Workshop will bring together 100 graduate students from diverse domain sciences and engineering with Data Scientists from industry and academia to discuss and collaborate on Big Data / Data Science challenges.

Participation: To participate in the workshop, submit a white paper in PDF format that describes a Big Data / Data Science challenge faced by your scientific or engineering discipline or an idea for a new tool or method addressing Big Data / Data Science problem. White papers will be reviewed using NSF scoring criteria and attendees will be selected based on the strength of their position papers.  If you are selected for attendance, you must bring a poster to present on one of either of the two poster presentation sessions.  The authors of the very highest scoring white papers will be invited to give lightning talks of a few slides during the plenary session to describe their challenges or methods. The white paper submission deadline is June 20th, 2015. Invitees will be notified on July 1st, 2015.

Program: In addition to keynote presentations from high profile speakers, the participants will present posters covering their own research and work collaboratively to begin to solve some of the Grand Challenge problems facing Data Enabled Science & Engineering disciplines. 

Community building: After the workshop, the output from the collaborative teams will be published in an open access environment. Through the shared work at the workshop and beyond, the participants will form lasting, collaborative relationships with their peers and the senior academia partners and industry participants including those from companies like Amazon, Google and Microsoft.

If you are invited to participate, travel support of up to $1,000 will be available which can be used to cover the registration and lodging fees in addition to airfare.  Most meals are included. Workshop registration is $200.  Lodging is $123 (two beds), $195 (single) for two nights.

Data Science Venn Diagram
Data Science Venn Diagram

O'Reilly Publishing has released a preface to Python Data Science Handbook (Early Release) by Jake VanderPlas, eScience's Senior Data Scientist and Director of Research, Physical Sciences.

"What is data science?" VanderPlas writes. "It's a surprisingly hard definition to nail down, especially given how ubiquitous the term has become. Despite its hype-laden veneer, [data science] is perhaps the best label we have for the cross-disciplinary set of skills that are becoming increasingly important in many applications across industry and academia."

And why Python? "[It] has emerged over the last couple decades as a first-class tool for scientific computing tasks, including the analysis and visualization of large datasets."

VanderPlas' book is geared toward technically-minded students, researchers, and developers with a strong background in writing code and using computational and numerical tools, focusing on a broad overlapping data science "mental model" of computational, statistical, and domain expertise known as the Data Science Venn Diagram. The first four sections of Python Data Science Handbook focuses on the computational component of the programming language and the extensive ecosystem of data-focused tools available within it, with the rest of the book a discussion about the fundamental concepts of statistics and mathematics, and their use in analyzing datasets. "The goal," says VanderPlas, "is that by the end readers will be poised to use these Python tools process, describe, model, and draw inferences from the various data they encounter."

VanderPlas encourages readers not to think of data science as a new domain or expertise to learn, but "a new set of skills that you can apply within your current area of expertise. Whether you are reporting election results, forecasting stock returns, optimizing online ad clicks, identifying microorganisms in microscope photos, seeking new classes of astronomical objects, or working with data in any other field, my goal is that the content of this book would give you the ability to ask and answer new questions about your chosen subject area."

You can read the preface to Python Data Science Handbook (Early Release) here:
https://beta.oreilly.com/learning/introduction-to-pandas

 

The First International Workshop on Smart Cities and Urban Analytics (UrbanGIS 2015), in conjunction with ACM SIGSPATIAL 2015, has announced a call for papers ahead of its November 3, 2015, workshop in Seattle, WA.

http://engineering.nyu.edu/urbangis2015/ 

About half of humanity lives in urban environments today and that number will grow to 80% by the middle of this century; North America is already 80% in cities, and will rise to 90% by 2050. Cities are thus the loci of resource consumption, of economic activity, and of innovation; they are the cause of our looming sustainability problems but also where those problems must be solved. Smart cities are leveraging advanced analytics solutions, usually with spatio-temporal data, to support urban management and more informed decision making. Big urban data, if properly acquired, integrated, and analyzed, can take us beyond today's imperfect and often anecdotal understanding of cities to enable better operations, informed planning, and improved policy.

Despite many efforts in tackling challenges of smart cities through big data and spatio(­-temporal) analysis, there is no standard spatio(­-temporal) data infrastructure able to support the wide range of requirements in different problem areas. This workshop will provide a forum for researchers from various domains to present their results and to work together toward developing such an infrastructure. This includes, but not limited to, techniques, policies, and standards required to acquire, process, and use spatio(-temporal) data, particularly in the urban context.

UrbanGIS 2015 is soliciting papers (including significant work-in-progress) that describe academic research efforts as well as applications and prototypes that leverage spatial or spatio­temporal data analysis to address urban challenges. Areas of research include but are not limited to:

  • Application and experimental experiences in smart cities
  • Data indexing techniques for massive spatio-temporal dataset
  • Human mobility modeling and analytics
  • Large­-scale visualization of urban data
  • Machine learning for predictive models
  • Parallel and distributed computing of big urban data
  • Safety, security, and privacy for smart cities
  • Smart buildings, grids, transportation, and utilities
  • Social computing, sensing and IoT for smart cities
  • Streaming/real­time processing of spatio-temporal data
  • Urban informatics

Submissions should be at most 8 pages for full papers and at most 4 pages for short papers or work-in-progress, formatted according to ACM formatting guidelines. Papers will be evaluated by the program committee members for the significance and relevance of their research contributions, as well as their presentation. Short papers are expected to be work in progress or of smaller scale but the same evaluation criteria will be applied as for full papers.

Important Dates:
Paper Submission: September 1, 2015 (midnight PT)
Notification of Acceptance: September 19, 2015
Workshop date: November 3, 2015
Paper Submission Site: https://easychair.org/conferences/?conf=urbangis2015

Organizers:
Huy T. Vo, New York University
Juliana Freire, New York University
Claudio T. Silva, New York University

Program Committee:
Charlie Catlett, Argonne National Lab & University of Chicago
Alex Chohlas-Wood, New York Police Department
Theo Damoulas, University of Warwick
Bill Howe, University of Washington
James T. Klosowski, AT&T Labs - Research
Ming Li, University of Nevada - Reno
David Maier, Portland State University
Carlos Scheidegger, University of Arizona
Manuela Veloso, Carnegie Mellon University
Lucien Wilson, KPF & Columbia University
Jianting Zhang, City University of New York