Workshop on Open Data in the IFIP eGOV Conference

10th International IFIP eGOV Conference 
The ENGAGE Workshop on Open Data 
Monday 29th August, 09:00 - 13:00
Delft, The Netherlands

Open Governmental Data - From Governments to Science and Society: The ENGAGE project 

More than 30 high level eGovernance, ICT and Policy Experts participated in the ENGAGE Open Data Workshop in Delft, the Netherlands. Open data has been recognized as a strategic tool for Governments all over the world, in their efforts to increase citizen trust, engagement and collaborative action. Despite its significance and the political support at EU level, many challenges remain open for member states, in their effort to provide on-line services for the discovery and use of public sector datasets, especially towards scientists of non-ICT domains.

The workshop focused on discussing the scientific base of ICT-enabled governance for open data, thus harnessing open data sources and methodologies for annotating, visualising and making open data available to scientists and citizens. Participants will present innovative approaches and provide their view on open data, including value-adding examples and best practices. 

The ENGAGE project
The main goal of the ENGAGE project (ENGAGE, 2011) is the deployment and use of an advanced service infrastructure, incorporating distributed and diverse public sector information resources as well as data curation, semantic annotation and visualisation tools, capable of supporting scientific collaboration and governance-related research from multi-disciplinary scientific communities, while also empowering the deployment of open governmental data towards citizens.

The main topics of the workshop were structured around the state of the art, the visionary scenarios, the research gaps and the future research challenges in the area of ICT for governance and public sector’s open data. The following presentations inspired constructive dialogue:
  • Yannis Charalabidis, University of Aegean & Marijn Janssen, Delft University of Technology: “Open Data Sources, Resources, Annotation Mechanisms and On-line Services Outlook from the ENGAGE project” 
  • Timo Wandhoefer & Mark Thamm, GESIS - Leibniz Institute for the Social Sciences: “Public Politician Profiles on Facebook and the gap of Authenticity: WeGov interview results with the German Bundestag” 
  • Sicco Verwer, Susan van den Braak & Sunil Choenni, Research and Documentation Centre (WODC), Dutch Ministry of Security and Justice: “Sharing data using multiple imputation” 
  • Nils Barnickel, Matthias Flügge, Jens Klessmann, Fraunhofer FOKUS:"Provision of public sector information at the local level. Practical experiences from the State of Berlin” 
The second half of the workshop was dedicated to gahering ideas from the audience on open data sources and curation.  The following usage scenaria were deliberated with the workshop audience:

ENGAGE Usage Scenario
Storing or linking (making accessible), Annotating and Visualising a PS data set
Public Servant
Getting the ENGAGE metadata specifications (for applying them in my systems)
Public Servant
Getting useful information (through browsing datasets or visualisations)
Citizen, Public Servant
Getting data for my research work
Linking my system with ENGAGE, for uploading data
Public Servant
Linking my system with ENGAGE, for downloading data
Storing data in draft from (to be further curated)
Put research-oriented data and annotate them
Put my needs for Public Sector data
Citizen / Researcher

Finally more than 15 key challenges and research objectives for Open Data and Linked Data were collected and prioritised.

A view from the audience at the ENGAGE Open Data Workshop 

Workshop Chairs
Yannis Charalabidis, University of Aegean, Greece, yannisx@aegean.gr
Sunil Choenni, Ministry of Justice, the Netherlands, r.choenni@minjus.nl
Marijn Janssen, Delft University of Technology, The Netherlands, m.f.w.h.a.janssen@tudelft.nl

What if Google search was running on humans ?

When discussing process automation with public sector officials and practitioners, I always stress that importance of solving semantic interoperability issues: this will allow services to be executed at machine time (milli-seconds) than at human time (hours or even days).  This way, your new building permit might be issued (or rejected) within a couple of seconds.  

But sometimes the message does not go through: people tend to take everything achieved as granted - not realising the difference that technology brought, in some cases. So, I had to devise this simple benchmark: 

What if Google search was running on humans ? 
Let's tackle this small problem in five steps:

1. How many sites do exist ?
According to the Netcraft January 2011 Web Server Survey, there are globally almost 300 million hostnames and almost 100 million active web sites.

2. How many web pages exist ?
According to a comparison between Yahoo indexed web pages and Netcraft reports in 2005, there were globally around 270 pages per web site (active or not). That index would give a total of 270 X 300 mio = 81 billion web pages. According to a report by Google in late 2008, there were almost 1 trillion web pages indexed by Google at that time, including a big percentage of duplicates or automaticly generated pages, that could amount to even 90%, yielding less than 100 billion web pages.  According to http://www.worldwidewebsize.com/ a search algorithm returns approximately 50 billion web pages, indexed by Google, Bing and Yahoo.  As the above three three estimations are of the same "order", we will adopt the smallest number: appr. 50 billion web pages exist. 

3. How many person hours would be needed for one (manual) search ?
If we suppose that we could have an infrastructure able to distribute web pages to humans, in order to search for a specific word, we estimate 10 seconds to judge if a specific word is contained in each page.  Not much you might say (try to locate a specific word in 10 pages and you will see the issue).  However, we need 500 billion person - seconds to complete one search over the total 50 billion pages.  So, if you need the answer within 10 seconds (the best this system can do) you still need 50 billion humans to complete one search in 10 seconds.  And this does not even have ranking ...

4. So ?
According to a 2010 report by Search Engine Land, Google caters for 34,000 searches per second (or appr. 3 billion searches per day). So, every 10 seconds, we need to cater for almost 340,000 searches, yielding a total number of needed humans to 50 bil X 340,000 = 17,000,000,000,000,000.  As this number is quite big, we divide by the population of earth (6,9 billion people), and we reach a conclusion:

If Google Search ran on human power, we would need almost 2,5 million times the global earth population (or 17 quadrillion people) to reach an average response time of 10 seconds.        

Think again before saying that "we do not need machines" in the public sector ...