The census is broken. Can AI fix it?

Greg Yetman is co-director of the Center for International Earth Science Information Network (CIESIN), part of Columbia University’s Climate School. Under a contract with NASA, CIESIN has been exploring since the early 1990s ways to provide socio-economic data by observing the Earth. Yetman says things like understanding that it’s common for people to live in basement apartments in New York’s Queens borough, for example, are “always hard to capture and really hard to measure from space. “. Apartment conversions, sublets by owner or occupier, or unregistered establishments – all likely to increase as the cost of living rises – are not often captured by the census or satellites not more. And if a person is homeless or has few financial records, they may not show up in location-sharing data collected by private brokers.

There is room for improvement in the census in the United States, but the Constitution requires that a census be taken every ten years, and Yetman says the country is “rich in data.” By comparison, some countries have not conducted detailed household surveys for decades. Barriers such as cost, conflict, or the difficulty of reaching remote locations can make some communities harder to count.

In 2017, the Nigerian government, CIESIN, and others working with funds from the Bill & Melinda Gates Foundation used satellite imagery and machine learning to map the country’s population for measles vaccination. Since then, says Vince Seaman, senior program manager at the Gates Foundation, the effort has expanded to five more African countries, a project known as Grid3. This work, he adds, demonstrates that technology is only part of the solution. After applying machine learning to photos taken by satellite, community surveys were conducted to reach thousands of people in person and verify the results.

In a study published last month, satellite imagery and machine learning were used to automatically identify housing plots and predict population, age and gender in five provinces in the western half of the Democratic Republic. of the Congo (DRC). The project brought together Grid3 participants like the University of Southampton in the UK with groups like the DRC National Bureau of Statistics. Anonymous surveys of nearly 80,000 people were conducted by the Kinshasa School of Public Health and the University of California, Los Angeles School of Public Health to validate the performance of a learning model in depth which has achieved an accuracy of approximately 80%. The co-authors say their method is no substitute for a true attempt at enumerating the entire population, but it can provide a predictive snapshot of society in places where data is sparse or of poor quality. No national census has taken place in the DRC since 1984.

Yetman has spent over 20 years working with satellite imagery. He works with Pop Grid, a data collaboration for a diverse group of organizations that count populations, including the European Commission, Facebook, the German Aerospace Center, and NASA. He says deep learning models for identifying buildings can’t always tell where one roof ends and another begins, and he cautions that there isn’t a model that works everywhere in the world.

In the United States, he explains, applying an AI model trained using images of rooftops in the western United States is problematic if applied to homes in the east coast, as the country’s westward expansion follows a grid-based system, while cities like Boston have grown with less uniformity. Similarly, a roof in South Africa is different from one in Zambia. AI can easily confuse the roof of a stall in a commercial market in Accra, Ghana with the roof of an unregistered house or struggle to accurately predict the number of people in urban settlements or villages rural. “Without the field survey that indicates there’s a slum or informal settlement here, it’s really hard to tell just from the structure of the roof models,” says Yetman. He adds that getting high-quality data to train models to detect buildings or residential plots based on local conditions is the hardest part of the job.

Leave a Comment