Flood Analytics Information System (FAIS): A National Scale Big Data Engineering and Gathering Pipeline To Improve Flood Situational Awareness

CZNet Coordinating HubCZNet Coordinating Hub

Posted: May 20, 2022

Flood Analytics Information System (FAIS): A National Scale Big Data Engineering and Gathering Pipeline To Improve Flood Situational Awareness

A Hydroinformatics Blog Post
Organized by the CUAHSI Informatics Standing Committee. Contributions are welcome, please contact Veronica Sosa Gonzalez at vgonzalez@cuahsi.org.

By: Vidya Samadi– Clemson University, USA.

Background and/or Rationale for the Work

Floods are among the most destructive natural hazards that affect millions of people across the world leading to a severe loss of life and damage to properties, critical infrastructure, and the environment. During flooding events, citizens around the world increasingly act as human sensors and collect and share millions of flood data, images, and videos on social media to report flood magnitude, damage, and impacts. Multimedia images, videos, geotagged texts posted over social media platforms such as Facebook, Twitter, YouTube, Flickr, and other online forums can provide valuable real-time information about flood situations. By using the content and user metadata from volunteered geographic information shared online, we can identify potential at-risk neighborhoods around the inundation areas that have been flooded. In addition, real-time surveillance cameras have been installed by several government agencies such as the US Geological Survey (USGS) and the Department of Transportation (DOT) across numerous river and road networks to meet the need for timely assessment of road and river flooding situations. These real-time videos/images can be used to track increasing flood levels during a storm and continuously monitor the potential impacts of flooding on nearby locations.

Accessibility to voluntarily generated and often publicly published content on social media provides a strong draw for disaster-related research. However, providing geographically targeted early flood warnings in time is hampered by a lack of real-time information and appropriate tools for stakeholders and residents to take proactive actions for themselves, their property, and their community. Developing a pipeline to gather the data and identify tweets relevant to flooding proved to be useful to assess real-time flooding impacts and damage in Sao Paulo- Brazil (de Assis et al, 2016), Jakarta-Indonesia (Eilander et al, 2016), the River Elbe-Germany (Herfort et al, 2014), and across Great Britain (Barker and Macleod, 2018). As needs for flooding impacts assessment and insights increase, stakeholders are facing fragmented data environments and warehouses with multiple technologies—often on multiple web services. There is a need to automate big data and crowd-sourced information collection in real-time and create a map-based dashboard to better determine at-risk locations and flood situations. Indeed, with the new advancement in technologies, there is an opportunity to gather and combine social media data with ground-based observations and imagery and translate this information into a web-based application to monitor and assess flooding hazards and communicate this information with citizens in real-time. The aim of this study is to discuss the functionality and workflow of the Flood Analytics Information System (FAIS) as a national-scale flood big data gathering pipeline.

Method and Infrastructure

Multiple Python algorithms were developed and integrated within the FAIS application. Specifically, FAIS workflow includes the use of multiple Internet of Things-Application Programming Interfaces (IoT-APIs) and various machine learning approaches for transmitting, processing, and loading big data through which the application gathers information and data from various web servers and replicates them to a data warehouse (IBM cloud service). Users are allowed to directly stream and download the US Geological Survey (USGS) and Department of Transportation (DOT) river and road flooding images, and save them on a local or cloud storage. The outcomes of the river measurement, imagery, and tabular data are displayed on a web-based remote dashboard and the information can be plotted in real-time.

In addition, FAIS uses natural language processing (NLP) to extract, cleanse, filter, and group flood-related tweets including tweet geolocation information, related images, etc. To do so, a Twitter Streaming bot (functions on both iOS and Mac) was developed and deployed at the Heroku cloud platform outside of the application access which is controlled by the Heroku User Interface. Heroku is a cloud platform as a service (PaaS) that enables system-level supervision and coordination of Twitter APIs, crowd-sourced data, and tweets. FAIS Twitter bot automates tweet gathering and continuously cleans and monitors all Twitter activities. During real-time flooding events, the bot gets notifications when new content, such as tweets that match certain criteria (keywords) is created. Overall, eight keywords including “Flood Damage”, “Road Closure”, “Emergency Management and Response”, “Flooded Neighborhood”, “Infrastructure Damage”, “Evacuation Route”, “Shelter and Rescue”, and “Storm Surge” are programmed within the FAIS Twitter NLP approach.

Using geotagged tweets FAIS is able to analyze at-risk locations of flooding. The application first cycles through a set of USGS web addresses for river gauge height readings, parsing these flat files using the Python web scraping technique and obtaining all the latest river levels. Each river level reading is compared with its respective long-term cached average level, to identify the highest relative river levels in real-time. The highest river level then intersects with watershed polygons as well as geotagged tweets to identify at-risk locations to flooding. The list of prioritized areas can be updated every 15 minutes (depending on the USGS flood monitoring data timescale) as the crowdsourced data and environmental information and conditions change. Geotagged tweet coordinates are considered as a center-point for approximately 16 km wide square boxes (Donratanapat et al., 2020). This size is arbitrarily chosen to cover the areas nearby to each gauge. The retrieved tweets are constantly stored in the IBM database system during operational use which provides ideal open-source data for post-flood studies.

In addition, FAIS uses various Convolutional Neural Networks (CNNs) such as YOLOv3 (You look only once version 3), Fast R-CNN (Region-based CNN), Mask R-CNN, SSD MobileNet (Single Shot MultiBox Detector MobileNet), and EfficientDet (Efficient Object Detection) to perform both flood image object detection and segmentation simultaneously (see Pally and Samadi, 2022). Flood frequency analysis (FFA) is another functionality that is embedded within the FAIS application. The pipeline estimates flood quantiles including the associated uncertainties that combine the elements of observational analysis, stochastic probability distribution, and design return periods. FFA techniques predict how flow values corresponding to specific return periods or probabilities along a river could change over different design periods. FAIS currently uses multiple probability distributions such as Normal, Lognormal, Gamma, Gumbel, Pearson Type III, Weibull, and Loglogistic distributions to compute FFA for any given flood gauging station in the US.

Results and Conclusion

FAIS is developed as a national-scale big data gathering and engineering pipeline. The application includes many tools and functionalities including USGS data gathering approaches, IoT-APIS, NLP and crowd intelligence mechanisms, CNNs, and FFA. FAIS help define the geospatial footprint of flood events using georeferenced tweets. FAIS proved to be proficient and user-friendly for real-time flood data gathering and assessment as it was tested during a 2-day Hurricane Dorian flooding event (2019) across the Carolinas where > 15,000 geotagged tweets were collected to identify 38 dynamic and at-risk areas to flooding. During a real-time event, the time between a tweet appearing online and visually plots for a user as being potentially relevant (in terms of location and content) would be in the order of a few seconds to minutes, thereby this rapid analysis can provide an early information channel for flood situational assessment. Please note that FAIS does not deal with flood modeling and forecasting, it rather enables flood data gathering and analytics across any location in the US.

Additional Resources

FAIS application is publicly available at Clemson-IBM cloud service. FAIS python package is freely available at Git and PIP.


This research is funded by the U.S. National Science Foundation (NSF) Directorate for Engineering under grant CBET 1901646. Any opinions, findings, and discussions expressed in this blog post are those of the author and do not necessarily reflect the views of the NSF. The author acknowledges IBM company for providing free credits to deploy and sustain the FAIS application. The author also acknowledges the contribution of the Clemson Hydrosystem and Hydroinformatics Research (HHR) group to the FAIS development.


The author and Clemson University assume no responsibility for errors or omissions in the contents of FAIS services.

About the author: Vidya Samadi is an Assistant Professor at Clemson University.


  1. Barker J.L.P., Macleod C.J.A. 2019. " Development of a national-scale real-time Twitter data mining pipeline for social geodata on the potential impacts of flooding on communities ", Environmental modelling & software, v.115, pp. 213-227.

  2. De Assis L.F.F.G., De Albuquerque J.P. , Herfort B. , Steiger E., Horita, F.E.A. 2016. Geographical prioritization of social network messages in near real-time using sensor data streams: an application to floods Brazilian Journal of Cartography. 68. 1231-1240.

  3. Donratanapat, N., Samadi S., Vidal, M.J., S. Sadeghi Tabas. 2020. A National-scale Big Data Prototype for Real-time Flood Emergency Response and Management. Environmental Modelling & Software. DOI: 10.1016/j.envsoft.2020.104828.

  4. Eilander D., Trambauer P., Wagemaker J., van Loenen A. 2016. Harvesting social media for generation of near real-time flood maps Procedia Eng., 154, pp. 176-183.

  5. Herfort B., Schelhorn S.J., Albuquerque J.P.D., Zipf May A. 2014. Does the spatiotemporal distribution of tweets match the spatiotemporal distribution of flood phenomena? A study about the River Elbe Flood in June 2013 Proceedings of the ISCRAM 2014 Conference–Pennsylvania, USA.

  6. Pally, R., Samadi S. 2022. Application of Image Processing and Convolutional Neural Networks for Flood Image Classification and Semantic Segmentation. Environmental Modelling & Software. Doi: https://doi.org/10.1016/j.envsoft.2021.105285.