The frustrations in gathering open data.

gyiernahfufieland
5 min readMar 19, 2021
Photo by Mukuko Studio on Unsplash

It was a rocky journey.

I have been working on a project regarding effects of number of flood events, population density and meteorological data on Number of Dengue incidents in Malaysia. Obviously, to do so, I have been finding ways to gather the necessary data. Since it’s my first ‘serious’ project, it opened my eyes on how difficult it is to gather the necessary data in Malaysia.

Meteorological data

There are 2 types of meteorological data that I have to gather. Historical meteorological data for all states (last 5 years) and forecasted meteorological data in 2021. The data is expected to contain information such as rainfall amount, temperature and wind speed. My first approach was to search within data.gov.my. Sadly the site has no relevant information. Hence my second step was to approach Department of Statistics, Malaysia (DOSM). I submitted a data request and got my reply about 1 month later. I was informed to approach Malaysia’s Meteorological Department (MET). My experience with MET was by far the best, in terms of response. The person in charge whom I was in contact with has been responding to my requests promptly. I usually get my replies within 2–3 business days which was great. However, I was charged at a range of RM 4,000 — RM 6,000 for the data I required. Which … I obviously can’t afford.

On a side note, I stumbled across this really nice site from our neighbouring country, Singapore, where you can download historical daily meteorological data for FREE (sigh).

Photo by Tom Pumford on Unsplash

I tried googling up and down for the data I required and I found this site from NASA POWER. It’s a great site that you can easily download the meteorological data you need based on your input location, however the data seems ‘off’ for me. The temperature for Kuala Lumpur goes between 23°C — 28°C. Growing up in Kuala Lumpur I am aware that the temperature should be between 24°C — 32°C. Hence I decided to keep NASA POWER as my back up plan and continued with my search for the data required.

There’s a couple of sites out there providing daily weather forecast such as weather.my, Accuweather and The Weather Channel. I decided to try my luck on them. Fortunately, the Accuweather Premium and Professional accounts provides historical data at any location desired. You will have to subscribe to either one service since it is offered as a ‘Premium service’. One disadvantage thou is that they do not allow you to download the data into files of any format. The monthly data is given in a table format online, and you will have to find your own ways to download the data. I used a web scraping tool to assist with the task in which I would talk about it in another post. Accuweather also provides forecasted meteorological data without the need of subscribing to their premium services.

Number of Dengue incidents

Similar to the previous data, I have approached both DOSM and data.gov.my. Fortunately, I managed to gather the data from data.gov.my here. The data however, was last updated in 2018. Which means that I am missing the data for 2019 and 2020. I wrote to the Ministry of Health at the advise of DOSM, and was told that any form of data to be shared will require approval from the Director of Health. Our Director of Health who is obviously very occupied with the current COVID-19 situation. I’m not sure if this is a standard practice but I had to follow the procedure and wrote to his office. I didn’t get a reply thou. But I managed to gather the data on number of Dengue cases from MOH’s media reports. It was sort of a hassle since I had to download the PDFs separately and extract the data but hey, as long as I get those data right? The data thou is not consistent. You would find the media reports for some weeks to be missing.

Photo by Scott Evans on Unsplash

Population Density

I managed to get this data from DOSM at the price of RM 6x. However, they have only provided data from 2017–2020 ( I requested data for 2015–2020). At the time of writing this post I am in communication with them. I also found that the provided data has discrepancies with some other sources I found online. Hence I wrote to them for clarification. They do respond, but totally irrelevant to what I have asked. For example, I have asked why is the data not available and where could I get the data for 2015–2016 since it is not available on their end, and they responded that the data is not available. Literally. And they did not clarify on the discrepancies in which I am still waiting for their response after following up for multiple times.

I decided to venture into some back up plans in the event if they did not respond on time. I found this at the DOSM portal. Although they do not update it very frequently, they have just the right amount of data I required. I also found this other link that contains 2015 population data.

Number of Flood events

I have the worst experience in finding the data for number of flood events. The dataset is available in data.gov.my but only for the range between 2000–2010. I have yet to hear from DOSM on my request for this dataset. But I have no idea who to reach out to for this. According to data.gov.my, the previous dataset was updated by Department of Irrigation and Drainage. I decided to first approach them under the email pro@water.gov.my. I did not receive any reply at all even after a few follow ups. I then reached out to ppb@water.gov.my and I received a reply that I should reach out to the corporate division under ppkor@water.gov.my. Similar to the first recipient, I received no reply even after 1 month of following ups. I contacted ppb@water.gov.my again and hopefully they can assist me to follow up on the request again.

Any kind soul that can point me to the right direction? Tried searching for this and I couldn’t find any alternatives.

Photo by Dayne Topkin on Unsplash

Anyway, that concludes my rocky journey of gathering open data. I wish we could have a better system in data sharing for future. Perhaps a one source of truth rather than having to knock the doors of different department. The data.gov.my portal was meant for open data sharing but not all department/ state office is consistently updating their data.

Have you had the same experience as above? How did you overcome them? Share with me !

--

--

gyiernahfufieland

从我的视野分享我爱的一切。Hey, how are you today?