Using Power Query M Language for Scraping Any Website Data

Author by Concurrency Blog

Power Query operates at the back end using a special language called as M language. Website scraping can be very easy using the Power Query M language. Once you understand “M” language, working with Power Query will be much more fruitful and you can extract data from a website within minutes.

I was wondering how to extract the weather data for the entire 2012 year for Phoenix city in Arizona state. I found a very nice website www.wunderground.com which provides detailed weather data. The challenge here is that the website provides the data month-wise in HTML. So, I can see only a month’s data and not a full year data. I have therefore written a scraping program to pull the data for a month from HTML page into excel using Power Query. I then edited the parameters in the code which are highlighted in yellow for pulling the data for the next month. Following is the code:

power query code

The above code helps me to pull the data for the month of January 2012 only. I am going to reuse this code to pull the data for the entire 2012 by modifying the words highlighted in yellow color.

As you can see, the function Web.Page helps to pull the data from the website www.wunderground.com. Now, if you edit the year highlighted in yellow from 2012 to 2013 in source=web.page section, Power Query will extract the weather data for 2013. Isn’t that cool?!!! Also, you can study other functions as given below:

Table.transformcolumntypes, Table.RemoveColumns, Table.Add, table.RenameColumns Table.CombineColumns which are all self-explanatory. For an example, Table.RenameColumn function can be used to rename a column in the dataset. You can use these functions in future to easily manipulate the data using Power Query. I edited the paramaters highlighted in yellow to scrape the website and get the entire year’s weather data.  The process of website scraping is automated by making minor changes to the code. Hope this gives you an idea to further explore the Power Query M language and automate your scripts to extract the data. This is not only applicable to extract website data. Power Query M language can also help you write codes quicker and automate your routine processes too.

Author

Concurrency Blog

The latest about Concurrency