According to the current condition, Geophysical Engineering graduate must facing with more difficult situation seeking for Job Opportunity. At this point, i would like to ask every graduate to open their mind and understand the market condition. I would like to present the facts of Job Distribution condition based on the data from professional career information public website, LinkedIn.
Collecting data through automation codes using Selenium and Python from public website. And for data visualization, we used Power-BI: Navigator chart and Tree map diagram to present data set result.
At year 2020, we lived at era where skills and knowledge would determine and increasing our chance to join with various kind of Jobs opportunity. We should keep learning and catching up with the changes around us, adapt, and survive with the rapid movement of the world.
“…. Studying was not about getting skills and expertise, it was about creating and building people’s mind about how to thinks right and systematically to solve the problem and create a valuable impacts to the society …” ~Mario Teguh ~
How it Works?
Selenium is a portable framework for testing web applications. Selenium provides a playback tool for authoring functional tests without the need to learn a test scripting language (Selenium IDE) (Wikipedia). In easy way, Selenium is able to run the web application (Chrome, Mozilla, etc.) for specific website link automatically. It was my first time to use this method, i tried to follow the internet resource instruction to use Python + Selenium to gathering and collecting data automatically from Public Website LinkedIn.
The concept of Idea collecting data or information using Python and Selenium is about scrapping the HTML data from the webpage that appear from our website destination. Each of webpage that appear in front of our monitor are containing with combination of HTML and CSS codes. At some lines, we could generate the information what we are looking for from the webpage using Scrapping module of python.
At this work, i tried to collect several main information from the LinkedIn webpage:
- Current job position
Image below are kind of framework i used for this project.
The Processing Code
It began with collecting the URL list of object using automation from Selenium and Python based on Google search with specific Keywords. We used two kind of Keywords, “Teknik Geofisika” and “Institut Teknologi Bandung”.
There are some kind of limitation when using this method, we found that Google maximum pages would search only for 30 pages and containing different result where the machine work for one keyword only at near last pages. At this point, we had successfully collecting 200 result of data information. For better result, the condition should repeat several times with different keywords, following with better cleansing data result at the end.
After successfully design and running the code, we would gathering data and information below:
For producing better result, we must understand for cleaning and dealing with certain condition of data that maybe not in a good shape. We should deal with manual data searching information or adding external information to create better result data set.
At this point, my data would not following with Validation data step. It means that the information that i presented at this stage maybe will not entirely correct. Any mistake or correction should be delivered to the writer for further improvement.
After successfully producing better shape of data set, we could continue presenting our data using Power-BI module. At this points, we used Network Navigator chart and Tree map chart.
According to the result, majority of Geophysical Engineering graduate jobs distribution still on Oil&Gas industry. There are still various number of student or graduate with no employee information. Its following with Education, Financial, Information and Technology, Consulting and Mining sectors. These kinds of job sector have increasing number through the time at 20th century.
1. For better data set result, the code automation should repeat several times using different key words that related with the topics. It would impacted the duplication result and should followed with better data cleansing and validation procedure. Setting up chrome or google result for more than 30 pages.
2. Data cleansing and validation step still using semi-auto procedure, for better result it should be automated or supported with additional information.
This publication is produced for educational or information only, if there are any mistake in data, judgement, methodology or false grammar/words that i used on this publication. The Author will follow up to update this information at the future.
* Please consider to contact the writer using contact information at Profile. I would like to discuss and sharing more about the topic. Thank you.