The Educator's PLN

The personal learning network for educators

Before using Big Data, you need to extract Web Data

I want to share an interesting article about data scaping that you might need in your business. The article below is mainly reprinted from here

Nowadays, big data is not new to us. Some of us use big data almost everyday, but how to extract web data that is high-volume in a short time? we will talk something about it.

 

Advances in data gathering, computing power and connectivity mean that we have more information than ever before at our fingertips. IBM estimates that by 2020 there will be 300 times more information in the world than there was in 2005.”  – John Hsu, Guardian Journalist

 

Large volume of data will stay in WEB and APP. So we can say, web data capture is part of big data architecture and offers the basic data source for big data architecture.

 

When we want to make text corpus, we need artificial intelligence to fetch data needed.

 

When we do some consumer behavior analysis, we need to collect comments on social media platforms.

 

When we make marketing pricing strategies, we need to track the prices and collect the data.

 

When we want to win at betting, we need to do extract enough gambling historical data to do analysis.

 

To accomplish these things above, we need hundreds of thousands of data. But most of the data on the Internet is unstructured data, and it sounds quite troublesome to extract such kind of data. In this case, you need someone who is good at writing a web crawler, developer for example, to create such a crawler for you to extract web data you need. Besides, you need to test the code after you finish writing before you spend most of your time and energy to collecting data, for a whole day with some cups of tasteful coffee. Don't you think that it's boring?

 

We can go online and ask for help. Google web data extractor and you will find many useful tools available for you to meet your different needs. And you have to pay for the service or purchase their packages. Maybe you are familiar with import.io, mozenda or other tools, but now, at this moment, it’s time for you to experience Octoparse, a totally free yet powerful program. It would only charge for a small fee when you need a lot of cloud servers to help you gather information, and it provide adequate support for users. I love this software since it can extract what I want from web pages and want to recommend it to you if you need to capture high-volume web data.

Views: 166

Comment

You need to be a member of The Educator's PLN to add comments!

Join The Educator's PLN

About

Thomas Whitby created this Ning Network.

Latest Activity

Alexander Loew updated their profile
Jun 17
Shawn Mitchell replied to Janet Wilkins's discussion Essay Writing Structure!
"Essay writing is considered to be one of the most important things when it comes to writing skills, and so many students struggle with it. It’s hard for them to understand the basic essay writing structure and to help them with that, the…"
May 28
Shawn Mitchell replied to Tata Nech's discussion Hey People
"i hope you are good in this time. and you can learn new things about tech & e-learning and too many things. And i want to read more blogs about it."
May 7
Shawn Mitchell replied to George Danke's discussion Integration with The Latest Technology in Education Field
"Condition is worse due to Covid we cannot go anywhere and even universities and research labs are closed. Hope the situation will be normal soon. Then we can think about something else"
Apr 29
Tata Nech posted a blog post

More Than a Sport: A Look at Scootering’s Evolution

Too many people do not have a clear idea about what is a Scooter; and much less than there is a sport based on it. Besides, it is a sport that has evolved a lot from its beginning, but if you do not know too much about the sport is hard to see it as more than a kid game.However, this sport is more than just kids and young people making scooter tricks in a square; scootering is a kind of culture and a lifestyle that has more than 20 years growing to become what it is today.For this reason, we…See More
Apr 24
Shawn Mitchell replied to Rob Schoonveld's discussion Twitter vs Edupln
"My preference is this website is best. Twitter is used for social purpose and EDUPLN is used for professional discussion which you cannot discuss on social platforms."
Apr 14
Shawn Mitchell replied to Shanshan Ma's discussion Personalized learning network and social media
"Hi, There are too many platforms of social media and the member of those platform discuss so many things and participate. So it depends only on you where do you want to discuss."
Apr 14
Shawn Mitchell replied to Nanacy Lin's discussion Future Of Education
"Now a days AI is best technology to learn because it is the future."
Apr 12

Events

© 2021   Created by Thomas Whitby.   Powered by

Badges  |  Report an Issue  |  Terms of Service