- #OCTOPARSE NOT CONNECTING HOW TO#
- #OCTOPARSE NOT CONNECTING UPGRADE#
- #OCTOPARSE NOT CONNECTING WINDOWS#
as if it wouldn't be any ajax routines on the pages. I gave a try to some scraping tools, and my final choice was made to Octoparse.Ījax is handled as easy as a basic html url. So, I had to find a way to still be able to extract my needed data, without having to pass an engineer degree in information technology.
and the dynamic pages that don't load at first sight, that wait for you to click on a button, that just show as you scroll down, that exchange static pictures urls with javascipt dynamically shown pictures. Then came for me (and I must admit, my limited skills) THE hammer : AJAX ! Yes, html + Javascipt + css + dom. In fact, websites regularly change minor things on their pages, and in the best case, you wouldn't get anymore some or all of the awaited data, in the worse case, absolutely inaccurate data. Years after years, it sounded clear that my extracting routines running on my server were more and more difficult to maintain in a good working shape. I have been crawling and parsing websites for a while, with use of php and cUrl. I wish I had discovered this jewel years ago. If you do any marketing and wish to gather data for stats or just create your database from any website, super easy to do, recommend it. And now devs are asking me for stats on scraped data, not the other way around.
#OCTOPARSE NOT CONNECTING HOW TO#
Did not have to ask our devs to write a scraper, the time I spent creating the scraper would be the same amount of time I would spend discussing with our devs how to scrape the content. It takes about a second to open a page, so roughly you can scrape one page per second per task.
#OCTOPARSE NOT CONNECTING UPGRADE#
You can have 2 active tasks running at the same time for free, if you want more, you can upgrade to a paid version. On the other hand, this one is also a Pro, because you can create tasks on your computer and load them up on your server just by restarting the app. Did not say anywhere that it was saving the tasks to their servers, so that's why probably has trouble with large tasks. Had a hard time adding a list of 50000 links into the queue, but not a problem because you can have multiple tasks 30-40K links in my case, just divide links between those tasks. If one suddenly would to shut down because of some error, other Octoparse tasks would still continue to work as nothing has happened. Configuration and scraper apps run in different programs. You can also back up your scraped data to Octoparse as a backup, will be saved with your task. You can export to Excel, directly to SQL, MYSQ or Oracle database, CSV, TXT or HTML file.
During scraping opens the pages in a real browser, so Javascript, AJAX websites would work as well. Don't know how, but this was the only scraper that could analyze and grab a specific text on the page without setting any rules, the other scrapers I've tried had a hard time and had to make complicated rules. That's it, no need to select specific HTML divs or write regex code. GUI was simple to understand, can dump a list of links that need to be scraped, select content on the page that needs to go into Excel spreadsheet and click start. No nodejs learning or programming needed.
#OCTOPARSE NOT CONNECTING WINDOWS#
Installs on Windows, so I could use spare Windows Server for scraping. At the end stopped on Octoparse for couple reasons. I would rather have those tasks done on Python.It took me about a day to look into all available web scrapers. It is important for me to understand the process of this connection in order to personalize it in the future. The API connection should be made in a way that can be easily edited and reuse it in the future with new tasks and spreadsheets. Octoparse API provides data extraction on schedules, once per day the data from many different tasks should be transferred to google sheets, updating the same with the new data.