Welcome to the ICM Forum. If you have an account but have trouble logging in, or have other questions, see THIS THREAD.
NOTE: Board emails should be working again. Information on forum upgrade and style issues.
Podcast: Talking Images (Episode 22 released November 17th * EXCLUSIVE * We Are Mentioned in a Book!!! Interview with Mary Guillermin on Rapture, JG & More)
Polls: Coming of Age (Results), DtC - Ratings (Results), 1933 (May 12th), 1970s (May 29th)
Challenges: Japan, Mystery/Thriller, Western
Film of the Week: La donna del lago, June nominations (May 28th)

Posting Links on iCM title comments

Post Reply
User avatar
Fergenaprido
Donator
Posts: 5488
Joined: June 3rd, 2014, 6:00 am
Location: Canada
Contact:

#841

Post by Fergenaprido »

Thanks. Comments deleted and user reported.
🧚‍♂️🦫
User avatar
joachimt
Donator
Posts: 33903
Joined: February 16th, 2012, 7:00 am
Location: Netherlands
Contact:

#843

Post by joachimt »

dead youtube-link removed
ICM-profile
Fergenaprido: "I find your OCD to be adorable, J"
User avatar
Torgo
Posts: 2955
Joined: June 30th, 2011, 6:00 am
Location: Germany
Contact:

#844

Post by Torgo »

Contrary to what wasabi is claiming (2 years ago :P ), his link doesn't work anymore:
https://www.icheckmovies.com/movies/chu ... ryori-ten/


Dimitris' link here doesn't open for me in any way:
https://www.icheckmovies.com/movies/garakuta+no+machi/
.. and returns this weird error:
Spoiler
Unable to determine IP address from host name kissanime.ru

The DNS server returned:

No DNS records
This means that the cache was not able to resolve the hostname presented in the URL. Check if the address is correct.
User avatar
joachimt
Donator
Posts: 33903
Joined: February 16th, 2012, 7:00 am
Location: Netherlands
Contact:

#845

Post by joachimt »

done
ICM-profile
Fergenaprido: "I find your OCD to be adorable, J"
User avatar
kongs_speech
Posts: 1492
Joined: April 4th, 2020, 10:32 pm
Location: FL
Contact:

#846

Post by kongs_speech »

🏳️‍⚧️
Quartoxuma wrote: A deeply human, life-affirming disgusting check whore.
Image
User avatar
joachimt
Donator
Posts: 33903
Joined: February 16th, 2012, 7:00 am
Location: Netherlands
Contact:

#847

Post by joachimt »

removed
ICM-profile
Fergenaprido: "I find your OCD to be adorable, J"
User avatar
Torgo
Posts: 2955
Joined: June 30th, 2011, 6:00 am
Location: Germany
Contact:

#848

Post by Torgo »

User avatar
joachimt
Donator
Posts: 33903
Joined: February 16th, 2012, 7:00 am
Location: Netherlands
Contact:

#849

Post by joachimt »

deleted
ICM-profile
Fergenaprido: "I find your OCD to be adorable, J"
User avatar
monk-time
Posts: 1425
Joined: March 23rd, 2015, 6:00 am
Contact:

#850

Post by monk-time »

I've been updating my tool for finding dead video links, and wanted to clarify the criteria for being 'dead', specifically for copyright-restricted videos.

Such a video can have either a list of countries where it is blocked or a list where it is allowed. My guess is that the tool should only report as dead the ones that are allowed only in 10 or less countries or blocked in 50 or more (out of 249). Mods, how does that sound?

Another option would be to report only videos that are blocked everywhere.
User avatar
joachimt
Donator
Posts: 33903
Joined: February 16th, 2012, 7:00 am
Location: Netherlands
Contact:

#851

Post by joachimt »

Your numbers sound reasonable. On the other hand, dealing with links that are blocked partly is a bit annoying. We don't remove those links, so if you run your script you'll find a lot of links that were already dealt with last time. Checking all those takes a time, which I rather spend on other things. So personally I would prefer to just have the links that are dead for everyone.
ICM-profile
Fergenaprido: "I find your OCD to be adorable, J"
User avatar
monk-time
Posts: 1425
Joined: March 23rd, 2015, 6:00 am
Contact:

#852

Post by monk-time »

joachimt wrote: ↑May 1st, 2021, 4:46 pm So personally I would prefer to just have the links that are dead for everyone.
Sure. That makes things a bit simpler on my side too. :thumbsup:
User avatar
joachimt
Donator
Posts: 33903
Joined: February 16th, 2012, 7:00 am
Location: Netherlands
Contact:

#854

Post by joachimt »

Removed
ICM-profile
Fergenaprido: "I find your OCD to be adorable, J"
User avatar
monk-time
Posts: 1425
Joined: March 23rd, 2015, 6:00 am
Contact:

#855

Post by monk-time »

Markdown, CSV (click 'Raw' to download)

1405 dead links in all comments from users in Top-5000s by rank and by total checks, in total from 5989 users. The tool found and checked 11479 video links on youtube, vimeo and dailymotion (didn't find any on googlevideo), so 12.2% of the links were dead.

Btw, everyone is encouraged to scroll through the file and remove their comments themselves to make it easier for the mods.
User avatar
Ebbywebby
Posts: 4080
Joined: September 10th, 2012, 6:00 am
Location: Orange County, CA
Contact:

#856

Post by Ebbywebby »

I removed mine.
User avatar
joachimt
Donator
Posts: 33903
Joined: February 16th, 2012, 7:00 am
Location: Netherlands
Contact:

#857

Post by joachimt »

Thanks, Bobby!
ICM-profile
Fergenaprido: "I find your OCD to be adorable, J"
User avatar
joachimt
Donator
Posts: 33903
Joined: February 16th, 2012, 7:00 am
Location: Netherlands
Contact:

#858

Post by joachimt »

I will message users on iCM asking to remove the comments themselves.
ICM-profile
Fergenaprido: "I find your OCD to be adorable, J"
User avatar
joachimt
Donator
Posts: 33903
Joined: February 16th, 2012, 7:00 am
Location: Netherlands
Contact:

#859

Post by joachimt »

Thanks for the csv, btw. This format works a lot faster than the previous text format.

I've sent PMs to everyone with at least 10 dead links. I'll continue to 5. I'll do the rest myself (unless the user posts here he already did it), because sending message for less than 5 comments hardly saves time.
ICM-profile
Fergenaprido: "I find your OCD to be adorable, J"
User avatar
Harco
Donator
Posts: 649
Joined: May 3rd, 2013, 6:00 am
Location: Groningen
Contact:

#860

Post by Harco »

I deleted my three dead links.
:ICM: | :letbxd:
User avatar
Fergenaprido
Donator
Posts: 5488
Joined: June 3rd, 2014, 6:00 am
Location: Canada
Contact:

#861

Post by Fergenaprido »

Thanks monk-time. I removed mine after Joachim PMed me. I left two up because they've been set to "private" as opposed to being deleted or the account being suspended, as they may become public again. I don't know if your script can differentiate that though.
🧚‍♂️🦫
User avatar
monk-time
Posts: 1425
Joined: March 23rd, 2015, 6:00 am
Contact:

#862

Post by monk-time »

Fergenaprido wrote: ↑May 2nd, 2021, 6:15 pm I left two up because they've been set to "private" as opposed to being deleted or the account being suspended, as they may become public again. I don't know if your script can differentiate that though.
Unfortunately it can't. Youtube Data API just straight up returns no data for private or dead videos. The only thing that I can do is to switch from API to parsing a whole page, but that would be much slower and thornier for such a script-heavy site than using the API.
MMDan
Posts: 221
Joined: January 10th, 2016, 7:00 am
Contact:

#863

Post by MMDan »

All my dead link comments are deleted. Thanks for the list joachimt, made it super easy.
User avatar
joachimt
Donator
Posts: 33903
Joined: February 16th, 2012, 7:00 am
Location: Netherlands
Contact:

#864

Post by joachimt »

MMDan wrote: ↑May 3rd, 2021, 3:29 am All my dead link comments are deleted. Thanks for the list joachimt, made it super easy.
Better thank monk-time for his wonderful script.


Messaging all those people really saves time. A lot have been dealt with already. Down to 902 dead links.
ICM-profile
Fergenaprido: "I find your OCD to be adorable, J"
User avatar
monk-time
Posts: 1425
Joined: March 23rd, 2015, 6:00 am
Contact:

#865

Post by monk-time »

joachimt wrote: ↑May 2nd, 2021, 3:22 pm Thanks for the csv, btw. This format works a lot faster than the previous text format.
The workflow that I had in mind for the text format was to use a browser extension like Snap Links Plus/Linkclump to open multiple links at once with one mouse gesture and then go through tabs with Ctrl+Tab (next tab), Space (scroll down), click to delete a comment, and Ctrl+W (close tab):

Image

You can also drag the selection rectangle a bit further to also grab video links in case you want to double-check them.
User avatar
joachimt
Donator
Posts: 33903
Joined: February 16th, 2012, 7:00 am
Location: Netherlands
Contact:

#866

Post by joachimt »

But to keep track of what I have done, I need to delete lines. So in the previous format I used linkchump to move everything to Excel.
Also, I changed all the urls to the beta, because the buttons to edit and delete comments are only on the beta. Using the new csv in Excel I could easily changed that for everything. After that I added a tab with unique usernames and notes of which I had messages, which are inactive, etc... Also a counter. Having the data in Excel is the most useful in the end.
ICM-profile
Fergenaprido: "I find your OCD to be adorable, J"
User avatar
monk-time
Posts: 1425
Joined: March 23rd, 2015, 6:00 am
Contact:

#867

Post by monk-time »

Oh, I see. I keep underestimating how useful Excel can be for tasks like this. Can you open multiple urls quickly from Excel too?
User avatar
joachimt
Donator
Posts: 33903
Joined: February 16th, 2012, 7:00 am
Location: Netherlands
Contact:

#868

Post by joachimt »

Don't know if you can open multiple urls, but I don't think that will save some time. I make the links clickable, that works fast enough.

I'm done with all the links from users with only one or two dead links. Half of the list is done (including the ones done by the owners of the comments).
ICM-profile
Fergenaprido: "I find your OCD to be adorable, J"
User avatar
joachimt
Donator
Posts: 33903
Joined: February 16th, 2012, 7:00 am
Location: Netherlands
Contact:

#869

Post by joachimt »

I'm done with everyone who didn't receive a PM. Also just did all the links of Armoreska. Down to 515 left. Need a break now.
ICM-profile
Fergenaprido: "I find your OCD to be adorable, J"
User avatar
joachimt
Donator
Posts: 33903
Joined: February 16th, 2012, 7:00 am
Location: Netherlands
Contact:

#870

Post by joachimt »

All is done.
ICM-profile
Fergenaprido: "I find your OCD to be adorable, J"
User avatar
monk-time
Posts: 1425
Joined: March 23rd, 2015, 6:00 am
Contact:

#871

Post by monk-time »

joachimt wrote: ↑May 3rd, 2021, 6:11 pmAll is done.
Wow. Great job! :poshclap:
User avatar
Ebbywebby
Posts: 4080
Joined: September 10th, 2012, 6:00 am
Location: Orange County, CA
Contact:

#872

Post by Ebbywebby »

I am so envious of people who can do Javascript. I feel like I don't understand the process on the most fundamental level. Like, how do you set up a loop to examine every page of a website to produce data like this? Or to loop through your IMDb ratings or ICM checks? Or.... I can't get my head around that. I get loops with numbers or letters, but...pages? Rankings? How?

I'd love to find a great, free online course. And I sure could use this skill for all the "fixing" I do on ICM.
User avatar
pitchorneirda
Posts: 850
Joined: February 11th, 2019, 12:07 pm
Location: France
Contact:

#873

Post by pitchorneirda »

If you get loops with numbers, you get loops with pages...

For i going from 1 to n, check https://www.url.com/page(i)

It will open https://www.url.com/page1 first, then scrape anything you like, then move on to https://www.url.com.page2 etc. until https://www.url.com.pagen
"Art is like a fire, it is born from the very thing it burns" - Jean-Luc Godard
User avatar
monk-time
Posts: 1425
Joined: March 23rd, 2015, 6:00 am
Contact:

#874

Post by monk-time »

Ebbywebby wrote: ↑May 3rd, 2021, 7:34 pm I am so envious of people who can do Javascript. I feel like I don't understand the process on the most fundamental level. Like, how do you set up a loop to examine every page of a website to produce data like this? Or to loop through your IMDb ratings or ICM checks? Or.... I can't get my head around that. I get loops with numbers or letters, but...pages? Rankings? How?

I'd love to find a great, free online course. And I sure could use this skill for all the "fixing" I do on ICM.
The gist is that you first need to find something that has the stuff you want to loop through, like a page with a list of movies, for example. And let's say you want to, I don't know, get all comments for each movie on that list.

First you write the code that fetches that page and extracts a list of urls for all movies on there. For that you usually inspect the page with your browser's dev tools (F12) to find how the page is structured and how HTML elements that contain each movie look like. Then you write a CSS selector that grabs exactly those tags and nothing else. Plug it into your code after you have fetched the page from the web - and boom, you have a list that you can loop over, where every element is an HTML tag. :) And in your loop for each element you'd want to extract the url (again, a matter of looking at HTML and figuring out where the data you need is stored, in this case it's the 'href' attribute of an anchor tag).

Now you have a list of urls, and with it the code that takes a movie list url and returns a list of movie urls. And from this point it's just more of the same: you write another piece of code that takes a movie url and extracts whatever you want, in this case a list of comments. You know how to construct a url that opens movie comments (slap /comments/ to the end of a url), all that is left is to repeat exactly the same steps as above, this time picking different data from a different page.

You might want to add some code to check if there's more than 1 page of comments and fetch and parse those too, but the gist always stays the same - take/construct a url, fetch it, parse it, grab a list of HTML elements, extract data from them. Rinse and repeat. Incidentally this is exactly how one tiny script I wrote for my tool works (ignore the last two lines). 50 lines is enough code to find out e.g. that only 7 people left more than 20 comments each on shorts from the list iCheckMovies' Most Favorite Shorts:

Code: Select all

('ClassicLady', 54), ('Emiam', 47), ('John Milton', 34), ('nicolaskrizan', 33), ('greenhorg', 26), ('frankqb', 25), ('Dieguito', 22)
Btw, from my experience for tasks like this (fetching pages from different websites) it's easier to use Python than JS+Node. Python has a ton of helpful libraries that can do everything you need for scraping (like "requests" for fetching stuff from the net, and "beautifulsoup4" for parsing HTML). And there are tons of "scraping with Python" tutorials around, like this one from a very popular intro book to Python.
User avatar
Ebbywebby
Posts: 4080
Joined: September 10th, 2012, 6:00 am
Location: Orange County, CA
Contact:

#875

Post by Ebbywebby »

Hmmm. Well, see, going to a single page and extracting elements from it makes sense to me. Sure. But how do you tell the program to move onto OTHER pages to examine? You can't just "add 1" to the current URL and continue. How do you construct a loop that's going to examine every movie page on ICM? I guess there's a way for the process to go to "All Movies" and then proceed through the 32,300 pages? (With 25 films to examine per page?)
User avatar
pitchorneirda
Posts: 850
Joined: February 11th, 2019, 12:07 pm
Location: France
Contact:

#876

Post by pitchorneirda »

Ebbywebby wrote: ↑May 3rd, 2021, 9:08 pm Hmmm. Well, see, going to a single page and extracting elements from it makes sense to me. Sure. But how do you tell the program to move onto OTHER pages to examine? You can't just "add 1" to the current URL and continue. How do you construct a loop that's going to examine every movie page on ICM? I guess there's a way for the process to go to "All Movies" and then proceed through the 32,300 pages? (With 25 films to examine per page?)
Yes you can! Let's say in Python: the first two lines that would go to "All Movies" and then proceed through the 32,300 pages are like this:

Code: Select all

for i in range(1,32301):
    URL = "https://www.icheckmovies.com/movies/?page={}".format(i)
    [...] your stuff [...]
i is gonna take every value from 1 to 32300. The "format" function replaces the {} in the string by the current value of i.
So during the first iteration, i = 1, my scraper will open https://www.icheckmovies.com/movies/?page=1, do the scraping, and then when it's finished, i takes 2 as value, my scraper will open https://www.icheckmovies.com/movies/?page=2, do the scraping, etc. etc.
"Art is like a fire, it is born from the very thing it burns" - Jean-Luc Godard
User avatar
Ebbywebby
Posts: 4080
Joined: September 10th, 2012, 6:00 am
Location: Orange County, CA
Contact:

#877

Post by Ebbywebby »

pitchorneirda wrote: ↑May 3rd, 2021, 9:41 pm
Ebbywebby wrote: ↑May 3rd, 2021, 9:08 pm Hmmm. Well, see, going to a single page and extracting elements from it makes sense to me. Sure. But how do you tell the program to move onto OTHER pages to examine? You can't just "add 1" to the current URL and continue. How do you construct a loop that's going to examine every movie page on ICM? I guess there's a way for the process to go to "All Movies" and then proceed through the 32,300 pages? (With 25 films to examine per page?)
Yes you can! Let's say in Python: the first two lines that would go to "All Movies" and then proceed through the 32,300 pages are like this:

Code: Select all

for i in range(1,32301):
    URL = "https://www.icheckmovies.com/movies/?page={}".format(i)
    [...] your stuff [...]
i is gonna take every value from 1 to 32300. The "format" function replaces the {} in the string by the current value of i.
So during the first iteration, i = 1, my scraper will open https://www.icheckmovies.com/movies/?page=1, do the scraping, and then when it's finished, i takes 2 as value, my scraper will open https://www.icheckmovies.com/movies/?page=2, do the scraping, etc. etc.
All right. Now we're getting somewhere. THAT, I understand.

But I thought all these scripts people share are in Javascript. So many are in Python?
User avatar
pitchorneirda
Posts: 850
Joined: February 11th, 2019, 12:07 pm
Location: France
Contact:

#878

Post by pitchorneirda »

I just took the example in Python since it's the language I work the most with but it's possible to do something very similar with Javascript, I don't have the exact syntax but it shouldn't be that different
"Art is like a fire, it is born from the very thing it burns" - Jean-Luc Godard
User avatar
monk-time
Posts: 1425
Joined: March 23rd, 2015, 6:00 am
Contact:

#879

Post by monk-time »

Ebbywebby wrote: ↑May 3rd, 2021, 10:14 pm But I thought all these scripts people share are in Javascript. So many are in Python?
It's a bit trickier to do this with JS than with Python. With Python you get a script as a file and you run it with Python's interpreter. With JS you need to either use Node.js and its interpreter (haven't used it at all but it's definitely possible; don't know if the libraries are as powerful tho) or a userscript for your browser that uses GM_xmlhttpRequest for requests for different domains than the one you launch the script from. The former two are fine choices, the latter one is more awkward but still doable.

Python seems like the language for web scraping though, with tons of powerful libraries for Python like Selenium if you need to do complex scraping from a JS-heavy site.
User avatar
Torgo
Posts: 2955
Joined: June 30th, 2011, 6:00 am
Location: Germany
Contact:

#880

Post by Torgo »

Post Reply