Skip to main content

Python crawlers to crawl pure girl pictures

Before running the code, you need to install BeautifulSoup, requests, os library.

From bs4 import BeautifulSoup
Import requests
Import time
Import os

Def get_html(url):
    Try:
        Response=requests.get(url)
        Response.encoding='gb2312'
        If response.status_code==200:
            Print('Successfully connected! URL is '+url)
            Return response.text
    Except requests.RequestException:
       Return None

Def get_url_and_name(url):
    "The passed argument is the main page link, the return value is a list with 2 elements, element 1 is the map package link, and element 2 is the map package name."
    Html=get_html(url)
    Soup=BeautifulSoup(html,'lxml')
    Name=[]
    Url_1=[]
    List2=soup.find_all(class_='t')
    Sign=1
    For item in list2:
        If(sign!=1 and sign!=42):
            Url_temp=item.find('a').get('href')
            Name_temp=item.find(class_='title').find('a').get('title')
            Url_1.append(url_temp)
            Name.append(name_temp)
        Sign=sign+1
    Temp=[url_1,name]
    Return temp

Def get_pic_url(url):
    "The argument passed in is the link to the map package, and the return value is the link to all the images in the map package."
    Address=[]
    Html1=get_html(url)
    Soup=BeautifulSoup(html1,'lxml')
    List4=soup.find(class_='page').find_all('a')
    Temp=1
    While(temp<len(list4)):
        If(temp==1):
            Url_3=url
        Else:
            Url_3=url.replace('.html','_'+str(temp)+'.html')
        Temp=temp+1
        Html2=get_html(url_3)
        Soup1=BeautifulSoup(html2,'lxml')
        List3=soup1.find(class_='content').find_all('img')
        For item in list3:
            Address.append(item.get('src'))
    Return address
    
Def pic_download(url,name,path):
    "url is a list of all image links for a map package, name is the name of the package, and path is the downloaded directory."
    Os.mkdir(path+'./'+name)
    # Because the mkdir function is used, it is necessary to ensure that the folder to be created cannot exist, otherwise an error will be reported.
    Print('The package being downloaded is named '+name')
    Index=1
    For i1 in url:
        Filename = path+'./'+name+'./'+str(index) +'.jpg'
        With open(filename, 'wb') as f:
            Img = requests.get(i1).content
            F.write(img)
        Index += 1
        Time.sleep(2)
    Print(name+'download completed!')

Def main(i):
#i is the number of pages of the homepage of the map (the first few pages)
    Url='https://www.keke234.com/gaoqing/list_5_'+str(i)+'.html'
    Path=r'H:\autoDownLoadPictures\savePicture'
#path is a custom path
    Information=get_url_and_name(url)
    Num=0
    For item in information[0]:
        Address=get_pic_url(item)
        Pic_download(address,information[1][num],path)
        Num=num+1

If __name__ == '__main__':
    For i in range(1,2):
        Main(i)

Comments

Popular posts from this blog

Low risk, high reward: the asymmetric black swan trade

Cornwall Capital was founded in 2003 by two young people who were not in the mainstream - Charlie Ledley, and Gammy Mak. I say they were out of the mainstream because they were not financial professionals from a scientific background, but they were full of imagination, and a keen insight into the market. Their insight is evident in the transactions of First Capital Financial. Case Background Before betting that the subprime mortgage market would collapse, Cornwall Capital first noticed a credit card business company, First Capital Financial. Throughout the 1990s and early 2000s, First Capital Financial claimed that it had better tools than other firms for analyzing the creditworthiness of subprime credit card users and pricing the risk of lending to them, and the market bought that claim. But in July 2002, First Capital Financial's stock fell 60% in two days after the company voluntarily disclosed a disagreement between them and two government regulators: How much capital did they ...

盘点类似河北的儿童杀人案,是如何判决的

javascript Mutations

 Return true if the string in the first element of the array contains all of the letters of the string in the second element of the array. For example, ["hello", "Hello"], should return true because all of the letters in the second string are present in the first, ignoring case. The arguments ["hello", "hey"] should return false because the string hello does not contain a y. Lastly, ["Alien", "line"], should return true because all of the letters in line are present in Alien. function   mutation ( arr ) {    var   temArr  =[];    for ( let   i = 0 ; i < arr [ 1 ]. length ; i ++) //要比较的是第二个数组的,所以放在外层,内层每次都要遍历第一个数组,让第二个数组中的字符去与每一个第一个数组中的字符比较,相等就push一个true,然后马上跳出内循环   {      for  ( let   j = 0 ; j < arr [ 0 ]. length ; j ++)     {        if  ( arr [ 1 ][ i ]. toUpperCase ()== arr [ 0 ][ j ]. toUpperCase ())      ...