Skip to main content

How to download pictures from Baidu Image with python

With below script, you can download specific pictures to a folder.


# coding: utf-8
import requests
import os
 
def getManyPages(keyword,pages):
    params=[]
    for i in range(30,30*pages+30,30):
        params.append({
                      'tn': 'resultjson_com',
                      'ipn': 'rj',
                      'ct': 201326592,
                      'is': '',
                      'fp': 'result',
                      'queryWord': keyword,
                      'cl': 2,
                      'lm': -1,
                      'ie': 'utf-8',
                      'oe': 'utf-8',
                      'adpicid': '',
                      'st': -1,
                      'z': '',
                      'ic': 0,
                      'word': keyword,
                      's': '',
                      'se': '',
                      'tab': '',
                      'width': '',
                      'height': '',
                      'face': 0,
                      'istype': 2,
                      'qc': '',
                      'nc': 1,
                      'fr': '',
                      'pn': i,
                      'rn': 30,
                      'gsm': '1e',
                      '1488942260214': ''
                  })
    url = 'https://image.baidu.com/search/acjson'
    urls = []
    for i in params:
        urls.append(requests.get(url,params=i).json().get('data'))
 
    return urls
 
 
def getImg(dataList, localPath):
 
    if not os.path.exists(localPath):  
        os.mkdir(localPath)
 
    x = 0
    for list in dataList:
        for i in list:
            if i.get('thumbURL') != None:
                print('downloading_s' % i.get('thumbURL'))
                ir = requests.get(i.get('thumbURL'))
                open(localPath + '%d.jpg' % x, 'wb').write(ir.content)
                x += 1
            else:
                print("The pictures link does not exist")
 
if __name__ == '__main__':
    dataList = getManyPages('Specifi pictures to download',10)  # param 1:keyword,param 2:page numbers to download
    getImg(dataList,'folder\\to\\store\\the\\picture\\') # param 2:the specified path to save the pictures

Type the picture name of picture in the second last row to replace 'Specifi pictures to download', and change the path to save the images. After saving above fils as downloadPictures.py, open the command console, and type python downloadPictures.py, the pictures will be automately downloaded to the specified directory.

Comments

Popular posts from this blog

span[class~="sr-only"]

  The  span[class~="sr-only"]  selector will select any  span  element whose  class   includes   sr-only . Create that selector, and give it a  border  property set to  0 . span [ class ~= "sr-only" ] {    border:   0 ; }

An Australian Pelican feast that lasted more than two decades

Why are you so focused? It turned out that the pelicans were all waiting to eat fish with their heads up, hahahaha! In the Central Coast area north of Sydney, there is a beautiful and famous town called The Entrance, which has the title of "Australian Pelican Capital". What makes a town so honored? The reason is these cute toucans. Every afternoon, the pelicans fly here from near and far, and there are no obstacles 365 days a year. As soon as 3:30, a staff member will push a large box full of fish to the small square where the pelicans gather, and the pelicans have long been eager to wait. This white-haired grandpa came to feed today. I saw the grandfather skillfully put on rubber gloves, while taking a fish out of the box and throwing it at the pelican, he interacted with the onlookers and introduced the knowledge of the pelican. The noise of the pelicans competing for the fish and the exclamation of the onlookers crowded into one, the atmosphere was warm. A clever pelican s...

正则表达式匹配空格\s和特定次数

  let   ohStr  =  "Ohhh no" ; let   ohRegex  =  /Oh{3,6}\sno/ ig ;  let   result  =  ohRegex . test ( ohStr ); {3,6}表示匹配3到6次,包含3,6. {3, )表示最少3次,无上限 { ,8}表示最多8次,无下限 {3}匹配特定次数,这里表示只匹配3次的。 后面跟一个?,表示这个字母可能会出现,也可能不出现。如/colou?r/既能匹配英式英语的colour,也能匹配美式英语的color 在 pwRegex 中使用前瞻来匹配长度大于 5 个字符且具有两个连续数字的密码。 let   sampleWord  =  "astronaut" ; let   pwRegex  =  /(?=\w{6})(?=\w*\d{2})/ gi ;  let   result  =  pwRegex . test ( sampleWord );