Your IP : 172.28.240.42


Current Path : /usr/lib/python2.7/
Upload File :
Current File : //usr/lib/python2.7/robotparser.pyc


|_c@s}dZddlZddlZdgZdddYZdddYZdd
dYZd	ejfd
YZdS(s< robotparser.py

    Copyright (C) 2000  Bastian Kleineidam

    You can choose between two licenses when using this package:
    1) GNU GPLv2
    2) PSF license for Python 2.2

    The robots.txt Exclusion Protocol is implemented as specified in
    http://info.webcrawler.com/mak/projects/robots/norobots-rfc.html
iNtRobotFileParsercBsbeZdZddZdZdZdZdZdZdZ	d	Z
d
ZRS(ss This class provides a set of methods to read, parse and answer
    questions about a single robots.txt file.

    tcCs>g|_d|_t|_t|_|j|d|_dS(Ni(tentriestNonet
default_entrytFalsetdisallow_allt	allow_alltset_urltlast_checked(tselfturl((s!/usr/lib/python2.7/robotparser.pyt__init__s				
cCs|jS(sReturns the time the robots.txt file was last fetched.

        This is useful for long-running web spiders that need to
        check for new robots.txt files periodically.

        (R	(R
((s!/usr/lib/python2.7/robotparser.pytmtime scCsddl}|j|_dS(sYSets the time the robots.txt file was last fetched to the
        current time.

        iN(ttimeR	(R
R((s!/usr/lib/python2.7/robotparser.pytmodified)scCs/||_tj|dd!\|_|_dS(s,Sets the URL referring to a robots.txt file.iiN(Rturlparsethosttpath(R
R((s!/usr/lib/python2.7/robotparser.pyR1s	cCst}|j|j}g|D]}|j^q"}|j|j|_|jdkrkt|_n@|jdkrt|_n%|jdkr|r|j	|ndS(s4Reads the robots.txt URL and feeds it to the parser.iiiiN(ii(
t	URLopenertopenRtstriptcloseterrcodetTrueRRtparse(R
topenertftlinetlines((s!/usr/lib/python2.7/robotparser.pytread6s	
cCsAd|jkr-|jdkr=||_q=n|jj|dS(Nt*(t
useragentsRRRtappend(R
tentry((s!/usr/lib/python2.7/robotparser.pyt
_add_entryDscCsd}d}t}x|D]}|d7}|s~|dkrPt}d}q~|dkr~|j|t}d}q~n|jd}|dkr|| }n|j}|sqn|jdd}t|dkr|djj|d<tj|dj|d<|ddkrk|dkrN|j|t}n|j	j
|dd}q|ddkr|dkr|jj
t|dt
d}qq|ddkr|dkr|jj
t|dtd}qqqqW|dkr|j|nd	S(
sparse the input lines from a robots.txt file.
           We allow that a user-agent: line is not preceded by
           one or more blank lines.iiit#t:s
user-agenttdisallowtallowN(tEntryR#tfindRtsplittlentlowerturllibtunquoteR R!t	rulelinestRuleLineRR(R
Rtstatet
linenumberR"Rti((s!/usr/lib/python2.7/robotparser.pyRMsN	

		
	

	cCs|jr
tS|jrtStjtj|}tjdd|j|j	|j
|jf}tj|}|s}d}nx-|j
D]"}|j|r|j|SqW|jr|jj|StS(s=using the parsed robots.txt decide if useragent can fetch urlRt/(RRRRRR-R.t
urlunparseRtparamstquerytfragmenttquoteRt
applies_tot	allowanceR(R
t	useragentRt
parsed_urlR"((s!/usr/lib/python2.7/robotparser.pyt	can_fetchs 				cCs-djg|jD]}t|d^qS(NRs
(tjoinRtstr(R
R"((s!/usr/lib/python2.7/robotparser.pyt__str__s(t__name__t
__module__t__doc__RR
RRRR#RR>RA(((s!/usr/lib/python2.7/robotparser.pyRs								3	R0cBs)eZdZdZdZdZRS(soA rule line is a single "Allow:" (allowance==True) or "Disallow:"
       (allowance==False) followed by a path.cCs;|dkr|rt}ntj||_||_dS(NR(RR-R9RR;(R
RR;((s!/usr/lib/python2.7/robotparser.pyRs	cCs|jdkp|j|jS(NR(Rt
startswith(R
tfilename((s!/usr/lib/python2.7/robotparser.pyR:scCs|jrdpdd|jS(NtAllowtDisallows: (R;R(R
((s!/usr/lib/python2.7/robotparser.pyRAs(RBRCRDRR:RA(((s!/usr/lib/python2.7/robotparser.pyR0s		R(cBs2eZdZdZdZdZdZRS(s?An entry has one or more user-agents and zero or more rulelinescCsg|_g|_dS(N(R R/(R
((s!/usr/lib/python2.7/robotparser.pyRs	cCsjg}x'|jD]}|jd|dgqWx*|jD]}|jt|dgq:Wdj|S(NsUser-agent: s
R(R textendR/R@R?(R
trettagentR((s!/usr/lib/python2.7/robotparser.pyRAscCs]|jddj}x=|jD]2}|dkr9tS|j}||kr#tSq#WtS(s2check if this entry applies to the specified agentR4iR(R*R,R RR(R
R<RK((s!/usr/lib/python2.7/robotparser.pyR:scCs.x'|jD]}|j|r
|jSq
WtS(sZPreconditions:
        - our agent applies to this entry
        - filename is URL decoded(R/R:R;R(R
RFR((s!/usr/lib/python2.7/robotparser.pyR;s(RBRCRDRRAR:R;(((s!/usr/lib/python2.7/robotparser.pyR(s
			
RcBs#eZdZdZdZRS(cGs tjj||d|_dS(Ni(R-tFancyURLopenerRR(R
targs((s!/usr/lib/python2.7/robotparser.pyRscCsdS(N(NN(R(R
Rtrealm((s!/usr/lib/python2.7/robotparser.pytprompt_user_passwdscCs(||_tjj||||||S(N(RR-RLthttp_error_default(R
RtfpRterrmsgtheaders((s!/usr/lib/python2.7/robotparser.pyRPs	(RBRCRRORP(((s!/usr/lib/python2.7/robotparser.pyRs		((((	RDRR-t__all__RR0R(RLR(((s!/usr/lib/python2.7/robotparser.pyt<module>s	$