SpiderFoot: OSINT collection and reconnaissance tool

Accujack · on Jan 31, 2020

I'm going to offer an opinion not matching the groupthink here.

I think this is a great project... many of us gather intelligence on various targets and threats by hand, because we're used to doing that, and it's the way we've always done it, and it works.

A system of automation is more thorough and more complete as well as speeding up the process, allowing for more searches in less time.

Sure, the software isn't perfect, but it's better than almost anything else in terms of being more of a product and less of a toolkit. It's open source, so if you think a data source like Facebook is missing, feel free to add it in.

Well done, I say.

ArtWomb · on Jan 31, 2020

Yes, I still mine data by hand. Usually inserted directly into sqlite using datagrip. And if I use a script its bespoke, specifically crafted for a single site (or single page) ;)

I applaud this effort as well. The web is a chaotic soup of unstructured data. And this is a first step towards agents that can surf the web for us. And return complex queries from natural human language!

smicallef · on Jan 31, 2020

Appreciate it, thanks!

tonyarkles · on Jan 31, 2020

Yeah, this looks pretty awesome! I ran it against one of my own domains and there's a fair bit of false positive/flood of information, but the data is really interesting!

jamescampbell · on Jan 31, 2020

I've used and setup SpiderFoot for others. It works fine and usually does enough that it helps more than hurts.

smicallef · on Jan 31, 2020

Thanks.. it’s been a ton of work over years, with valuable contributions from others as well, so I’m really glad to see it’s appreciated.

If there’s however any pain point you experience with SpiderFoot, I’d be happy to hear it.

ghostpepper · on Feb 1, 2020

I am new to OSINT and this looks like a really cool way to get started.

Maybe this exists and I missed it but it would nice to have a way not to run scans with a missing API key. Even better would be a way to easily link to instructions to obtain missing API keys, for example each module could have a metadata slot for 'API key generation url'... it would be a lot of work so I understand if it's not in the roadmap.

Thanks for your hard work on this project

josephmosby · on Jan 31, 2020

I am excited to get into this.

Anecdotal disclaimer: I have been both an OSINT analyst and a developer supporting OSINT work in the past. I understand completely why OSINT analysts like to do things by hand - they can understand where the data comes from; they can cite sources as they go; they can tailor things specifically to their use case. Understanding the process is indeed important to knowing how we got to the conclusion.

And yet, we're frustrated when there's a panic situation and we need a report of any kind in two hours. We're frustrated when we're handed sixteen targets all at once and have to suss them out over the weekend. We're frustrated that visualizations suck and we have to custom make them ourselves. Better tooling, an opinionated process, and automation helps solve all of that, or at least advance things a bit.

Can't have anything perfect (entity resolution as #1 on the un-perfect list) but getting a tool together as well thought-out as this one is an excellent step forward. Kudos, @smicallef.

smicallef · on Jan 31, 2020

So happy to hear it helps, thanks a lot!

smicallef · on Jan 31, 2020

Author of the project (not the OP) here. Must say it was quite a surprise to see this land in my HN feed today! I’ll do my best to answer the points raised below but if you have any questions or further feedback, I’m glad to hear it!

Ansil849 · on Jan 31, 2020

It is nice to see OSINT receive more tooling, but I shudder at the 'most complete' hyperbole. For starters, this appears to be almost exclusively centered on web domain OSINT. But what about reverse image searches? Government records searches? Video analysis? All of these domains appear to not be represented at all, to name just a few. Even in the domain of web analysis the tool is missing critical queries such as historic whois and leaked database lookups.

smicallef · on Jan 31, 2020

Actually you can target a bunch of things beyond domains, including IPs, usernames, phone numbers and more. And historic Whois and leak database modules are indeed there. In some cases you need API keys though but most offer free tiers for low volumes.

But yes, it’s not covering some of the other sources you mentioned... yet.

KarlKemp · on Jan 31, 2020

For something that claims comprehensiveness, I am somewhat surprised by the list of sources. It’s almost entirely restricted to the web domain, i. e. domain/ip/email data.

That’s understandable, considering how any data you gleam can immediately be fed into another run of all these tools. But I was under the impression that "OSINT" has a few more data sources connecting it to the "real world“? Company data, published books, or the court system come to mind.

(Sorry if I just missed some)

smicallef · on Jan 31, 2020

Company data is a little there, using OpenCorporates’ API. The tool originated with a smaller scope initially and has grown over 8 years of development. I can imagine how different it will look in another few years.

no-dr-onboard · on Jan 31, 2020

The whole syntax similarity feature in Recon-NG with MSF is really what is keeping me from trying this tool.

Swiss-knife projects like this are great, but if you don't provide a relatively intuitive set of commands for flipping through the various modules and interfaces, you're going to lose a lot of your userbase.

Simply put, no one wants to learn how to use a new tool if they already know the input and expected output and can't leverage existing navigational habits.

smicallef · on Jan 31, 2020

I didn’t even consider the MSF compatibility need for people, so will take that into consideration for a future release. The sfcli.py CLI was a starting point for that kind of functionality but not MSF compatible. Thanks for the feedback though!

nannal · on Jan 31, 2020

Jack of all trades..?

I've not tested spiderfoot to see how it compares to other more specialised OSINT tools in their specific areas but there is a line of thinking that would suggest an amalgamation of tools might fare better.

gatewaynode · on Jan 31, 2020

It is kind of an amalgamation of tools. Install it, sign up for the dozens and dozens of different external services it queries, configure those API's in SpiderFoot. Then fire away. It is really good, but some of the better visualizations, sorting options, reporting and diffing come with the premium SAAS version SpiderFootHX.

londons_explore · on Jan 31, 2020

It scans wikipedia for edits from a specific IP, but doesn't scan facebook for accounts matching names?

This tool seems to do some very niche things, and miss out some very big things...

smicallef · on Jan 31, 2020

It actually does search for social media accounts linked to a name or username, but does so using Google/Bing APIs and not Facebook’s.

ggm · on Jan 31, 2020

Rdap. Whois is so 1990s

smicallef · on Jan 31, 2020

Indeed, Whois sucks for parsing and is losing value as an OSINT source since GDPR. I’ll take a look at RDAP.

egorfine · on Jan 31, 2020

does it compare to palantir?

duckqlz · on Jan 31, 2020

“ There's something fascinating about hairy little spider feet. They look like they belong on dogs. Or maybe even cats.

Recently, images of hairy spider "paws" have circulated on social media with people oohing and aahing about how cute they are and how much they resemble furry pet appendages”

If you haven’t googled spider feet. Do it.

bookofjoe · on Jan 31, 2020

>The black "hairs" on the Araniella villanii spider are innervated, meaning they are sensory organs, much like a cat's whisker.

Source: https://www.livescience.com/newly-discovered-math-spider.htm...

dr_zoidberg · on Jan 31, 2020

Inside one of the plugins I found this code:

    def setup(self, sfc, userOpts=dict()):
        self.sf = sfc
        self.results = self.tempStorage()

        for opt in list(userOpts.keys()):
            self.opts[opt] = userOpts[opt]

And I'm left wondering... Since the class that implements this method has a opts dictionary that is a class attribute, why didn't the author write a simple self.opts.update(userOpts) that takes care of it? It does exactly the same thing, is faster and clearer.

That in itself made me lose a lose a lot of the interest I had in this project. I'll still pass it around to a few colleagues that work in infosec to see what they think of the tool from a practical point of view.

I'm also a bit confused that the authors says it's Python 3, but there are a lot of idioms around that point a 2/3 compatibility. I can understand that the project might have started with a 2/3 mentality, and then progressed. However, some of this idioms not only affect readability but can introduce side effects (anywhere from annoyances to bugs).