Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
SpiderFoot: OSINT collection and reconnaissance tool (github.com/smicallef)
272 points by axiomdata316 on Jan 31, 2020 | hide | past | favorite | 26 comments


I'm going to offer an opinion not matching the groupthink here.

I think this is a great project... many of us gather intelligence on various targets and threats by hand, because we're used to doing that, and it's the way we've always done it, and it works.

A system of automation is more thorough and more complete as well as speeding up the process, allowing for more searches in less time.

Sure, the software isn't perfect, but it's better than almost anything else in terms of being more of a product and less of a toolkit. It's open source, so if you think a data source like Facebook is missing, feel free to add it in.

Well done, I say.


Yes, I still mine data by hand. Usually inserted directly into sqlite using datagrip. And if I use a script its bespoke, specifically crafted for a single site (or single page) ;)

I applaud this effort as well. The web is a chaotic soup of unstructured data. And this is a first step towards agents that can surf the web for us. And return complex queries from natural human language!


Appreciate it, thanks!


Yeah, this looks pretty awesome! I ran it against one of my own domains and there's a fair bit of false positive/flood of information, but the data is really interesting!


I've used and setup SpiderFoot for others. It works fine and usually does enough that it helps more than hurts.


Thanks.. it’s been a ton of work over years, with valuable contributions from others as well, so I’m really glad to see it’s appreciated.

If there’s however any pain point you experience with SpiderFoot, I’d be happy to hear it.


I am new to OSINT and this looks like a really cool way to get started.

Maybe this exists and I missed it but it would nice to have a way not to run scans with a missing API key. Even better would be a way to easily link to instructions to obtain missing API keys, for example each module could have a metadata slot for 'API key generation url'... it would be a lot of work so I understand if it's not in the roadmap.

Thanks for your hard work on this project


I am excited to get into this.

Anecdotal disclaimer: I have been both an OSINT analyst and a developer supporting OSINT work in the past. I understand completely why OSINT analysts like to do things by hand - they can understand where the data comes from; they can cite sources as they go; they can tailor things specifically to their use case. Understanding the process is indeed important to knowing how we got to the conclusion.

And yet, we're frustrated when there's a panic situation and we need a report of any kind in two hours. We're frustrated when we're handed sixteen targets all at once and have to suss them out over the weekend. We're frustrated that visualizations suck and we have to custom make them ourselves. Better tooling, an opinionated process, and automation helps solve all of that, or at least advance things a bit.

Can't have anything perfect (entity resolution as #1 on the un-perfect list) but getting a tool together as well thought-out as this one is an excellent step forward. Kudos, @smicallef.


So happy to hear it helps, thanks a lot!


Author of the project (not the OP) here. Must say it was quite a surprise to see this land in my HN feed today! I’ll do my best to answer the points raised below but if you have any questions or further feedback, I’m glad to hear it!


It is nice to see OSINT receive more tooling, but I shudder at the 'most complete' hyperbole. For starters, this appears to be almost exclusively centered on web domain OSINT. But what about reverse image searches? Government records searches? Video analysis? All of these domains appear to not be represented at all, to name just a few. Even in the domain of web analysis the tool is missing critical queries such as historic whois and leaked database lookups.


Actually you can target a bunch of things beyond domains, including IPs, usernames, phone numbers and more. And historic Whois and leak database modules are indeed there. In some cases you need API keys though but most offer free tiers for low volumes.

But yes, it’s not covering some of the other sources you mentioned... yet.


For something that claims comprehensiveness, I am somewhat surprised by the list of sources. It’s almost entirely restricted to the web domain, i. e. domain/ip/email data.

That’s understandable, considering how any data you gleam can immediately be fed into another run of all these tools. But I was under the impression that "OSINT" has a few more data sources connecting it to the "real world“? Company data, published books, or the court system come to mind.

(Sorry if I just missed some)


Company data is a little there, using OpenCorporates’ API. The tool originated with a smaller scope initially and has grown over 8 years of development. I can imagine how different it will look in another few years.


The whole syntax similarity feature in Recon-NG with MSF is really what is keeping me from trying this tool.

Swiss-knife projects like this are great, but if you don't provide a relatively intuitive set of commands for flipping through the various modules and interfaces, you're going to lose a lot of your userbase.

Simply put, no one wants to learn how to use a new tool if they already know the input and expected output and can't leverage existing navigational habits.


I didn’t even consider the MSF compatibility need for people, so will take that into consideration for a future release. The sfcli.py CLI was a starting point for that kind of functionality but not MSF compatible. Thanks for the feedback though!


Jack of all trades..?

I've not tested spiderfoot to see how it compares to other more specialised OSINT tools in their specific areas but there is a line of thinking that would suggest an amalgamation of tools might fare better.


It is kind of an amalgamation of tools. Install it, sign up for the dozens and dozens of different external services it queries, configure those API's in SpiderFoot. Then fire away. It is really good, but some of the better visualizations, sorting options, reporting and diffing come with the premium SAAS version SpiderFootHX.


It scans wikipedia for edits from a specific IP, but doesn't scan facebook for accounts matching names?

This tool seems to do some very niche things, and miss out some very big things...


It actually does search for social media accounts linked to a name or username, but does so using Google/Bing APIs and not Facebook’s.


Rdap. Whois is so 1990s


Indeed, Whois sucks for parsing and is losing value as an OSINT source since GDPR. I’ll take a look at RDAP.


does it compare to palantir?


“ There's something fascinating about hairy little spider feet. They look like they belong on dogs. Or maybe even cats.

Recently, images of hairy spider "paws" have circulated on social media with people oohing and aahing about how cute they are and how much they resemble furry pet appendages”

If you haven’t googled spider feet. Do it.


>The black "hairs" on the Araniella villanii spider are innervated, meaning they are sensory organs, much like a cat's whisker.

Source: https://www.livescience.com/newly-discovered-math-spider.htm...


Inside one of the plugins I found this code:

    def setup(self, sfc, userOpts=dict()):
        self.sf = sfc
        self.results = self.tempStorage()

        for opt in list(userOpts.keys()):
            self.opts[opt] = userOpts[opt]
And I'm left wondering... Since the class that implements this method has a opts dictionary that is a class attribute, why didn't the author write a simple self.opts.update(userOpts) that takes care of it? It does exactly the same thing, is faster and clearer.

That in itself made me lose a lose a lot of the interest I had in this project. I'll still pass it around to a few colleagues that work in infosec to see what they think of the tool from a practical point of view.

I'm also a bit confused that the authors says it's Python 3, but there are a lot of idioms around that point a 2/3 compatibility. I can understand that the project might have started with a 2/3 mentality, and then progressed. However, some of this idioms not only affect readability but can introduce side effects (anywhere from annoyances to bugs).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: