# Scanning IPs from selected log files



## toprank (Feb 2, 2018)

How do I scan files for unique IP addresses where the IP may not be the first field in each line? Fortunately, httpd-access.log has IPs as the first field so `awk '{if (!unique[$1]++) {print $1}}' /var/log/httpd-access.log` works, but how do I do this for something like auth.log where the IPs are elsewhere?


----------



## SirDice (Feb 2, 2018)

"Userland programming and scripting" is probably a better place for this, thread moved.

In general I use Perl for things like this, especially for the combination of log files and some clever regular expressions. But I'm quite used to Perl, I've used it for quite a number of years. Still, I think Perl is ideal for this type of situation, it's named Practical Extraction and Reporting Language for a reason, it really excels at doing tasks like this.


----------



## linux->bsd (Feb 3, 2018)

As SirDice said, Perl is probably the way to go. But if you don't mind piping together a few command line programs, just do something like this: `grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /var/log/auth.log | sort | uniq`. Probably best not to expand that to support IPv6.


----------



## fullauto2012 (Feb 3, 2018)

```
cat auth.log | grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" | awk '{match($0,/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/); ip = substr($0,RSTART,RLENGTH); print ip}' | uniq -u
```
 will output only the uniq IPs in that file...


----------



## toprank (Feb 3, 2018)

linux->bsd said:


> As SirDice said, Perl is probably the way to go. But if you don't mind piping together a few command line programs, just do something like this: `grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /var/log/auth.log | sort | uniq`. Probably best not to expand that to support IPv6.



Thank you. This worked perfectly!



fullauto2012 said:


> ```
> cat auth.log | grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" | awk '{match($0,/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/); ip = substr($0,RSTART,RLENGTH); print ip}' | uniq -u
> ```
> will output only the uniq IPs in that file...



Thank you. This worked, too, but printed out duplicates.


----------



## Nicola Mingotti (Feb 9, 2018)

toprank said:


> Thank you. This worked, too, but printed out duplicates.



That is beause of "uniq" command, which is not what one expects.
I never use it, its name is misleading.

I will add that, also Ruby is very good for such kind of things. If you try it you will love it. Expecially if you come from Perl.


----------



## tingo (Feb 11, 2018)

uniq(1) works, but you have to sort(1) to the lines in the input files first. The unix way, you know.


----------



## Nicola Mingotti (Feb 11, 2018)

I never said "it does not work" . I said "uniq" has a misledading name, and I still belive that.
It is true it is not written "unique" but you read it like that; it is misleading.

It was like calling a command "maximum" but then, oh no, "maximum" works only if its input
its sorted. ... then, it should not be calld "maximum"

BTW, AFAIR (i studied this a long time ago so I may say bullshit now) if you need work on "n"
lines a simple "unique" operation would take O(n), a sort + unique takes O(n*log(n)) + O(n).
[misregarding space, for now]

I don't know the reason why "uniq" was implementaed like it is, maybe someone a bit older
knows the rationale. If it was my decision I would have make "uninq" do a real "unique"
operation and maybe "uniqu -a" should operate on adjacent lines.


----------



## Nicola Mingotti (Feb 11, 2018)

For example, (with Ruby)

Create a 10M lines file, each line is a random number

```
f = File.open("data.txt","w")
(1..1E7).each do |x|
  f.puts Random.rand(10000)
end
f.close
```

then create a true "unique" command called "unique.rd"

```
#!/usr/local/bin/ruby
diz = {}
while line=gets do
  if diz.has_key? line then nil else
    diz[line] = 1
    puts line
  end
end
```

Now compare "unique.rb" with sort + uniq

```
time cat data.txt | sort -n | uniq > data2.txt
real    1m31.001s
user    1m17.652s
sys     0m4.327s

time cat data.txt | ./unique.rb > data3.txt
real    0m8.550s
user    0m7.534s
sys     0m0.275s
```


----------

