0

What is a Linux terminal command to do a Proximity text search?

Search a directory, recursively, for files that contain :
2 words or more within
99 characters of each other
any order of above words.

example1:
search by : city and first name
Berlin
Bob

Above input finds a file with
Bob Smith
123 Main Street Apt. 101
Berlin Ohio USA 54321
Phone ...
email ...

Question -- inspired by:
example2 is working, on web page, not desired Local disk:
search for 2 words:
living
soul
https://www.biblegateway.com/quicksearch/?quicksearch=living+soul&resultspp=250&version=NIV

finds 4 verses
here is 1 quote in reverse order of 2 word input:
soul
living

Psalm 84:2
My soul yearns, even faints, for the courts of the Lord;
my heart and my flesh cry out for the living God.

Instead of above web page Proximity text search ...

What is a Linux terminal command to do a,
2 + word,
Proximity text search on a Local disk?

using
OS: Kubuntu 22.04.4 LTS x86_64

to display above:
neofetch --stdout |grep 'OS:'

--

1 Answer 1

0

You may easily achieve this using a combination of 'find-, 'exec' and 'grep'.

The logic is the following:

  1. Find all files recursively within a specific path/directory
  2. For each file, execute a grep command
  3. For each grepped content, also include some lines after the match (for example 2).

The command will be (in your linux shell):

find /some/path/ -type f -exec egrep -hi -A2 "bob|berlin" {} +;

Note that, with grep:

  • Option A: to show X lines after the match
  • Option B: to show X lines before the match
  • Option C: to show X lines before and after the match

Hope it helps.

Alternative 1

If you want to display the matched data from a file that have BOTH the values you are looking for, you may try this:

grep -lri "bob" $(grep -lri "berlin" /some/path/) | xargs egrep -Hi -A2 "bob|berlin"

Some explanations:

  • The piece of code grep -lri "bob" $(grep -lri "berlin" /some/path/) will return all filepaths matching both the strings you are looking for (bob & berlin).
  • The piece of code xargs egrep -Hi -A2 "bob|berlin" takes the input filepaths (xargs) and display any line containing bob or berlin (case unsensitive) + 2 lines after + the filepaths where lines originate from.
  • Option l (grep): return filenames instead of contents.
  • Option r (grep): search recursively over a folder
  • Option i (grep): case unsensitive. Grep will match the search whatever the LetTerCaSe found.
  • Option H (grep): prefix the output with the filepath from which the result is matched.

Hope it helps more.

Alternative 2

The previous commands does not properly handle filenames having spaces. The xargs command will handle names with spaces as two separate files, and the grep would result in the error like: "No such file or directory".

The following command may help resolve this issue by handling filenames properly (even if they have spaces in it)

find /some/path -type f -exec grep -li bob {} + |xargs -I% grep -li berlin % |xargs -I% egrep -Hi -A2 "bob|berlin" %

This command line is not necessarily very optimized, but has the merit of working

16
  • petitradisgris typed Hope it helps. -- It partially helps. used: egrep --version # grep (GNU grep) 3.7 # command1 used: find . -type f -exec egrep -hi -A2 "bob|berlin" {} +; 1. command1 shows a false positive, because command1 finds text with bob = string1 but that file did not have berlin = string2 -- thus false positive. 2. Output is not the file but the contents showing bob and berlin. This means the output is a wall of text with no reference as to which files have that text. --
    – joseph22
    Commented May 7 at 16:32
  • stackoverflow.com/questions/6637882/…
    – Destroy666
    Commented May 7 at 16:36
  • As for the mismatches, you'll need to provide accurate examples of what should be matched and what not.
    – Destroy666
    Commented May 7 at 16:36
  • see example 1 .
    – joseph22
    Commented May 7 at 16:53
  • Hello joseph. I believe that i understood better your need with your comment, so i have made an EDIT to my post. False positive 1 resolved: only data from files having BOTH strings will be displayed. 2nd false positive: resolved by using the -H argument to grep, giving a prefix which is the flenames for each matched line Commented May 9 at 0:22

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .