shell操作文件的几条命令：删除最后一列、删除第一行、diff等

文章由LinuxBoy分享于2019-03-24 05:03:02热评（171）

shell操作文件的几条命令：删除最后一列、删除第一行、diff等

删除文件第一行： sed '1d' filename

删除文件最后一列： awk '{print $NF}' filename

比较文件的两种方法：

1）comm -3 --nocheck-order file1 file2

2) grep -v -f file1 file2 :输出file2中有file1中没有的行

当然还有diff file1 file2

贴一段昨天写的shell脚本~

#!/bin/bash

date_time=`date +'%H_%M_%S'`

yesterday=`date -d"-1 day" +'%Y_%m_%d'`

today=`date +'%Y_%m_%d'`

date_day_time=`date +'%Y_%m_%d_%H_%M_%S'`

mkdir /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/same_similiar_log/$today

# begin to get input files which haven't been deal with

today_input=/home/crawler/petabyte/crawllog/news_data/$today

yesterday_input=/home/crawler/petabyte/crawllog/news_data/$yesterday

/opt/hadoop/program/bin/hadoop fs -ls $yesterday_input/ > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get

/opt/hadoop/program/bin/hadoop fs -ls $today_input/ >> /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get

sed '1d' /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get_without_first_line

awk '{print $NF}' /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get_without_first_line > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input

#comm -3 --nocheck-order /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_diff

grep -v -f /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_diff

awk '{print $NF}' /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_diff > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_new_input

mv /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done

# begin to compute same_similary_news

inputfile1=""

while read line

inputfile1=$inputfile1,${line}

done < /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done

echo $inputfile1

推荐文章：

shell操作文件的几条命令：删除最后一列、删除第一行、diff等