shell操作文件的几条命令:删除最后一列、删除第一行、diff等


shell操作文件的几条命令:删除最后一列、删除第一行、diff等
 
删除文件第一行: sed '1d' filename
 
删除文件最后一列: awk '{print $NF}' filename
 
比较文件的两种方法:
 
1)comm -3 --nocheck-order file1 file2
2) grep -v -f file1 file2 :输出file2中有file1中没有的行
 
当然还有diff file1 file2
 
贴一段昨天写的shell脚本~
 
#!/bin/bash
date_time=`date +'%H_%M_%S'`
yesterday=`date -d"-1 day" +'%Y_%m_%d'`
today=`date +'%Y_%m_%d'`
date_day_time=`date +'%Y_%m_%d_%H_%M_%S'`
 
mkdir /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/same_similiar_log/$today
 
# begin to get input files which haven't been deal with
today_input=/home/crawler/petabyte/crawllog/news_data/$today
yesterday_input=/home/crawler/petabyte/crawllog/news_data/$yesterday
 
/opt/hadoop/program/bin/hadoop fs -ls $yesterday_input/ > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get
/opt/hadoop/program/bin/hadoop fs -ls $today_input/ >> /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get
 
sed '1d' /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get_without_first_line
 
awk '{print $NF}' /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get_without_first_line > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input
 
#comm -3 --nocheck-order /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_diff
 
grep -v -f /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_diff
 
awk '{print $NF}' /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_diff > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_new_input
 
mv /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done
 
 
# begin to compute same_similary_news
inputfile1=""
while read line
do
  inputfile1=$inputfile1,${line}
done < /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done
echo $inputfile1
 

相关内容

    暂无相关文章